Instructions to use Antonio88/TaliML-PHI3-128K-ITA-V.1.0.FINAL with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Antonio88/TaliML-PHI3-128K-ITA-V.1.0.FINAL with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="Antonio88/TaliML-PHI3-128K-ITA-V.1.0.FINAL", trust_remote_code=True) messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("Antonio88/TaliML-PHI3-128K-ITA-V.1.0.FINAL", trust_remote_code=True) model = AutoModelForCausalLM.from_pretrained("Antonio88/TaliML-PHI3-128K-ITA-V.1.0.FINAL", trust_remote_code=True) messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use Antonio88/TaliML-PHI3-128K-ITA-V.1.0.FINAL with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "Antonio88/TaliML-PHI3-128K-ITA-V.1.0.FINAL" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Antonio88/TaliML-PHI3-128K-ITA-V.1.0.FINAL", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/Antonio88/TaliML-PHI3-128K-ITA-V.1.0.FINAL
- SGLang
How to use Antonio88/TaliML-PHI3-128K-ITA-V.1.0.FINAL with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "Antonio88/TaliML-PHI3-128K-ITA-V.1.0.FINAL" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Antonio88/TaliML-PHI3-128K-ITA-V.1.0.FINAL", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "Antonio88/TaliML-PHI3-128K-ITA-V.1.0.FINAL" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Antonio88/TaliML-PHI3-128K-ITA-V.1.0.FINAL", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use Antonio88/TaliML-PHI3-128K-ITA-V.1.0.FINAL with Docker Model Runner:
docker model run hf.co/Antonio88/TaliML-PHI3-128K-ITA-V.1.0.FINAL
Responsible AI Considerations for the Phi3stran Models
Like other language models, the Phi series can potentially exhibit behaviors that are unfair, unreliable, or offensive. It’s important to be aware of some limiting behaviors:
Quality of Service: The Phi models are primarily trained on Italian text. Performance may degrade for languages other than Italian.
Representation of Harms & Perpetuation of Stereotypes: These models can over- or under-represent certain groups of people, erase representation of some groups, or reinforce demeaning or negative stereotypes. Despite post-training safety measures, these limitations may persist due to varying levels of representation of different groups or the prevalence of negative stereotypes in the training data that reflect real-world patterns and societal biases.
Inappropriate or Offensive Content: The models may generate content that is inappropriate or offensive, which could make them unsuitable for deployment in sensitive contexts without additional, use-case-specific mitigations.
Information Reliability: Language models can produce nonsensical or fabricated content that may seem plausible but is inaccurate or outdated.
Limited Scope for Code: The majority of Phi-3 training data is based on Python and utilizes common packages such as “typing, math, random, collections, datetime, itertools”. If the model generates Python scripts that use other packages or scripts in other languages, manual verification of all API uses is strongly recommended.
Developers should employ responsible AI best practices and ensure compliance with relevant laws and regulations (e.g., privacy, trade, etc.) for their specific use cases.
Model in Test: Continuous improvements are being made to the model.
Please note that the responses from the model should not be regarded as absolute truths.
Prompt Template:
** Use Phi 3 model preset.
Prompt template:
<|system|> {system_prompt}.<|end|> <|user|> {prompt}<|end|> <|assistant|>
Downloading and running the models
You can download the individual files from the Files & versions section.
| Quant type | Download |
|---|---|
| Q5_K_M | PHI3STRAN-GGUF here |
How to Download GGUF Files Manually?
Note for Manual Downloaders:
The following clients will automatically download models for you, providing a list of available models to choose from:
LM Studio
Use PHI3 config.preset
Credits & License
The license of the smashed model follows the license of the original model. Please check the license of the original model before using this model which provided the base model.
- Downloads last month
- 3
Model tree for Antonio88/TaliML-PHI3-128K-ITA-V.1.0.FINAL
Base model
microsoft/Phi-3-mini-128k-instruct