Instructions to use Antonio88/TaliML-PHI3-128K-ITA-V.1.0.FINAL with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Antonio88/TaliML-PHI3-128K-ITA-V.1.0.FINAL with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="Antonio88/TaliML-PHI3-128K-ITA-V.1.0.FINAL", trust_remote_code=True)
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("Antonio88/TaliML-PHI3-128K-ITA-V.1.0.FINAL", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("Antonio88/TaliML-PHI3-128K-ITA-V.1.0.FINAL", trust_remote_code=True)
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use Antonio88/TaliML-PHI3-128K-ITA-V.1.0.FINAL with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Antonio88/TaliML-PHI3-128K-ITA-V.1.0.FINAL"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Antonio88/TaliML-PHI3-128K-ITA-V.1.0.FINAL",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/Antonio88/TaliML-PHI3-128K-ITA-V.1.0.FINAL

SGLang

How to use Antonio88/TaliML-PHI3-128K-ITA-V.1.0.FINAL with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Antonio88/TaliML-PHI3-128K-ITA-V.1.0.FINAL" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Antonio88/TaliML-PHI3-128K-ITA-V.1.0.FINAL",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Antonio88/TaliML-PHI3-128K-ITA-V.1.0.FINAL" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Antonio88/TaliML-PHI3-128K-ITA-V.1.0.FINAL",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use Antonio88/TaliML-PHI3-128K-ITA-V.1.0.FINAL with Docker Model Runner:
```
docker model run hf.co/Antonio88/TaliML-PHI3-128K-ITA-V.1.0.FINAL
```
Browse Quantizations to use this model in llama.cpp, Ollama, LM Studio, or any compatible app.

Responsible AI Considerations for the Phi3stran Models

Like other language models, the Phi series can potentially exhibit behaviors that are unfair, unreliable, or offensive. It’s important to be aware of some limiting behaviors:

Quality of Service: The Phi models are primarily trained on Italian text. Performance may degrade for languages other than Italian.

Representation of Harms & Perpetuation of Stereotypes: These models can over- or under-represent certain groups of people, erase representation of some groups, or reinforce demeaning or negative stereotypes. Despite post-training safety measures, these limitations may persist due to varying levels of representation of different groups or the prevalence of negative stereotypes in the training data that reflect real-world patterns and societal biases.

Inappropriate or Offensive Content: The models may generate content that is inappropriate or offensive, which could make them unsuitable for deployment in sensitive contexts without additional, use-case-specific mitigations.

Information Reliability: Language models can produce nonsensical or fabricated content that may seem plausible but is inaccurate or outdated.

Limited Scope for Code: The majority of Phi-3 training data is based on Python and utilizes common packages such as “typing, math, random, collections, datetime, itertools”. If the model generates Python scripts that use other packages or scripts in other languages, manual verification of all API uses is strongly recommended.

Developers should employ responsible AI best practices and ensure compliance with relevant laws and regulations (e.g., privacy, trade, etc.) for their specific use cases.

Model in Test: Continuous improvements are being made to the model.

Please note that the responses from the model should not be regarded as absolute truths.

Prompt Template:

** Use Phi 3 model preset.

Prompt template:

<|system|> {system_prompt}.<|end|> <|user|> {prompt}<|end|> <|assistant|>

Downloading and running the models

You can download the individual files from the Files & versions section.

Quant type	Download
Q5_K_M	PHI3STRAN-GGUF here

How to Download GGUF Files Manually?

Note for Manual Downloaders:

The following clients will automatically download models for you, providing a list of available models to choose from:

LM Studio

Use PHI3 config.preset

Credits & License

The license of the smashed model follows the license of the original model. Please check the license of the original model before using this model which provided the base model.

Downloads last month: 3

Safetensors

Model size

4B params

Tensor type

BF16

Model tree for Antonio88/TaliML-PHI3-128K-ITA-V.1.0.FINAL

Base model

microsoft/Phi-3-mini-128k-instruct

Finetuned

(44)

this model

Merges

1 model

Quantizations

2 models