Instructions to use inclusionAI/Ling-2.6-1T with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use inclusionAI/Ling-2.6-1T with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="inclusionAI/Ling-2.6-1T", trust_remote_code=True)
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained("inclusionAI/Ling-2.6-1T", trust_remote_code=True, dtype="auto")

Inference
HuggingChat
Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use inclusionAI/Ling-2.6-1T with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "inclusionAI/Ling-2.6-1T"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "inclusionAI/Ling-2.6-1T",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/inclusionAI/Ling-2.6-1T

SGLang

How to use inclusionAI/Ling-2.6-1T with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "inclusionAI/Ling-2.6-1T" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "inclusionAI/Ling-2.6-1T",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "inclusionAI/Ling-2.6-1T" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "inclusionAI/Ling-2.6-1T",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use inclusionAI/Ling-2.6-1T with Docker Model Runner:
```
docker model run hf.co/inclusionAI/Ling-2.6-1T
```

Can't install inclusionAI/Ling-2.6-1T

by chovyfu - opened 18 days ago

Discussion

chovyfu

18 days ago

inclusionAI/Ling-2.6-1T

ollama exited 1: [?2026h[?25l[1Gpulling manifest ⠋ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠙ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest [K[?25h[?2026l Error: pull model manifest: file does not exist

Akicou

14 days ago

Ollama works only with GGUF format of models and this isn't one or doesnt have that supported in llama.cpp as far as I know. Use Vllm or Sglang for inference instead if you have the hardware

m1ngcheng

inclusionAI org 13 days ago

@chovyfu , have your issues been resolved? If it is resolved, I will close this discussion.

chovyfu

13 days ago

ok thanks. i haven't tried vllm before. i'll take a look. is it like ollama?

Akicou

13 days ago

ok thanks. i haven't tried vllm before. i'll take a look. is it like ollama?

It's a bit harder to get used to it. The repo is https://github.com/vllm-project/vllm there are plenty YouTube tutorials out there though

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment