Instructions to use huihui-ai/Huihui3.5-67B-A3B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use huihui-ai/Huihui3.5-67B-A3B with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="huihui-ai/Huihui3.5-67B-A3B")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoProcessor, AutoModelForImageTextToText

processor = AutoProcessor.from_pretrained("huihui-ai/Huihui3.5-67B-A3B")
model = AutoModelForImageTextToText.from_pretrained("huihui-ai/Huihui3.5-67B-A3B")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
inputs = processor.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use huihui-ai/Huihui3.5-67B-A3B with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "huihui-ai/Huihui3.5-67B-A3B"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "huihui-ai/Huihui3.5-67B-A3B",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/huihui-ai/Huihui3.5-67B-A3B

SGLang

How to use huihui-ai/Huihui3.5-67B-A3B with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "huihui-ai/Huihui3.5-67B-A3B" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "huihui-ai/Huihui3.5-67B-A3B",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "huihui-ai/Huihui3.5-67B-A3B" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "huihui-ai/Huihui3.5-67B-A3B",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Docker Model Runner
How to use huihui-ai/Huihui3.5-67B-A3B with Docker Model Runner:
```
docker model run hf.co/huihui-ai/Huihui3.5-67B-A3B
```
Browse Quantizations to use this model in llama.cpp, Ollama, LM Studio, or any compatible app.

huihui-ai/Huihui3.5-67B-A3B

Model Overview

huihui-ai/Huihui3.5-67B-A3B is a Mixture of Experts (MoE) language model developed by huihui.ai, built upon the Qwen/Qwen3.5-35B-A3B base model. It enhances the standard Transformer architecture by replacing MLP layers with MoE layers, each containing 512 experts, to achieve high performance with efficient inference. The model is designed for natural language processing tasks, including image-text-to-text generation, question answering, and conversational applications.

Note huihui-ai/Huihui3.5-67B-A3B is not an ablated model..

This is just a test. The exploration of merging different manifestations of models of the same type is another possibility.

Architecture: Qwen3_5MoeForConditionalGeneration model with 512 experts per layer, activating 8 expert per token.
Total Parameters: ~67 billion (67B)
Activated Parameters: ~3 billion (3B) during inference, comparable to Qwen/Qwen3.5-35B-A3B
Developer: huihui.ai
Release Date: January 2026
License: Inherits the license of the Qwen3.5 base model (apache-2.0)

Expert Models:

Training

Base Model: Qwen/Qwen3.5-35B-A3B
Conversion: The model copies embeddings, self-attention, and normalization weights from Qwen/Qwen3.5-35B-A3B, replacing MLP layers with MoE layers (512 experts). Gating weights are randomly initialized.
Fine-Tuning: Not fine-tuned; users are recommended to fine-tune for specific tasks to optimize expert routing.

ollama

You can use huihui_ai/huihui3.5:67b directly,

ollama run huihui_ai/huihui3.5:67b

Applications

image-text-to-text Generation: Articles, dialogues, and creative writing.
Question Answering: Information retrieval and query resolution.
Conversational AI: Multi-turn dialogues for chatbots.
Research: Exploration of MoE architectures and efficient model scaling.

Limitations

Fine-Tuning Required: Randomly initialized gating weights may lead to suboptimal expert utilization without fine-tuning.
Compatibility: Developed with transformers 5.5.0; ensure matching versions to avoid loading issues.
Inference Speed: While efficient for an MoE model, performance depends on hardware (GPU recommended).

Ethical Considerations

Bias: Inherits potential biases from the Qwen3-4B-abliterated base model; users should evaluate outputs for fairness.
Usage: Intended for research and responsible applications; avoid generating harmful or misleading content.

Contact

Developer: huihui.ai
Repository: huihui-ai/Huihui3.5-67B-A3B (available locally or on Hugging Face)
Issues: Report bugs or request features via the repository or please send an email to support@huihui.ai