Instructions to use Paranioar/NEO1_0-9B-SFT with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Paranioar/NEO1_0-9B-SFT with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-text-to-text", model="Paranioar/NEO1_0-9B-SFT", trust_remote_code=True) messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] pipe(text=messages)# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("Paranioar/NEO1_0-9B-SFT", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use Paranioar/NEO1_0-9B-SFT with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "Paranioar/NEO1_0-9B-SFT" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Paranioar/NEO1_0-9B-SFT", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker
docker model run hf.co/Paranioar/NEO1_0-9B-SFT
- SGLang
How to use Paranioar/NEO1_0-9B-SFT with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "Paranioar/NEO1_0-9B-SFT" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Paranioar/NEO1_0-9B-SFT", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "Paranioar/NEO1_0-9B-SFT" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Paranioar/NEO1_0-9B-SFT", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }' - Docker Model Runner
How to use Paranioar/NEO1_0-9B-SFT with Docker Model Runner:
docker model run hf.co/Paranioar/NEO1_0-9B-SFT
ππ Motivation
Two lingering clouds cast shadows over its widespread exploration and promotion:
What fundamental constraints set native VLMs apart from modular ones, and to what extent can these barriers be overcome?
How to make research in native VLMs more accessible and democratized, thereby accelerating progress in the field.
We construct native VLMs built from first principles, where its primitive should:
effectively align pixel and word representations within a shared semantic space;
seamlessly integrate the strengths of separate vision and language modules;
inherently embody various cross-modal properties that support unified vision-language encoding, aligning, and reasoning.
ππ Highlight
With only 390M image-text examples, NEO develops strong visual perception from scratch inside a dense and monolithic model via elaborate primitives.
NEO serves as a cornerstone for scalable and powerful native VLMs, paired with reusable components that foster a cost-effective and extensible ecosystem.
π§βπ¨π§βπ¨ Model Overview
NEO1_0-9B has the following features:
Model Type: Native Vision-Language Models
Model Mode: Mixed Native-Attn & Native-RoPE
Layer Parameters: 214M vs. 193M (Qwen3-8B)
Model Parameters: 9B (Non-Embedding)
Number of Layers: 42 (6 for Pre-Buffer & 36 for Post-LLM)
Number of Heads: 32 for Q and 8 for KV (GQA)
Head Dimensions: 128 * 2 for QK and 128 for V
π₯π₯ Model Performance
ππ Model Weights
We release the 9B weights of NEO1_0 in Pre-Training (PT), Mid-Training (MT), and Supervised Fine-Tuning (SFT).
| Model name | Weight |
|---|---|
| NEO-9B-PT | π€ NEO-9B-PT HF link |
| NEO-9B-MT | π€ NEO-9B-MT HF link |
| NEO-9B-SFT | π€ NEO-9B-SFT HF link |
βοΈβοΈ Citation
If NEO is helpful for your research, please consider star β and citation π :
@article{Diao2025NEO,
title = {From Pixels to Words--Towards Native Vision-Language Primitives at Scale},
author = {Diao, Haiwen and Li, Mingxuan and Wu, Silei and Dai, Linjun and Wang, Xiaohua and Deng, Hanming and Lu, Lewei and Lin, Dahua and Liu, Ziwei},
journal = {arXiv preprint arXiv:2510.14979},
year = {2025}
}
- Downloads last month
- 63