Instructions to use andresnowak/Qwen3-0.6B-instruction-finetuned with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use andresnowak/Qwen3-0.6B-instruction-finetuned with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="andresnowak/Qwen3-0.6B-instruction-finetuned")# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("andresnowak/Qwen3-0.6B-instruction-finetuned") model = AutoModelForCausalLM.from_pretrained("andresnowak/Qwen3-0.6B-instruction-finetuned") - Inference
- Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use andresnowak/Qwen3-0.6B-instruction-finetuned with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "andresnowak/Qwen3-0.6B-instruction-finetuned" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "andresnowak/Qwen3-0.6B-instruction-finetuned", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/andresnowak/Qwen3-0.6B-instruction-finetuned
- SGLang
How to use andresnowak/Qwen3-0.6B-instruction-finetuned with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "andresnowak/Qwen3-0.6B-instruction-finetuned" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "andresnowak/Qwen3-0.6B-instruction-finetuned", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "andresnowak/Qwen3-0.6B-instruction-finetuned" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "andresnowak/Qwen3-0.6B-instruction-finetuned", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Unsloth Studio new
How to use andresnowak/Qwen3-0.6B-instruction-finetuned with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for andresnowak/Qwen3-0.6B-instruction-finetuned to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for andresnowak/Qwen3-0.6B-instruction-finetuned to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for andresnowak/Qwen3-0.6B-instruction-finetuned to start chatting
Load model with FastModel
pip install unsloth from unsloth import FastModel model, tokenizer = FastModel.from_pretrained( model_name="andresnowak/Qwen3-0.6B-instruction-finetuned", max_seq_length=2048, ) - Docker Model Runner
How to use andresnowak/Qwen3-0.6B-instruction-finetuned with Docker Model Runner:
docker model run hf.co/andresnowak/Qwen3-0.6B-instruction-finetuned
Model Card for Qwen3-0.6B-instruction-finetuned
This model is a fine-tuned version of unsloth/Qwen3-0.6B-Base. It has been trained using TRL.
Quick start
from transformers import pipeline
question = "If you had a time machine, but could only go to the past or the future once and never return, which would you choose and why?"
generator = pipeline("text-generation", model="andresnowak/Qwen3-0.6B-instruction-finetuned", device="cuda")
output = generator([{"role": "user", "content": question}], max_new_tokens=128, return_full_text=False)[0]
print(output["generated_text"])
Training procedure
This model was done using Language modelling (loss done on prompt and completion) Supervised instruction finetuning and this model was also trained by applying some ranom templates as to be able to have more robustness as how questions will be asked apart from the dataest already bein high quality and having a lot of this examples, this was done as we weren't allowed to use chat templates for the evaluation. But this model probably had two problems during training, one being that we didn't filter the dataset to just have examples that combined (prompt and completion) have a size of 2048 (the max size we are using) and instead doing a truncation. Also this model uses left side padding in the tokenizer as flash-attention 2 needs this
environment:
seed: 42
use_template: True
model:
name: Qwen/Qwen3-0.6B-Base
hub_model_id: andresnowak/Qwen3-0.6B-instruction-finetuned
dataset:
- name: andresnowak/Instruction-finetuning-mixture-mnlp
config: codeAlpaca
size: 0.3
- name: andresnowak/Instruction-finetuning-mixture-mnlp
config: noRobots
size: 0.8
- name: andresnowak/Instruction-finetuning-mixture-mnlp
config: openMathGsm8k
size: 0.3
- name: andresnowak/Instruction-finetuning-mixture-mnlp
config: codeV2
size: 0.3
- name: andresnowak/Instruction-finetuning-mixture-mnlp
config: flanV2
size: 0.8
- name: andresnowak/Instruction-finetuning-mixture-mnlp
config: ifData
size: 0.8
- name: andresnowak/Instruction-finetuning-mixture-mnlp
config: mathAlgebra
size: 0.3
- name: andresnowak/Instruction-finetuning-mixture-mnlp
config: mathGrade
size: 0.3
- name: andresnowak/Instruction-finetuning-mixture-mnlp
config: oasst1
size: 0.6
- name: andresnowak/Instruction-finetuning-mixture-mnlp
config: sciriff
size: 0.8
- name: andresnowak/Instruction-finetuning-mixture-mnlp
config: tableGpt
size: 0.3
- name: andresnowak/Instruction-finetuning-mixture-mnlp
config: tirMath
size: 0.4
- name: andresnowak/Instruction-finetuning-mixture-mnlp
config: wildChat
size: 0.7
- name: andresnowak/Instruction-finetuning-mixture-mnlp
config: mathV5
size: 0.2
dataset_evaluation:
- name: cais/mmlu
config: validation
subjects: ["abstract_algebra", "anatomy", "astronomy", "college_biology", "college_chemistry", "college_computer_science", "college_mathematics", "college_physics", "computer_security", "conceptual_physics", "electrical_engineering", "elementary_mathematics", "high_school_biology", "high_school_chemistry", "high_school_computer_science", "high_school_mathematics", "high_school_physics", "high_school_statistics", "machine_learning"]
training:
learning_rate: 1e-5
per_device_train_batch_size: 16
per_device_eval_batch_size: 16
gradient_accumulation_steps: 8
num_train_epochs: 2
weight_decay: 0.00
warmup_ratio: 0.03
max_grad_norm: 0.5
lr_scheduler: "linear"
This model was trained with SFT.
Evaluation results
The performance is as follows:
| Benchmark | Accuracy (Acc) | Normalized Accuracy (Acc Norm) |
|---|---|---|
| ARC Challenge | 46.0% | 45.3% |
| ARC Easy | 59.3% | 54.2% |
| GPQA | 29.9% | 27.0% |
| Math QA | 24.0% | 24.8% |
| MCQA Evals | 37.9% | 34.9% |
| MMLU | 47.2% | 47.2% |
| MMLU Pro | 13.2% | 12.0% |
| MuSR | 43.5% | 42.1% |
| NLP4Education | 38.8% | 36.5% |
| Overall | 37.8% | 36.0% |
The tests where done with this prompt (And only MusR used a different one where you add the Question: and Narrative: )
This question assesses challenging STEM problems as found on graduate standardized tests. Carefully evaluate the options and select the correct answer.
---
[Insert Question Here]
---
[Insert Choices Here, e.g.:
A. Option 1
B. Option 2
C. Option 3
D. Option 4]
---
Your response should include the letter and the exact text of the correct choice.
Example: B. Entropy increases.
Answer:
And the teseting was done on [Letter]. [Text answer]
Framework versions
- TRL: 0.15.2
- Transformers: 4.51.3
- Pytorch: 2.5.1+cu121
- Datasets: 3.6.0
- Tokenizers: 0.21.0
Citations
Cite TRL as:
@misc{vonwerra2022trl,
title = {{TRL: Transformer Reinforcement Learning}},
author = {Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin Gallouédec},
year = 2020,
journal = {GitHub repository},
publisher = {GitHub},
howpublished = {\url{https://github.com/huggingface/trl}}
}
- Downloads last month
- 25
Model tree for andresnowak/Qwen3-0.6B-instruction-finetuned
Base model
Qwen/Qwen3-0.6B-Base