Qwen3-1.7B-SFT-UltraChat

This model is a fine-tuned version of Qwen/Qwen3-1.7B-Base trained on the HuggingFaceH4/ultrachat_200k dataset using Supervised Fine-Tuning (SFT) with LoRA adapters.

Overview

Qwen3-1.7B-SFT-UltraChat is an instruction-following language model optimized for conversational tasks. It combines the powerful 1.7B base model with high-quality instruction-following data from UltraChat, resulting in improved response quality and helpfulness.

Key Features

High-Quality Fine-Tuning: Trained on 197,471 instruction-response pairs
Efficient Training: Uses LoRA (Low-Rank Adaptation) for memory efficiency
Strong Performance: Achieves 67.25% token accuracy on held-out evaluation set
Optimized for Inference: Available in multiple formats including GGUF quantizations

Model Details

Property	Value
Developed by	ermiaazarkhalili
License	CC-BY-NC-4.0
Language	English
Base Model	Qwen/Qwen3-1.7B-Base
Model Size	1.7B parameters
Tensor Type	BF16
Context Length	2,048 tokens
Training Method	SFT with LoRA

Training Information

Training Configuration

Parameter	Value
Learning Rate	0.0002
Batch Size	8 per device
Effective Batch Size	16 (with gradient accumulation)
Gradient Accumulation Steps	2
Number of Epochs	1
Max Sequence Length	2,048 tokens
LR Scheduler	Linear warmup + Cosine annealing
Precision	BF16 mixed precision
Gradient Checkpointing	Enabled
Optimizer	AdamW

LoRA Configuration

Parameter	Value
LoRA Rank (r)	32
LoRA Alpha	64
LoRA Dropout	0.05
Target Modules	q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj

Training Metrics

Metric	Value
Final Training Loss	1.3051
Final Eval Loss	1.2908
Token Accuracy	67.25%
Training Time	1d 3h 24m

Training Hardware

GPU: NVIDIA H100 80GB HBM3
CPU: 8 vCPUs
Memory: 64GB
Platform: Compute Canada (Fir Cluster)

Dataset

This model was trained on the HuggingFaceH4/ultrachat_200k dataset:

Split	Samples
Training	197,471
Evaluation	10,394

The UltraChat dataset contains high-quality multi-turn conversations designed to improve instruction-following capabilities.

Usage

Quick Start

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_id = "ermiaazarkhalili/Qwen3-1.7B-SFT-UltraChat"

# Load tokenizer and model
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto"
)

# Chat format
messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "What are the key principles of effective communication?"}
]

# Apply chat template
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)

# Generate
outputs = model.generate(
    **inputs,
    max_new_tokens=512,
    temperature=0.7,
    top_p=0.9,
    do_sample=True
)

response = tokenizer.decode(outputs[0][inputs['input_ids'].shape[1]:], skip_special_tokens=True)
print(response)

Using Pipeline

from transformers import pipeline

generator = pipeline(
    "text-generation",
    model="ermiaazarkhalili/Qwen3-1.7B-SFT-UltraChat",
    device_map="auto",
    torch_dtype="auto"
)

messages = [{"role": "user", "content": "Explain quantum computing in simple terms."}]
output = generator(messages, max_new_tokens=256, return_full_text=False)
print(output[0]["generated_text"])

GGUF Versions

For CPU or mixed CPU/GPU inference, GGUF quantized versions are available at: ermiaazarkhalili/Qwen3-1.7B-SFT-UltraChat-GGUF

Available quantizations:

Q4_K_M: Best balance of quality and size
Q5_K_M: Higher quality, larger size
Q8_0: Highest quality quantization

Using with Ollama

ollama pull hf.co/ermiaazarkhalili/Qwen3-1.7B-SFT-UltraChat-GGUF:Q4_K_M
ollama run hf.co/ermiaazarkhalili/Qwen3-1.7B-SFT-UltraChat-GGUF:Q4_K_M "Hello, how are you?"

Limitations

Language: Primarily trained on English data; performance on other languages may vary
Knowledge Cutoff: Base model knowledge is limited to its training data cutoff
Hallucinations: Like all LLMs, may generate plausible-sounding but incorrect information
Context Length: Limited to 2,048 tokens during fine-tuning
Safety: Not extensively safety-tuned; use appropriate content filtering in production

Intended Use

Recommended Uses

Conversational AI assistants
Question answering systems
Text generation and completion
Educational applications
Research and experimentation

Out-of-Scope Uses

Medical, legal, or financial advice without expert oversight
Generation of harmful, deceptive, or illegal content
High-stakes decision-making without human verification

Citation

@misc{ermiaazarkhalili_qwen3_1.7b_sft_ultrachat,
    author = {Ermia Azarkhalili},
    title = {Qwen3-1.7B-SFT-UltraChat: Fine-tuned Qwen3-1.7B-Base on UltraChat},
    year = {2025},
    publisher = {Hugging Face},
    howpublished = {\url{https://huggingface.co/ermiaazarkhalili/Qwen3-1.7B-SFT-UltraChat}}
}

Acknowledgments

Qwen Team for the excellent base model
Hugging Face TRL Team for the training framework
UltraChat Dataset creators
Compute Canada for providing HPC resources

Contact

For questions, issues, or collaborations, please open an issue on the model repository or contact via HuggingFace.

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for ermiaazarkhalili/Qwen3-1.7B-SFT-UltraChat

Base model

Qwen/Qwen3-1.7B-Base

Adapter

(36)

this model

Quantizations

1 model

ermiaazarkhalili
/

Qwen3-1.7B-SFT-UltraChat