Qwen3-1.7B-SFT-UltraChat

This model is a fine-tuned version of Qwen/Qwen3-1.7B-Base trained on the HuggingFaceH4/ultrachat_200k dataset using Supervised Fine-Tuning (SFT) with LoRA adapters.

Overview

Qwen3-1.7B-SFT-UltraChat is an instruction-following language model optimized for conversational tasks. It combines the powerful 1.7B base model with high-quality instruction-following data from UltraChat, resulting in improved response quality and helpfulness.

Key Features

  • High-Quality Fine-Tuning: Trained on 197,471 instruction-response pairs
  • Efficient Training: Uses LoRA (Low-Rank Adaptation) for memory efficiency
  • Strong Performance: Achieves 67.25% token accuracy on held-out evaluation set
  • Optimized for Inference: Available in multiple formats including GGUF quantizations

Model Details

Property Value
Developed by ermiaazarkhalili
License CC-BY-NC-4.0
Language English
Base Model Qwen/Qwen3-1.7B-Base
Model Size 1.7B parameters
Tensor Type BF16
Context Length 2,048 tokens
Training Method SFT with LoRA

Training Information

Training Configuration

Parameter Value
Learning Rate 0.0002
Batch Size 8 per device
Effective Batch Size 16 (with gradient accumulation)
Gradient Accumulation Steps 2
Number of Epochs 1
Max Sequence Length 2,048 tokens
LR Scheduler Linear warmup + Cosine annealing
Precision BF16 mixed precision
Gradient Checkpointing Enabled
Optimizer AdamW

LoRA Configuration

Parameter Value
LoRA Rank (r) 32
LoRA Alpha 64
LoRA Dropout 0.05
Target Modules q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj

Training Metrics

Metric Value
Final Training Loss 1.3051
Final Eval Loss 1.2908
Token Accuracy 67.25%
Training Time 1d 3h 24m

Training Hardware

  • GPU: NVIDIA H100 80GB HBM3
  • CPU: 8 vCPUs
  • Memory: 64GB
  • Platform: Compute Canada (Fir Cluster)

Dataset

This model was trained on the HuggingFaceH4/ultrachat_200k dataset:

Split Samples
Training 197,471
Evaluation 10,394

The UltraChat dataset contains high-quality multi-turn conversations designed to improve instruction-following capabilities.

Usage

Quick Start

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_id = "ermiaazarkhalili/Qwen3-1.7B-SFT-UltraChat"

# Load tokenizer and model
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto"
)

# Chat format
messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "What are the key principles of effective communication?"}
]

# Apply chat template
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)

# Generate
outputs = model.generate(
    **inputs,
    max_new_tokens=512,
    temperature=0.7,
    top_p=0.9,
    do_sample=True
)

response = tokenizer.decode(outputs[0][inputs['input_ids'].shape[1]:], skip_special_tokens=True)
print(response)

Using Pipeline

from transformers import pipeline

generator = pipeline(
    "text-generation",
    model="ermiaazarkhalili/Qwen3-1.7B-SFT-UltraChat",
    device_map="auto",
    torch_dtype="auto"
)

messages = [{"role": "user", "content": "Explain quantum computing in simple terms."}]
output = generator(messages, max_new_tokens=256, return_full_text=False)
print(output[0]["generated_text"])

GGUF Versions

For CPU or mixed CPU/GPU inference, GGUF quantized versions are available at: ermiaazarkhalili/Qwen3-1.7B-SFT-UltraChat-GGUF

Available quantizations:

  • Q4_K_M: Best balance of quality and size
  • Q5_K_M: Higher quality, larger size
  • Q8_0: Highest quality quantization

Using with Ollama

ollama pull hf.co/ermiaazarkhalili/Qwen3-1.7B-SFT-UltraChat-GGUF:Q4_K_M
ollama run hf.co/ermiaazarkhalili/Qwen3-1.7B-SFT-UltraChat-GGUF:Q4_K_M "Hello, how are you?"

Limitations

  • Language: Primarily trained on English data; performance on other languages may vary
  • Knowledge Cutoff: Base model knowledge is limited to its training data cutoff
  • Hallucinations: Like all LLMs, may generate plausible-sounding but incorrect information
  • Context Length: Limited to 2,048 tokens during fine-tuning
  • Safety: Not extensively safety-tuned; use appropriate content filtering in production

Intended Use

Recommended Uses

  • Conversational AI assistants
  • Question answering systems
  • Text generation and completion
  • Educational applications
  • Research and experimentation

Out-of-Scope Uses

  • Medical, legal, or financial advice without expert oversight
  • Generation of harmful, deceptive, or illegal content
  • High-stakes decision-making without human verification

Citation

@misc{ermiaazarkhalili_qwen3_1.7b_sft_ultrachat,
    author = {Ermia Azarkhalili},
    title = {Qwen3-1.7B-SFT-UltraChat: Fine-tuned Qwen3-1.7B-Base on UltraChat},
    year = {2025},
    publisher = {Hugging Face},
    howpublished = {\url{https://huggingface.co/ermiaazarkhalili/Qwen3-1.7B-SFT-UltraChat}}
}

Acknowledgments

Contact

For questions, issues, or collaborations, please open an issue on the model repository or contact via HuggingFace.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for ermiaazarkhalili/Qwen3-1.7B-SFT-UltraChat

Adapter
(36)
this model
Quantizations
1 model

Dataset used to train ermiaazarkhalili/Qwen3-1.7B-SFT-UltraChat