Qwen2.5-7B-Instruct-LoRA-Alpaca-ZH

This is a LoRA fine-tuned version of Qwen/Qwen2.5-7B-Instruct on the Alpaca-GPT4-ZH dataset.

Model Details

  • Base Model: Qwen/Qwen2.5-7B-Instruct (7B parameters)
  • Training Method: QLoRA (4-bit quantization)
  • Trainable Parameters: 20.2M (0.46% of total)
  • Dataset: Alpaca-GPT4-ZH (500 samples)
  • Training Time: ~3.5 minutes
  • Hardware: Lambda Cloud A10 GPU (24GB)
  • Framework: ms-swift

Training Configuration

Model: Qwen/Qwen2.5-7B-Instruct
Training Type: LoRA
Quantization: 4-bit (BitsAndBytes)
LoRA Rank: 8
LoRA Alpha: 32
Target Modules: all-linear
Batch Size: 1
Gradient Accumulation: 4 steps
Learning Rate: 1e-4
Epochs: 1
Max Length: 2048
Training Loss: 1.395
GPU Memory: ~7GB

Usage

Using with Transformers + PEFT

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel

# Load base model
base_model = AutoModelForCausalLM.from_pretrained(
    "Qwen/Qwen2.5-7B-Instruct",
    torch_dtype=torch.bfloat16,
    device_map="auto"
)

# Load LoRA weights
model = PeftModel.from_pretrained(
    base_model,
    "FutureMa/Qwen2.5-7B-Instruct-LoRA-Alpaca-ZH"
)

# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-7B-Instruct")

# Generate response
messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "解释什么是人工智能"}
]

text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)
inputs = tokenizer([text], return_tensors="pt").to(model.device)

outputs = model.generate(
    **inputs,
    max_new_tokens=256,
    temperature=0.7,
    top_p=0.9,
)

response = tokenizer.decode(
    outputs[0][len(inputs.input_ids[0]):],
    skip_special_tokens=True
)
print(response)

Using with ms-swift

# Inference with fine-tuned model
swift infer --ckpt_dir FutureMa/Qwen2.5-7B-Instruct-LoRA-Alpaca-ZH

Training Results

Comparison: Base vs Fine-tuned Model

Question: "给出三个健康饮食的建议"

Base Model Response:

  • Lengthy, detailed explanations
  • May exceed token limits
  • General knowledge-based

Fine-tuned Model Response:

  • Concise and structured (3 clear points)
  • Direct and actionable advice
  • Matches Alpaca dataset style
  • Complete within token limits

The fine-tuned model shows improved:

  • Response structure and clarity
  • Adherence to instruction format
  • Conciseness while maintaining quality
  • Better alignment with Chinese instruction-following tasks

Model Performance

  • Training Runtime: 206.66 seconds
  • Training Samples/Second: 2.42
  • Training Loss: 1.395
  • GPU Memory Usage: 7.03 GB (29% of 24GB)

Citation

If you use this model, please cite:

@misc{qwen2.5-7b-lora-alpaca-zh,
  author = {FutureMa},
  title = {Qwen2.5-7B-Instruct-LoRA-Alpaca-ZH},
  year = {2025},
  publisher = {Hugging Face},
  howpublished = {\url{https://huggingface.co/FutureMa/Qwen2.5-7B-Instruct-LoRA-Alpaca-ZH}}
}

License

This model is released under the Apache 2.0 license, following the base model's licensing.

Acknowledgments

Downloads last month
14
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for FutureMa/Qwen2.5-7B-Instruct-LoRA-Alpaca-ZH

Base model

Qwen/Qwen2.5-7B
Adapter
(763)
this model