databricks/databricks-dolly-15k
Viewer • Updated • 15k • 34.4k • 972
This is a 2.7B parameter LLaMA model distilled from a 7B LoRA-tuned teacher model using the MiniLLM knowledge distillation framework.
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
# Load model and tokenizer
model = AutoModelForCausalLM.from_pretrained(
"YOUR_USERNAME/llama-2.7b-distilled-from-7b-lora",
torch_dtype=torch.float16,
device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("YOUR_USERNAME/llama-2.7b-distilled-from-7b-lora")
# Generate response
prompt = "Instruction: Explain what is machine learning.\n\nResponse:"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(
**inputs,
max_length=256,
temperature=0.7,
top_p=0.9,
do_sample=True
)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
This model was trained using the MiniLLM framework with:
If you use this model, please cite the MiniLLM paper:
@inproceedings{minillm,
title={MiniLLM: Knowledge Distillation of Large Language Models},
author={Gu, Yuxian and Dong, Li and Wei, Furu and Huang, Minlie},
booktitle={Proceedings of ICLR},
year={2024}
}
This model is released under Apache 2.0 license. Note that LLaMA models have specific usage terms from Meta.
Base model
meta-llama/Llama-2-7b-hf