Instructions to use Tammy7777777/kpop-llama2-finetuned with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Tammy7777777/kpop-llama2-finetuned with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="Tammy7777777/kpop-llama2-finetuned")# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("Tammy7777777/kpop-llama2-finetuned", dtype="auto") - PEFT
How to use Tammy7777777/kpop-llama2-finetuned with PEFT:
Task type is invalid.
- Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use Tammy7777777/kpop-llama2-finetuned with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "Tammy7777777/kpop-llama2-finetuned" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Tammy7777777/kpop-llama2-finetuned", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/Tammy7777777/kpop-llama2-finetuned
- SGLang
How to use Tammy7777777/kpop-llama2-finetuned with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "Tammy7777777/kpop-llama2-finetuned" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Tammy7777777/kpop-llama2-finetuned", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "Tammy7777777/kpop-llama2-finetuned" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Tammy7777777/kpop-llama2-finetuned", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use Tammy7777777/kpop-llama2-finetuned with Docker Model Runner:
docker model run hf.co/Tammy7777777/kpop-llama2-finetuned
π€ LLaMA2 K-Pop Q&A Model (Fine-tuned)
This model is a fine-tuned version of meta-llama/Llama-2-7b-hf using PEFT (LoRA) on a custom 20-example K-Pop Q&A dataset.
β¨ Overview
- Task: Text generation (K-Pop Question Answering)
- Dataset: Manually written 20 K-Pop themed examples
- Training Setup: LoRA-based fine-tuning on Google Colab Free Tier (15 GB GPU)
- Use Case: Lightweight model for educational/demo use related to K-Pop fan Q&A
π Dataset Description
Each example is structured as: Example:
Question: Who is the leader of BTS?
Context: BTS is a popular South Korean boy band formed in 2013.
Answer: RM is the leader of BTS.
π§ͺ Training Details
| Parameter | Value |
|---|---|
| Base model | LLaMA2 7B HF |
| Finetuning method | PEFT (LoRA) |
| Epochs | 1 |
| Batch Size | 1 |
| Max Length | 512 tokens |
| Optimizer | AdamW |
| Learning Rate | 2e-4 |
| GPU Used | Free Google Colab |
| Layers Updated | Only 1 (rest frozen) |
| Quantization | 4-bit (bitsandbytes) |
π§Ύ How to Use
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
model = AutoModelForCausalLM.from_pretrained("Tammy7777777/kpop-llama2-finetuned", device_map="auto")
tokenizer = AutoTokenizer.from_pretrained("Tammy7777777/kpop-llama2-finetuned")
pipe = pipeline("text-generation", model=model, tokenizer=tokenizer)
prompt = "Question: Who is the leader of BTS?\nContext: BTS is a South Korean boy band formed in 2013.\nAnswer:"
print(pipe(prompt, max_new_tokens=64)[0]['generated_text'])
π Evaluation
Base model output: Often generic or unrelated
Fine-tuned model: More aligned, provides specific K-Pop answers
Method: Manual comparison of predictions before & after fine-tuning
π¦ Model Architecture
Based on meta-llama/Llama-2-7b-hf
Only one transformer layer fine-tuned using LoRA
Efficient for few-shot adaptation
β οΈ Limitations
Trained on a very small (20-example) dataset
May hallucinate or overfit on known examples
Not suitable for production use or factually sensitive topics
π€Author
Tamanna Sheikh (@Tammy7777777)
π License
This model is released under the MIT license for research and educational use.