abisee/cnn_dailymail
Viewer • Updated • 936k • 146k • 343
How to use emonty777/QLoRA-Flan-T5-Small with PEFT:
from peft import PeftModel
from transformers import AutoModelForSeq2SeqLM
base_model = AutoModelForSeq2SeqLM.from_pretrained("google/flan-t5-small")
model = PeftModel.from_pretrained(base_model, "emonty777/QLoRA-Flan-T5-Small")This model is a fine-tuned version of google/flan-t5-small on the cnn_dailymail dataset. It achieves the following on the test set:
This model was fine-tuned with the purpose of performing the task of abstractive summarization.
Fine-tuned on cnn_dailymail training set Evaluated on cnn_dailymail test set
import torch
from peft import PeftModel, PeftConfig
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer
# Load peft config for pre-trained checkpoint etc.
peft_model_id = "emonty777/QLoRA-Flan-T5-Small"
config = PeftConfig.from_pretrained(peft_model_id)
# load base LLM model and tokenizer / runs on CPU
model = AutoModelForSeq2SeqLM.from_pretrained(config.base_model_name_or_path)
tokenizer = AutoTokenizer.from_pretrained(config.base_model_name_or_path)
# load base LLM model and tokenizer for GPU
model = AutoModelForSeq2SeqLM.from_pretrained(config.base_model_name_or_path, load_in_8bit=True, device_map={"":0})
tokenizer = AutoTokenizer.from_pretrained(config.base_model_name_or_path)
# Load the Lora model
model = PeftModel.from_pretrained(model, peft_model_id, device_map={"":0})
model.eval()
text = "Your text goes here..."
# If you want to use CPU
input_ids = tokenizer(text, return_tensors="pt", truncation=True).input_ids
# If you want to use GPU
input_ids = tokenizer(text, return_tensors="pt", truncation=True).input_ids.cuda()
# Adjust max_new_tokens based on size. This is set up for articles of text
outputs = model.generate(input_ids=input_ids, max_new_tokens=120, do_sample=False)
print(f"input sentence: {sample['article']}\n{'---'* 20}")
print(f"summary:\n{tokenizer.batch_decode(outputs.detach().cpu().numpy(), skip_special_tokens=True)[0]}")
The following hyperparameters were used during training:
Evaluated on full CNN Dailymail test set