Discriminative Finetuning of Generative Large Language Models without Reward Models and Preference Data
Paper
•
2502.18679
•
Published
•
2
This model is a fine-tuned version of mistralai/Mistral-7B-v0.1 on the siqi00/mistral_ultrafeedback_unhelpful_chatprompt_0.7_1.0_50_320 dataset.
This model was trained using Discriminative Fine-tuning (DFT), as described in the paper Discriminative Finetuning of Generative Large Language Models without Reward Models and Preference Data. The code is available at PenGuln/DFT.
The following hyperparameters were used during training:
Base model
mistralai/Mistral-7B-v0.1