Supervised Fine-Tuning as Inverse Reinforcement Learning
Paper
•
2403.12017
•
Published
This model is a fine-tuned version of EleutherAI/gpt-j-6b on the TL;DR dataset. It achieves the following results on the evaluation set:
More information needed
More information needed
More information needed
The following hyperparameters were used during training:
| Training Loss | Epoch | Step | Validation Loss | Accuracy |
|---|---|---|---|---|
| 0.5633 | 1.0 | 22660 | 0.6624 | 0.6857 |
If you would like to cite our paper when using the model, please use
@article{sun2024supervised,
title={Supervised Fine-Tuning as Inverse Reinforcement Learning},
author={Sun, Hao},
journal={arXiv preprint arXiv:2403.12017},
year={2024}
}
Base model
EleutherAI/gpt-j-6b