argilla/OpenHermes2.5-dpo-binarized-alpha
Viewer • Updated • 9.79k • 304 • 63
How to use jan-hq/stealth-finance-v2-dpo-adapter with PEFT:
from peft import PeftModel
from transformers import AutoModelForCausalLM
base_model = AutoModelForCausalLM.from_pretrained("TomGrc/FusionNet_7Bx2_MoE_v0.1")
model = PeftModel.from_pretrained(base_model, "jan-hq/stealth-finance-v2-dpo-adapter")This model is a fine-tuned version of TomGrc/FusionNet_7Bx2_MoE_v0.1 on the jan-hq/distilabel_dpo_pairs_binarized, the argilla/OpenHermes2.5-dpo-binarized-alpha, the jan-hq/capybara_dpo_binarized, the jan-hq/bagel_dpo_binarized, the jan-hq/ultrafeedback_preferences_cleaned_binarized, the jan-hq/openmath_instruct_dpo_binarized, the jan-hq/distil_math_dpo_binarized and the jan-hq/evol_codealpaca_dpo_binarized datasets. It achieves the following results on the evaluation set:
More information needed
More information needed
More information needed
The following hyperparameters were used during training:
| Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 0.3593 | 1.0 | 3280 | 0.1290 | -0.1799 | -6.0696 | 0.8597 | 5.8897 | -324.0384 | -275.3572 | -0.7749 | -0.7773 |
Base model
TomGrc/FusionNet_7Bx2_MoE_v0.1