abs-bf16-7b-math-code-instruction-lr2e-5-g0.997-l1.0-gpu8-bs4-ga32-ep2-wu50-cut3000

This model is a fine-tuned version of Qwen/Qwen2.5-1.5B-Instruct on the 7b_math_95k_2_train, the 7b_code_100k_2_train and the 7b_instruction_100k_2_train datasets. It achieves the following results on the evaluation set:

  • Loss: 0.0044
  • Token Mean Mae: 1667898715.5410
  • Token Mean Rmse: 829237.4977
  • Token Mean Seq Mean Mae: 58941.5716
  • Token Mean Seq Mean Rmse: 2405.8500
  • Token Mean Relerr: 0.3010
  • Token Mean Seq Mean Relerr: 0.3483

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 2e-05
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 0
  • distributed_type: multi-GPU
  • num_devices: 8
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 1024
  • total_eval_batch_size: 64
  • optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: constant
  • num_epochs: 2.0

Training results

Training Loss Epoch Step Validation Loss Token Mean Mae Token Mean Rmse Token Mean Seq Mean Mae Token Mean Seq Mean Rmse Token Mean Relerr Token Mean Seq Mean Relerr
0.1091 0.0868 50 0.0079 2114620342.4301 975319.1954 74041.4268 3094.4201 0.4852 0.6094
0.0833 0.1736 100 0.0056 2004614063.9349 967153.4426 70343.2373 2846.9168 0.3504 0.4287
0.0808 0.2605 150 0.0052 1844904106.9341 901717.9613 64873.9665 2640.0881 0.3428 0.3878
0.0779 0.3473 200 0.0049 1816613973.2032 896552.7070 63922.2482 2590.4385 0.3288 0.3776
0.0752 0.4341 250 0.0049 1734559138.3595 852339.8631 61078.8151 2501.3052 0.3350 0.4076
0.0767 0.5209 300 0.0049 1777580395.4504 877835.4934 62422.9097 2532.6624 0.3442 0.4035
0.0752 0.6078 350 0.0047 1747704572.4626 869533.9426 61325.1604 2509.6978 0.3072 0.3447
0.0716 0.6946 400 0.0046 1740305536.6010 864221.9411 61402.4703 2499.7285 0.3160 0.3630
0.0693 0.7814 450 0.0046 1777698185.3464 889635.5166 62513.6725 2526.2048 0.3031 0.3338
0.0700 0.8682 500 0.0046 1742391892.7631 865465.9366 61463.7000 2489.8043 0.3309 0.3780
0.0741 0.9551 550 0.0045 1729399670.6567 861427.8125 60807.6753 2469.3901 0.2963 0.3366
0.0563 1.0417 600 0.0045 1745160175.3986 868709.1215 61411.6929 2490.3916 0.2906 0.3243
0.0574 1.1285 650 0.0045 1733981739.0859 861189.7798 61028.6462 2472.3316 0.2978 0.3285
0.0597 1.2153 700 0.0046 1746525720.9338 866094.1289 61438.0043 2493.2958 0.2862 0.3191
0.0592 1.3021 750 0.0047 1807411461.6055 890670.7002 63807.7708 2558.2878 0.2874 0.3133
0.0592 1.3890 800 0.0046 1744567760.4542 857970.4265 61819.2170 2506.4203 0.2826 0.3069
0.0571 1.4758 850 0.0045 1698453853.9144 843351.3908 59941.4237 2436.3356 0.3002 0.3399
0.0583 1.5626 900 0.0045 1674186696.0527 830514.9445 59018.2140 2414.6785 0.3086 0.3615
0.0583 1.6494 950 0.0046 1751055790.9167 865419.9649 61876.0072 2498.4548 0.2861 0.3155
0.0644 1.7363 1000 0.0044 1667831139.5895 815780.6399 59169.6027 2423.3030 0.3005 0.3342
0.0655 1.8231 1050 0.0044 1711587558.1778 854067.8211 60262.8130 2440.6906 0.2958 0.3278
0.0623 1.9099 1100 0.0044 1698326682.4013 855002.2224 59755.1559 2422.2892 0.2910 0.3298
0.0564 1.9967 1150 0.0044 1705119562.8685 854411.8960 60067.3141 2438.6407 0.2901 0.3296

Framework versions

  • Transformers 5.0.0
  • Pytorch 2.10.0+cu128
  • Datasets 3.6.0
  • Tokenizers 0.22.2
Downloads last month
4
Safetensors
Model size
2B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for namezz/lvm-a-qwen2.5-7b-instruct-b-qwen2.5-1.5b-instruct

Finetuned
(1538)
this model

Collection including namezz/lvm-a-qwen2.5-7b-instruct-b-qwen2.5-1.5b-instruct