Length Value Model
Collection
12 items • Updated • 1
This model is a fine-tuned version of Qwen/Qwen2.5-1.5B-Instruct on the 7b_math_95k_2_train, the 7b_code_100k_2_train and the 7b_instruction_100k_2_train datasets. It achieves the following results on the evaluation set:
More information needed
More information needed
More information needed
The following hyperparameters were used during training:
| Training Loss | Epoch | Step | Validation Loss | Token Mean Mae | Token Mean Rmse | Token Mean Seq Mean Mae | Token Mean Seq Mean Rmse | Token Mean Relerr | Token Mean Seq Mean Relerr |
|---|---|---|---|---|---|---|---|---|---|
| 0.1091 | 0.0868 | 50 | 0.0079 | 2114620342.4301 | 975319.1954 | 74041.4268 | 3094.4201 | 0.4852 | 0.6094 |
| 0.0833 | 0.1736 | 100 | 0.0056 | 2004614063.9349 | 967153.4426 | 70343.2373 | 2846.9168 | 0.3504 | 0.4287 |
| 0.0808 | 0.2605 | 150 | 0.0052 | 1844904106.9341 | 901717.9613 | 64873.9665 | 2640.0881 | 0.3428 | 0.3878 |
| 0.0779 | 0.3473 | 200 | 0.0049 | 1816613973.2032 | 896552.7070 | 63922.2482 | 2590.4385 | 0.3288 | 0.3776 |
| 0.0752 | 0.4341 | 250 | 0.0049 | 1734559138.3595 | 852339.8631 | 61078.8151 | 2501.3052 | 0.3350 | 0.4076 |
| 0.0767 | 0.5209 | 300 | 0.0049 | 1777580395.4504 | 877835.4934 | 62422.9097 | 2532.6624 | 0.3442 | 0.4035 |
| 0.0752 | 0.6078 | 350 | 0.0047 | 1747704572.4626 | 869533.9426 | 61325.1604 | 2509.6978 | 0.3072 | 0.3447 |
| 0.0716 | 0.6946 | 400 | 0.0046 | 1740305536.6010 | 864221.9411 | 61402.4703 | 2499.7285 | 0.3160 | 0.3630 |
| 0.0693 | 0.7814 | 450 | 0.0046 | 1777698185.3464 | 889635.5166 | 62513.6725 | 2526.2048 | 0.3031 | 0.3338 |
| 0.0700 | 0.8682 | 500 | 0.0046 | 1742391892.7631 | 865465.9366 | 61463.7000 | 2489.8043 | 0.3309 | 0.3780 |
| 0.0741 | 0.9551 | 550 | 0.0045 | 1729399670.6567 | 861427.8125 | 60807.6753 | 2469.3901 | 0.2963 | 0.3366 |
| 0.0563 | 1.0417 | 600 | 0.0045 | 1745160175.3986 | 868709.1215 | 61411.6929 | 2490.3916 | 0.2906 | 0.3243 |
| 0.0574 | 1.1285 | 650 | 0.0045 | 1733981739.0859 | 861189.7798 | 61028.6462 | 2472.3316 | 0.2978 | 0.3285 |
| 0.0597 | 1.2153 | 700 | 0.0046 | 1746525720.9338 | 866094.1289 | 61438.0043 | 2493.2958 | 0.2862 | 0.3191 |
| 0.0592 | 1.3021 | 750 | 0.0047 | 1807411461.6055 | 890670.7002 | 63807.7708 | 2558.2878 | 0.2874 | 0.3133 |
| 0.0592 | 1.3890 | 800 | 0.0046 | 1744567760.4542 | 857970.4265 | 61819.2170 | 2506.4203 | 0.2826 | 0.3069 |
| 0.0571 | 1.4758 | 850 | 0.0045 | 1698453853.9144 | 843351.3908 | 59941.4237 | 2436.3356 | 0.3002 | 0.3399 |
| 0.0583 | 1.5626 | 900 | 0.0045 | 1674186696.0527 | 830514.9445 | 59018.2140 | 2414.6785 | 0.3086 | 0.3615 |
| 0.0583 | 1.6494 | 950 | 0.0046 | 1751055790.9167 | 865419.9649 | 61876.0072 | 2498.4548 | 0.2861 | 0.3155 |
| 0.0644 | 1.7363 | 1000 | 0.0044 | 1667831139.5895 | 815780.6399 | 59169.6027 | 2423.3030 | 0.3005 | 0.3342 |
| 0.0655 | 1.8231 | 1050 | 0.0044 | 1711587558.1778 | 854067.8211 | 60262.8130 | 2440.6906 | 0.2958 | 0.3278 |
| 0.0623 | 1.9099 | 1100 | 0.0044 | 1698326682.4013 | 855002.2224 | 59755.1559 | 2422.2892 | 0.2910 | 0.3298 |
| 0.0564 | 1.9967 | 1150 | 0.0044 | 1705119562.8685 | 854411.8960 | 60067.3141 | 2438.6407 | 0.2901 | 0.3296 |