Length Value Model
Collection
12 items • Updated • 1
This model is a fine-tuned version of Qwen/Qwen2.5-0.5B-Instruct on the 7b_math_95k_2_train, the 7b_code_100k_2_train and the 7b_instruction_100k_2_train datasets. It achieves the following results on the evaluation set:
More information needed
More information needed
More information needed
The following hyperparameters were used during training:
| Training Loss | Epoch | Step | Validation Loss | Token Mean Mae | Token Mean Rmse | Token Mean Seq Mean Mae | Token Mean Seq Mean Rmse | Token Mean Relerr | Token Mean Seq Mean Relerr |
|---|---|---|---|---|---|---|---|---|---|
| 0.1178 | 0.0868 | 50 | 0.0073 | 2263844398.6678 | 1071622.5521 | 79472.7089 | 3176.6721 | 0.3927 | 0.4645 |
| 0.1140 | 0.1736 | 100 | 0.0066 | 2078902431.7301 | 981870.9478 | 73235.4770 | 2988.0135 | 0.4147 | 0.4863 |
| 0.0937 | 0.2605 | 150 | 0.0061 | 2030736403.4614 | 974900.0266 | 71506.6388 | 2864.0524 | 0.3395 | 0.3890 |
| 0.1064 | 0.3473 | 200 | 0.0076 | 2311332148.9392 | 1065853.0814 | 82016.4487 | 3206.0134 | 0.3438 | 0.3569 |
| 0.1146 | 0.4341 | 250 | 0.0073 | 2030651502.6936 | 890259.9017 | 72395.3084 | 3097.9291 | 0.3794 | 0.4232 |
| 0.0895 | 0.5209 | 300 | 0.0056 | 1913583638.1415 | 935627.5738 | 66940.5910 | 2692.9061 | 0.3497 | 0.4239 |
| 0.0919 | 0.6078 | 350 | 0.0056 | 1869855950.8649 | 903558.4060 | 65857.3854 | 2683.5201 | 0.3439 | 0.4064 |
| 0.0980 | 0.6946 | 400 | 0.0058 | 1993659656.9446 | 964020.7070 | 70012.4340 | 2799.8679 | 0.3267 | 0.3565 |
| 0.0778 | 0.7814 | 450 | 0.0054 | 1832813236.2907 | 895473.6447 | 64531.9670 | 2623.8486 | 0.3453 | 0.4043 |
| 0.0831 | 0.8682 | 500 | 0.0052 | 1852278254.4045 | 907735.9972 | 65360.6544 | 2642.9237 | 0.3374 | 0.3840 |
| 0.1004 | 0.9551 | 550 | 0.0057 | 1804696164.7212 | 853874.5617 | 63857.3133 | 2630.1284 | 0.3695 | 0.4562 |
| 0.0670 | 1.0417 | 600 | 0.0052 | 1870502953.7651 | 916758.2446 | 65962.7757 | 2642.5650 | 0.3161 | 0.3670 |
| 0.0700 | 1.1285 | 650 | 0.0053 | 1789147480.0786 | 874791.0081 | 63005.9924 | 2562.7974 | 0.3317 | 0.3924 |
| 0.0690 | 1.2153 | 700 | 0.0051 | 1773889181.7713 | 877113.2014 | 62382.9228 | 2540.5305 | 0.3289 | 0.3934 |
| 0.0674 | 1.3021 | 750 | 0.0051 | 1835795742.4188 | 897972.6088 | 64946.3321 | 2609.7523 | 0.3172 | 0.3611 |
| 0.0696 | 1.3890 | 800 | 0.0051 | 1844006975.0281 | 889972.6433 | 65618.5613 | 2648.3364 | 0.3000 | 0.3355 |
| 0.0654 | 1.4758 | 850 | 0.0051 | 1834534190.5460 | 901626.3717 | 64754.8425 | 2605.7707 | 0.3115 | 0.3549 |
| 0.0651 | 1.5626 | 900 | 0.0050 | 1768793425.0901 | 863479.9880 | 62609.2571 | 2544.9282 | 0.3288 | 0.3813 |
| 0.0695 | 1.6494 | 950 | 0.0051 | 1866776729.5138 | 913800.9463 | 65991.4996 | 2644.0386 | 0.3100 | 0.3469 |
| 0.0719 | 1.7363 | 1000 | 0.0050 | 1769232935.6183 | 855875.1185 | 62915.1637 | 2559.8121 | 0.3055 | 0.3313 |
| 0.0746 | 1.8231 | 1050 | 0.0050 | 1895920727.6612 | 929899.4862 | 66864.5839 | 2668.0359 | 0.3043 | 0.3296 |
| 0.0724 | 1.9099 | 1100 | 0.0049 | 1768969140.4923 | 880428.9307 | 62285.4477 | 2516.9485 | 0.3102 | 0.3525 |
| 0.0699 | 1.9967 | 1150 | 0.0049 | 1786144123.4731 | 887134.9846 | 62914.9836 | 2544.2375 | 0.3065 | 0.3502 |