abs-bf16-7b-math-code-instruction-lr2e-5-g0.997-l1.0-gpu8-bs8-ga16-ep2-wu0-cut3000

This model is a fine-tuned version of Qwen/Qwen2.5-0.5B-Instruct on the 7b_math_95k_2_train, the 7b_code_100k_2_train and the 7b_instruction_100k_2_train datasets. It achieves the following results on the evaluation set:

  • Loss: 0.0051
  • Token Mean Mae: 1755490844.4616
  • Token Mean Rmse: 860741.0284
  • Token Mean Seq Mean Mae: 61985.1293
  • Token Mean Seq Mean Rmse: 2517.6665
  • Token Mean Relerr: 0.3471
  • Token Mean Seq Mean Relerr: 0.4281

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 2e-05
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 0
  • distributed_type: multi-GPU
  • num_devices: 8
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 1024
  • total_eval_batch_size: 64
  • optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: constant
  • num_epochs: 2.0

Training results

Training Loss Epoch Step Validation Loss Token Mean Mae Token Mean Rmse Token Mean Seq Mean Mae Token Mean Seq Mean Rmse Token Mean Relerr Token Mean Seq Mean Relerr
0.1178 0.0868 50 0.0073 2263844398.6678 1071622.5521 79472.7089 3176.6721 0.3927 0.4645
0.1140 0.1736 100 0.0066 2078902431.7301 981870.9478 73235.4770 2988.0135 0.4147 0.4863
0.0937 0.2605 150 0.0061 2030736403.4614 974900.0266 71506.6388 2864.0524 0.3395 0.3890
0.1064 0.3473 200 0.0076 2311332148.9392 1065853.0814 82016.4487 3206.0134 0.3438 0.3569
0.1146 0.4341 250 0.0073 2030651502.6936 890259.9017 72395.3084 3097.9291 0.3794 0.4232
0.0895 0.5209 300 0.0056 1913583638.1415 935627.5738 66940.5910 2692.9061 0.3497 0.4239
0.0919 0.6078 350 0.0056 1869855950.8649 903558.4060 65857.3854 2683.5201 0.3439 0.4064
0.0980 0.6946 400 0.0058 1993659656.9446 964020.7070 70012.4340 2799.8679 0.3267 0.3565
0.0778 0.7814 450 0.0054 1832813236.2907 895473.6447 64531.9670 2623.8486 0.3453 0.4043
0.0831 0.8682 500 0.0052 1852278254.4045 907735.9972 65360.6544 2642.9237 0.3374 0.3840
0.1004 0.9551 550 0.0057 1804696164.7212 853874.5617 63857.3133 2630.1284 0.3695 0.4562
0.0670 1.0417 600 0.0052 1870502953.7651 916758.2446 65962.7757 2642.5650 0.3161 0.3670
0.0700 1.1285 650 0.0053 1789147480.0786 874791.0081 63005.9924 2562.7974 0.3317 0.3924
0.0690 1.2153 700 0.0051 1773889181.7713 877113.2014 62382.9228 2540.5305 0.3289 0.3934
0.0674 1.3021 750 0.0051 1835795742.4188 897972.6088 64946.3321 2609.7523 0.3172 0.3611
0.0696 1.3890 800 0.0051 1844006975.0281 889972.6433 65618.5613 2648.3364 0.3000 0.3355
0.0654 1.4758 850 0.0051 1834534190.5460 901626.3717 64754.8425 2605.7707 0.3115 0.3549
0.0651 1.5626 900 0.0050 1768793425.0901 863479.9880 62609.2571 2544.9282 0.3288 0.3813
0.0695 1.6494 950 0.0051 1866776729.5138 913800.9463 65991.4996 2644.0386 0.3100 0.3469
0.0719 1.7363 1000 0.0050 1769232935.6183 855875.1185 62915.1637 2559.8121 0.3055 0.3313
0.0746 1.8231 1050 0.0050 1895920727.6612 929899.4862 66864.5839 2668.0359 0.3043 0.3296
0.0724 1.9099 1100 0.0049 1768969140.4923 880428.9307 62285.4477 2516.9485 0.3102 0.3525
0.0699 1.9967 1150 0.0049 1786144123.4731 887134.9846 62914.9836 2544.2375 0.3065 0.3502

Framework versions

  • Transformers 5.0.0
  • Pytorch 2.10.0+cu128
  • Datasets 3.6.0
  • Tokenizers 0.22.2
Downloads last month
5
Safetensors
Model size
0.5B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for namezz/lvm-a-qwen2.5-7b-instruct-b-qwen2.5-0.5b-instruct

Finetuned
(736)
this model

Collection including namezz/lvm-a-qwen2.5-7b-instruct-b-qwen2.5-0.5b-instruct