GSM8K-Binary_Llama-3.2-1B-rtd7v6w7

This model is a fine-tuned version of meta-llama/Llama-3.2-1B on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.8309
  • Model Preparation Time: 0.0055
  • Mdl: 2966.8667
  • Accumulated Loss: 2056.4753
  • Correct Preds: 1984.0
  • Total Preds: 2475.0
  • Accuracy: 0.8016
  • Correct Gen Preds: 1758.0
  • Gen Accuracy: 0.7103
  • Correct Gen Preds 34192: 869.0
  • Correct Preds 34192: 1016.0
  • Total Labels 34192: 1196.0
  • Accuracy 34192: 0.8495
  • Gen Accuracy 34192: 0.7266
  • Correct Gen Preds 41568: 881.0
  • Correct Preds 41568: 968.0
  • Total Labels 41568: 1267.0
  • Accuracy 41568: 0.7640
  • Gen Accuracy 41568: 0.6953

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 2e-05
  • train_batch_size: 32
  • eval_batch_size: 64
  • seed: 42
  • optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.01
  • num_epochs: 100

Training results

Training Loss Epoch Step Validation Loss Model Preparation Time Mdl Accumulated Loss Correct Preds Total Preds Accuracy Correct Gen Preds Gen Accuracy Correct Gen Preds 34192 Correct Preds 34192 Total Labels 34192 Accuracy 34192 Gen Accuracy 34192 Correct Gen Preds 41568 Correct Preds 41568 Total Labels 41568 Accuracy 41568 Gen Accuracy 41568
No log 0 0 1.4656 0.0055 5233.1723 3627.3586 1196.0 2475.0 0.4832 1204.0 0.4865 1196.0 1196.0 1196.0 1.0 1.0 0.0 0.0 1267.0 0.0 0.0
0.7154 1.0 25 0.6989 0.0055 2495.4759 1729.7321 1514.0 2475.0 0.6117 949.0 0.3834 661.0 1174.0 1196.0 0.9816 0.5527 280.0 340.0 1267.0 0.2684 0.2210
0.9619 2.0 50 0.7183 0.0055 2564.7178 1777.7269 1557.0 2475.0 0.6291 23.0 0.0093 0.0 338.0 1196.0 0.2826 0.0 15.0 1219.0 1267.0 0.9621 0.0118
0.7749 3.0 75 0.5842 0.0055 2086.0604 1445.9469 1857.0 2475.0 0.7503 260.0 0.1051 0.0 718.0 1196.0 0.6003 0.0 252.0 1139.0 1267.0 0.8990 0.1989
0.3418 4.0 100 1.0916 0.0055 3897.6253 2701.6280 1690.0 2475.0 0.6828 379.0 0.1531 272.0 1174.0 1196.0 0.9816 0.2274 99.0 516.0 1267.0 0.4073 0.0781
0.5389 5.0 125 0.7000 0.0055 2499.5846 1732.5800 1936.0 2475.0 0.7822 353.0 0.1426 115.0 1040.0 1196.0 0.8696 0.0962 231.0 896.0 1267.0 0.7072 0.1823
0.4228 6.0 150 0.8309 0.0055 2966.8667 2056.4753 1984.0 2475.0 0.8016 1758.0 0.7103 869.0 1016.0 1196.0 0.8495 0.7266 881.0 968.0 1267.0 0.7640 0.6953
0.0038 7.0 175 1.0896 0.0055 3890.7175 2696.8399 1907.0 2475.0 0.7705 1087.0 0.4392 596.0 1096.0 1196.0 0.9164 0.4983 484.0 811.0 1267.0 0.6401 0.3820
0.0096 8.0 200 1.1357 0.0055 4055.3565 2810.9589 1951.0 2475.0 0.7883 1898.0 0.7669 963.0 987.0 1196.0 0.8253 0.8052 927.0 964.0 1267.0 0.7609 0.7316
0.3927 9.0 225 1.4010 0.0055 5002.5226 3467.4844 1976.0 2475.0 0.7984 1937.0 0.7826 1017.0 1046.0 1196.0 0.8746 0.8503 913.0 930.0 1267.0 0.7340 0.7206
0.0001 10.0 250 1.2540 0.0055 4477.7630 3103.7488 1983.0 2475.0 0.8012 1946.0 0.7863 964.0 993.0 1196.0 0.8303 0.8060 975.0 990.0 1267.0 0.7814 0.7695
0.3922 11.0 275 1.3906 0.0055 4965.3047 3441.6870 1964.0 2475.0 0.7935 1932.0 0.7806 1029.0 1047.0 1196.0 0.8754 0.8604 896.0 917.0 1267.0 0.7238 0.7072
0.0 12.0 300 1.4206 0.0055 5072.5476 3516.0220 1966.0 2475.0 0.7943 1947.0 0.7867 1030.0 1046.0 1196.0 0.8746 0.8612 910.0 920.0 1267.0 0.7261 0.7182
0.3921 13.0 325 1.4252 0.0055 5089.0899 3527.4883 1967.0 2475.0 0.7947 1954.0 0.7895 1029.0 1041.0 1196.0 0.8704 0.8604 918.0 926.0 1267.0 0.7309 0.7245
0.0 14.0 350 1.4296 0.0055 5104.8061 3538.3819 1968.0 2475.0 0.7952 1956.0 0.7903 1027.0 1040.0 1196.0 0.8696 0.8587 922.0 928.0 1267.0 0.7324 0.7277
0.3921 15.0 375 1.4327 0.0055 5115.6967 3545.9308 1973.0 2475.0 0.7972 1964.0 0.7935 1029.0 1040.0 1196.0 0.8696 0.8604 928.0 933.0 1267.0 0.7364 0.7324
0.3921 16.0 400 1.4372 0.0055 5131.6445 3556.9849 1972.0 2475.0 0.7968 1961.0 0.7923 1028.0 1039.0 1196.0 0.8687 0.8595 926.0 933.0 1267.0 0.7364 0.7309
0.0 17.0 425 1.4412 0.0055 5146.1676 3567.0516 1972.0 2475.0 0.7968 1961.0 0.7923 1027.0 1038.0 1196.0 0.8679 0.8587 927.0 934.0 1267.0 0.7372 0.7316
0.7841 18.0 450 1.4461 0.0055 5163.7167 3579.2156 1970.0 2475.0 0.7960 1962.0 0.7927 1026.0 1037.0 1196.0 0.8671 0.8579 929.0 933.0 1267.0 0.7364 0.7332
0.0 19.0 475 1.4499 0.0055 5177.0033 3588.4253 1972.0 2475.0 0.7968 1966.0 0.7943 1027.0 1036.0 1196.0 0.8662 0.8587 932.0 936.0 1267.0 0.7388 0.7356
0.0 20.0 500 1.4508 0.0055 5180.3062 3590.7146 1971.0 2475.0 0.7964 1964.0 0.7935 1025.0 1035.0 1196.0 0.8654 0.8570 932.0 936.0 1267.0 0.7388 0.7356
0.3921 21.0 525 1.4534 0.0055 5189.7817 3597.2826 1974.0 2475.0 0.7976 1969.0 0.7956 1029.0 1038.0 1196.0 0.8679 0.8604 933.0 936.0 1267.0 0.7388 0.7364
0.0 22.0 550 1.4580 0.0055 5206.0889 3608.5858 1971.0 2475.0 0.7964 1964.0 0.7935 1028.0 1038.0 1196.0 0.8679 0.8595 929.0 933.0 1267.0 0.7364 0.7332
0.0 23.0 575 1.4600 0.0055 5213.0440 3613.4067 1975.0 2475.0 0.7980 1968.0 0.7952 1027.0 1037.0 1196.0 0.8671 0.8587 934.0 938.0 1267.0 0.7403 0.7372
0.0 24.0 600 1.4608 0.0055 5216.0428 3615.4854 1975.0 2475.0 0.7980 1970.0 0.7960 1027.0 1036.0 1196.0 0.8662 0.8587 936.0 939.0 1267.0 0.7411 0.7388
0.0 25.0 625 1.4642 0.0055 5228.0668 3623.8198 1973.0 2475.0 0.7972 1967.0 0.7947 1028.0 1037.0 1196.0 0.8671 0.8595 932.0 936.0 1267.0 0.7388 0.7356
0.0 26.0 650 1.4671 0.0055 5238.4672 3631.0288 1973.0 2475.0 0.7972 1969.0 0.7956 1025.0 1034.0 1196.0 0.8645 0.8570 937.0 939.0 1267.0 0.7411 0.7395

Framework versions

  • Transformers 4.51.3
  • Pytorch 2.6.0+cu124
  • Datasets 3.5.0
  • Tokenizers 0.21.1
Downloads last month
1
Safetensors
Model size
1B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for donoway/GSM8K-Binary_Llama-3.2-1B-rtd7v6w7

Finetuned
(915)
this model