End of training

Browse files

Files changed (4) hide show

README.md +102 -101
model.safetensors +1 -1
tokenizer.json +10 -1
training_args.bin +1 -1

README.md CHANGED Viewed

@@ -15,7 +15,7 @@ should probably proofread and complete it, then remove this comment. -->
 This model is a fine-tuned version of [EleutherAI/pythia-70m-deduped](https://huggingface.co/EleutherAI/pythia-70m-deduped) on an unknown dataset.
 It achieves the following results on the evaluation set:
-- Loss: 0.9229
 ## Model description
@@ -35,8 +35,8 @@ More information needed
 The following hyperparameters were used during training:
 - learning_rate: 5e-05
-- train_batch_size: 64
-- eval_batch_size: 64
 - seed: 42
 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
 - lr_scheduler_type: cosine
@@ -46,104 +46,105 @@ The following hyperparameters were used during training:
 | Training Loss | Epoch  | Step | Validation Loss |
 |:-------------:|:------:|:----:|:---------------:|
-| 1.4125        | 0.0101 | 19   | 1.4113          |
-| 1.2711        | 0.0203 | 38   | 1.2734          |
-| 1.275         | 0.0304 | 57   | 1.2185          |
-| 1.2618        | 0.0405 | 76   | 1.1850          |
-| 1.2098        | 0.0507 | 95   | 1.1672          |
-| 1.1458        | 0.0608 | 114  | 1.1452          |
-| 1.1457        | 0.0709 | 133  | 1.1275          |
-| 1.1003        | 0.0811 | 152  | 1.1198          |
-| 1.1089        | 0.0912 | 171  | 1.1120          |
-| 1.1051        | 0.1013 | 190  | 1.0940          |
-| 1.06          | 0.1115 | 209  | 1.0922          |
-| 1.0822        | 0.1216 | 228  | 1.0840          |
-| 1.0617        | 0.1317 | 247  | 1.0745          |
-| 1.0888        | 0.1419 | 266  | 1.0689          |
-| 1.0703        | 0.152  | 285  | 1.0629          |
-| 1.0849        | 0.1621 | 304  | 1.0559          |
-| 1.0098        | 0.1723 | 323  | 1.0513          |
-| 1.0548        | 0.1824 | 342  | 1.0480          |
-| 1.0344        | 0.1925 | 361  | 1.0453          |
-| 1.1181        | 0.2027 | 380  | 1.0379          |
-| 1.0027        | 0.2128 | 399  | 1.0370          |
-| 0.9994        | 0.2229 | 418  | 1.0329          |
-| 1.0089        | 0.2331 | 437  | 1.0315          |
-| 1.1145        | 0.2432 | 456  | 1.0243          |
-| 1.024         | 0.2533 | 475  | 1.0247          |
-| 1.0804        | 0.2635 | 494  | 1.0207          |
-| 1.0525        | 0.2736 | 513  | 1.0189          |
-| 1.0473        | 0.2837 | 532  | 1.0177          |
-| 1.0761        | 0.2939 | 551  | 1.0098          |
-| 0.9665        | 0.304  | 570  | 1.0065          |
-| 0.9576        | 0.3141 | 589  | 1.0043          |
-| 1.0517        | 0.3243 | 608  | 1.0063          |
-| 1.0393        | 0.3344 | 627  | 1.0029          |
-| 0.9573        | 0.3445 | 646  | 1.0048          |
-| 1.0322        | 0.3547 | 665  | 1.0001          |
-| 1.0246        | 0.3648 | 684  | 0.9922          |
-| 0.9412        | 0.3749 | 703  | 0.9923          |
-| 1.0155        | 0.3851 | 722  | 0.9910          |
-| 1.0375        | 0.3952 | 741  | 0.9849          |
-| 0.9608        | 0.4053 | 760  | 0.9859          |
-| 1.0077        | 0.4155 | 779  | 0.9820          |
-| 0.9509        | 0.4256 | 798  | 0.9820          |
-| 0.9974        | 0.4357 | 817  | 0.9802          |
-| 0.953         | 0.4459 | 836  | 0.9789          |
-| 0.9422        | 0.456  | 855  | 0.9756          |
-| 0.9781        | 0.4661 | 874  | 0.9757          |
-| 1.008         | 0.4763 | 893  | 0.9729          |
-| 0.9776        | 0.4864 | 912  | 0.9736          |
-| 0.996         | 0.4965 | 931  | 0.9707          |
-| 0.9971        | 0.5067 | 950  | 0.9676          |
-| 0.9908        | 0.5168 | 969  | 0.9668          |
-| 0.984         | 0.5269 | 988  | 0.9627          |
-| 0.9498        | 0.5371 | 1007 | 0.9627          |
-| 0.9978        | 0.5472 | 1026 | 0.9590          |
-| 0.9228        | 0.5573 | 1045 | 0.9577          |
-| 0.9223        | 0.5675 | 1064 | 0.9579          |
-| 0.984         | 0.5776 | 1083 | 0.9564          |
-| 0.955         | 0.5877 | 1102 | 0.9527          |
-| 0.9582        | 0.5979 | 1121 | 0.9519          |
-| 0.9493        | 0.608  | 1140 | 0.9508          |
-| 0.9451        | 0.6181 | 1159 | 0.9491          |
-| 0.9552        | 0.6283 | 1178 | 0.9486          |
-| 0.9433        | 0.6384 | 1197 | 0.9456          |
-| 0.967         | 0.6485 | 1216 | 0.9451          |
-| 0.9235        | 0.6587 | 1235 | 0.9424          |
-| 0.9173        | 0.6688 | 1254 | 0.9433          |
-| 0.9439        | 0.6789 | 1273 | 0.9408          |
-| 0.97          | 0.6891 | 1292 | 0.9403          |
-| 0.9765        | 0.6992 | 1311 | 0.9387          |
-| 0.9627        | 0.7093 | 1330 | 0.9378          |
-| 0.8907        | 0.7195 | 1349 | 0.9379          |
-| 0.9282        | 0.7296 | 1368 | 0.9356          |
-| 0.9235        | 0.7397 | 1387 | 0.9348          |
-| 0.9221        | 0.7499 | 1406 | 0.9329          |
-| 0.9527        | 0.76   | 1425 | 0.9322          |
-| 0.9015        | 0.7701 | 1444 | 0.9321          |
-| 0.9371        | 0.7803 | 1463 | 0.9309          |
-| 0.9379        | 0.7904 | 1482 | 0.9301          |
-| 0.9294        | 0.8005 | 1501 | 0.9299          |
-| 0.8619        | 0.8107 | 1520 | 0.9289          |
-| 0.9352        | 0.8208 | 1539 | 0.9283          |
-| 0.9187        | 0.8309 | 1558 | 0.9273          |
-| 0.9197        | 0.8411 | 1577 | 0.9274          |
-| 0.9627        | 0.8512 | 1596 | 0.9268          |
-| 0.9634        | 0.8613 | 1615 | 0.9259          |
-| 0.9242        | 0.8715 | 1634 | 0.9253          |
-| 0.937         | 0.8816 | 1653 | 0.9249          |
-| 0.9317        | 0.8917 | 1672 | 0.9246          |
-| 0.9478        | 0.9019 | 1691 | 0.9245          |
-| 0.9598        | 0.912  | 1710 | 0.9241          |
-| 0.9347        | 0.9221 | 1729 | 0.9239          |
-| 0.9048        | 0.9323 | 1748 | 0.9234          |
-| 0.8737        | 0.9424 | 1767 | 0.9232          |
-| 0.9272        | 0.9525 | 1786 | 0.9231          |
-| 0.9176        | 0.9627 | 1805 | 0.9230          |
-| 0.946         | 0.9728 | 1824 | 0.9229          |
-| 0.9279        | 0.9829 | 1843 | 0.9229          |
-| 0.907         | 0.9931 | 1862 | 0.9229          |
 ### Framework versions

 This model is a fine-tuned version of [EleutherAI/pythia-70m-deduped](https://huggingface.co/EleutherAI/pythia-70m-deduped) on an unknown dataset.
 It achieves the following results on the evaluation set:
+- Loss: 0.7609
 ## Model description
 The following hyperparameters were used during training:
 - learning_rate: 5e-05
+- train_batch_size: 8
+- eval_batch_size: 8
 - seed: 42
 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
 - lr_scheduler_type: cosine
 | Training Loss | Epoch  | Step | Validation Loss |
 |:-------------:|:------:|:----:|:---------------:|
+| 1.1912        | 0.0100 | 62   | 1.2654          |
+| 1.1714        | 0.0200 | 124  | 1.1780          |
+| 1.0771        | 0.0301 | 186  | 1.1419          |
+| 1.0829        | 0.0401 | 248  | 1.1046          |
+| 1.0113        | 0.0501 | 310  | 1.0850          |
+| 1.152         | 0.0601 | 372  | 1.0701          |
+| 1.0895        | 0.0701 | 434  | 1.0544          |
+| 0.9123        | 0.0802 | 496  | 1.0484          |
+| 1.0489        | 0.0902 | 558  | 1.0214          |
+| 1.0312        | 0.1002 | 620  | 1.0252          |
+| 0.9756        | 0.1102 | 682  | 1.0020          |
+| 1.0125        | 0.1202 | 744  | 0.9940          |
+| 1.0581        | 0.1303 | 806  | 0.9862          |
+| 1.0726        | 0.1403 | 868  | 0.9809          |
+| 0.9963        | 0.1503 | 930  | 0.9830          |
+| 0.9309        | 0.1603 | 992  | 0.9653          |
+| 0.8858        | 0.1703 | 1054 | 0.9538          |
+| 1.1137        | 0.1803 | 1116 | 0.9472          |
+| 0.9024        | 0.1904 | 1178 | 0.9411          |
+| 0.9812        | 0.2004 | 1240 | 0.9396          |
+| 0.9916        | 0.2104 | 1302 | 0.9254          |
+| 0.9509        | 0.2204 | 1364 | 0.9334          |
+| 0.8848        | 0.2304 | 1426 | 0.9439          |
+| 0.8302        | 0.2405 | 1488 | 0.9175          |
+| 1.0111        | 0.2505 | 1550 | 0.9158          |
+| 1.0273        | 0.2605 | 1612 | 0.9182          |
+| 0.8968        | 0.2705 | 1674 | 0.9116          |
+| 0.8892        | 0.2805 | 1736 | 0.9098          |
+| 0.7539        | 0.2906 | 1798 | 0.8896          |
+| 0.811         | 0.3006 | 1860 | 0.8968          |
+| 0.928         | 0.3106 | 1922 | 0.8875          |
+| 0.8163        | 0.3206 | 1984 | 0.8821          |
+| 0.9202        | 0.3306 | 2046 | 0.8820          |
+| 1.0208        | 0.3407 | 2108 | 0.8811          |
+| 0.8297        | 0.3507 | 2170 | 0.8823          |
+| 0.8213        | 0.3607 | 2232 | 0.8736          |
+| 0.8324        | 0.3707 | 2294 | 0.8698          |
+| 0.7721        | 0.3807 | 2356 | 0.8735          |
+| 0.9504        | 0.3908 | 2418 | 0.8705          |
+| 0.858         | 0.4008 | 2480 | 0.8620          |
+| 0.8791        | 0.4108 | 2542 | 0.8540          |
+| 0.8411        | 0.4208 | 2604 | 0.8606          |
+| 0.8845        | 0.4308 | 2666 | 0.8496          |
+| 0.7752        | 0.4409 | 2728 | 0.8462          |
+| 0.8598        | 0.4509 | 2790 | 0.8481          |
+| 0.7935        | 0.4609 | 2852 | 0.8412          |
+| 0.7352        | 0.4709 | 2914 | 0.8392          |
+| 0.8153        | 0.4809 | 2976 | 0.8426          |
+| 0.7371        | 0.4910 | 3038 | 0.8332          |
+| 0.7136        | 0.5010 | 3100 | 0.8300          |
+| 0.9777        | 0.5110 | 3162 | 0.8294          |
+| 0.8336        | 0.5210 | 3224 | 0.8306          |
+| 0.7546        | 0.5310 | 3286 | 0.8234          |
+| 0.8436        | 0.5410 | 3348 | 0.8237          |
+| 0.9316        | 0.5511 | 3410 | 0.8224          |
+| 0.6996        | 0.5611 | 3472 | 0.8191          |
+| 0.7417        | 0.5711 | 3534 | 0.8146          |
+| 0.8528        | 0.5811 | 3596 | 0.8110          |
+| 0.6861        | 0.5911 | 3658 | 0.8095          |
+| 0.8401        | 0.6012 | 3720 | 0.8096          |
+| 0.7056        | 0.6112 | 3782 | 0.8080          |
+| 0.8643        | 0.6212 | 3844 | 0.8004          |
+| 0.7575        | 0.6312 | 3906 | 0.8018          |
+| 0.8133        | 0.6412 | 3968 | 0.8008          |
+| 0.8221        | 0.6513 | 4030 | 0.7940          |
+| 0.8004        | 0.6613 | 4092 | 0.7948          |
+| 0.7002        | 0.6713 | 4154 | 0.7984          |
+| 0.8425        | 0.6813 | 4216 | 0.7892          |
+| 0.6777        | 0.6913 | 4278 | 0.7876          |
+| 0.9178        | 0.7014 | 4340 | 0.7865          |
+| 0.787         | 0.7114 | 4402 | 0.7844          |
+| 0.6979        | 0.7214 | 4464 | 0.7829          |
+| 0.7954        | 0.7314 | 4526 | 0.7825          |
+| 0.7937        | 0.7414 | 4588 | 0.7792          |
+| 0.7849        | 0.7515 | 4650 | 0.7790          |
+| 0.7108        | 0.7615 | 4712 | 0.7782          |
+| 0.831         | 0.7715 | 4774 | 0.7768          |
+| 0.8242        | 0.7815 | 4836 | 0.7741          |
+| 0.7472        | 0.7915 | 4898 | 0.7731          |
+| 0.8171        | 0.8016 | 4960 | 0.7732          |
+| 0.7857        | 0.8116 | 5022 | 0.7702          |
+| 0.7925        | 0.8216 | 5084 | 0.7707          |
+| 0.7134        | 0.8316 | 5146 | 0.7680          |
+| 0.8401        | 0.8416 | 5208 | 0.7686          |
+| 0.6919        | 0.8516 | 5270 | 0.7679          |
+| 0.7689        | 0.8617 | 5332 | 0.7658          |
+| 0.7899        | 0.8717 | 5394 | 0.7645          |
+| 0.8457        | 0.8817 | 5456 | 0.7639          |
+| 0.7738        | 0.8917 | 5518 | 0.7635          |
+| 0.7943        | 0.9017 | 5580 | 0.7628          |
+| 0.756         | 0.9118 | 5642 | 0.7625          |
+| 0.8021        | 0.9218 | 5704 | 0.7619          |
+| 0.7325        | 0.9318 | 5766 | 0.7615          |
+| 0.7312        | 0.9418 | 5828 | 0.7613          |
+| 0.8255        | 0.9518 | 5890 | 0.7613          |
+| 0.794         | 0.9619 | 5952 | 0.7610          |
+| 0.7392        | 0.9719 | 6014 | 0.7609          |
+| 0.841         | 0.9819 | 6076 | 0.7609          |
+| 0.7018        | 0.9919 | 6138 | 0.7609          |
 ### Framework versions

model.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:b7cfe24eed9f46124b829ad114c1705b1181d859f93789348e2fbe6464e52943
 size 281715176

 version https://git-lfs.github.com/spec/v1
+oid sha256:3152e02065b95cc734022a2ae121b2e6ae85fcd5f3c99c1583760b6cf7bffbfd
 size 281715176

tokenizer.json CHANGED Viewed

@@ -6,7 +6,16 @@
     "strategy": "LongestFirst",
     "stride": 0
   },
-  "padding": null,
   "added_tokens": [
     {
       "id": 0,

     "strategy": "LongestFirst",
     "stride": 0
   },
+  "padding": {
+    "strategy": {
+      "Fixed": 128
+    },
+    "direction": "Right",
+    "pad_to_multiple_of": null,
+    "pad_id": 0,
+    "pad_type_id": 0,
+    "pad_token": "<|endoftext|>"
+  },
   "added_tokens": [
     {
       "id": 0,

training_args.bin CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:ebac6fcd96158c330b4abf25cb1b0bbbba194fb85dfe8b7836e44e74470ce912
 size 5048

 version https://git-lfs.github.com/spec/v1
+oid sha256:c695c5d72fafa83e8e41ba77f5436c18fbd9659321bd9566a8ae7929a7c280c7
 size 5048