End of training
Browse files- README.md +102 -101
- model.safetensors +1 -1
- tokenizer.json +10 -1
- training_args.bin +1 -1
README.md
CHANGED
|
@@ -15,7 +15,7 @@ should probably proofread and complete it, then remove this comment. -->
|
|
| 15 |
|
| 16 |
This model is a fine-tuned version of [EleutherAI/pythia-70m-deduped](https://huggingface.co/EleutherAI/pythia-70m-deduped) on an unknown dataset.
|
| 17 |
It achieves the following results on the evaluation set:
|
| 18 |
-
- Loss: 0.
|
| 19 |
|
| 20 |
## Model description
|
| 21 |
|
|
@@ -35,8 +35,8 @@ More information needed
|
|
| 35 |
|
| 36 |
The following hyperparameters were used during training:
|
| 37 |
- learning_rate: 5e-05
|
| 38 |
-
- train_batch_size:
|
| 39 |
-
- eval_batch_size:
|
| 40 |
- seed: 42
|
| 41 |
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
|
| 42 |
- lr_scheduler_type: cosine
|
|
@@ -46,104 +46,105 @@ The following hyperparameters were used during training:
|
|
| 46 |
|
| 47 |
| Training Loss | Epoch | Step | Validation Loss |
|
| 48 |
|:-------------:|:------:|:----:|:---------------:|
|
| 49 |
-
| 1.
|
| 50 |
-
| 1.
|
| 51 |
-
| 1.
|
| 52 |
-
| 1.
|
| 53 |
-
| 1.
|
| 54 |
-
| 1.
|
| 55 |
-
| 1.
|
| 56 |
-
|
|
| 57 |
-
| 1.
|
| 58 |
-
| 1.
|
| 59 |
-
|
|
| 60 |
-
| 1.
|
| 61 |
-
| 1.
|
| 62 |
-
| 1.
|
| 63 |
-
|
|
| 64 |
-
|
|
| 65 |
-
|
|
| 66 |
-
| 1.
|
| 67 |
-
|
|
| 68 |
-
|
|
| 69 |
-
|
|
| 70 |
-
| 0.
|
| 71 |
-
|
|
| 72 |
-
|
|
| 73 |
-
| 1.
|
| 74 |
-
| 1.
|
| 75 |
-
|
|
| 76 |
-
|
|
| 77 |
-
|
|
| 78 |
-
| 0.
|
| 79 |
-
| 0.
|
| 80 |
-
|
|
| 81 |
-
|
|
| 82 |
-
|
|
| 83 |
-
|
|
| 84 |
-
|
|
| 85 |
-
| 0.
|
| 86 |
-
|
|
| 87 |
-
|
|
| 88 |
-
| 0.
|
| 89 |
-
|
|
| 90 |
-
| 0.
|
| 91 |
-
| 0.
|
| 92 |
-
| 0.
|
| 93 |
-
| 0.
|
| 94 |
-
| 0.
|
| 95 |
-
|
|
| 96 |
-
| 0.
|
| 97 |
-
| 0.
|
| 98 |
-
| 0.
|
| 99 |
-
| 0.
|
| 100 |
-
| 0.
|
| 101 |
-
| 0.
|
| 102 |
-
| 0.
|
| 103 |
-
| 0.
|
| 104 |
-
| 0.
|
| 105 |
-
| 0.
|
| 106 |
-
| 0.
|
| 107 |
-
| 0.
|
| 108 |
-
| 0.
|
| 109 |
-
| 0.
|
| 110 |
-
| 0.
|
| 111 |
-
| 0.
|
| 112 |
-
| 0.
|
| 113 |
-
| 0.
|
| 114 |
-
| 0.
|
| 115 |
-
| 0.
|
| 116 |
-
| 0.
|
| 117 |
-
| 0.
|
| 118 |
-
| 0.
|
| 119 |
-
| 0.
|
| 120 |
-
| 0.
|
| 121 |
-
| 0.
|
| 122 |
-
| 0.
|
| 123 |
-
| 0.
|
| 124 |
-
| 0.
|
| 125 |
-
| 0.
|
| 126 |
-
| 0.
|
| 127 |
-
| 0.
|
| 128 |
-
| 0.
|
| 129 |
-
| 0.
|
| 130 |
-
| 0.
|
| 131 |
-
| 0.
|
| 132 |
-
| 0.
|
| 133 |
-
| 0.
|
| 134 |
-
| 0.
|
| 135 |
-
| 0.
|
| 136 |
-
| 0.
|
| 137 |
-
| 0.
|
| 138 |
-
| 0.
|
| 139 |
-
| 0.
|
| 140 |
-
| 0.
|
| 141 |
-
| 0.
|
| 142 |
-
| 0.
|
| 143 |
-
| 0.
|
| 144 |
-
| 0.
|
| 145 |
-
| 0.
|
| 146 |
-
| 0.
|
|
|
|
| 147 |
|
| 148 |
|
| 149 |
### Framework versions
|
|
|
|
| 15 |
|
| 16 |
This model is a fine-tuned version of [EleutherAI/pythia-70m-deduped](https://huggingface.co/EleutherAI/pythia-70m-deduped) on an unknown dataset.
|
| 17 |
It achieves the following results on the evaluation set:
|
| 18 |
+
- Loss: 0.7609
|
| 19 |
|
| 20 |
## Model description
|
| 21 |
|
|
|
|
| 35 |
|
| 36 |
The following hyperparameters were used during training:
|
| 37 |
- learning_rate: 5e-05
|
| 38 |
+
- train_batch_size: 8
|
| 39 |
+
- eval_batch_size: 8
|
| 40 |
- seed: 42
|
| 41 |
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
|
| 42 |
- lr_scheduler_type: cosine
|
|
|
|
| 46 |
|
| 47 |
| Training Loss | Epoch | Step | Validation Loss |
|
| 48 |
|:-------------:|:------:|:----:|:---------------:|
|
| 49 |
+
| 1.1912 | 0.0100 | 62 | 1.2654 |
|
| 50 |
+
| 1.1714 | 0.0200 | 124 | 1.1780 |
|
| 51 |
+
| 1.0771 | 0.0301 | 186 | 1.1419 |
|
| 52 |
+
| 1.0829 | 0.0401 | 248 | 1.1046 |
|
| 53 |
+
| 1.0113 | 0.0501 | 310 | 1.0850 |
|
| 54 |
+
| 1.152 | 0.0601 | 372 | 1.0701 |
|
| 55 |
+
| 1.0895 | 0.0701 | 434 | 1.0544 |
|
| 56 |
+
| 0.9123 | 0.0802 | 496 | 1.0484 |
|
| 57 |
+
| 1.0489 | 0.0902 | 558 | 1.0214 |
|
| 58 |
+
| 1.0312 | 0.1002 | 620 | 1.0252 |
|
| 59 |
+
| 0.9756 | 0.1102 | 682 | 1.0020 |
|
| 60 |
+
| 1.0125 | 0.1202 | 744 | 0.9940 |
|
| 61 |
+
| 1.0581 | 0.1303 | 806 | 0.9862 |
|
| 62 |
+
| 1.0726 | 0.1403 | 868 | 0.9809 |
|
| 63 |
+
| 0.9963 | 0.1503 | 930 | 0.9830 |
|
| 64 |
+
| 0.9309 | 0.1603 | 992 | 0.9653 |
|
| 65 |
+
| 0.8858 | 0.1703 | 1054 | 0.9538 |
|
| 66 |
+
| 1.1137 | 0.1803 | 1116 | 0.9472 |
|
| 67 |
+
| 0.9024 | 0.1904 | 1178 | 0.9411 |
|
| 68 |
+
| 0.9812 | 0.2004 | 1240 | 0.9396 |
|
| 69 |
+
| 0.9916 | 0.2104 | 1302 | 0.9254 |
|
| 70 |
+
| 0.9509 | 0.2204 | 1364 | 0.9334 |
|
| 71 |
+
| 0.8848 | 0.2304 | 1426 | 0.9439 |
|
| 72 |
+
| 0.8302 | 0.2405 | 1488 | 0.9175 |
|
| 73 |
+
| 1.0111 | 0.2505 | 1550 | 0.9158 |
|
| 74 |
+
| 1.0273 | 0.2605 | 1612 | 0.9182 |
|
| 75 |
+
| 0.8968 | 0.2705 | 1674 | 0.9116 |
|
| 76 |
+
| 0.8892 | 0.2805 | 1736 | 0.9098 |
|
| 77 |
+
| 0.7539 | 0.2906 | 1798 | 0.8896 |
|
| 78 |
+
| 0.811 | 0.3006 | 1860 | 0.8968 |
|
| 79 |
+
| 0.928 | 0.3106 | 1922 | 0.8875 |
|
| 80 |
+
| 0.8163 | 0.3206 | 1984 | 0.8821 |
|
| 81 |
+
| 0.9202 | 0.3306 | 2046 | 0.8820 |
|
| 82 |
+
| 1.0208 | 0.3407 | 2108 | 0.8811 |
|
| 83 |
+
| 0.8297 | 0.3507 | 2170 | 0.8823 |
|
| 84 |
+
| 0.8213 | 0.3607 | 2232 | 0.8736 |
|
| 85 |
+
| 0.8324 | 0.3707 | 2294 | 0.8698 |
|
| 86 |
+
| 0.7721 | 0.3807 | 2356 | 0.8735 |
|
| 87 |
+
| 0.9504 | 0.3908 | 2418 | 0.8705 |
|
| 88 |
+
| 0.858 | 0.4008 | 2480 | 0.8620 |
|
| 89 |
+
| 0.8791 | 0.4108 | 2542 | 0.8540 |
|
| 90 |
+
| 0.8411 | 0.4208 | 2604 | 0.8606 |
|
| 91 |
+
| 0.8845 | 0.4308 | 2666 | 0.8496 |
|
| 92 |
+
| 0.7752 | 0.4409 | 2728 | 0.8462 |
|
| 93 |
+
| 0.8598 | 0.4509 | 2790 | 0.8481 |
|
| 94 |
+
| 0.7935 | 0.4609 | 2852 | 0.8412 |
|
| 95 |
+
| 0.7352 | 0.4709 | 2914 | 0.8392 |
|
| 96 |
+
| 0.8153 | 0.4809 | 2976 | 0.8426 |
|
| 97 |
+
| 0.7371 | 0.4910 | 3038 | 0.8332 |
|
| 98 |
+
| 0.7136 | 0.5010 | 3100 | 0.8300 |
|
| 99 |
+
| 0.9777 | 0.5110 | 3162 | 0.8294 |
|
| 100 |
+
| 0.8336 | 0.5210 | 3224 | 0.8306 |
|
| 101 |
+
| 0.7546 | 0.5310 | 3286 | 0.8234 |
|
| 102 |
+
| 0.8436 | 0.5410 | 3348 | 0.8237 |
|
| 103 |
+
| 0.9316 | 0.5511 | 3410 | 0.8224 |
|
| 104 |
+
| 0.6996 | 0.5611 | 3472 | 0.8191 |
|
| 105 |
+
| 0.7417 | 0.5711 | 3534 | 0.8146 |
|
| 106 |
+
| 0.8528 | 0.5811 | 3596 | 0.8110 |
|
| 107 |
+
| 0.6861 | 0.5911 | 3658 | 0.8095 |
|
| 108 |
+
| 0.8401 | 0.6012 | 3720 | 0.8096 |
|
| 109 |
+
| 0.7056 | 0.6112 | 3782 | 0.8080 |
|
| 110 |
+
| 0.8643 | 0.6212 | 3844 | 0.8004 |
|
| 111 |
+
| 0.7575 | 0.6312 | 3906 | 0.8018 |
|
| 112 |
+
| 0.8133 | 0.6412 | 3968 | 0.8008 |
|
| 113 |
+
| 0.8221 | 0.6513 | 4030 | 0.7940 |
|
| 114 |
+
| 0.8004 | 0.6613 | 4092 | 0.7948 |
|
| 115 |
+
| 0.7002 | 0.6713 | 4154 | 0.7984 |
|
| 116 |
+
| 0.8425 | 0.6813 | 4216 | 0.7892 |
|
| 117 |
+
| 0.6777 | 0.6913 | 4278 | 0.7876 |
|
| 118 |
+
| 0.9178 | 0.7014 | 4340 | 0.7865 |
|
| 119 |
+
| 0.787 | 0.7114 | 4402 | 0.7844 |
|
| 120 |
+
| 0.6979 | 0.7214 | 4464 | 0.7829 |
|
| 121 |
+
| 0.7954 | 0.7314 | 4526 | 0.7825 |
|
| 122 |
+
| 0.7937 | 0.7414 | 4588 | 0.7792 |
|
| 123 |
+
| 0.7849 | 0.7515 | 4650 | 0.7790 |
|
| 124 |
+
| 0.7108 | 0.7615 | 4712 | 0.7782 |
|
| 125 |
+
| 0.831 | 0.7715 | 4774 | 0.7768 |
|
| 126 |
+
| 0.8242 | 0.7815 | 4836 | 0.7741 |
|
| 127 |
+
| 0.7472 | 0.7915 | 4898 | 0.7731 |
|
| 128 |
+
| 0.8171 | 0.8016 | 4960 | 0.7732 |
|
| 129 |
+
| 0.7857 | 0.8116 | 5022 | 0.7702 |
|
| 130 |
+
| 0.7925 | 0.8216 | 5084 | 0.7707 |
|
| 131 |
+
| 0.7134 | 0.8316 | 5146 | 0.7680 |
|
| 132 |
+
| 0.8401 | 0.8416 | 5208 | 0.7686 |
|
| 133 |
+
| 0.6919 | 0.8516 | 5270 | 0.7679 |
|
| 134 |
+
| 0.7689 | 0.8617 | 5332 | 0.7658 |
|
| 135 |
+
| 0.7899 | 0.8717 | 5394 | 0.7645 |
|
| 136 |
+
| 0.8457 | 0.8817 | 5456 | 0.7639 |
|
| 137 |
+
| 0.7738 | 0.8917 | 5518 | 0.7635 |
|
| 138 |
+
| 0.7943 | 0.9017 | 5580 | 0.7628 |
|
| 139 |
+
| 0.756 | 0.9118 | 5642 | 0.7625 |
|
| 140 |
+
| 0.8021 | 0.9218 | 5704 | 0.7619 |
|
| 141 |
+
| 0.7325 | 0.9318 | 5766 | 0.7615 |
|
| 142 |
+
| 0.7312 | 0.9418 | 5828 | 0.7613 |
|
| 143 |
+
| 0.8255 | 0.9518 | 5890 | 0.7613 |
|
| 144 |
+
| 0.794 | 0.9619 | 5952 | 0.7610 |
|
| 145 |
+
| 0.7392 | 0.9719 | 6014 | 0.7609 |
|
| 146 |
+
| 0.841 | 0.9819 | 6076 | 0.7609 |
|
| 147 |
+
| 0.7018 | 0.9919 | 6138 | 0.7609 |
|
| 148 |
|
| 149 |
|
| 150 |
### Framework versions
|
model.safetensors
CHANGED
|
@@ -1,3 +1,3 @@
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
-
oid sha256:
|
| 3 |
size 281715176
|
|
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:3152e02065b95cc734022a2ae121b2e6ae85fcd5f3c99c1583760b6cf7bffbfd
|
| 3 |
size 281715176
|
tokenizer.json
CHANGED
|
@@ -6,7 +6,16 @@
|
|
| 6 |
"strategy": "LongestFirst",
|
| 7 |
"stride": 0
|
| 8 |
},
|
| 9 |
-
"padding":
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 10 |
"added_tokens": [
|
| 11 |
{
|
| 12 |
"id": 0,
|
|
|
|
| 6 |
"strategy": "LongestFirst",
|
| 7 |
"stride": 0
|
| 8 |
},
|
| 9 |
+
"padding": {
|
| 10 |
+
"strategy": {
|
| 11 |
+
"Fixed": 128
|
| 12 |
+
},
|
| 13 |
+
"direction": "Right",
|
| 14 |
+
"pad_to_multiple_of": null,
|
| 15 |
+
"pad_id": 0,
|
| 16 |
+
"pad_type_id": 0,
|
| 17 |
+
"pad_token": "<|endoftext|>"
|
| 18 |
+
},
|
| 19 |
"added_tokens": [
|
| 20 |
{
|
| 21 |
"id": 0,
|
training_args.bin
CHANGED
|
@@ -1,3 +1,3 @@
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
-
oid sha256:
|
| 3 |
size 5048
|
|
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:c695c5d72fafa83e8e41ba77f5436c18fbd9659321bd9566a8ae7929a7c280c7
|
| 3 |
size 5048
|