bert_mn
This model is a fine-tuned version of tergel/bert_mn on the None dataset. It achieves the following results on the evaluation set:
- Loss: 2.2973
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 8e-05
- train_batch_size: 256
- eval_batch_size: 512
- seed: 42
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- num_epochs: 20
Training results
| Training Loss | Epoch | Step | Validation Loss |
|---|---|---|---|
| 2.5775 | 1.0 | 577 | 2.5137 |
| 2.5628 | 2.0 | 1154 | 2.4950 |
| 2.5396 | 3.0 | 1731 | 2.4834 |
| 2.5172 | 4.0 | 2308 | 2.4643 |
| 2.4969 | 5.0 | 2885 | 2.4481 |
| 2.4754 | 6.0 | 3462 | 2.4253 |
| 2.4578 | 7.0 | 4039 | 2.4195 |
| 2.4403 | 8.0 | 4616 | 2.4009 |
| 2.4216 | 9.0 | 5193 | 2.3863 |
| 2.4075 | 10.0 | 5770 | 2.3794 |
| 2.3927 | 11.0 | 6347 | 2.3640 |
| 2.3787 | 12.0 | 6924 | 2.3516 |
| 2.365 | 13.0 | 7501 | 2.3403 |
| 2.3546 | 14.0 | 8078 | 2.3344 |
| 2.3423 | 15.0 | 8655 | 2.3268 |
| 2.3336 | 16.0 | 9232 | 2.3171 |
| 2.3265 | 17.0 | 9809 | 2.3149 |
| 2.3168 | 18.0 | 10386 | 2.3049 |
| 2.312 | 19.0 | 10963 | 2.3032 |
| 2.3047 | 20.0 | 11540 | 2.2973 |
Framework versions
- Transformers 4.44.0
- Pytorch 2.4.0+cu121
- Datasets 2.20.0
- Tokenizers 0.19.1
- Downloads last month
- 3
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support
Model tree for tergel/bert-base-mongolian-uncased
Unable to build the model tree, the base model loops to the model itself. Learn more.