Update README.md
Browse files
README.md
CHANGED
|
@@ -36,25 +36,6 @@ embeddings = model.encode(sentences)
|
|
| 36 |
print(embeddings)
|
| 37 |
```
|
| 38 |
|
| 39 |
-
## Model Summary
|
| 40 |
-
|
| 41 |
-
- Fine-tuning method: Supervised SimCSE
|
| 42 |
-
- Base model: [cl-tohoku/bert-base-japanese-v3](https://huggingface.co/cl-tohoku/bert-base-japanese-v3)
|
| 43 |
-
- Training dataset: [JSNLI](https://nlp.ist.i.kyoto-u.ac.jp/?%E6%97%A5%E6%9C%AC%E8%AA%9ESNLI%28JSNLI%29%E3%83%87%E3%83%BC%E3%82%BF%E3%82%BB%E3%83%83%E3%83%88)
|
| 44 |
-
- Pooling strategy: cls (with an extra MLP layer only during training)
|
| 45 |
-
- Hidden size: 768
|
| 46 |
-
- Learning rate: 5e-5
|
| 47 |
-
- Batch size: 512
|
| 48 |
-
- Temperature: 0.05
|
| 49 |
-
- Max sequence length: 64
|
| 50 |
-
- Number of training examples: 2^20
|
| 51 |
-
- Validation interval (steps): 2^6
|
| 52 |
-
- Warmup ratio: 0.1
|
| 53 |
-
- Dtype: BFloat16
|
| 54 |
-
|
| 55 |
-
See the [GitHub repository](https://github.com/hppRC/simple-simcse-ja) for a detailed experimental setup.
|
| 56 |
-
|
| 57 |
-
|
| 58 |
## Usage (HuggingFace Transformers)
|
| 59 |
Without [sentence-transformers](https://www.SBERT.net), you can use the model like this: First, you pass your input through the transformer model, then you have to apply the right pooling-operation on-top of the contextualized word embeddings.
|
| 60 |
|
|
@@ -96,6 +77,24 @@ SentenceTransformer(
|
|
| 96 |
)
|
| 97 |
```
|
| 98 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 99 |
## Citing & Authors
|
| 100 |
|
| 101 |
```
|
|
|
|
| 36 |
print(embeddings)
|
| 37 |
```
|
| 38 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 39 |
## Usage (HuggingFace Transformers)
|
| 40 |
Without [sentence-transformers](https://www.SBERT.net), you can use the model like this: First, you pass your input through the transformer model, then you have to apply the right pooling-operation on-top of the contextualized word embeddings.
|
| 41 |
|
|
|
|
| 77 |
)
|
| 78 |
```
|
| 79 |
|
| 80 |
+
## Model Summary
|
| 81 |
+
|
| 82 |
+
- Fine-tuning method: Supervised SimCSE
|
| 83 |
+
- Base model: [cl-tohoku/bert-base-japanese-v3](https://huggingface.co/cl-tohoku/bert-base-japanese-v3)
|
| 84 |
+
- Training dataset: [JSNLI](https://nlp.ist.i.kyoto-u.ac.jp/?%E6%97%A5%E6%9C%AC%E8%AA%9ESNLI%28JSNLI%29%E3%83%87%E3%83%BC%E3%82%BF%E3%82%BB%E3%83%83%E3%83%88)
|
| 85 |
+
- Pooling strategy: cls (with an extra MLP layer only during training)
|
| 86 |
+
- Hidden size: 768
|
| 87 |
+
- Learning rate: 5e-5
|
| 88 |
+
- Batch size: 512
|
| 89 |
+
- Temperature: 0.05
|
| 90 |
+
- Max sequence length: 64
|
| 91 |
+
- Number of training examples: 2^20
|
| 92 |
+
- Validation interval (steps): 2^6
|
| 93 |
+
- Warmup ratio: 0.1
|
| 94 |
+
- Dtype: BFloat16
|
| 95 |
+
|
| 96 |
+
See the [GitHub repository](https://github.com/hppRC/simple-simcse-ja) for a detailed experimental setup.
|
| 97 |
+
|
| 98 |
## Citing & Authors
|
| 99 |
|
| 100 |
```
|