Qsevent77 commited on
Commit
306867d
·
verified ·
1 Parent(s): 072a865

Upload folder using huggingface_hub

Browse files
.gitattributes CHANGED
@@ -33,3 +33,6 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ tokenizer.json filter=lfs diff=lfs merge=lfs -text
37
+ *.original filter=lfs diff=lfs merge=lfs -text
38
+ onnx/model.onnx_data filter=lfs diff=lfs merge=lfs -text
README.md ADDED
@@ -0,0 +1,491 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model:
3
+ - jinaai/xlm-roberta-flash-implementation
4
+ language:
5
+ - multilingual
6
+ - af
7
+ - am
8
+ - ar
9
+ - as
10
+ - az
11
+ - be
12
+ - bg
13
+ - bn
14
+ - br
15
+ - bs
16
+ - ca
17
+ - cs
18
+ - cy
19
+ - da
20
+ - de
21
+ - el
22
+ - en
23
+ - eo
24
+ - es
25
+ - et
26
+ - eu
27
+ - fa
28
+ - fi
29
+ - fr
30
+ - fy
31
+ - ga
32
+ - gd
33
+ - gl
34
+ - gu
35
+ - ha
36
+ - he
37
+ - hi
38
+ - hr
39
+ - hu
40
+ - hy
41
+ - id
42
+ - is
43
+ - it
44
+ - ja
45
+ - jv
46
+ - ka
47
+ - kk
48
+ - km
49
+ - kn
50
+ - ko
51
+ - ku
52
+ - ky
53
+ - la
54
+ - lo
55
+ - lt
56
+ - lv
57
+ - mg
58
+ - mk
59
+ - ml
60
+ - mn
61
+ - mr
62
+ - ms
63
+ - my
64
+ - ne
65
+ - nl
66
+ - 'no'
67
+ - om
68
+ - or
69
+ - pa
70
+ - pl
71
+ - ps
72
+ - pt
73
+ - ro
74
+ - ru
75
+ - sa
76
+ - sd
77
+ - si
78
+ - sk
79
+ - sl
80
+ - so
81
+ - sq
82
+ - sr
83
+ - su
84
+ - sv
85
+ - sw
86
+ - ta
87
+ - te
88
+ - th
89
+ - tl
90
+ - tr
91
+ - ug
92
+ - uk
93
+ - ur
94
+ - uz
95
+ - vi
96
+ - xh
97
+ - yi
98
+ - zh
99
+ library_name: transformers
100
+ license: cc-by-nc-4.0
101
+ tags:
102
+ - xlm-roberta
103
+ - eva02
104
+ - clip
105
+ - feature-extraction
106
+ - sentence-similarity
107
+ - retrieval
108
+ - multimodal
109
+ - multi-modal
110
+ - crossmodal
111
+ - cross-modal
112
+ - mteb
113
+ - clip-benchmark
114
+ - vidore
115
+ - transformers
116
+ - sentence-transformers
117
+ - onnx
118
+ - safetensors
119
+ - transformers.js
120
+ inference: false
121
+ ---
122
+
123
+ <br><br>
124
+
125
+ <p align="center">
126
+ <img src="https://huggingface.co/datasets/jinaai/documentation-images/resolve/main/logo.webp" alt="Jina AI: Your Search Foundation, Supercharged!" width="150px">
127
+ </p>
128
+
129
+
130
+ <p align="center">
131
+ <b>The embedding set trained by <a href="https://jina.ai/"><b>Jina AI</b></a>.</b>
132
+ </p>
133
+
134
+ <p align="center">
135
+ <b>Jina CLIP v2: Multilingual Multimodal Embeddings for Texts and Images</b>
136
+ </p>
137
+
138
+ This model is based on the paper [jina-clip-v2: Multilingual Multimodal Embeddings for Text and Images](https://huggingface.co/papers/2412.08802).
139
+
140
+ ## Quick Start
141
+
142
+ [Blog](https://jina.ai/news/jina-clip-v2-multilingual-multimodal-embeddings-for-text-and-images) | [Technical Report](https://arxiv.org/abs/2412.08802) | [Azure](https://azuremarketplace.microsoft.com/en-gb/marketplace/apps/jinaai.jina-clip-v2-vm?tab=Overview) | [AWS SageMaker](https://aws.amazon.com/marketplace/pp/prodview-bfbctuqmky676) | [Google Cloud Platform](https://console.cloud.google.com/marketplace/browse?hl=en&inv=1&invt=AbiD-g&q=jina) | [API](https://jina.ai/embeddings)
143
+
144
+
145
+ ## Intended Usage & Model Info
146
+
147
+ `jina-clip-v2` is a **general-purpose multilingual multimodal embedding model for text & images**.
148
+
149
+ Multimodal embeddings enable searching and understanding data across different modalities through a coherent representation. They serve as the backbone of neural information retrieval and multimodal GenAI applications.
150
+
151
+ Built upon [`jina-clip-v1`](https://huggingface.co/jinaai/jina-clip-v1) and our recently released [`jina-embeddings-v3`](https://huggingface.co/jinaai/jina-embeddings-v3), `jina-clip-v2` features several significant improvements:
152
+
153
+ * **Improved Performance**: v2 shows a 3% performance improvement over v1 in both text-image and text-text retrieval tasks. Similar to v1, v2's text encoder can serve as an effective multilingual long-context dense retriever. It performs on par with our frontier model `jina-embeddings-v3` (currently the best multilingual embeddings under 1B parameters on MTEB).
154
+ * **Multilingual Support**: Using the same backbone as `jina-embeddings-v3` for the text tower, `jina-clip-v2` supports 89 languages for multilingual-image retrieval, showing up to 4% improvement compared to `nllb-clip-large-siglip` on multilingual image retrieval tasks.
155
+ * **Higher Image Resolution**: v2 now supports 512x512 input image resolution, a significant increase from v1's 224x224. This higher resolution enables better processing of detailed images, improved feature extraction, and more accurate recognition of fine-grained visual elements.
156
+ * **Matryoshka Representations**: v2 allows users to truncate the output dimensions of both text and image embeddings from 1024 down to 64, reducing storage and processing overhead while maintaining strong performance.
157
+
158
+ Measuring 0.9B parameters, `jina-clip-v2` combines two powerful encoders:
159
+ * the text encoder `Jina-XLM-RoBERTa` (the backbone of `jina-embeddings-v3`) and
160
+ * the vision encoder `EVA02-L14` (an efficient vision Transformer developed by BAAI).
161
+
162
+ | FEATURE | TEXT ENCODER | IMAGE ENCODER |
163
+ |-----------------------|-------------------------|------------------|
164
+ | Base Model | Jina-XLM-RoBERTa | EVA02-L |
165
+ | Parameters | 561M | 304M |
166
+ | Input Specification | 8,192 tokens (max) | 512×512 pixels |
167
+ | Min Output Dimensions | 64 | 64 |
168
+ | Max Output Dimensions | 1,024 | 1,024 |
169
+ | Layers | 24 | 24 |
170
+ | Attention Mechanism | FlashAttention2 | xFormers |
171
+ | Pooling Strategy | Mean pooling | CLS pooling |
172
+ | Additional Features | 89 languages supported | Patch size 14x14 |
173
+
174
+
175
+ These encoders are jointly trained to create aligned representations of images and text.
176
+
177
+ CLIP-like models have established themselves as the backbone for general-purpose multimodal applications. With `jina-clip-v2`, we're taking these capabilities to the next level, breaking down language barriers to deliver more accurate cross-modal understanding and retrieval. We're confident this release delivers a promise in making multimodal search and retrieval both more powerful and more accessible to developers worldwide.
178
+
179
+
180
+
181
+ ## Training, Data, Parameters
182
+
183
+ Please refer to our [technical report of jina-clip-v2](https://arxiv.org/abs/2412.08802) for the model and training details.
184
+
185
+ [technical report of jina-clip-v1](https://arxiv.org/abs/2405.20204)
186
+
187
+ ## Faster Inference: FA2, XFormers and bf16
188
+
189
+ On a CUDA enabled torch environment, the model comes in `torch.bfloat16`
190
+ precision by default. It is highly recommended to install
191
+ [FlashAttention](https://github.com/Dao-AILab/flash-attention?tab=readme-ov-file#installation-and-features)
192
+ and [xFormers](https://github.com/facebookresearch/xformers?tab=readme-ov-file#installing-xformers)
193
+ to make use of their efficient attention mechanism implementations.
194
+
195
+
196
+ ## Usage
197
+
198
+ <details>
199
+ <summary>via Jina AI <a href="https://jina.ai/embeddings/">Embedding API</a></summary>
200
+
201
+ ```bash
202
+ curl https://api.jina.ai/v1/embeddings \
203
+ -H "Content-Type: application/json" \
204
+ -H "Authorization: Bearer [JINA_AI_API_TOKEN]" \
205
+ -d @- <<EOFEOF
206
+ {
207
+ "model": "jina-clip-v2",
208
+ "dimensions": 1024,
209
+ "task": "retrieval.query",
210
+ "normalized": true,
211
+ "embedding_type": "float",
212
+ "input": [
213
+ {
214
+ "text": "غروب جميل على الشاطئ"
215
+ },
216
+ {
217
+ "text": "海滩上美丽的日落"
218
+ },
219
+ {
220
+ "text": "A beautiful sunset over the beach"
221
+ },
222
+ {
223
+ "text": "Un beau coucher de soleil sur la plage"
224
+ },
225
+ {
226
+ "text": "Ein wunderschöner Sonnenuntergang am Strand"
227
+ },
228
+ {
229
+ "text": "Ένα όμορφο ηλιοβασίλεμα πάνω από την παραλία"
230
+ },
231
+ {
232
+ "text": "समुद्र तट पर एक खूबसूरत सूर्यास्त"
233
+ },
234
+ {
235
+ "text": "Un bellissimo tramonto sulla spiaggia"
236
+ },
237
+ {
238
+ "text": "浜辺に沈む美しい夕日"
239
+ },
240
+ {
241
+ "text": "해변 위로 아름다운 일몰"
242
+ },
243
+ {
244
+ "image": "https://i.ibb.co/nQNGqL0/beach1.jpg"
245
+ },
246
+ {
247
+ "image": "https://i.ibb.co/r5w8hG8/beach2.jpg"
248
+ }
249
+ ]
250
+ }
251
+ EOFEOF
252
+ ```
253
+
254
+ </details>
255
+
256
+ <details>
257
+ <summary>via <a href="https://huggingface.co/docs/transformers/en/index">transformers</a></summary>
258
+
259
+ ```python
260
+ # !pip install transformers einops timm pillow
261
+ from transformers import AutoModel
262
+
263
+ # Initialize the model
264
+ model = AutoModel.from_pretrained('jinaai/jina-clip-v2', trust_remote_code=True)
265
+
266
+ # Corpus
267
+ sentences = [
268
+ 'غروب جميل على الشاطئ', # Arabic
269
+ '海滩上美丽的日落', # Chinese
270
+ 'Un beau coucher de soleil sur la plage', # French
271
+ 'Ein wunderschöner Sonnenuntergang am Strand', # German
272
+ 'Ένα όμορφο ηλιοβασίλεμα πάνω από την παραλία', # Greek
273
+ 'समुद्र तट पर एक खूबसूरत सूर्यास्त', # Hindi
274
+ 'Un bellissimo tramonto sulla spiaggia', # Italian
275
+ '浜辺に沈む美しい夕日', # Japanese
276
+ '해변 위로 아름다운 일몰', # Korean
277
+ ]
278
+
279
+ # Public image URLs or PIL Images
280
+ image_urls = ['https://i.ibb.co/nQNGqL0/beach1.jpg', 'https://i.ibb.co/r5w8hG8/beach2.jpg']
281
+
282
+ # Choose a matryoshka dimension, set to None to get the full 1024-dim vectors
283
+ truncate_dim = 512
284
+
285
+ # Encode text and images
286
+ text_embeddings = model.encode_text(sentences, truncate_dim=truncate_dim)
287
+ image_embeddings = model.encode_image(
288
+ image_urls, truncate_dim=truncate_dim
289
+ ) # also accepts PIL.Image.Image, local filenames, dataURI
290
+
291
+ # Encode query text
292
+ query = 'beautiful sunset over the beach' # English
293
+ query_embeddings = model.encode_text(
294
+ query, task='retrieval.query', truncate_dim=truncate_dim
295
+ )
296
+
297
+ # Text to Image
298
+ print('En -> Img: ' + str(query_embeddings @ image_embeddings[0].T))
299
+ # Image to Image
300
+ print('Img -> Img: ' + str(image_embeddings[0] @ image_embeddings[1].T))
301
+ # Text to Text
302
+ print('En -> Ar: ' + str(query_embeddings @ text_embeddings[0].T))
303
+ print('En -> Zh: ' + str(query_embeddings @ text_embeddings[1].T))
304
+ print('En -> Fr: ' + str(query_embeddings @ text_embeddings[2].T))
305
+ print('En -> De: ' + str(query_embeddings @ text_embeddings[3].T))
306
+ print('En -> Gr: ' + str(query_embeddings @ text_embeddings[4].T))
307
+ print('En -> Hi: ' + str(query_embeddings @ text_embeddings[5].T))
308
+ print('En -> It: ' + str(query_embeddings @ text_embeddings[6].T))
309
+ print('En -> Jp: ' + str(query_embeddings @ text_embeddings[7].T))
310
+ print('En -> Ko: ' + str(query_embeddings @ text_embeddings[8].T))
311
+ ```
312
+ </details>
313
+
314
+ <details>
315
+ <summary>via <a href="https://sbert.net/">sentence-transformers</a></summary>
316
+
317
+ ```python
318
+ # !pip install sentence-transformers einops timm pillow
319
+ from sentence_transformers import SentenceTransformer
320
+
321
+ # Choose a matryoshka dimension
322
+ truncate_dim = 512
323
+
324
+ # Initialize the model
325
+ model = SentenceTransformer(
326
+ 'jinaai/jina-clip-v2', trust_remote_code=True, truncate_dim=truncate_dim
327
+ )
328
+
329
+ # Corpus
330
+ sentences = [
331
+ 'غروب جميل على الشاطئ', # Arabic
332
+ '海滩上美丽的日落', # Chinese
333
+ 'Un beau coucher de soleil sur la plage', # French
334
+ 'Ein wunderschöner Sonnenuntergang am Strand', # German
335
+ 'Ένα όμορφο ηλιοβασίλεμα πάνω από την παραλία', # Greek
336
+ 'समुद्र तट पर एक खूबसूरत सूर्यास्त', # Hindi
337
+ 'Un bellissimo tramonto sulla spiaggia', # Italian
338
+ '浜辺に沈む美しい夕日', # Japanese
339
+ '해변 위로 아름다운 일몰', # Korean
340
+ ]
341
+
342
+ # Public image URLs or PIL Images
343
+ image_urls = ['https://i.ibb.co/nQNGqL0/beach1.jpg', 'https://i.ibb.co/r5w8hG8/beach2.jpg']
344
+
345
+ # Encode text and images
346
+ text_embeddings = model.encode(sentences, normalize_embeddings=True)
347
+ image_embeddings = model.encode(
348
+ image_urls, normalize_embeddings=True
349
+ ) # also accepts PIL.Image.Image, local filenames, dataURI
350
+
351
+ # Encode query text
352
+ query = 'beautiful sunset over the beach' # English
353
+ query_embeddings = model.encode(
354
+ query, prompt_name='retrieval.query', normalize_embeddings=True
355
+ )
356
+ ```
357
+ </details>
358
+
359
+ <details>
360
+ <summary>via <a href="https://huggingface.co/docs/transformers.js/en/index">transformers.js</a></summary>
361
+
362
+ > [!NOTE]
363
+ > JinaCLIP was added in Transformers.js v3.1.0, so make sure you're using a compatible version!
364
+ > See the [release notes](https://github.com/huggingface/transformers.js/releases/tag/3.1.0) for more information.
365
+
366
+ If you haven't already, you can install the [Transformers.js](https://huggingface.co/docs/transformers.js) JavaScript library from [NPM](https://www.npmjs.com/package/@huggingface/transformers) using:
367
+ ```bash
368
+ npm i @huggingface/transformers
369
+ ```
370
+
371
+ **Example:** Compute text and/or image embeddings with `jinaai/jina-clip-v2`:
372
+ ```js
373
+ import { AutoModel, AutoProcessor, RawImage, matmul } from "@huggingface/transformers";
374
+
375
+ // Load processor and model
376
+ const model_id = "jinaai/jina-clip-v2";
377
+ const processor = await AutoProcessor.from_pretrained(model_id);
378
+ const model = await AutoModel.from_pretrained(model_id, { dtype: "q4" /* e.g., "fp16", "q8", or "q4" */ });
379
+
380
+ // Prepare inputs
381
+ const urls = ["https://i.ibb.co/nQNGqL0/beach1.jpg", "https://i.ibb.co/r5w8hG8/beach2.jpg"];
382
+ const images = await Promise.all(urls.map(url => RawImage.read(url)));
383
+ const sentences = [
384
+ "غروب جميل على الشاطئ", // Arabic
385
+ "海滩上美丽的日落", // Chinese
386
+ "Un beau coucher de soleil sur la plage", // French
387
+ "Ein wunderschöner Sonnenuntergang am Strand", // German
388
+ "Ένα όμορφο ηλιοβασίλεμα πάνω από την παραλία", // Greek
389
+ "समुद्र तट पर एक खूबसूरत सूर्यास्त", // Hindi
390
+ "Un bellissimo tramonto sulla spiaggia", // Italian
391
+ "浜辺に沈む美しい夕日", // Japanese
392
+ "해변 위로 아름다운 일몰", // Korean
393
+ ];
394
+
395
+ // Encode text and images
396
+ const inputs = await processor(sentences, images, { padding: true, truncation: true });
397
+ const { l2norm_text_embeddings, l2norm_image_embeddings } = await model(inputs);
398
+
399
+ // Encode query (text-only)
400
+ const query_prefix = "Represent the query for retrieving evidence documents: ";
401
+ const query_inputs = await processor(query_prefix + "beautiful sunset over the beach");
402
+ const { l2norm_text_embeddings: query_embeddings } = await model(query_inputs);
403
+
404
+ // Compute text-image similarity scores
405
+ const text_to_image_scores = await matmul(query_embeddings, l2norm_image_embeddings.transpose(1, 0));
406
+ console.log("text-image similarity scores", text_to_image_scores.tolist()[0]); // [0.29530206322669983, 0.3183615803718567]
407
+
408
+ // Compute image-image similarity scores
409
+ const image_to_image_score = await matmul(l2norm_image_embeddings[0], l2norm_image_embeddings[1]);
410
+ console.log("image-image similarity score", image_to_image_score.item()); // 0.9344457387924194
411
+
412
+ // Compute text-text similarity scores
413
+ const text_to_text_scores = await matmul(query_embeddings, l2norm_text_embeddings.transpose(1, 0));
414
+ console.log("text-text similarity scores", text_to_text_scores.tolist()[0]); // [0.5566609501838684, 0.7028406858444214, 0.582255482673645, 0.6648036241531372, 0.5462006330490112, 0.6791588068008423, 0.6192430257797241, 0.6258729100227356, 0.6453716158866882]
415
+ ```
416
+ </details>
417
+
418
+
419
+ <details>
420
+ <summary>via the <a href="https://onnxruntime.ai/">ONNX Runtime</a></summary>
421
+
422
+ ```python
423
+ # !pip install transformers onnxruntime pillow
424
+ import onnxruntime as ort
425
+ from transformers import AutoImageProcessor, AutoTokenizer
426
+
427
+ # Load tokenizer and image processor using transformers
428
+ tokenizer = AutoTokenizer.from_pretrained('jinaai/jina-clip-v2', trust_remote_code=True)
429
+ image_processor = AutoImageProcessor.from_pretrained(
430
+ 'jinaai/jina-clip-v2', trust_remote_code=True
431
+ )
432
+
433
+ # Corpus
434
+ sentences = [
435
+ 'غروب جميل على الشاطئ', # Arabic
436
+ '海滩上美丽的日落', # Chinese
437
+ 'Un beau coucher de soleil sur la plage', # French
438
+ 'Ein wunderschöner Sonnenuntergang am Strand', # German
439
+ 'Ένα όμορφο ηλιοβασίλεμα πάνω από την παραλία', # Greek
440
+ 'समुद्र तट पर एक खूबसूरत सूर्यास्त', # Hindi
441
+ 'Un bellissimo tramonto sulla spiaggia', # Italian
442
+ '浜辺に沈む美しい夕日', # Japanese
443
+ '해변 위로 아름다운 일몰', # Korean
444
+ ]
445
+
446
+ # Public image URLs or PIL Images
447
+ image_urls = ['https://i.ibb.co/nQNGqL0/beach1.jpg', 'https://i.ibb.co/r5w8hG8/beach2.jpg']
448
+
449
+ # Tokenize input texts and transform input images
450
+ input_ids = tokenizer(sentences, return_tensors='np')['input_ids']
451
+ pixel_values = image_processor(image_urls)['pixel_values']
452
+
453
+ # Start an ONNX Runtime Session
454
+ session = ort.InferenceSession('jina-clip-v2/onnx/model.onnx')
455
+
456
+ # Run inference
457
+ output = session.run(None, {'input_ids': input_ids, 'pixel_values': pixel_values})
458
+
459
+ # Keep the normalised embeddings, first 2 outputs are un-normalized
460
+ _, _, text_embeddings, image_embeddings = output
461
+ ```
462
+
463
+ </details>
464
+
465
+
466
+
467
+ ## License
468
+
469
+ This model is licensed to download and run under [CC BY-NC 4.0](https://creativecommons.org/licenses/by-nc/4.0/deed.en). It is available for commercial use via the [Jina Embeddings API](https://jina.ai/embeddings/), [AWS](https://aws.amazon.com/marketplace/pp/prodview-bfbctuqmky676), [Azure](https://azuremarketplace.microsoft.com/en-gb/marketplace/apps/jinaai.jina-clip-v2-vm?tab=Overview), and [GCP](https://console.cloud.google.com/marketplace/browse?hl=en&inv=1&invt=AbiFWQ&q=jina). To download for commercial use, please [contact us](https://jina.ai/contact-sales).
470
+
471
+
472
+ ## Contact
473
+
474
+ Join our [Discord community](https://discord.jina.ai) and chat with other community members about ideas.
475
+
476
+
477
+ ## Citation
478
+
479
+ If you find `jina-clip-v2` useful in your research, please cite the following paper:
480
+
481
+ ```bibtex
482
+ @misc{koukounas2024jinaclipv2multilingualmultimodalembeddings,
483
+ title={jina-clip-v2: Multilingual Multimodal Embeddings for Text and Images},
484
+ author={Andreas Koukounas and Georgios Mastrapas and Bo Wang and Mohammad Kalim Akram and Sedigheh Eslami and Michael Günther and Isabelle Mohr and Saba Sturua and Scott Martens and Nan Wang and Han Xiao},
485
+ year={2024},
486
+ eprint={2412.08802},
487
+ archivePrefix={arXiv},
488
+ primaryClass={cs.CL},
489
+ url={https://arxiv.org/abs/2412.08802},
490
+ }
491
+ ```
config.json ADDED
@@ -0,0 +1,70 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_projections": false,
3
+ "architectures": [
4
+ "JinaCLIPModel"
5
+ ],
6
+ "auto_map": {
7
+ "AutoConfig": "jinaai/jina-clip-implementation--configuration_clip.JinaCLIPConfig",
8
+ "AutoModel": "jinaai/jina-clip-implementation--modeling_clip.JinaCLIPModel"
9
+ },
10
+ "initializer_factor": 1.0,
11
+ "logit_scale_init_value": 2.6592,
12
+ "matryoshka_dimensions": [32, 64, 128, 256, 512, 768, 1024],
13
+ "model_type": "jina_clip",
14
+ "projection_dim": 1024,
15
+ "text_config": {
16
+ "default_instruction_task": null,
17
+ "default_lora_task": "retrieval.query",
18
+ "embed_dim": 1024,
19
+ "hf_model_config_kwargs": {
20
+ "load_trained_adapters": false,
21
+ "lora_adaptations": [
22
+ "retrieval.query"
23
+ ],
24
+ "lora_alpha": 4,
25
+ "lora_dropout_p": 0.0,
26
+ "lora_main_params_trainable": false,
27
+ "lora_rank": 4,
28
+ "task_instructions": {
29
+ "retrieval.query": "Represent the query for retrieving evidence documents: "
30
+ },
31
+ "use_flash_attn": true
32
+ },
33
+ "hf_model_name_or_path": "jinaai/jina-embeddings-v3",
34
+ "model_type": "jina_clip_text",
35
+ "pooler_type": "mean_pooler",
36
+ "proj_bias": false,
37
+ "proj_type": null
38
+ },
39
+ "torch_dtype": "bfloat16",
40
+ "transformers.js_config": {
41
+ "use_external_data_format": {
42
+ "model.onnx": true
43
+ }
44
+ },
45
+ "truncate_dim": null,
46
+ "use_text_flash_attn": null,
47
+ "use_vision_xformers": null,
48
+ "vision_config": {
49
+ "embed_dim": 1024,
50
+ "fused_layer_norm": false,
51
+ "head_width": 64,
52
+ "image_size": 512,
53
+ "intp_freq": true,
54
+ "layers": 24,
55
+ "ls_init_value": null,
56
+ "mlp_ratio": 2.6667,
57
+ "model_type": "jina_clip_vision",
58
+ "naive_swiglu": true,
59
+ "patch_dropout": 0.1,
60
+ "patch_size": 14,
61
+ "post_norm": false,
62
+ "proj_type": null,
63
+ "pt_hw_seq_len": 16,
64
+ "qkv_bias": true,
65
+ "rope_embeddings": true,
66
+ "subln": true,
67
+ "width": 1024,
68
+ "x_attention": true
69
+ }
70
+ }
config_sentence_transformers.json ADDED
@@ -0,0 +1,12 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "__version__": {
3
+ "sentence_transformers": "3.3.0",
4
+ "transformers": "4.46.2",
5
+ "pytorch": "2.2.2"
6
+ },
7
+ "prompts":{
8
+ "retrieval.query":"Represent the query for retrieving evidence documents: "
9
+ },
10
+ "default_prompt_name": null,
11
+ "similarity_fn_name": "cosine"
12
+ }
custom_st.py ADDED
@@ -0,0 +1,275 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import base64
2
+ import json
3
+ import os
4
+ from io import BytesIO
5
+ from typing import Any, Dict, List, Literal, Optional, Union
6
+
7
+ import requests
8
+ import torch
9
+ from PIL import Image
10
+ from torch import nn
11
+ from transformers import AutoConfig, AutoImageProcessor, AutoModel, AutoTokenizer
12
+
13
+
14
+ class Transformer(nn.Module):
15
+
16
+ save_in_root: bool = True
17
+
18
+ def __init__(
19
+ self,
20
+ model_name_or_path: str = 'jinaai/jina-clip-v2',
21
+ tokenizer_name_or_path: Optional[str] = None,
22
+ image_processor_name_or_path: Optional[str] = None,
23
+ max_seq_length: Optional[int] = None,
24
+ config_args: Optional[Dict[str, Any]] = None,
25
+ model_args: Optional[Dict[str, Any]] = None,
26
+ tokenizer_args: Optional[Dict[str, Any]] = None,
27
+ image_processor_args: Optional[Dict[str, Any]] = None,
28
+ assume_text_inputs: bool = False,
29
+ cache_dir: Optional[str] = None,
30
+ backend: Literal['torch', 'onnx', 'openvino'] = 'torch',
31
+ **_,
32
+ ) -> None:
33
+ """
34
+ Creates a custom SentenceTransformer module that uses `jinai/jina-clip-v2` to
35
+ map sentences/images to embeddings
36
+
37
+ Args:
38
+ model_name_or_path (str, optional): If it is a filepath on disc, it loads
39
+ the model from that path. If it is not a path, tries to construct a
40
+ model from the Hugging Face Hub with that name. Defaults to
41
+ 'jinaai/jina-clip-v2'
42
+ tokenizer_name_or_path (str, optional): If it is a filepath on disc, it
43
+ loads the tokenizer from that path. If it is not a path, tries to
44
+ construct a tokenizer from the Hugging Face Hub with that name.
45
+ If `None` it is automatically set to the value of `model_name_or_path`
46
+ image_processor_name_or_path (str, optional): If it is a filepath on disc,
47
+ it loads the image processor from that path. If it is not a path, tries
48
+ to construct an image processor from the Hugging Face Hub with that
49
+ name. If `None` it is automatically set to the value of
50
+ `model_name_or_path`
51
+ max_seq_length (int, optional): The maximum sequence length of the model.
52
+ If not provided, will be inferred from model or tokenizer
53
+ config_args (Dict[str, Any], optional): Additional model configuration
54
+ parameters to be passed to the Hugging Face Transformers config
55
+ model_args (Dict[str, Any], optional): Additional model configuration
56
+ parameters to be passed to the Hugging Face Transformers model
57
+ tokenizer_args (Dict[str, Any], optional): Additional tokenizer
58
+ configuration parameters to be passed to the Hugging Face Transformers
59
+ tokenizer
60
+ image_processor_args (Dict[str, Any], optional): Additional image processor
61
+ configuration parameters to be passed to the Hugging Face Transformers
62
+ image processor
63
+ assume_text_inputs (bool, optional): If set to `True`, all inputs are
64
+ treated as texts. Defaults to `False`
65
+ cache_dir (str, optional): The Hugging Face Hub cache directory
66
+ backend (str, optional): Computational backend, only 'torch' is supported
67
+
68
+ Example:
69
+ ::
70
+
71
+ from sentence_transformers import SentenceTransformer
72
+
73
+ model = SentenceTransformer(
74
+ 'jinaai/jina-clip-v2', trust_remote_code=True
75
+ )
76
+ sentences_or_images = [
77
+ "The weather is lovely today.",
78
+ "It's so sunny outside!",
79
+ "/path/to/stadium.jpg",
80
+ ]
81
+ embeddings = model.encode(sentences_or_images)
82
+ print(embeddings.shape)
83
+ # (3, 1024)
84
+
85
+ # Get the similarity scores between all inputs
86
+ similarities = model.similarity(embeddings, embeddings)
87
+ print(similarities)
88
+ # tensor([[1.0000, 0.6817, 0.0492],
89
+ # [0.6817, 1.0000, 0.0421],
90
+ # [0.0492, 0.0421, 1.0000]])
91
+ """
92
+ super(Transformer, self).__init__()
93
+ if backend != 'torch':
94
+ raise ValueError(
95
+ f'Backend \'{backend}\' is not supported, please use \'torch\' instead'
96
+ )
97
+
98
+ config_kwargs = config_args or {}
99
+ model_kwargs = model_args or {}
100
+ tokenizer_kwargs = tokenizer_args or {}
101
+ image_processor_kwargs = {
102
+ 'token': model_kwargs.get('token', None),
103
+ 'trust_remote_code': model_kwargs.get('trust_remote_code', False),
104
+ 'revision': model_kwargs.get('revision', None),
105
+ 'local_files_only': model_kwargs.get('local_files_only', None),
106
+ }
107
+ image_processor_kwargs.update(image_processor_args or {})
108
+
109
+ config = AutoConfig.from_pretrained(
110
+ model_name_or_path, cache_dir=cache_dir, **config_kwargs
111
+ )
112
+ self.model = AutoModel.from_pretrained(
113
+ model_name_or_path, config=config, cache_dir=cache_dir, **model_kwargs
114
+ )
115
+ if max_seq_length is not None and 'model_max_length' not in tokenizer_kwargs:
116
+ tokenizer_kwargs['model_max_length'] = max_seq_length
117
+
118
+ self.tokenizer = AutoTokenizer.from_pretrained(
119
+ tokenizer_name_or_path or model_name_or_path,
120
+ cache_dir=cache_dir,
121
+ **tokenizer_kwargs,
122
+ )
123
+ self.image_processor = AutoImageProcessor.from_pretrained(
124
+ image_processor_name_or_path or model_name_or_path,
125
+ cache_dir=cache_dir,
126
+ **image_processor_kwargs,
127
+ )
128
+ self.assume_text_inputs = assume_text_inputs
129
+
130
+ # No max_seq_length set. Try to infer from model
131
+ if max_seq_length is None:
132
+ if (
133
+ hasattr(self.model, 'config')
134
+ and hasattr(self.model.config, 'max_position_embeddings')
135
+ and hasattr(self.tokenizer, 'model_max_length')
136
+ ):
137
+ max_seq_length = min(
138
+ self.model.config.max_position_embeddings,
139
+ self.tokenizer.model_max_length,
140
+ )
141
+ self.max_seq_length = max_seq_length
142
+ if tokenizer_name_or_path is not None:
143
+ self.model.config.tokenizer_class = self.tokenizer.__class__.__name__
144
+
145
+ @staticmethod
146
+ def _decode_data_image(data_image_str: str) -> Image.Image:
147
+ header, data = data_image_str.split(',', 1)
148
+ image_data = base64.b64decode(data)
149
+ return Image.open(BytesIO(image_data))
150
+
151
+ def tokenize(
152
+ self, texts: List[Union[str, Image.Image]], padding: Union[str, bool] = True
153
+ ) -> Dict[str, torch.Tensor]:
154
+ """
155
+ Encodes input samples. Text samples are tokenized. Image URLs, image data
156
+ buffers and PIL images are passed through the image processor.
157
+ """
158
+ _images = []
159
+ _texts = []
160
+ _image_or_text_descriptors = []
161
+
162
+ if self.assume_text_inputs:
163
+ for sample in texts:
164
+ if isinstance(sample, str):
165
+ _texts.append(sample)
166
+ _image_or_text_descriptors.append(1)
167
+ else:
168
+ for sample in texts:
169
+ if isinstance(sample, str):
170
+ if sample.startswith('http'):
171
+ try:
172
+ response = requests.get(sample)
173
+ _images.append(
174
+ Image.open(BytesIO(response.content)).convert('RGB')
175
+ )
176
+ _image_or_text_descriptors.append(0)
177
+ except Exception as e:
178
+ _ = str(e)
179
+ _texts.append(sample)
180
+ _image_or_text_descriptors.append(1)
181
+ elif sample.startswith('data:image/'):
182
+ _images.append(self._decode_data_image(sample).convert('RGB'))
183
+ _image_or_text_descriptors.append(0)
184
+ else:
185
+ try:
186
+ _images.append(Image.open(sample).convert('RGB'))
187
+ _image_or_text_descriptors.append(0)
188
+ except Exception as e:
189
+ _ = str(e)
190
+ _texts.append(sample)
191
+ _image_or_text_descriptors.append(1)
192
+ elif isinstance(sample, Image.Image):
193
+ _images.append(sample.convert('RGB'))
194
+ _image_or_text_descriptors.append(0)
195
+
196
+ encoding = {}
197
+ if len(_texts):
198
+ encoding['input_ids'] = self.tokenizer(
199
+ _texts,
200
+ padding=padding,
201
+ truncation='longest_first',
202
+ return_tensors='pt',
203
+ max_length=self.max_seq_length,
204
+ ).input_ids
205
+
206
+ if len(_images):
207
+ encoding['pixel_values'] = self.image_processor(
208
+ _images, return_tensors='pt'
209
+ ).pixel_values
210
+
211
+ encoding['image_text_info'] = _image_or_text_descriptors
212
+ return encoding
213
+
214
+ def forward(self, features: Dict[str, torch.Tensor]) -> Dict[str, torch.Tensor]:
215
+ image_embeddings = []
216
+ text_embeddings = []
217
+
218
+ if 'pixel_values' in features:
219
+ image_embeddings = self.model.get_image_features(features['pixel_values'])
220
+ if 'input_ids' in features:
221
+ text_embeddings = self.model.get_text_features(features['input_ids'])
222
+
223
+ sentence_embedding = []
224
+ image_features = iter(image_embeddings)
225
+ text_features = iter(text_embeddings)
226
+ for _, _input_type in enumerate(features['image_text_info']):
227
+ if _input_type == 0:
228
+ sentence_embedding.append(next(image_features))
229
+ else:
230
+ sentence_embedding.append(next(text_features))
231
+
232
+ features['sentence_embedding'] = torch.stack(sentence_embedding).float()
233
+ return features
234
+
235
+ def save(self, output_path: str, safe_serialization: bool = True) -> None:
236
+ self.model.save_pretrained(output_path, safe_serialization=safe_serialization)
237
+ self.tokenizer.save_pretrained(output_path)
238
+ self.image_processor.save_pretrained(output_path)
239
+
240
+ @staticmethod
241
+ def load(input_path: str) -> 'Transformer':
242
+ # Old classes used other config names than 'sentence_bert_config.json'
243
+ for config_name in [
244
+ 'sentence_bert_config.json',
245
+ 'sentence_roberta_config.json',
246
+ 'sentence_distilbert_config.json',
247
+ 'sentence_camembert_config.json',
248
+ 'sentence_albert_config.json',
249
+ 'sentence_xlm-roberta_config.json',
250
+ 'sentence_xlnet_config.json',
251
+ ]:
252
+ sbert_config_path = os.path.join(input_path, config_name)
253
+ if os.path.exists(sbert_config_path):
254
+ break
255
+
256
+ with open(sbert_config_path) as fIn:
257
+ config = json.load(fIn)
258
+
259
+ # Don't allow configs to set trust_remote_code
260
+ if 'config_kwargs' in config and 'trust_remote_code' in config['config_kwargs']:
261
+ config['config_kwargs'].pop('trust_remote_code')
262
+ if 'model_kwargs' in config and 'trust_remote_code' in config['model_kwargs']:
263
+ config['model_kwargs'].pop('trust_remote_code')
264
+ if (
265
+ 'tokenizer_kwargs' in config
266
+ and 'trust_remote_code' in config['tokenizer_kwargs']
267
+ ):
268
+ config['tokenizer_kwargs'].pop('trust_remote_code')
269
+ if (
270
+ 'image_processor_kwargs' in config
271
+ and 'trust_remote_code' in config['image_processor_kwargs']
272
+ ):
273
+ config['image_processor_kwargs'].pop('trust_remote_code')
274
+
275
+ return Transformer(model_name_or_path=input_path, **config)
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:eff4c0a13ab4de71a9927a56968fef44e626920ff935e503f1bd3e6ec797062d
3
+ size 1730688642
modules.json ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "idx": 0,
4
+ "name": "transformer",
5
+ "path": "",
6
+ "type": "custom_st.Transformer"
7
+ },
8
+ {
9
+ "idx": 1,
10
+ "name": "normalizer",
11
+ "path": "1_Normalize",
12
+ "type": "sentence_transformers.models.Normalize"
13
+ }
14
+ ]
onnx/model.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e8bdc8a8124e10305fa97140ba11902e06b91f8a4dcf13a1664da521cdc155ed
3
+ size 2090152
onnx/model.onnx_data ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0ec73a3b6f33c472249f7b058c3cbfb9586483b88ee5930b3b3749dff7acd873
3
+ size 3453550848
onnx/model_bnb4.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1029184ed517000825a39faa6ff154c68b98ef06a67dd2f88b3269ef69fec55f
3
+ size 1379631302
onnx/model_fp16.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:746a78209096d1cd52891b70d752903b8bf86088ba847bd0c56c03fb29256801
3
+ size 1728814880
onnx/model_int8.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:21b8b77a009865faecaa29f076ee55d6334ea42699a9efa14d542ce8d3938a3f
3
+ size 874350932
onnx/model_q4.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:032967b983b9cea6fb94811f4b6fe1986deff89d13ee794a1f3b124df711f5c5
3
+ size 1417712750
onnx/model_q4f16.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9a753f9cdd4061cdd4342beeafeb442b2e5b562b49bb47d3bebedad7693fa602
3
+ size 861019483
onnx/model_quantized.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:65c6423fc82eecffb7f7f813730c6a6f0d28e2dc908e414250733b1416ed30bf
3
+ size 874351078
onnx/model_uint8.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:65c6423fc82eecffb7f7f813730c6a6f0d28e2dc908e414250733b1416ed30bf
3
+ size 874351078
preprocessor_config.json ADDED
@@ -0,0 +1,22 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "auto_map": {
3
+ "AutoImageProcessor": "jinaai/jina-clip-implementation--processing_clip.JinaCLIPImageProcessor",
4
+ "AutoProcessor": "jinaai/jina-clip-implementation--processing_clip.JinaCLIPProcessor"
5
+ },
6
+ "fill_color": 0,
7
+ "image_processor_type": "JinaCLIPImageProcessor",
8
+ "interpolation": "bicubic",
9
+ "mean": [
10
+ 0.48145466,
11
+ 0.4578275,
12
+ 0.40821073
13
+ ],
14
+ "processor_class": "JinaCLIPProcessor",
15
+ "resize_mode": "shortest",
16
+ "size": 512,
17
+ "std": [
18
+ 0.26862954,
19
+ 0.26130258,
20
+ 0.27577711
21
+ ]
22
+ }
pytorch_model.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:8721cced553dc439b45c1dfd30d36d7535ab93f92e38aae7fa36f4380ffdd11d
3
+ size 1730896230
special_tokens_map.json ADDED
@@ -0,0 +1,51 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": {
3
+ "content": "<s>",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "cls_token": {
10
+ "content": "<s>",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "eos_token": {
17
+ "content": "</s>",
18
+ "lstrip": false,
19
+ "normalized": false,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ },
23
+ "mask_token": {
24
+ "content": "<mask>",
25
+ "lstrip": true,
26
+ "normalized": false,
27
+ "rstrip": false,
28
+ "single_word": false
29
+ },
30
+ "pad_token": {
31
+ "content": "<pad>",
32
+ "lstrip": false,
33
+ "normalized": false,
34
+ "rstrip": false,
35
+ "single_word": false
36
+ },
37
+ "sep_token": {
38
+ "content": "</s>",
39
+ "lstrip": false,
40
+ "normalized": false,
41
+ "rstrip": false,
42
+ "single_word": false
43
+ },
44
+ "unk_token": {
45
+ "content": "<unk>",
46
+ "lstrip": false,
47
+ "normalized": false,
48
+ "rstrip": false,
49
+ "single_word": false
50
+ }
51
+ }
tokenizer.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6601c4120779a1a3863897ba332fe3481d548e363bec2c91eba10ef8640a5e93
3
+ size 17082997
tokenizer_config.json ADDED
@@ -0,0 +1,54 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "<s>",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "1": {
12
+ "content": "<pad>",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "2": {
20
+ "content": "</s>",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "3": {
28
+ "content": "<unk>",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "250001": {
36
+ "content": "<mask>",
37
+ "lstrip": true,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ }
43
+ },
44
+ "bos_token": "<s>",
45
+ "clean_up_tokenization_spaces": true,
46
+ "cls_token": "<s>",
47
+ "eos_token": "</s>",
48
+ "mask_token": "<mask>",
49
+ "model_max_length": 8194,
50
+ "pad_token": "<pad>",
51
+ "sep_token": "</s>",
52
+ "tokenizer_class": "XLMRobertaTokenizer",
53
+ "unk_token": "<unk>"
54
+ }