nomic-ai/nomic-embed-text-v1.5

It also requires the removal of these 3 lines: https://huggingface.co/nomic-ai/nomic-embed-text-v1.5/blob/main/sentence_bert_config.json#L4-L6
Once removed, this PR, together with the GitHub PR that you mentioned, will result in identical performance in my evaluations on Nano-MSMARCO:

from sentence_transformers import SentenceTransformer
from sentence_transformers.evaluation import NanoBEIREvaluator

# model = SentenceTransformer("nomic-ai/nomic-embed-text-v1.5", trust_remote_code=True)
model = SentenceTransformer("nomic-ai/nomic-embed-text-v1.5", revision="refs/pr/57")
model.prompts = {
    "document": "search_document: ",
    "query": "search_query: ",
}

evaluator = NanoBEIREvaluator(["msmarco"])
results = evaluator(model)
print(results[evaluator.primary_metric])
# Baseline (transformers ~v5.4, trust_remote_code)
# 0.5925795457057265

# New, local, transformers:
# 0.5925795457057265

Of course, the core issue that remains is that merging this + the GitHub PR would break the model for all current users. Could we perhaps keep the auto_map? Then, current users can continue to use the model without issues, and we can update the README to suggest that new users can avoid trust_remote_code once their transformers version is high enough?

Tom Aarsen

franken config + fix ste8134cbb

tomaarsen

Nomic AI org 21 days ago

Just tested this again, and I'm getting identical performance on NanoMSMARCO using Sentence Transformers using:

trust_remote_code=True on main
trust_remote_code=True on this PR
No trust_remote_code on this PR, i.e. with native transformers

Nice work.

Tom Aarsen

rm rf remote code and add notice45db6f4e

hnomic changed pull request status to merged 16 days ago

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment