BERT Base Uncased Fine-Tuned on MRPC
This model is a fine-tuned version of bert-base-uncased on the GLUE MRPC (Microsoft Research Paraphrase Corpus) dataset.
It determines whether two English sentences are paraphrases (have the same meaning).
The model was trained using the Hugging Face Transformers library and PyTorch on an Apple M2 machine.
Model Details
Model Description
This is a BERT-based sequence classification model fine-tuned for semantic similarity and paraphrase detection.
It outputs a binary label:
1→ the two sentences mean the same thing (paraphrase)0→ they don’t mean the same thing (non-paraphrase)Developed by: Juan Sebastián Reina García
Model type: Transformer-based encoder (BERT)
Language(s): English
License: Apache-2.0 (inherits from BERT base)
Finetuned from model:
bert-base-uncased
Model Sources
- Repository: https://huggingface.co/juan-reina33/mrpc-bert-uncased-finetuned
- Paper: GLUE Benchmark (Wang et al., 2018)
- Demo: Coming soon
Uses
Direct Use
Use this model for sentence-pair classification or semantic similarity detection tasks.
Example use cases:
- Detecting duplicate questions or answers
- Identifying paraphrased customer support tickets
- Measuring semantic equivalence in English text
Example code:
clf = pipeline("text-classification", model="juan-reina33/mrpc-bert-uncased-finetuned")
result = clf({
"text": "The company released a new product.",
"text_pair": "A new product was launched by the company."
})
print(result)
# [{'label': 'paraphrase', 'score': 0.89}]
Downstream Use
You can fine-tune this model further for:
Natural Language Inference (NLI)
Question–Answer entailment
Duplicate detection in domain-specific datasets
Out-of-Scope Use
Not intended for:
Non-English text
Factual reasoning or high-stakes decision systems (e.g., legal, medical)
Bias, Risks, and Limitations
The model may reflect biases from BERT’s original pretraining data.
Performance may drop on noisy, informal, or domain-specific English.
Not suitable for multilingual text.
Recommendations
Always test the model on your target dataset before deployment. Use human validation for sensitive use cases.
How to Get Started
python Copy code from transformers import AutoTokenizer, AutoModelForSequenceClassification
tokenizer = AutoTokenizer.from_pretrained("juan-reina33/mrpc-bert-uncased-finetuned") model = AutoModelForSequenceClassification.from_pretrained("juan-reina33/mrpc-bert-uncased-finetuned")
Training Details
Training Data
Dataset: MRPC (Microsoft Research Paraphrase Corpus) from the GLUE benchmark
3,668 training pairs
408 validation pairs
Binary labels: 1 = paraphrase, 0 = not
Training Procedure
Optimizer: AdamW
Learning rate: 5e-5
Batch size: 8
Epochs: 3
Scheduler: Linear decay, no warm-up
Precision: fp32
Loss: CrossEntropyLoss
Speeds, Sizes, Times
Training time: ~25 min total (3 epochs on Apple M2)
Model size: ~420 MB
Frameworks: PyTorch 2.9.0, Transformers 4.57.1
Evaluation
Metric Score
Accuracy 84.8% F1 Score 89.2%
Environmental Impact
Hardware: Apple MacBook Air M2, 16 GB RAM
Training duration: ~0.5 hrs
Cloud Provider: None (local)
Estimated CO₂: < 0.01 kg
Citation
BibTeX
bibtex Copy code
title = {BERT Base Uncased Fine-Tuned on MRPC},
author = {Juan Sebastián Reina García},
year = {2025},
publisher = {Hugging Face},
howpublished = {\url{https://huggingface.co/juan-reina33/mrpc-bert-uncased-finetuned}}
}
APA
Reina García, J. S. (2025). BERT Base Uncased Fine-Tuned on MRPC [Computer software]. Hugging Face. https://huggingface.co/juan-reina33/mrpc-bert-uncased-finetuned
Model Card Author
Juan Sebastián Reina García
Contact
Hugging Face: @juan-reina33
- Downloads last month
- 3