π¬ DistilBERT IMDb Sentiment Classifier
This model is a fine-tuned version of DistilBERT for sentiment analysis on the IMDb movie reviews dataset.
It classifies reviews into positive or negative sentiment with high accuracy.
π Model Details
- Model type: Transformer-based model (DistilBERT)
- Task: Sentiment Analysis / Text Classification
- Framework: Hugging Face Transformers (PyTorch)
- Dataset: IMDb (50,000+ labeled movie reviews)
- Accuracy: ~91.5% on test data
π Usage
You can load and use this model directly with transformers:
from transformers import pipeline
sentiment_pipeline = pipeline("sentiment-analysis", model="mirko-doljanica/distilbert-imdb-sentiment")
print(sentiment_pipeline("I loved this movie! The performances were amazing."))
## π Results
| Metric | Score |
|--------------|-------|
| Accuracy | 0.915 |
| Precision | 0.915 |
| Recall | 0.915 |
| F1-score | 0.915 |
These results were achieved after fine-tuning DistilBERT on the IMDb dataset with 2 epochs and a batch size of 8. The model performs consistently well on binary sentiment classification tasks.
π¦ Applications
- π Review Analysis: Automatically classify product, movie, or app reviews as positive or negative.
- π¬ Social Media Monitoring: Analyze public sentiment from platforms like Twitter, Reddit, or forums.
- π§ Customer Feedback Analysis: Segment user feedback into positive and negative categories for better decision-making.
- π Opinion Mining: Detect trends in public opinion about topics, brands, or events.
- π§ͺ Research & Prototyping: Quickly integrate into NLP workflows for experimentation and benchmarking.
β οΈ Limitations & Bias
- π£οΈ Language Restriction: The model is trained on English text. Performance will degrade significantly for other languages without further fine-tuning.
- π¬ Domain Bias: Since the training data consists of movie reviews, performance might vary when used for other domains like finance, healthcare, or politics.
- π€ Data Bias: The dataset may contain subjective or biased opinions reflecting societal stereotypes present in IMDb reviews.
- π Nuance Handling: The model might struggle with sarcasm, humor, or context-heavy statements that require deeper semantic understanding.
π οΈ How It Was Built
This model was built by fine-tuning the pre-trained DistilBERT model using the IMDb movie reviews dataset.
The training process involved:
- Preprocessing text (tokenization, truncation, padding)
- Splitting into training and validation sets
- Fine-tuning for 2 epochs with a batch size of 8
- Evaluation on the test set to measure accuracy, precision, recall, and F1-score
Training was done with the Hugging Face transformers, datasets, and accelerate libraries, leveraging PyTorch as the backend.
π Citation
If you use this model in your research, please cite it as follows: @misc{distilbert_imdb_sentiment, author = {Mirko Doljanica}, title = {DistilBERT IMDb Sentiment Classifier}, year = {2025}, howpublished = {\url{https://huggingface.co/mirko-doljanica/distilbert-imdb-sentiment}} }
π¬ Contact
Author: Mirko Doljanica
- π Hugging Face: mirko-doljanica
- π» GitHub: mirko-doljanica
- π§ Email: [doljanicamir@gmail.com]
Feel free to open issues, contribute, or reach out for collaborations.
- Downloads last month
- -