🎬 DistilBERT IMDb Sentiment Classifier

This model is a fine-tuned version of DistilBERT for sentiment analysis on the IMDb movie reviews dataset.
It classifies reviews into positive or negative sentiment with high accuracy.


πŸ“Š Model Details

  • Model type: Transformer-based model (DistilBERT)
  • Task: Sentiment Analysis / Text Classification
  • Framework: Hugging Face Transformers (PyTorch)
  • Dataset: IMDb (50,000+ labeled movie reviews)
  • Accuracy: ~91.5% on test data

πŸ“ Usage

You can load and use this model directly with transformers:

from transformers import pipeline

sentiment_pipeline = pipeline("sentiment-analysis", model="mirko-doljanica/distilbert-imdb-sentiment")
print(sentiment_pipeline("I loved this movie! The performances were amazing."))

## πŸ“ˆ Results

| Metric       | Score |
|--------------|-------|
| Accuracy     | 0.915 |
| Precision    | 0.915 |
| Recall       | 0.915 |
| F1-score     | 0.915 |

These results were achieved after fine-tuning DistilBERT on the IMDb dataset with 2 epochs and a batch size of 8. The model performs consistently well on binary sentiment classification tasks.

πŸ“¦ Applications

  • πŸ“Š Review Analysis: Automatically classify product, movie, or app reviews as positive or negative.
  • πŸ’¬ Social Media Monitoring: Analyze public sentiment from platforms like Twitter, Reddit, or forums.
  • 🧠 Customer Feedback Analysis: Segment user feedback into positive and negative categories for better decision-making.
  • πŸ” Opinion Mining: Detect trends in public opinion about topics, brands, or events.
  • πŸ§ͺ Research & Prototyping: Quickly integrate into NLP workflows for experimentation and benchmarking.

⚠️ Limitations & Bias

  • πŸ—£οΈ Language Restriction: The model is trained on English text. Performance will degrade significantly for other languages without further fine-tuning.
  • 🎬 Domain Bias: Since the training data consists of movie reviews, performance might vary when used for other domains like finance, healthcare, or politics.
  • πŸ€– Data Bias: The dataset may contain subjective or biased opinions reflecting societal stereotypes present in IMDb reviews.
  • πŸ“‰ Nuance Handling: The model might struggle with sarcasm, humor, or context-heavy statements that require deeper semantic understanding.

πŸ› οΈ How It Was Built

This model was built by fine-tuning the pre-trained DistilBERT model using the IMDb movie reviews dataset.
The training process involved:

  • Preprocessing text (tokenization, truncation, padding)
  • Splitting into training and validation sets
  • Fine-tuning for 2 epochs with a batch size of 8
  • Evaluation on the test set to measure accuracy, precision, recall, and F1-score

Training was done with the Hugging Face transformers, datasets, and accelerate libraries, leveraging PyTorch as the backend.


πŸ“š Citation

If you use this model in your research, please cite it as follows: @misc{distilbert_imdb_sentiment, author = {Mirko Doljanica}, title = {DistilBERT IMDb Sentiment Classifier}, year = {2025}, howpublished = {\url{https://huggingface.co/mirko-doljanica/distilbert-imdb-sentiment}} }

πŸ“¬ Contact

Author: Mirko Doljanica

Feel free to open issues, contribute, or reach out for collaborations.

Downloads last month
-
Safetensors
Model size
67M params
Tensor type
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Dataset used to train mirko-doljanica/distilbert-imdb-sentiment