roberta-finetune-slangs

Fine-tuned RoBERTa model for sentiment analysis of internet slang, abbreviations, and short words, based on the research paper:

Sahil Kamath, Vaishnavi Padiya, Sonia Dโ€™Silva, Nilesh Patil, Meera Narvekar. TeenSenti โ€“ A novel approach for sentiment analysis of short words and slangs.

Model description

This model is fine-tuned from a pre-trained RoBERTa transformer to classify the sentiment of sentences containing informal internet expressions such as slang, abbreviations, and short forms. The goal is to address the gap in existing sentiment analysis models, which often fail to interpret modern linguistic nuances.

Key features:

  • Handles slang and short words with contextual understanding.
  • Trained using a custom slang dictionary integrated into the dataset.
  • Outperforms the base twitter-roberta-base-sentiment model on slang-heavy datasets.
  • Designed for social media, product reviews, and informal text analysis.

Intended uses & limitations

Intended uses

  • Sentiment classification for texts containing slang or abbreviations.
  • Social media monitoring, brand sentiment analysis, or content moderation where informal language is common.

Limitations

  • Optimized for slang/abbreviation-heavy English text; performance may degrade on formal or domain-specific corpora.
  • Slang evolves rapidly โ€” periodic retraining is recommended for sustained accuracy.

Training and evaluation data

  • Dataset: Custom-curated TeenSenti dataset of ~20,000 sentences.
  • Each slang term has both positive and negative example sentences generated and verified.
  • Dataset split: 80% training, 20% testing (per slang term to avoid overlap).
  • Examples include terms like "ftw" ("for the win") and "h8" ("hate").

Training procedure

Preprocessing

  • Custom tokenizer preserving slang and short words from the slang dictionary.
  • Tokenization and text processing using Hugging Face transformers.

Training hyperparameters

  • Optimizer: AdamW
  • Learning rate schedule: triangular policy
  • Batch size: 16
  • Epochs: 4
  • Max sequence length: 128
  • Mixed precision: float32

Evaluation

Compared to the base twitter-roberta-base-sentiment model:

Example sentence Base model prediction Fine-tuned prediction
"Team India ftw" Neutral Positive
"I h8 that person" Negative (low confidence) Negative (high confidence)

Fine-tuned model achieves:

  • Accuracy: 0.93
  • F1-score: 0.925
  • Precision: 0.92
  • Recall: 0.93

Framework versions

  • transformers: 4.35.2
  • torch: 2.x
  • tokenizers: 0.15.0
  • datasets: 2.x

Citation

If you use this model, please cite:

@INPROCEEDINGS{10582077,
  author={Kamath, Sahil and Padiya, Vaishnavi and D'Silva, Sonia and Patil, Nilesh and Narvekar, Meera},
  booktitle={2024 International Conference on Advances in Modern Age Technologies for Health and Engineering Science (AMATHE)}, 
  title={TeenSenti - A novel approach for sentiment analysis of short words and slangs}, 
  year={2024},
  volume={},
  number={},
  pages={1-8},
  keywords={Deep learning;Sentiment analysis;Dictionaries;Accuracy;Reviews;Navigation;Oral communication;Sentiment Analysis;Slang;Short Words;NLP;FastText Embeddings},
  doi={10.1109/AMATHE61652.2024.10582077}}
Downloads last month
11
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Space using spectre0108/roberta-finetune-slangs 1

Evaluation results