roberta-finetune-slangs
Fine-tuned RoBERTa model for sentiment analysis of internet slang, abbreviations, and short words, based on the research paper:
Sahil Kamath, Vaishnavi Padiya, Sonia DโSilva, Nilesh Patil, Meera Narvekar. TeenSenti โ A novel approach for sentiment analysis of short words and slangs.
Model description
This model is fine-tuned from a pre-trained RoBERTa transformer to classify the sentiment of sentences containing informal internet expressions such as slang, abbreviations, and short forms. The goal is to address the gap in existing sentiment analysis models, which often fail to interpret modern linguistic nuances.
Key features:
- Handles slang and short words with contextual understanding.
- Trained using a custom slang dictionary integrated into the dataset.
- Outperforms the base
twitter-roberta-base-sentimentmodel on slang-heavy datasets. - Designed for social media, product reviews, and informal text analysis.
Intended uses & limitations
Intended uses
- Sentiment classification for texts containing slang or abbreviations.
- Social media monitoring, brand sentiment analysis, or content moderation where informal language is common.
Limitations
- Optimized for slang/abbreviation-heavy English text; performance may degrade on formal or domain-specific corpora.
- Slang evolves rapidly โ periodic retraining is recommended for sustained accuracy.
Training and evaluation data
- Dataset: Custom-curated
TeenSentidataset of ~20,000 sentences. - Each slang term has both positive and negative example sentences generated and verified.
- Dataset split: 80% training, 20% testing (per slang term to avoid overlap).
- Examples include terms like
"ftw"("for the win") and"h8"("hate").
Training procedure
Preprocessing
- Custom tokenizer preserving slang and short words from the slang dictionary.
- Tokenization and text processing using Hugging Face
transformers.
Training hyperparameters
- Optimizer: AdamW
- Learning rate schedule: triangular policy
- Batch size: 16
- Epochs: 4
- Max sequence length: 128
- Mixed precision: float32
Evaluation
Compared to the base twitter-roberta-base-sentiment model:
| Example sentence | Base model prediction | Fine-tuned prediction |
|---|---|---|
| "Team India ftw" | Neutral | Positive |
| "I h8 that person" | Negative (low confidence) | Negative (high confidence) |
Fine-tuned model achieves:
- Accuracy: 0.93
- F1-score: 0.925
- Precision: 0.92
- Recall: 0.93
Framework versions
transformers: 4.35.2torch: 2.xtokenizers: 0.15.0datasets: 2.x
Citation
If you use this model, please cite:
@INPROCEEDINGS{10582077,
author={Kamath, Sahil and Padiya, Vaishnavi and D'Silva, Sonia and Patil, Nilesh and Narvekar, Meera},
booktitle={2024 International Conference on Advances in Modern Age Technologies for Health and Engineering Science (AMATHE)},
title={TeenSenti - A novel approach for sentiment analysis of short words and slangs},
year={2024},
volume={},
number={},
pages={1-8},
keywords={Deep learning;Sentiment analysis;Dictionaries;Accuracy;Reviews;Navigation;Oral communication;Sentiment Analysis;Slang;Short Words;NLP;FastText Embeddings},
doi={10.1109/AMATHE61652.2024.10582077}}
- Downloads last month
- 11
Space using spectre0108/roberta-finetune-slangs 1
Evaluation results
- Accuracy on TeenSenti Slang Datasettest set self-reported0.930
- Precision on TeenSenti Slang Datasettest set self-reported0.920
- Recall on TeenSenti Slang Datasettest set self-reported0.930
- F1 on TeenSenti Slang Datasettest set self-reported0.925