--- license: apache-2.0 datasets: - 4nkh/theme_data language: - en metrics: - precision - f1 - recall - accuracy base_model: - google-bert/bert-base-uncased pipeline_tag: text-classification library_name: transformers tags: - multi-label - theme_detection - mentorship - entrepreneurship - startup success - json automation --- # Theme classification model (multi-label) This repository contains a fine-tuned BERT model for classifying short texts into community-oriented themes. The model was trained locally and pushed to the Hugging Face Hub. Model details - Model architecture: bert-base-uncased (fine-tuned) - Problem type: multi-label classification - Labels: `mentorship`, `entrepreneurship`, `startup success` - Training data: `train_theme.jsonl` (included) - Final evaluation (example run): - eval_loss: 0.1822 - eval_micro/f1: 1.0 - eval_macro/f1: 1.0 Usage ```python from transformers import AutoTokenizer, AutoModelForSequenceClassification import torch repo = "4nkh/theme_model" tokenizer = AutoTokenizer.from_pretrained(repo) model = AutoModelForSequenceClassification.from_pretrained(repo) texts = ["Our co-op paired first-time founders with veteran shop owners to troubleshoot setbacks."] inputs = tokenizer(texts, truncation=True, padding=True, return_tensors="pt") with torch.no_grad(): outputs = model(**inputs) logits = outputs.logits probs = torch.sigmoid(logits) preds = (probs >= 0.5).int() print('probs', probs.numpy(), 'preds', preds.numpy()) ``` Notes - This model uses a threshold of 0.5 for multi-label predictions. Adjust thresholds per-class as needed. - If you want to re-train or fine-tune further, see `train_theme_model.py` in this folder. License Specify your license here (e.g., Apache-2.0) or remove this section if you prefer a different license.