---
license: apache-2.0
datasets:
- 4nkh/theme_data
language:
- en
metrics:
- precision
- f1
- recall
- accuracy
base_model:
- google-bert/bert-base-uncased
pipeline_tag: text-classification
library_name: transformers
tags:
- multi-label
- theme_detection
- mentorship
- entrepreneurship
- startup success
- json automation
---
# Theme classification model (multi-label)

This repository contains a fine-tuned BERT model for classifying short texts into community-oriented themes. The model was trained locally and pushed to the Hugging Face Hub.

Model details

- Model architecture: bert-base-uncased (fine-tuned)
- Problem type: multi-label classification
- Labels: `mentorship`, `entrepreneurship`, `startup success`
- Training data: `train_theme.jsonl` (included)
- Final evaluation (example run):
  - eval_loss: 0.1822
  - eval_micro/f1: 1.0
  - eval_macro/f1: 1.0

Usage

```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

repo = "4nkh/theme_model"
tokenizer = AutoTokenizer.from_pretrained(repo)
model = AutoModelForSequenceClassification.from_pretrained(repo)

texts = ["Our co-op paired first-time founders with veteran shop owners to troubleshoot setbacks."]
inputs = tokenizer(texts, truncation=True, padding=True, return_tensors="pt")
with torch.no_grad():
    outputs = model(**inputs)
    logits = outputs.logits
    probs = torch.sigmoid(logits)
    preds = (probs >= 0.5).int()
    print('probs', probs.numpy(), 'preds', preds.numpy())
```

Notes

- This model uses a threshold of 0.5 for multi-label predictions. Adjust thresholds per-class as needed.
- If you want to re-train or fine-tune further, see `train_theme_model.py` in this folder.

License

Specify your license here (e.g., Apache-2.0) or remove this section if you prefer a different license.