xml-RoBERTa based model with CRF layer for location and address span extraction.
Trained on weakly labeled dataset (Part of Dataset 10 of Epstein files). Geographical entities in dataset labeled with qwen3 70b + BIO-tags added automatically.
Still needs tests in the wild.
Trained until: Epoch 6 | Loss: 62.9988 (CFR-loss) | Token F1: 0.8452 | Binary F1: 0.8170 | Token Acc: 0.9842 | Span Acc: 0.6357 | Partial: 0.7419
Token F1 - based on token matching
Binary F1 - shows performance of Geo Entety extraction only
Token Acc - based on token matching
Span Acc - based on span (whole geo entity) matching
partial - based on span (whole geo entity) matching (at least 50% correct overlap)
language: - en metrics: - f1 - accuracy base_model: - FacebookAI/xlm-roberta-base pipeline_tag: text-classification tags: - geoparsing - location - ner - informationextraction
- Downloads last month
- 18
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support