You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

xml-RoBERTa based model with CRF layer for location and address span extraction.
Trained on weakly labeled dataset (Part of Dataset 10 of Epstein files). Geographical entities in dataset labeled with qwen3 70b + BIO-tags added automatically.
Still needs tests in the wild.

Trained until: Epoch 6 | Loss: 62.9988 (CFR-loss) | Token F1: 0.8452 | Binary F1: 0.8170 | Token Acc: 0.9842 | Span Acc: 0.6357 | Partial: 0.7419
Token F1 - based on token matching
Binary F1 - shows performance of Geo Entety extraction only
Token Acc - based on token matching
Span Acc - based on span (whole geo entity) matching
partial - based on span (whole geo entity) matching (at least 50% correct overlap)

language: - en metrics: - f1 - accuracy base_model: - FacebookAI/xlm-roberta-base pipeline_tag: text-classification tags: - geoparsing - location - ner - informationextraction

Downloads last month
18
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train laurabernardy/RueBERTa