multiclinner_enigma_cz_procedure_xlmr-crf

Czech clinical Named Entity Recognition model for the PROCEDURE entity type, fine-tuned from xlm-roberta-base on the Czech portion of the MultiClinAI 2026 IberLEF shared task. Developed by Team Enigma at the Faculty of Mathematics and Informatics, Sofia University.

Summary


Task	Token classification (BIO), single entity type
Entity type	`PROCEDURE` (medical procedures and interventions)
Language	Czech (`cs`)
Base model	`xlm-roberta-base`
Architecture	Transformer + CRF
Augmentation	Morphological synonyms (curated + Wikidata)
Training data	MultiClinAI Czech `train + dev` combined (1,258 documents) plus augmentation
Test F1 (strict)	0.6552 (char F1 0.7932)

Quick start

This is a custom Transformer + CRF model, so it cannot be loaded through the standard AutoModel API. The repository ships a self-contained modeling_crf.py next to the weights; the snippet below downloads the whole repository and imports the class from there.

pip install torch transformers pytorch-crf huggingface_hub

import sys
import torch
from huggingface_hub import snapshot_download
from transformers import AutoTokenizer

repo = "SU-FMI-AI/multiclinner_enigma_cz_procedure_xlmr-crf"
local_dir = snapshot_download(repo)

# Use the modeling_crf.py shipped inside the repository.
sys.path.insert(0, local_dir)
from modeling_crf import TransformerCRF

device = "cuda" if torch.cuda.is_available() else "cpu"
model = TransformerCRF.from_pretrained(local_dir, device=device).to(device).eval()
tokenizer = AutoTokenizer.from_pretrained(local_dir)

LABELS = ["O", "B-PROCEDURE", "I-PROCEDURE"]


@torch.no_grad()
def predict_entities(text: str):
    enc = tokenizer(
        text, return_tensors="pt", truncation=True, max_length=512,
        return_offsets_mapping=True,
    )
    offsets = enc.pop("offset_mapping")[0].tolist()
    enc = {k: v.to(device) for k, v in enc.items()}
    tag_ids = model(enc["input_ids"], enc["attention_mask"])[0]

    spans = []
    in_ent, start, end, prev_os = False, 0, 0, -1
    for tag_id, (os, oe) in zip(tag_ids, offsets):
        if os == oe:                  # special token
            continue
        if os == prev_os:             # SentencePiece sub-token at same offset
            if in_ent:
                end = max(end, oe)
            continue
        prev_os = os

        label = LABELS[tag_id]
        if label.startswith("B-"):
            if in_ent:
                spans.append((start, end, text[start:end]))
            start, end, in_ent = os, oe, True
        elif label.startswith("I-") and in_ent:
            end = oe
        else:
            if in_ent:
                spans.append((start, end, text[start:end]))
            in_ent = False

    if in_ent:
        spans.append((start, end, text[start:end]))
    return spans


text = "Pacient byl přijat s hypertenzí a podstoupil koronarografii."
print(predict_entities(text))

Label set

The model predicts a single entity type using the BIO tagging scheme:

ID	Label	Meaning
0	`O`	outside any entity
1	`B-PROCEDURE`	beginning of a `PROCEDURE` mention
2	`I-PROCEDURE`	inside a `PROCEDURE` mention

Intended use

Extracting PROCEDURE mentions from Czech clinical text (discharge summaries, case reports, medical records).
Building block for ensembles (this model was deployed as part of an ensemble in the original submission).
A starting point for further fine-tuning on related Czech biomedical corpora.

Out-of-scope use

Other languages. Although the XLM-RoBERTa based variants share a multilingual encoder, the classification head was only trained on Czech.
Other domains. The model was not exposed to non-clinical text, social media or layperson descriptions.
Clinical decision making. This is a research artifact. Do not use it as the sole input to any clinical decision.

Training data

Source. MultiClinAI Czech NER: 1,006 train documents and 252 dev documents per entity type in BRAT standoff format, derived from the DisTEMIST, SympTEMIST and MedProcNER corpora translated and annotation-projected to Czech. For the final submission models the gold train + dev sets are merged into a single training partition (no held-out validation).
Augmentation. Morphological synonym replacement. Rare entity surface forms (occurring at most 5 times in training) are substituted with morphological variants drawn from a curated Czech medical synonym dictionary. The dictionary combines manually curated morphological paradigms for Czech medical terms with Wikidata concept labels for Czech medical entities, totalling roughly 1,400 synonym entries across about 900 morphological families. Approximately 2,200 augmented documents per entity type are appended to the 1,258 document gold train+dev partition.
Tokenisation. SentencePiece tokenizer inherited from the base model.

Training procedure

Hyperparameter	Value
Base model	`xlm-roberta-base`
Head	CRF
Optimiser	AdamW
Learning rate	2e-5
Batch size	32
Epochs	5
Max sequence length	512
Input granularity	Sentence-level
Warmup ratio	0.1
Weight decay	0.01
Seed	42
Mixed precision	fp16 (CUDA)

Token classification head with a Conditional Random Field on top: the linear projection produces emission scores and a CRF layer (torchcrf) decodes the globally optimal BIO sequence via Viterbi, learning valid transition constraints (e.g. I-X cannot follow B-Y or O).

Evaluation

Held-out development set

Best dev-set entity-level F1 observed during development: 0.739.

MultiClinAI Czech, official blind test set

Run name in the official MultiClinAI ranking: xlmr-crf-cz-procedure.

Metric	Strict	Character-level
Precision	0.6566	0.7955
Recall	0.6539	0.7909
F1	0.6552	0.7932

Strict matching requires the predicted span to exactly match a gold span (same start, end, and type). Character-level matching gives partial credit for overlapping spans.

Related models

Other models for the same entity type:

SU-FMI-AI/multiclinner_enigma_cz_procedure_robeczech-os1: robeczech-base, Morphological synonyms + 1x oversample of entity-bearing docs, SOFTMAX head, test F1 = 0.6620.

License

Released under the apache-2.0 license. Base-model and dataset licenses apply to their respective artifacts.

Code and resources

Training code, augmentation pipeline, ablation log and evaluation scripts are available in the project's GitHub repository: https://github.com/TeogopK/MultiClinAI-Czech.

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for SU-FMI-AI/multiclinner_enigma_cz_procedure_xlmr-crf

Base model

FacebookAI/xlm-roberta-base

Finetuned

(3985)

this model

Evaluation results

Strict Precision on MultiClinAI Czech (PROCEDURE)
self-reported

0.657
Strict Recall on MultiClinAI Czech (PROCEDURE)
self-reported

0.654
Strict F1 on MultiClinAI Czech (PROCEDURE)
self-reported

0.655
Char-level Precision on MultiClinAI Czech (PROCEDURE)
self-reported

0.795
Char-level Recall on MultiClinAI Czech (PROCEDURE)
self-reported

0.791
Char-level F1 on MultiClinAI Czech (PROCEDURE)
self-reported

0.793