MAGA-Bench: Machine-Augment-Generated Text via Alignment Detection Benchmark

🖥️ Github 📝 Paper

MAGA is a comprehensive dataset for advancing the generalization research of machine-generated text detectors, built via alignment-augment. It contains nearly 1 million generations covering 12 generators, 20 domains (10 English + 10 Chinese), 4 alignment methods, and diverse decoding strategies. It serves as a valuable resource for testing detector robustness and enhancing the generalization ability of fine-tuned detectors.

Collection

Load the model

To load the model, install the library transformers with pip install transformers. Then,

from transformers import AutoTokenizer, AutoModelForSequenceClassification
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
tokenizer = AutoTokenizer.from_pretrained("anyangsong/MGT-Detector-RB-MAGA-cn")
model = AutoModelForSequenceClassification.from_pretrained("anyangsong/MGT-Detector-RB-MAGA-cn").to(device)

Detect

The following contains a code snippet illustrating how to use the model to detect whether the input texts are machine-generated.

model.eval()
texts = [
    "这是一段人类文本。",
    "这不是一段机器文本。"
]
inputs = tokenizer(texts, padding=True, truncation=True, return_tensors="pt").to(device)
with torch.no_grad():
    outputs = model(**inputs)
    probs = outputs.logits.softmax(dim=-1)
    human_probs, machine_probs = probs[:, 0], probs[:, 1]

Our model uses 0.5 as the default threshold, i.e. is_machine = machine_probs >= 0.5

Citation

If you find MAGA useful for your research and applications, please cite using the Bibtex:

@misc{song2026maga,
      title={MAGA-Bench: Machine-Augment-Generated Text via Alignment Detection Benchmark}, 
      author={Anyang Song and Ying Cheng and Yiqian Xu and Rui Feng},
      year={2026},
      eprint={2601.04633},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2601.04633}, 
}
Downloads last month
5
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for anyangsong/MGT-Detector-RB-MAGA-cn

Finetuned
(279)
this model

Dataset used to train anyangsong/MGT-Detector-RB-MAGA-cn

Paper for anyangsong/MGT-Detector-RB-MAGA-cn