MAGA is a comprehensive dataset for advancing the generalization research of machine-generated text detectors, built via alignment-augment. It contains nearly 1 million generations covering 12 generators, 20 domains (10 English + 10 Chinese), 4 alignment methods, and diverse decoding strategies. It serves as a valuable resource for testing detector robustness and enhancing the generalization ability of fine-tuned detectors.
Collection
| link | |
|---|---|
| MAGA | https://huggingface.co/datasets/anyangsong/MAGA |
| MAGA-cn | https://huggingface.co/datasets/anyangsong/MAGA-cn |
| MGT-Detector-RB-MAGA | https://huggingface.co/anyangsong/MGT-Detector-RB-MAGA |
| MGT-Detector-RB-MAGA-cn | https://huggingface.co/anyangsong/MGT-Detector-RB-MAGA-cn |
| MGT-Detector-RB-MGB | https://huggingface.co/anyangsong/MGT-Detector-RB-MGB |
| MGT-Detector-RB-MGB-cn | https://huggingface.co/anyangsong/MGT-Detector-RB-MGB-cn |
Load the model
To load the model, install the library transformers with pip install transformers. Then,
from transformers import AutoTokenizer, AutoModelForSequenceClassification
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
tokenizer = AutoTokenizer.from_pretrained("anyangsong/MGT-Detector-RB-MAGA-cn")
model = AutoModelForSequenceClassification.from_pretrained("anyangsong/MGT-Detector-RB-MAGA-cn").to(device)
Detect
The following contains a code snippet illustrating how to use the model to detect whether the input texts are machine-generated.
model.eval()
texts = [
"这是一段人类文本。",
"这不是一段机器文本。"
]
inputs = tokenizer(texts, padding=True, truncation=True, return_tensors="pt").to(device)
with torch.no_grad():
outputs = model(**inputs)
probs = outputs.logits.softmax(dim=-1)
human_probs, machine_probs = probs[:, 0], probs[:, 1]
Our model uses 0.5 as the default threshold, i.e. is_machine = machine_probs >= 0.5
Citation
If you find MAGA useful for your research and applications, please cite using the Bibtex:
@misc{song2026maga,
title={MAGA-Bench: Machine-Augment-Generated Text via Alignment Detection Benchmark},
author={Anyang Song and Ying Cheng and Yiqian Xu and Rui Feng},
year={2026},
eprint={2601.04633},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2601.04633},
}
- Downloads last month
- 5
Model tree for anyangsong/MGT-Detector-RB-MAGA-cn
Base model
google-bert/bert-base-chinese