---
title: SWE-Model-Arena
emoji: 🎯
colorFrom: purple
colorTo: green
sdk: gradio
sdk_version: 5.50.0
app_file: app.py
hf_oauth: true
pinned: false
short_description: Model arena for software engineering tasks
---

# SWE-Model-Arena

An open-source platform for evaluating tool-calling models head-to-head. Both sides share the **same scaffolding** ([opencode](https://opencode.ai)) with identical tools, prompts, and sandboxed environments — the **only variable** is the underlying tool-calling model.

**[Try it on Hugging Face Spaces](https://huggingface.co/spaces/SWE-Arena/SWE-Model-Arena)**

## Key Capabilities

- **Agentic evaluation** — models read files, write code, and execute commands in real git repos via [opencode](https://opencode.ai), not just generate text
- **RepoChat** — auto-injects repo context (issues, commits, PRs) from GitHub / GitLab / Hugging Face
- **Multi-round + git diff comparison** — follow up across turns and compare actual diffs side-by-side
- **Rich leaderboard** — Elo, PageRank, modularity clustering, self-play consistency, and efficiency metrics

## How It Works

1. **Submit a task** — sign in, enter a coding task (optionally include a repo URL for RepoChat context)
2. **Models execute** — two randomly selected tool-calling models work independently via [OpenRouter](https://openrouter.ai)
3. **Compare** — view outputs and git diffs side-by-side; send follow-ups for multi-round refinement
4. **Vote** — pick the better model based on code quality, correctness, and approach

## Contributing

Submit tasks, report bugs, or request features via [GitHub Issues](https://github.com/Software-Engineering-Arena/SWE-Model-Arena/issues/new).

## Citation

```bibtex
@inproceedings{zhao2025se,
  title={SE Arena: An Interactive Platform for Evaluating Foundation Models in Software Engineering},
  author={Zhao, Zhimin},
  booktitle={2025 IEEE/ACM Second International Conference on AI Foundation Models and Software Engineering (Forge)},
  pages={78--81},
  year={2025},
  organization={IEEE}
}
```