lm-provers
/

QED-Nano

@@ -23,11 +23,11 @@ base_model:
 ## Model Summary
-QED-Nano is a 4B parameter model explicitly post-trained to strengthen capabilities on Olympiad-level math proof problems. Despite its small size, QED-Nano achieves an impressive 40% score on the challenging IMO-ProofBench benchmark (+20% over the Qwen3 base model), matching the performance of [GPT-OSS-120B](https://huggingface.co/openai/gpt-oss-120b) from OpenAI. With an agent scaffold that scales inference-time compute to over 1M tokens per problem, QED-Nano approaches the performance of Gemini-3-Pro:
 ![imoproofbench.png](https://huggingface.co/lm-provers/QED-Nano/resolve/main/imoproofbench.png)
-QED-Nano is built by post-training [Qwen/Qwen3-4B-Thinking-2507](https://huggingface.co/Qwen/Qwen3-4B-Thinking-2507), via a combination of supervised fine-tuning on curated high-quality proofs, and reinforcement learning with a [reasoning cache](https://huggingface.co/papers/2602.03773) meant to improve the trained model's ability to scale test-time compute effectively with agentic scaffolds. We train on a mixture of Olympiads proof problems from various public sources that we also open source, along with our SFT dataset.
 ## How to use
@@ -90,13 +90,6 @@ In this section, we report the evaluation results of QED-Nano. All evaluations a
 [ADD TABLE]
-## Training
-### Model
-- **Architecture:** Transformer decoder
-- **Precision:** bfloat16
 ## Limitations
@@ -105,15 +98,22 @@ QED-Nano is a domain-specific model that is designed for one thing and one thing
 ## License
 [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0)
-## Citation
-If you find this model useful in your research, please consider citing it as follows:
-```bibtex
-@misc{qwed_nano2026,
-  title={{QED-Nano: Solving Olympiad Math at Gemini Level with a 4B Model}},
-  author={Setlur, Amrith and Dekoninck, Jasper and Qu, Yuxiao and Wu, Ian and Li, Jia and Beeching, Edward and Tunstall, Lewis and Kumar, Aviral},
-  year={2026},
-  howpublished={\url{https://huggingface.co/blog/smollm3}}
-}
-```

 ## Model Summary
+QED-Nano is a 4B parameter model explicitly post-trained to strengthen its proof-writing capabilities. Despite its small size, QED-Nano achieves an impressive 40% score on the challenging IMO-ProofBench benchmark (+20% over the Qwen3 base model), matching the performance of [GPT-OSS-120B](https://huggingface.co/openai/gpt-oss-120b) from OpenAI. With an agent scaffold that scales inference-time compute to over 1M tokens per problem, QED-Nano approaches the performance of Gemini-3-Pro:
 ![imoproofbench.png](https://huggingface.co/lm-provers/QED-Nano/resolve/main/imoproofbench.png)
+QED-Nano is based on [Qwen/Qwen3-4B-Thinking-2507](https://huggingface.co/Qwen/Qwen3-4B-Thinking-2507), and was post-trained via a combination of supervised fine-tuning and [reinforcement learning with a reasoning cache](https://huggingface.co/papers/2602.03773) on a mixture of Olympiads proof problems from various public sources.
 ## How to use
 [ADD TABLE]
 ## Limitations
 ## License
 [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0)
+## Acknowledgements
+QED-Nano is a joint collaboration between the research teams at CMU, ETH Zürich, Numina, and Hugging Face. Below is a list of the individual contributors and their affiliations:
+### CMU
+Amrith Setlur, Yuxiao Qu, Ian Wu, and Aviral Kumar
+### ETH Zürich
+Jasper Dekoninck
+### Numina
+Jia Li
+### Hugging Face
+Edward Beeching and Lewis Tunstall