MiniMax-M2.7-REAP-172B-A10B-BF16

REAP-pruned BF16 variant of MiniMaxAI/MiniMax-M2.7. Uses Cerebras's REAP methodology with a 6-dataset agentic calibration mix to preserve tool-use, code-generation, and reasoning capabilities while reducing expert count by 25%.

This is the intermediate BF16 artifact — the REAP'd model before quantization. Use this if you want to apply your own quantization scheme. For direct single-Spark deployment on NVIDIA DGX Spark (GB10) and Blackwell-family hardware, use the NVFP4 variant: saricles/MiniMax-M2.7-REAP-172B-A10B-NVFP4-GB10.

Model Details


Base Model	MiniMaxAI/MiniMax-M2.7
Architecture	MiniMaxM2ForCausalLM (MoE, 192 experts, top-K=8)
Total Parameters	172B (down from 230B, -25%)
Active Parameters	~10B per token
Hidden Layers	62
Experts per layer	192 (pruned from 256)
Format	BF16 safetensors (70 shards)
Size on Disk	~345 GB
License	Other (inherited from MiniMaxAI/MiniMax-M2.7)

REAP Pruning Details

Method: REAP (Router-weighted Expert Activation Pruning), Lasby et al. — arXiv:2510.13999
Keep ratio: 0.75 (kept 192 / 256 experts per layer)
Scoring: Router weight × expert output magnitude, accumulated via forward-pass hooks
Calibration samples: 250 per dataset × 6 = 1500 total (1234 after ≥32-token filter)
Max sequence length: 2048 tokens
Hardware used: Hugging Face Jobs, 4× NVIDIA H200 141 GB
Recipe script: reap-bf16-agentic.py

Agentic calibration mix

"Which experts survive pruning depends on which activations the calibration data triggers." If you calibrate on pure chat, experts specialized for code / tool-calling / math get scored low and pruned — the resulting model loses those capabilities. This mix is chosen to preserve capabilities for agent-framework deployment (OpenClaw, Aider, Claude Code-style assistants):

Dataset	Domain
`theblackcat102/evol-codealpaca-v1`	Code generation
`Salesforce/xlam-function-calling-60k`	Tool calling / function invocation
`open-r1/Mixture-of-Thoughts` (code config)	Code reasoning
`open-r1/Mixture-of-Thoughts` (math config)	Mathematical reasoning
`open-r1/Mixture-of-Thoughts` (science config)	Scientific reasoning
`SWE-bench/SWE-smith-trajectories` (`tool` split)	Software-engineering agent trajectories

Performance

Benchmarks pending GB10 deployment validation runs — will be added when available.

Usage

Load with transformers (requires trust_remote_code=True):

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained(
    "saricles/MiniMax-M2.7-REAP-172B-A10B-BF16",
    trust_remote_code=True,
    torch_dtype="bfloat16",
    device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained(
    "saricles/MiniMax-M2.7-REAP-172B-A10B-BF16",
    trust_remote_code=True,
)

At 340 GB this does not fit on a single consumer GPU. For serving, either run multi-GPU tensor-parallel or apply a quantization (NVFP4, AWQ, GPTQ, etc.). For GB10 / Blackwell deployment, consider the NVFP4 variant.

Recommended Sampling Parameters

Per MiniMax documentation:

{ "temperature": 1.0, "top_p": 0.95, "top_k": 40, "min_p": 0.01 }

When to choose this BF16 variant vs. full MiniMax-M2.7

Use this REAP BF16 if: you want a 25%-smaller MoE model with preserved agentic/coder capabilities, or you plan to apply your own quantization (AWQ, GPTQ, GGUF, different NVFP4 recipe).
Use MiniMaxAI/MiniMax-M2.7 if: you need the full 256-expert model and can afford the larger deployment footprint.

Target Hardware

This is BF16 — hardware-agnostic. Any inference stack supporting custom transformers modeling code + BF16 safetensors (transformers, vLLM, TGI, etc.) can load it, given enough aggregate GPU memory.

Calibration notes (2026-04-17 correction)

During the REAP scoring pass that produced this artifact, the dataset extractor silently dropped texts from SWE-bench/SWE-smith-trajectories because that dataset stores the messages field as a JSON-encoded string (not a list-of-dicts, as several other datasets in the mix). Our extractor iterated the string as characters, found no dict entries, and collected zero usable texts from SWE-smith.

Net effect on this artifact: 5 of the 6 documented calibration datasets contributed to expert-importance scoring — evol-codealpaca, xlam-function-calling, and the three Mixture-of-Thoughts configs (code, math, science). SWE-smith-trajectories did NOT contribute, contrary to the original claim in this card.

Fix: the recipe script reap-bf16-agentic.py has been updated to json.loads() string-encoded messages fields, plus a per-dataset assertion that fails the run if any dataset yields zero texts. Future REAP variants will include SWE-smith as originally intended.

Practical implication: agentic tool-use preservation still benefited from xlam-function-calling. What's missing is the specific activation pattern from multi-turn SWE-agent trajectories (long-horizon read→run→patch→verify chains). If you run this model in long-chain SWE-style agents and see activation-clipping artifacts, that's why — please file an issue.

Acknowledgments

Base model by MiniMax
REAP methodology + reference implementation: Cerebras Research (Lasby et al., arXiv:2510.13999)
Calibration-mix design: in the spirit of the multi-domain calibration mix Cerebras recommends for REAP — the specific configs (MoT code/math/science split, SWE-smith tool split) are optimized for agentic workloads