MiniMax-M2.7-REAP-172B-A10B-BF16

REAP-pruned BF16 variant of MiniMaxAI/MiniMax-M2.7. Uses Cerebras's REAP methodology with a 6-dataset agentic calibration mix to preserve tool-use, code-generation, and reasoning capabilities while reducing expert count by 25%.

This is the intermediate BF16 artifact — the REAP'd model before quantization. Use this if you want to apply your own quantization scheme. For direct single-Spark deployment on NVIDIA DGX Spark (GB10) and Blackwell-family hardware, use the NVFP4 variant: saricles/MiniMax-M2.7-REAP-172B-A10B-NVFP4-GB10.

Model Details

Base Model MiniMaxAI/MiniMax-M2.7
Architecture MiniMaxM2ForCausalLM (MoE, 192 experts, top-K=8)
Total Parameters 172B (down from 230B, -25%)
Active Parameters ~10B per token
Hidden Layers 62
Experts per layer 192 (pruned from 256)
Format BF16 safetensors (70 shards)
Size on Disk ~345 GB
License Other (inherited from MiniMaxAI/MiniMax-M2.7)

REAP Pruning Details

  • Method: REAP (Router-weighted Expert Activation Pruning), Lasby et al. — arXiv:2510.13999
  • Keep ratio: 0.75 (kept 192 / 256 experts per layer)
  • Scoring: Router weight × expert output magnitude, accumulated via forward-pass hooks
  • Calibration samples: 250 per dataset × 6 = 1500 total (1234 after ≥32-token filter)
  • Max sequence length: 2048 tokens
  • Hardware used: Hugging Face Jobs, 4× NVIDIA H200 141 GB
  • Recipe script: reap-bf16-agentic.py

Agentic calibration mix

"Which experts survive pruning depends on which activations the calibration data triggers." If you calibrate on pure chat, experts specialized for code / tool-calling / math get scored low and pruned — the resulting model loses those capabilities. This mix is chosen to preserve capabilities for agent-framework deployment (OpenClaw, Aider, Claude Code-style assistants):

Dataset Domain
theblackcat102/evol-codealpaca-v1 Code generation
Salesforce/xlam-function-calling-60k Tool calling / function invocation
open-r1/Mixture-of-Thoughts (code config) Code reasoning
open-r1/Mixture-of-Thoughts (math config) Mathematical reasoning
open-r1/Mixture-of-Thoughts (science config) Scientific reasoning
SWE-bench/SWE-smith-trajectories (tool split) Software-engineering agent trajectories

Performance

Benchmarks pending GB10 deployment validation runs — will be added when available.

Usage

Load with transformers (requires trust_remote_code=True):

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained(
    "saricles/MiniMax-M2.7-REAP-172B-A10B-BF16",
    trust_remote_code=True,
    torch_dtype="bfloat16",
    device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained(
    "saricles/MiniMax-M2.7-REAP-172B-A10B-BF16",
    trust_remote_code=True,
)

At 340 GB this does not fit on a single consumer GPU. For serving, either run multi-GPU tensor-parallel or apply a quantization (NVFP4, AWQ, GPTQ, etc.). For GB10 / Blackwell deployment, consider the NVFP4 variant.

Recommended Sampling Parameters

Per MiniMax documentation:

{ "temperature": 1.0, "top_p": 0.95, "top_k": 40, "min_p": 0.01 }

When to choose this BF16 variant vs. full MiniMax-M2.7

  • Use this REAP BF16 if: you want a 25%-smaller MoE model with preserved agentic/coder capabilities, or you plan to apply your own quantization (AWQ, GPTQ, GGUF, different NVFP4 recipe).
  • Use MiniMaxAI/MiniMax-M2.7 if: you need the full 256-expert model and can afford the larger deployment footprint.

Target Hardware

This is BF16 — hardware-agnostic. Any inference stack supporting custom transformers modeling code + BF16 safetensors (transformers, vLLM, TGI, etc.) can load it, given enough aggregate GPU memory.

Calibration notes (2026-04-17 correction)

During the REAP scoring pass that produced this artifact, the dataset extractor silently dropped texts from SWE-bench/SWE-smith-trajectories because that dataset stores the messages field as a JSON-encoded string (not a list-of-dicts, as several other datasets in the mix). Our extractor iterated the string as characters, found no dict entries, and collected zero usable texts from SWE-smith.

Net effect on this artifact: 5 of the 6 documented calibration datasets contributed to expert-importance scoring — evol-codealpaca, xlam-function-calling, and the three Mixture-of-Thoughts configs (code, math, science). SWE-smith-trajectories did NOT contribute, contrary to the original claim in this card.

Fix: the recipe script reap-bf16-agentic.py has been updated to json.loads() string-encoded messages fields, plus a per-dataset assertion that fails the run if any dataset yields zero texts. Future REAP variants will include SWE-smith as originally intended.

Practical implication: agentic tool-use preservation still benefited from xlam-function-calling. What's missing is the specific activation pattern from multi-turn SWE-agent trajectories (long-horizon read→run→patch→verify chains). If you run this model in long-chain SWE-style agents and see activation-clipping artifacts, that's why — please file an issue.

Acknowledgments

  • Base model by MiniMax
  • REAP methodology + reference implementation: Cerebras Research (Lasby et al., arXiv:2510.13999)
  • Calibration-mix design: in the spirit of the multi-domain calibration mix Cerebras recommends for REAP — the specific configs (MoT code/math/science split, SWE-smith tool split) are optimized for agentic workloads
Downloads last month
962
Safetensors
Model size
173B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for saricles/MiniMax-M2.7-REAP-172B-A10B-BF16

Finetuned
(21)
this model
Finetunes
1 model
Quantizations
2 models

Paper for saricles/MiniMax-M2.7-REAP-172B-A10B-BF16