BitMamba-2-255M

Paper GitHub

BitMamba-2-255M is the ultra-efficient baseline model of the BitMamba-2 family. It integrates 1.58-bit ternary quantization (BitNet) into the Mamba-2 architecture. Despite its small size, it demonstrates stable convergence and surprising reasoning capabilities, serving as the proof-of-concept for scaling ternary State Space Models.

⚑ Key Features

  • Architecture: Mamba-2 SSM + BitNet b1.58 (Ternary Weights).
  • Parameters: 255M.
  • Precision: 1.58-bit (weights {-1, 0, 1}).
  • Training Tokens: Trained on high-quality data (FineWeb-Edu, Cosmopedia, Stack-Dedup).
  • Hardware: Trained on Google Cloud TPU v6e.

πŸ“Š Benchmark Results

This model serves as the baseline for our scaling laws analysis.

Benchmark Metric BitMamba-2-255M
ARC-Easy Accuracy 55.51%
PIQA Accuracy 64.42%
BoolQ Accuracy 59.30%
HellaSwag Acc Norm 35.22%
WikiText-2 Perplexity 51.69

As shown in the scaling analysis below, the 255M model (blue line) establishes a stable learning trajectory, which is significantly improved upon by the 1B model (red line).

Scaling Laws

πŸš€ Usage (Inference)

This model is optimized for extreme edge deployment (IoT, Mobile, Legacy Hardware) using our custom C++ inference engine.

1. Download the Quantized Model

Download the bitmamba_255m.bin file located in the files tab.

2. Run with C++

Go to our GitHub Repository to get the inference code.

# Example usage after compiling bitmamba.cpp
# Note: Using smaller context size for speed demonstration
./bitmamba bitmamba_255m.bin "15496 11 314 716" 0.7 1.1 0.05 0.9 40 200

3. JAX/Flax Usage

The bitmamba_255m.msgpack contains the raw JAX weights for research purposes. You can load them using the source code provided in src/ on GitHub.

πŸ› οΈ Efficient Deployment

Running on a consumer Intel Core i3-12100F CPU:

Model RAM Usage Speed
BitMamba-2-255M 252 MB ~146 tok/s

πŸ“œ Citation

If you use this model or our architecture, please cite our paper:

@misc{salazar2026bitmamba2,
  author       = {Salazar, Jesus},
  title        = {BitMamba-2: Efficient Scaling of 1.58-bit State Space Models},
  year         = {2026},
  publisher    = {Zenodo},
  doi          = {10.5281/zenodo.18394665},
  url          = {[https://doi.org/10.5281/zenodo.18394665](https://doi.org/10.5281/zenodo.18394665)}
}
Downloads last month
3
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Datasets used to train Zhayr1/BitMamba-2-0.25B