---
license: apache-2.0
tags:
  - robotics
  - vision-language-model
  - quantization
  - w8a8
  - bitblas
  - pi0
  - libero
language:
  - en
pipeline_tag: robotics
library_name: transformers
---

# Pi0.5-LIBERO W8A8 Quantized Model

This is a **W8A8 (INT8 weights + INT8 activations)** quantized version of [Pi0.5-LIBERO](https://github.com/Physical-Intelligence/openpi), using [BitBLAS](https://github.com/microsoft/BitBLAS) for efficient INT8 Tensor Core computation.

## Model Description

- **Architecture**: Pi0.5 (PaliGemma 2B VLM + Gemma 300M Action Expert)
- **Quantization**: W8A8 (INT8 weights, INT8 activations) with per-channel weight scales and per-tensor activation scales
- **Backend**: BitBLAS for INT8 Tensor Core acceleration
- **W8A8 Layers**: 180 Linear layers replaced with BitBLASW8A8Linear
- **Model Size**: ~4.90GB (quantized) vs ~6.96GB (FP16)
- **Task**: Robot manipulation (LIBERO benchmark)

## Performance

Tested on LIBERO benchmark:

| Task Suite | Success Rate |
|------------|--------------|
| libero_spatial | 100% (10/10) |

Inference speed on NVIDIA A40:
- First inference: ~30s (BitBLAS kernel compilation/caching)
- Subsequent inference: ~150ms per step
- Memory usage: ~4GB VRAM

## Installation

### Prerequisites

- Python 3.11
- CUDA 12.1+ compatible GPU (tested on NVIDIA A40)
- Linux (Ubuntu 22.04 recommended)

### Step 1: Create Conda Environment

```bash
conda create -n openpi_w8a8 python=3.11 -y
conda activate openpi_w8a8
```

### Step 2: Install PyTorch

```bash
pip install torch==2.7.1 torchvision
pip install 'numpy<2.0.0'
```

### Step 3: Install HuggingFace Packages

```bash
pip install transformers==4.53.2 accelerate safetensors huggingface_hub einops
```

### Step 4: Install BitBLAS and Robot Simulation

```bash
pip install bitblas scipy mujoco matplotlib
pip install robosuite==1.4.1
pip install bddl easydict
pip install 'numpy<2.0.0'
```

### Step 5: Install JAX and Flax

```bash
pip install jax[cuda12]==0.5.3 flax==0.10.2 orbax-checkpoint==0.11.13
```

### Step 6: Install Other Dependencies

```bash
pip install sentencepiece draccus==0.10.0 tyro wandb polars numpydantic augmax \
    beartype==0.19.0 equinox jaxtyping==0.2.36 ml-collections==1.0.0 \
    imageio tqdm-loggable flatbuffers Pillow
```

### Step 7: Install LeRobot

```bash
pip install 'lerobot @ git+https://github.com/huggingface/lerobot@0cf864870cf29f4738d3ade893e6fd13fbd7cdb5' --no-deps
pip install datasets h5py zarr diffusers hydra-core gym jsonlines av torchcodec
```

### Step 8: Clone and Install OpenPI

```bash
git clone https://github.com/JingxuanZhang77/openpi_duquant.git openpi
pip install -e openpi/packages/openpi-client --no-deps
pip install -e openpi --no-deps
```

### Step 9: Clone and Install LIBERO (for evaluation)

```bash
git clone https://github.com/Lifelong-Robot-Learning/LIBERO.git
touch LIBERO/libero/__init__.py
pip install -e LIBERO --no-deps
```

### Step 10: Copy Custom Transformers Files

```bash
SITE_PACKAGES=$(python -c "import site; print(site.getsitepackages()[0])")
cp -r openpi/src/openpi/models_pytorch/transformers_replace/models/* $SITE_PACKAGES/transformers/models/
```

### Step 11: Set Environment Variables

```bash
export LD_LIBRARY_PATH=/usr/local/cuda-12.8/lib64:$LD_LIBRARY_PATH
export OPENPI_DISABLE_TORCH_COMPILE=1
```

## Quick Start

```python
import os
os.environ["OPENPI_DISABLE_TORCH_COMPILE"] = "1"

from openpi.models_pytorch.bitblas_w8a8_layers import load_w8a8_policy

# Load model from HuggingFace (downloads automatically)
policy = load_w8a8_policy(
    "fatdove/pi05-libero-w8a8",
    policy_config_name="pi05_libero",
    enable_tuning=False,
)

print(f"Model loaded! W8A8 layers: {policy._w8a8_layer_count}")  # 180

# Run inference
import numpy as np
obs = {
    "observation/image": np.random.randint(0, 255, (224, 224, 3), dtype=np.uint8),
    "observation/wrist_image": np.random.randint(0, 255, (224, 224, 3), dtype=np.uint8),
    "observation/state": np.random.randn(8).astype(np.float32),
    "prompt": "pick up the red cube and place it on the blue plate",
}
result = policy.infer(obs)
print(f"Actions shape: {result['actions'].shape}")  # (10, 7)
```

## Run LIBERO Evaluation

After completing the installation, you can run the LIBERO evaluation:

```bash
# Quick test (1 trial per task)
python run_libero_w8a8.py --task-suite libero_spatial --num-trials 1

# Full evaluation (20 trials per task)
python run_libero_w8a8.py --task-suite libero_spatial --num-trials 20
```

## Model Files

- `model.safetensors` - Quantized weights (includes both W8A8 and non-quantized layers)
- `w8a8_config.json` - Quantization configuration (layer names, scales info)
- `assets/` - Normalization statistics for input preprocessing

## Quantization Details

The W8A8 quantization uses:
- **Weight quantization**: Per-channel INT8 with symmetric quantization
- **Activation quantization**: Per-tensor INT8 with dynamic quantization
- **Backend**: BitBLAS Matmul kernels optimized for NVIDIA Tensor Cores

180 Linear layers are quantized, including:
- PaliGemma VLM: All attention (q_proj, k_proj, v_proj, o_proj) and MLP (gate_proj, up_proj, down_proj) layers
- Gemma Expert: All MLP layers

## Troubleshooting

### CUDA/BitBLAS Issues
Make sure CUDA 12.x is installed and LD_LIBRARY_PATH is set:
```bash
export LD_LIBRARY_PATH=/usr/local/cuda-12.8/lib64:$LD_LIBRARY_PATH
```

### Import Errors
If you get import errors like `cannot import name 'ACT2FN'`, ensure you've copied the transformers_replace files:
```bash
SITE_PACKAGES=$(python -c "import site; print(site.getsitepackages()[0])")
cp -r openpi/src/openpi/models_pytorch/transformers_replace/models/* $SITE_PACKAGES/transformers/models/
```

### NumPy Version Errors
OpenPI requires numpy<2.0:
```bash
pip install 'numpy<2.0.0'
```

## Citation

If you use this model, please cite:

```bibtex

```

## License

Apache 2.0. See the [OpenPI repository](https://github.com/Physical-Intelligence/openpi) for more details.