CUDA OOM on 96GB VRAM during loading

by Huanyiiiii - opened Dec 18, 2025

Dec 18, 2025

Description

I am trying to load the Qwen3-Next-80B-A3B-Instruct-FP8 model on a single NVIDIA RTX 6000 GPU (approx. 95GB VRAM).

Despite the model being FP8 (theoretically ~80GB) and my GPU having ~95GB VRAM, the loading process crashes at 25% (Shard 2/8) with a CUDA OOM error.

The error message indicates severe memory fragmentation:

"Of the allocated memory 21.12 GiB is allocated by PyTorch, and 73.21 GiB is reserved by PyTorch but unallocated."

It seems PyTorch reserves almost all available VRAM immediately but fails to allocate new segments for the subsequent shards, even though the actual used memory is only ~21GB at that point.

Environment

Model: Qwen/Qwen3-Next-80B-A3B-Instruct-FP8 (Local path)
GPU: NVIDIA RTX 6000 (Ada/Blackwell generation) - 94.97 GiB Total Capacity
Library: transformers (latest), torch (2.x), accelerate installed.
CUDA: 13.0

Reproduction Code

from transformers import AutoModel
import torch

# Path to the downloaded FP8 model
MODEL_PATH = "/path/to/Qwen3-Next-80B-A3B-Instruct-FP8"

# Simple loading with auto device map
model = AutoModel.from_pretrained(
    MODEL_PATH,
    trust_remote_code=True,
    device_map="auto"
)

Error Log

Loading checkpoint shards:  25%|██▌       | 2/8 [00:10<00:30,  5.02s/it]
...
torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 2.00 MiB. GPU 0 has a total capacity of 94.97 GiB of which 1.88 MiB is free. Including non-PyTorch memory, this process has 94.96 GiB memory in use. Of the allocated memory 21.12 GiB is allocated by PyTorch, and 73.21 GiB is reserved by PyTorch but unallocated.

Question

Given that the GPU has significantly more memory (95GB) than the theoretical model size (~80GB), why does the loader reserve 73GB of unallocated memory so early in the process?

Are there specific quantization_config settings or environment variables required to load this FP8 checkpoint correctly without triggering this fragmentation?

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment