YOLO26S-pose - ExecuTorch with XNNPACK (Dynamic Shapes)

YOLO26S-pose exported to ExecuTorch .pte format with XNNPACK backend for accelerated CPU inference.

Model Details

Base Model: Ultralytics YOLO26S-Pose Estimation
Format: ExecuTorch (.pte)
Backend: XNNPACK (CPU-optimized)
Quantization: FP32
File Size: 44.9 MB

Dynamic Shape Support

This model supports dynamic input shapes within the following constraints:

Dimension	Min	Max	Constraint
Height	320	8192	Multiple of 32
Width	320	8192	Multiple of 32
Batch	1	1	Static

Supported resolutions: 320×320, 640×640, 1280×1280, 2560×1440, 7680×4320 (8K), and any size that's a multiple of 32.

Usage

import torch
from executorch.runtime import Runtime

# Load the model
with open("yolo26s-pose_dynamic_xnnpack.pte", "rb") as f:
    pte_buffer = f.read()

runtime = Runtime.get()
program = runtime.load_program(pte_buffer)
method = program.load_method("forward")

# Run inference with different input sizes
for h, w in [(640, 640), (1280, 1280), (2560, 1440)]:
    input_tensor = torch.randn(1, 3, h, w)
    output = method.execute([input_tensor])
    print(f"Input shape: {(h, w)}, Output shape: {output[0].shape}")

Model Architecture

YOLO26 is an end-to-end NMS-free object detector optimized for edge devices:

End-to-end design (no NMS post-processing required)
Up to 43% faster CPU inference than previous YOLO versions
Optimized for mobile and edge deployment

Performance

Based on Ultralytics YOLO26 benchmarks:

Metric	Value
Parameters	9.7M
Input Size	640×640 (training)
Inference	Supports 320-8192 px (multiples of 32)

Tasks

This model performs pose estimation, detecting human keypoints.

Output includes bounding boxes and 17 COCO keypoints per person.

Troubleshooting

Low confidence / incorrect outputs with non-contiguous inputs

If your outputs look wrong (for object-detection models this can show up as all confidences capped around ~0.20 / 20% and no detections), ensure the input tensor passed to ExecuTorch is contiguous.

Example:

import torch

# img_hwc: float32 HWC image (e.g. RGB) in [0, 1]
x = torch.from_numpy(img_hwc).permute(2, 0, 1).unsqueeze(0)  # NCHW (often non-contiguous)
x = x.contiguous()  # IMPORTANT

outputs = method.execute([x])

Detection symptom example (before fix):

Confidence range: [0.0004, 0.2012]
Detections: 0

After fix (.contiguous()):

Confidence range: [0.0001, 0.9589]
Detections: 12

License

This model is released under AGPL-3.0 license. See Ultralytics YOLO26 for more details.

larryliu0820
/

yolo26s-pose-ExecuTorch-XNNPACK