YOLO26S-pose - ExecuTorch with XNNPACK (Dynamic Shapes)
YOLO26S-pose exported to ExecuTorch .pte format with XNNPACK backend for accelerated CPU inference.
Model Details
- Base Model: Ultralytics YOLO26S-Pose Estimation
- Format: ExecuTorch (.pte)
- Backend: XNNPACK (CPU-optimized)
- Quantization: FP32
- File Size: 44.9 MB
Dynamic Shape Support
This model supports dynamic input shapes within the following constraints:
| Dimension | Min | Max | Constraint |
|---|---|---|---|
| Height | 320 | 8192 | Multiple of 32 |
| Width | 320 | 8192 | Multiple of 32 |
| Batch | 1 | 1 | Static |
Supported resolutions: 320×320, 640×640, 1280×1280, 2560×1440, 7680×4320 (8K), and any size that's a multiple of 32.
Usage
import torch
from executorch.runtime import Runtime
# Load the model
with open("yolo26s-pose_dynamic_xnnpack.pte", "rb") as f:
pte_buffer = f.read()
runtime = Runtime.get()
program = runtime.load_program(pte_buffer)
method = program.load_method("forward")
# Run inference with different input sizes
for h, w in [(640, 640), (1280, 1280), (2560, 1440)]:
input_tensor = torch.randn(1, 3, h, w)
output = method.execute([input_tensor])
print(f"Input shape: {(h, w)}, Output shape: {output[0].shape}")
Model Architecture
YOLO26 is an end-to-end NMS-free object detector optimized for edge devices:
- End-to-end design (no NMS post-processing required)
- Up to 43% faster CPU inference than previous YOLO versions
- Optimized for mobile and edge deployment
Performance
Based on Ultralytics YOLO26 benchmarks:
| Metric | Value |
|---|---|
| Parameters | 9.7M |
| Input Size | 640×640 (training) |
| Inference | Supports 320-8192 px (multiples of 32) |
Tasks
This model performs pose estimation, detecting human keypoints.
Output includes bounding boxes and 17 COCO keypoints per person.
Troubleshooting
Low confidence / incorrect outputs with non-contiguous inputs
If your outputs look wrong (for object-detection models this can show up as all confidences capped around ~0.20 / 20% and no detections), ensure the input tensor passed to ExecuTorch is contiguous.
Example:
import torch
# img_hwc: float32 HWC image (e.g. RGB) in [0, 1]
x = torch.from_numpy(img_hwc).permute(2, 0, 1).unsqueeze(0) # NCHW (often non-contiguous)
x = x.contiguous() # IMPORTANT
outputs = method.execute([x])
Detection symptom example (before fix):
Confidence range: [0.0004, 0.2012]
Detections: 0
After fix (.contiguous()):
Confidence range: [0.0001, 0.9589]
Detections: 12
License
This model is released under AGPL-3.0 license. See Ultralytics YOLO26 for more details.
Credits
- Base Model: Ultralytics YOLO26
- Export Framework: ExecuTorch
- Backend: XNNPACK
Export Details
- ExecuTorch Version: Latest main branch
- Export Date: 2025-02-06
- Dynamic Shapes: Enabled (height/width: 320-8192, multiples of 32)
- Downloads last month
- 2