| | --- |
| | title: RTMO Checkpoint Tester |
| | emoji: ๐ |
| | colorFrom: pink |
| | colorTo: green |
| | sdk: gradio |
| | sdk_version: 5.27.0 |
| | app_file: app.py |
| | pinned: false |
| | license: apache-2.0 |
| | short_description: RTMO PyTorch Checkpoint Tester |
| | --- |
| | |
| | # RTMO PyTorch Checkpoint Tester |
| |
|
| | This HuggingFace Space provides a real-time 2D multi-person pose estimation demo using the RTMO model from OpenMMLab, accelerated with ZeroGPU. It supports both image and video inputs. |
| |
|
| | ## Features |
| |
|
| | - **Remote Checkpoint Selection**: Choose from multiple pre-trained variants (COCO, BODY7, CrowdPose, retrainable RTMO-s) via a dropdown. |
| | - **Custom Checkpoint Upload**: Upload your own `.pth` file; the application auto-detects RTMO-t/s/m/l variants. |
| | - **Image Input**: Upload images for single-frame pose estimation. |
| | - **Video Input**: Upload video files (e.g., `.mp4`, `.mov`, `.avi`, `.mkv`, `.webm`) to perform pose estimation on video sequences and view annotated outputs. |
| | - **Threshold Adjustment**: Fine-tune **Bounding Box Threshold** and **NMS Threshold** sliders to refine detections. |
| | - **Example Images**: Three license-free images with people are included for quick testing via the **Examples** panel. |
| | - **ZeroGPU Acceleration**: Utilizes the `@spaces.GPU()` decorator for GPU inference on HuggingFace Spaces. |
| |
|
| | ## Usage |
| |
|
| | 1. **Upload Image**: Drag-and-drop or select an image in the **Upload Image** component (or choose from **Examples**). |
| | 2. **Upload Video**: Drag-and-drop or select a video file in the **Upload Video** component. |
| | 3. **Select Remote Checkpoint**: Pick a preloaded variant from the dropdown menu. |
| | 4. **(Optional) Upload Your Own Checkpoint**: Provide a `.pth` file to override the remote selection; the model variant is detected automatically. |
| | 5. **Adjust Thresholds**: Set **Bounding Box Threshold** (`bbox_thr`) and **NMS Threshold** (`nms_thr`) to control confidence and suppression behavior. |
| | 6. **Run Inference**: Click **Run Inference**. |
| | 7. **View Results**: |
| | - For images, the annotated image will appear in the **Annotated Image** panel. |
| | - For videos, the annotated video will appear in the **Annotated Video** panel. |
| | The active checkpoint name will appear below. |
| |
|
| | ## Remote Checkpoints |
| |
|
| | The following variants are available out of the box: |
| |
|
| | - `rtmo-s_8xb32-600e_coco` |
| | - `rtmo-m_16xb16-600e_coco` |
| | - `rtmo-l_16xb16-600e_coco` |
| | - `rtmo-t_8xb32-600e_body7` |
| | - `rtmo-s_8xb32-600e_body7` |
| | - `rtmo-m_16xb16-600e_body7` |
| | - `rtmo-l_16xb16-600e_body7` |
| | - `rtmo-s_8xb32-700e_crowdpose` |
| | - `rtmo-m_16xb16-700e_crowdpose` |
| | - `rtmo-l_16xb16-700e_crowdpose` |
| | - `rtmo-s_coco_retrainable` (from Hugging Face) |
| |
|
| | ## Implementation Details |
| |
|
| | - **GPU Decorator**: `@spaces.GPU()` marks the `predict` function for GPU execution under ZeroGPU. |
| | - **Inference API**: Leverages `MMPoseInferencer` from MMPose with `pose2d`, `pose2d_weights`, and category `[0]` for person detection. |
| | - **Monkey-Patch**: Applies a regex patch to bypass `mmdet`โs MMCV version assertion for compatibility. |
| | - **Variant Detection**: Inspects `backbone.stem.conv.conv.weight` channels in the checkpoint to select the correct RTMO variant. |
| | - **Checkpoint Management**: Remote files are downloaded to `/tmp/{key}.pth` on demand; uploads use the provided local path. |
| | - **Image & Video Support**: The `predict` function automatically handles both image and video inputs, saving annotated frames or video to `/tmp/vis` and displaying them in the UI. |
| | - **Output**: Saves visualization images or videos to `/tmp/vis` and displays them in the UI panels. |
| |
|
| | ## Files |
| |
|
| | - **app.py**: Main Gradio application script. |
| | - **requirements.txt**: Python dependencies, including MMCV and MMPose. |
| | - **README.md**: This documentation file. |
| |
|
| |
|