|
|
--- |
|
|
license: apache-2.0 |
|
|
tags: |
|
|
- pytorch |
|
|
- keypoint-detection |
|
|
- human-pose-estimation |
|
|
- heatmap-regression |
|
|
- computer-vision |
|
|
- detr |
|
|
- coco |
|
|
model-index: |
|
|
- name: detr-pose-coco50 |
|
|
results: |
|
|
- task: |
|
|
type: pose-estimation |
|
|
name: Human Pose Estimation |
|
|
dataset: |
|
|
type: COCO |
|
|
name: COCO 2017 (50-person subset) |
|
|
metrics: |
|
|
- type: MSELoss |
|
|
value: ~0.02 |
|
|
name: Heatmap MSE |
|
|
--- |
|
|
|
|
|
# π **DETR + Keypoint Estimation (COCO Subset)** |
|
|
Author: [@Koushik](https://huggingface.co/Koushim) |
|
|
|
|
|
--- |
|
|
|
|
|
### π§ Model Overview |
|
|
|
|
|
This project combines: |
|
|
|
|
|
* π€ [facebook/detr-resnet-50](https://huggingface.co/facebook/detr-resnet-50) (object detector) |
|
|
* π§± Custom PyTorch keypoint head |
|
|
* π Trained on 500-person subset of [COCO 2017 Keypoints](https://cocodataset.org/#keypoints-2020) |
|
|
|
|
|
The system detects people using DETR, then predicts 17 COCO-style keypoints (top-down) using heatmap regression. |
|
|
|
|
|
--- |
|
|
|
|
|
### π Files Included |
|
|
|
|
|
| File | Description | |
|
|
| ------------------------------- | ------------------------------------------ | |
|
|
| `pytorch_model.bin` | Trained PyTorch model weights | |
|
|
| `05_detr_pose_coco_colab.ipynb` | Full Colab notebook (training + inference) | |
|
|
| `config.json` | Basic model metadata | |
|
|
| `README.md` | Project description | |
|
|
|
|
|
--- |
|
|
|
|
|
### π Dataset |
|
|
|
|
|
* **Subset**: 500 images from COCO val2017 with visible persons |
|
|
* **Annotations**: 17 keypoints per person |
|
|
* **Source**: [COCO Keypoints](https://cocodataset.org/#keypoints-2020) |
|
|
|
|
|
--- |
|
|
|
|
|
### ποΈ Architecture |
|
|
|
|
|
```text |
|
|
[ Input Image ] |
|
|
β |
|
|
βΌ |
|
|
[ DETR (Person BBox) ] |
|
|
β |
|
|
βΌ |
|
|
[ Crop + Resize (256Γ256) ] |
|
|
β |
|
|
βΌ |
|
|
[ CNN Keypoint Head ] |
|
|
β |
|
|
βΌ |
|
|
[ 17 Heatmaps (Keypoints) ] |
|
|
``` |
|
|
|
|
|
--- |
|
|
|
|
|
### π Quick Start |
|
|
|
|
|
```python |
|
|
import torch |
|
|
from model import KeypointHead |
|
|
|
|
|
model = KeypointHead() |
|
|
model.load_state_dict(torch.load('pytorch_model.bin')) |
|
|
model.eval() |
|
|
``` |
|
|
|
|
|
--- |
|
|
|
|
|
### π§ͺ Inference Demo |
|
|
|
|
|
```python |
|
|
from PIL import Image |
|
|
import cv2, numpy as np |
|
|
from transformers import DetrImageProcessor, DetrForObjectDetection |
|
|
|
|
|
img = Image.open('sample.jpg') |
|
|
processor = DetrImageProcessor.from_pretrained("facebook/detr-resnet-50") |
|
|
detector = DetrForObjectDetection.from_pretrained("facebook/detr-resnet-50") |
|
|
|
|
|
inputs = processor(images=img, return_tensors="pt") |
|
|
outputs = detector(**inputs) |
|
|
results = processor.post_process_object_detection(outputs, target_sizes=[img.size[::-1]], threshold=0.8)[0] |
|
|
|
|
|
# Use results['boxes'][0] to crop person |
|
|
# Feed crop into model(img) to get 17 heatmaps |
|
|
``` |
|
|
|
|
|
--- |
|
|
|
|
|
### π§ Training (optional) |
|
|
|
|
|
To fine-tune on your own dataset: |
|
|
|
|
|
* Convert your data to COCO format |
|
|
* Use the notebook provided (`05_detr_pose_coco_colab.ipynb`) |
|
|
* Change paths and re-train |
|
|
|
|
|
--- |
|
|
|
|
|
### β¨ Credit |
|
|
|
|
|
* [Hugging Face Transformers](https://github.com/huggingface/transformers) |
|
|
* [COCO Dataset](https://cocodataset.org/) |
|
|
* [facebook/detr](https://huggingface.co/facebook/detr-resnet-50) |
|
|
|
|
|
|