detr-pose-coco50 / README.md
Koushim's picture
Update README.md
bef9ef9 verified
---
license: apache-2.0
tags:
- pytorch
- keypoint-detection
- human-pose-estimation
- heatmap-regression
- computer-vision
- detr
- coco
model-index:
- name: detr-pose-coco50
results:
- task:
type: pose-estimation
name: Human Pose Estimation
dataset:
type: COCO
name: COCO 2017 (50-person subset)
metrics:
- type: MSELoss
value: ~0.02
name: Heatmap MSE
---
# πŸ“Œ **DETR + Keypoint Estimation (COCO Subset)**
Author: [@Koushik](https://huggingface.co/Koushim)
---
### 🧠 Model Overview
This project combines:
* πŸ€– [facebook/detr-resnet-50](https://huggingface.co/facebook/detr-resnet-50) (object detector)
* 🧱 Custom PyTorch keypoint head
* πŸ“Š Trained on 500-person subset of [COCO 2017 Keypoints](https://cocodataset.org/#keypoints-2020)
The system detects people using DETR, then predicts 17 COCO-style keypoints (top-down) using heatmap regression.
---
### πŸ“‚ Files Included
| File | Description |
| ------------------------------- | ------------------------------------------ |
| `pytorch_model.bin` | Trained PyTorch model weights |
| `05_detr_pose_coco_colab.ipynb` | Full Colab notebook (training + inference) |
| `config.json` | Basic model metadata |
| `README.md` | Project description |
---
### πŸ“š Dataset
* **Subset**: 500 images from COCO val2017 with visible persons
* **Annotations**: 17 keypoints per person
* **Source**: [COCO Keypoints](https://cocodataset.org/#keypoints-2020)
---
### πŸ—οΈ Architecture
```text
[ Input Image ]
β”‚
β–Ό
[ DETR (Person BBox) ]
β”‚
β–Ό
[ Crop + Resize (256Γ—256) ]
β”‚
β–Ό
[ CNN Keypoint Head ]
β”‚
β–Ό
[ 17 Heatmaps (Keypoints) ]
```
---
### πŸš€ Quick Start
```python
import torch
from model import KeypointHead
model = KeypointHead()
model.load_state_dict(torch.load('pytorch_model.bin'))
model.eval()
```
---
### πŸ§ͺ Inference Demo
```python
from PIL import Image
import cv2, numpy as np
from transformers import DetrImageProcessor, DetrForObjectDetection
img = Image.open('sample.jpg')
processor = DetrImageProcessor.from_pretrained("facebook/detr-resnet-50")
detector = DetrForObjectDetection.from_pretrained("facebook/detr-resnet-50")
inputs = processor(images=img, return_tensors="pt")
outputs = detector(**inputs)
results = processor.post_process_object_detection(outputs, target_sizes=[img.size[::-1]], threshold=0.8)[0]
# Use results['boxes'][0] to crop person
# Feed crop into model(img) to get 17 heatmaps
```
---
### 🧠 Training (optional)
To fine-tune on your own dataset:
* Convert your data to COCO format
* Use the notebook provided (`05_detr_pose_coco_colab.ipynb`)
* Change paths and re-train
---
### ✨ Credit
* [Hugging Face Transformers](https://github.com/huggingface/transformers)
* [COCO Dataset](https://cocodataset.org/)
* [facebook/detr](https://huggingface.co/facebook/detr-resnet-50)