RF-DETR (Medium)

RF-DETR is a real-time detection transformer family introduced in RF-DETR: Neural Architecture Search for Real-Time Detection Transformers by Robinson et al. and integrated in 🤗 Transformers via PR #36895. Disclaimer: This model was originally contributed by stevenbucaille in 🤗 transformers.

Model description

RF-DETR is an end-to-end object detection model that combines ideas from LW-DETR and Deformable DETR: a DINOv2-with-registers style ViT backbone (with an RF-DETR windowing pattern for efficient attention), a multi-scale projector between encoder and decoder, and a multi-scale deformable DETR decoder for fast convergence and strong accuracy–latency tradeoffs.

Key Architectural Details:

Backbone: DINOv2-with-registers style ViT with RF-DETR windowed / full attention alternation (instead of a purely convolutional encoder).
Multi-scale fusion: RF-DETR multi-scale projector (C2f-style blocks in the LW-DETR lineage) to aggregate multi-level backbone features before the decoder.
Decoder: Deformable DETR-style decoder with multi-scale deformable cross-attention; depth and input resolution vary by checkpoint (NAS frontier).
Queries: DETR-style object queries with bipartite matching and auxiliary decoder losses for training stability.

Training Details:

Detection losses: classification plus bounding-box L1 and GIoU, with auxiliary losses on intermediate decoder layers.
Group DETR: parallel decoder copies during training for faster convergence (same high-level idea as LW-DETR's Group DETR).
NAS (family-level): the RF-DETR paper uses weight-sharing neural architecture search over practical accuracy–latency knobs after adapting a shared backbone on the target dataset, so many checkpoints correspond to different subnets without full independent retrains for every point on the frontier.

How to use

You can use the raw model for object detection. See the model hub to look for all available RF-DETR models.

Here is how to use this model:

from transformers import AutoImageProcessor, RfDetrForObjectDetection
import torch
from PIL import Image
import requests

url = "http://images.cocodataset.org/val2017/000000039769.jpg"
image = Image.open(requests.get(url, stream=True).raw)

processor = AutoImageProcessor.from_pretrained("stevenbucaille/rf-detr-medium")
model = RfDetrForObjectDetection.from_pretrained("stevenbucaille/rf-detr-medium")

inputs = processor(images=image, return_tensors="pt")
outputs = model(**inputs)

# convert outputs (bounding boxes and class logits) to COCO API
# let's only keep detections with score > 0.35
target_sizes = torch.tensor([image.size[::-1]])
results = processor.post_process_object_detection(outputs, target_sizes=target_sizes, threshold=0.35)[0]

for score, label, box in list(zip(results["scores"], results["labels"], results["boxes"]))[:8]:
    box = [round(i, 2) for i in box.tolist()]
    print(
            f"Detected {model.config.id2label[label.item()]} with confidence "
            f"{round(score.item(), 3)} at location {box}"
    )

This should output:

Detected remote with confidence 0.988 at location [40.11, 73.16, 175.23, 118.2]
Detected cat with confidence 0.988 at location [347.22, 23.4, 639.47, 374.62]
Detected cat with confidence 0.987 at location [7.72, 55.88, 316.65, 473.55]
Detected remote with confidence 0.98 at location [334.08, 76.82, 370.65, 188.08]
Detected couch with confidence 0.414 at location [1.54, 0.42, 639.09, 475.48]
Detected remote with confidence 0.345 at location [261.15, 54.76, 290.15, 78.09]
Detected remote with confidence 0.117 at location [334.03, 77.05, 370.36, 188.02]
Detected remote with confidence 0.283 at location [334.55, 124.55, 354.86, 187.27]

Training data

These checkpoints are trained on the standard COCO 2017 object detection dataset label space (80 categories) as reflected in config.id2label.

BibTeX entry and citation info

@misc{robinson2026rfdetrneuralarchitecturesearch,
      title={RF-DETR: Neural Architecture Search for Real-Time Detection Transformers},
      author={Isaac Robinson and Peter Robicheaux and Matvei Popov and Deva Ramanan and Neehar Peri},
      year={2026},
      eprint={2511.09554},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2511.09554},
}

Downloads last month: 58

Safetensors

Model size

33.7M params

Tensor type

F32

Collection including stevenbucaille/rf-detr-medium

RFDetr

Collection

RFDetr checkpoints to be used with 🤗 transformers • 16 items • Updated 2 days ago

Paper for stevenbucaille/rf-detr-medium

RF-DETR: Neural Architecture Search for Real-Time Detection Transformers

Paper • 2511.09554 • Published Nov 12, 2025 • 9