---
license: apache-2.0
base_model: depth-anything/Depth-Anything-V2-Small
tags:
  - robotics
  - edge-deployment
  - anima
  - forge
  - depth-estimation
  - monocular-depth
  - safetensors
  - vision
  - ros2
  - jetson
  - real-time
library_name: transformers
pipeline_tag: depth-estimation
model-index:
  - name: depth-anything-v2-small
    results:
      - task:
          type: depth-estimation
        metrics:
          - name: Model Size (MB)
            type: model_size
            value: 95
---

# Depth Anything V2 Small — SafeTensors

> Depth Anything V2 (Small, ViT-S backbone) converted to SafeTensors for real-time robotic depth estimation. At just **95 MB**, this is the lightest production-quality monocular depth model available — perfect for edge devices like Jetson Nano.

This model is part of the **[RobotFlowLabs](https://huggingface.co/robotflowlabs)** model library, built for the **ANIMA** agentic robotics platform.

## Why This Model Exists

Depth estimation needs to run alongside segmentation, features, and action models — all on the same edge GPU. At 95 MB, Depth Anything V2 Small is tiny enough to fit in any perception stack while still producing high-quality relative depth maps. Converted from raw `.pth` to SafeTensors for safe, zero-copy loading.

## Model Details

| Property | Value |
|----------|-------|
| **Architecture** | DPT head + ViT-Small encoder |
| **Parameters** | 24.8M |
| **Encoder** | ViT-S/14 (DINOv2-based) |
| **Input Resolution** | Flexible (recommended 518×518) |
| **Output** | Dense relative depth map |
| **Original Model** | [`depth-anything/Depth-Anything-V2-Small`](https://huggingface.co/depth-anything/Depth-Anything-V2-Small) |
| **License** | Apache-2.0 |

## Quick Start

```python
from safetensors.torch import load_file

state_dict = load_file("model.safetensors")

from depth_anything_v2.dpt import DepthAnythingV2
model = DepthAnythingV2(encoder='vits', features=64, out_channels=[48, 96, 192, 384])
model.load_state_dict(state_dict)
model.to("cuda").eval()

depth = model.infer_image(image)
```

## Use Cases in ANIMA

- **Real-Time Obstacle Avoidance** — Fastest depth estimation for navigation at camera framerate
- **Grasp Distance** — Quick depth estimate for reach planning
- **Mobile Robots** — Fits on Jetson Nano-class devices alongside other models
- **Multi-Camera Setups** — Small enough to run one instance per camera

## Depth Anything V2 Family

| Model | Params | Size | Best For |
|-------|--------|------|----------|
| [depth-anything-v2-large](https://huggingface.co/robotflowlabs/depth-anything-v2-large) | 335M | 1.3 GB | Highest quality depth |
| **[depth-anything-v2-small](https://huggingface.co/robotflowlabs/depth-anything-v2-small)** | **24.8M** | **95 MB** | **Real-time edge deployment** |

## Limitations

- Relative depth only — not metric (needs calibration for absolute distances)
- Lower accuracy than Large variant on complex scenes
- Single-frame estimation — no temporal consistency

## Attribution

- **Original Model**: [`depth-anything/Depth-Anything-V2-Small`](https://huggingface.co/depth-anything/Depth-Anything-V2-Small) by TUM & HKU
- **License**: [Apache-2.0](https://www.apache.org/licenses/LICENSE-2.0)
- **Paper**: [Depth Anything V2](https://arxiv.org/abs/2406.09414) — Yang et al., 2024
- **Converted by**: [RobotFlowLabs](https://huggingface.co/robotflowlabs) using [FORGE](https://github.com/robotflowlabs/forge)

## Citation

```bibtex
@article{yang2024depth_anything_v2,
  title={Depth Anything V2},
  author={Yang, Lihe and Kang, Bingyi and Huang, Zilong and Zhao, Zhen and Xu, Xiaogang and Feng, Jiashi and Zhao, Hengshuang},
  journal={arXiv preprint arXiv:2406.09414},
  year={2024}
}
```

---

<p align="center">
  <b>Built with FORGE by <a href="https://huggingface.co/robotflowlabs">RobotFlowLabs</a></b><br>
  Optimizing foundation models for real robots.
</p>