--- license: apache-2.0 base_model: depth-anything/Depth-Anything-V2-Small tags: - robotics - edge-deployment - anima - forge - depth-estimation - monocular-depth - safetensors - vision - ros2 - jetson - real-time library_name: transformers pipeline_tag: depth-estimation model-index: - name: depth-anything-v2-small results: - task: type: depth-estimation metrics: - name: Model Size (MB) type: model_size value: 95 --- # Depth Anything V2 Small — SafeTensors > Depth Anything V2 (Small, ViT-S backbone) converted to SafeTensors for real-time robotic depth estimation. At just **95 MB**, this is the lightest production-quality monocular depth model available — perfect for edge devices like Jetson Nano. This model is part of the **[RobotFlowLabs](https://huggingface.co/robotflowlabs)** model library, built for the **ANIMA** agentic robotics platform. ## Why This Model Exists Depth estimation needs to run alongside segmentation, features, and action models — all on the same edge GPU. At 95 MB, Depth Anything V2 Small is tiny enough to fit in any perception stack while still producing high-quality relative depth maps. Converted from raw `.pth` to SafeTensors for safe, zero-copy loading. ## Model Details | Property | Value | |----------|-------| | **Architecture** | DPT head + ViT-Small encoder | | **Parameters** | 24.8M | | **Encoder** | ViT-S/14 (DINOv2-based) | | **Input Resolution** | Flexible (recommended 518×518) | | **Output** | Dense relative depth map | | **Original Model** | [`depth-anything/Depth-Anything-V2-Small`](https://huggingface.co/depth-anything/Depth-Anything-V2-Small) | | **License** | Apache-2.0 | ## Quick Start ```python from safetensors.torch import load_file state_dict = load_file("model.safetensors") from depth_anything_v2.dpt import DepthAnythingV2 model = DepthAnythingV2(encoder='vits', features=64, out_channels=[48, 96, 192, 384]) model.load_state_dict(state_dict) model.to("cuda").eval() depth = model.infer_image(image) ``` ## Use Cases in ANIMA - **Real-Time Obstacle Avoidance** — Fastest depth estimation for navigation at camera framerate - **Grasp Distance** — Quick depth estimate for reach planning - **Mobile Robots** — Fits on Jetson Nano-class devices alongside other models - **Multi-Camera Setups** — Small enough to run one instance per camera ## Depth Anything V2 Family | Model | Params | Size | Best For | |-------|--------|------|----------| | [depth-anything-v2-large](https://huggingface.co/robotflowlabs/depth-anything-v2-large) | 335M | 1.3 GB | Highest quality depth | | **[depth-anything-v2-small](https://huggingface.co/robotflowlabs/depth-anything-v2-small)** | **24.8M** | **95 MB** | **Real-time edge deployment** | ## Limitations - Relative depth only — not metric (needs calibration for absolute distances) - Lower accuracy than Large variant on complex scenes - Single-frame estimation — no temporal consistency ## Attribution - **Original Model**: [`depth-anything/Depth-Anything-V2-Small`](https://huggingface.co/depth-anything/Depth-Anything-V2-Small) by TUM & HKU - **License**: [Apache-2.0](https://www.apache.org/licenses/LICENSE-2.0) - **Paper**: [Depth Anything V2](https://arxiv.org/abs/2406.09414) — Yang et al., 2024 - **Converted by**: [RobotFlowLabs](https://huggingface.co/robotflowlabs) using [FORGE](https://github.com/robotflowlabs/forge) ## Citation ```bibtex @article{yang2024depth_anything_v2, title={Depth Anything V2}, author={Yang, Lihe and Kang, Bingyi and Huang, Zilong and Zhao, Zhen and Xu, Xiaogang and Feng, Jiashi and Zhao, Hengshuang}, journal={arXiv preprint arXiv:2406.09414}, year={2024} } ``` ---
Built with FORGE by RobotFlowLabs
Optimizing foundation models for real robots.