|
|
<div align="center"> |
|
|
|
|
|
|
|
|
|
|
|
**Optimized fork of Depth Anything 3 with production-ready features** |
|
|
|
|
|
[](https://pypi.org/project/awesome-depth-anything-3/) |
|
|
[](https://www.python.org/) |
|
|
[](LICENSE) |
|
|
[](https://github.com/Aedelon/awesome-depth-anything-3/actions) |
|
|
[](https://colab.research.google.com/github/Aedelon/awesome-depth-anything-3/blob/main/notebooks/da3_tutorial.ipynb) |
|
|
[](https://huggingface.co/spaces/Aedelon/awesome-depth-anything-3) |
|
|
|
|
|
[Demo](https://huggingface.co/spaces/Aedelon/awesome-depth-anything-3) · [Tutorial](notebooks/da3_tutorial.ipynb) · [Benchmarks](BENCHMARKS.md) · [Original Paper](https://arxiv.org/abs/2511.10647) |
|
|
|
|
|
</div> |
|
|
|
|
|
--- |
|
|
|
|
|
> **This is an optimized fork** of [Depth Anything 3](https://github.com/ByteDance-Seed/Depth-Anything-3) by ByteDance. |
|
|
> All credit for the model architecture, training, and research goes to the original authors (see [Credits]( |
|
|
> This fork focuses on **production optimization, developer experience, and ease of deployment**. |
|
|
|
|
|
|
|
|
|
|
|
| Feature | Description | |
|
|
|---------|-------------| |
|
|
| **Model Caching** | ~200x faster model loading after first use | |
|
|
| **Adaptive Batching** | Automatic batch size optimization based on GPU memory | |
|
|
| **PyPI Package** | `pip install awesome-depth-anything-3` | |
|
|
| **CLI Improvements** | Batch processing options, better error handling | |
|
|
| **Apple Silicon Optimized** | Smart CPU/GPU preprocessing for best MPS performance | |
|
|
| **Comprehensive Benchmarks** | Detailed performance analysis across devices | |
|
|
|
|
|
|
|
|
|
|
|
| Metric | Upstream | This Fork | Improvement | |
|
|
|--------|----------|-----------|-------------| |
|
|
| Cached model load | ~1s | ~5ms | **200x faster** | |
|
|
| Batch 4 inference (MPS) | 3.32 img/s | 3.78 img/s | **1.14x faster** | |
|
|
| Cold model load | 1.28s | 0.77s | **1.7x faster** | |
|
|
|
|
|
--- |
|
|
|
|
|
<div align="center"> |
|
|
|
|
|
|
|
|
|
|
|
<h3>Recovering the Visual Space from Any Views</h3> |
|
|
|
|
|
[**Haotong Lin**](https://haotongl.github.io/)<sup>*</sup> · [**Sili Chen**](https://github.com/SiliChen321)<sup>*</sup> · [**Jun Hao Liew**](https://liewjunhao.github.io/)<sup>*</sup> · [**Donny Y. Chen**](https://donydchen.github.io)<sup>*</sup> · [**Zhenyu Li**](https://zhyever.github.io/) · [**Guang Shi**](https://scholar.google.com/citations?user=MjXxWbUAAAAJ&hl=en) · [**Jiashi Feng**](https://scholar.google.com.sg/citations?user=Q8iay0gAAAAJ&hl=en) |
|
|
<br> |
|
|
[**Bingyi Kang**](https://bingykang.github.io/)<sup>*†</sup> |
|
|
|
|
|
†project lead *Equal Contribution |
|
|
|
|
|
<a href="https://arxiv.org/abs/2511.10647"><img src='https://img.shields.io/badge/arXiv-Depth Anything 3-red' alt='Paper PDF'></a> |
|
|
<a href='https://depth-anything-3.github.io'><img src='https://img.shields.io/badge/Project_Page-Depth Anything 3-green' alt='Project Page'></a> |
|
|
<a href='https://huggingface.co/spaces/depth-anything/Depth-Anything-3'><img src='https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Official Demo-blue'></a> |
|
|
|
|
|
</div> |
|
|
|
|
|
This work presents **Depth Anything 3 (DA3)**, a model that predicts spatially consistent geometry from |
|
|
arbitrary visual inputs, with or without known camera poses. |
|
|
In pursuit of minimal modeling, DA3 yields two key insights: |
|
|
- 💎 A **single plain transformer** (e.g., vanilla DINO encoder) is sufficient as a backbone without architectural specialization, |
|
|
- ✨ A singular **depth-ray representation** obviates the need for complex multi-task learning. |
|
|
|
|
|
🏆 DA3 significantly outperforms |
|
|
[DA2](https://github.com/DepthAnything/Depth-Anything-V2) for monocular depth estimation, |
|
|
and [VGGT](https://github.com/facebookresearch/vggt) for multi-view depth estimation and pose estimation. |
|
|
All models are trained exclusively on **public academic datasets**. |
|
|
|
|
|
<!-- <p align="center"> |
|
|
<img src="assets/images/da3_teaser.png" alt="Depth Anything 3" width="100%"> |
|
|
</p> --> |
|
|
<p align="center"> |
|
|
<img src="assets/images/demo320-2.gif" alt="Depth Anything 3 - Left" width="70%"> |
|
|
</p> |
|
|
<p align="center"> |
|
|
<img src="assets/images/da3_radar.png" alt="Depth Anything 3" width="100%"> |
|
|
</p> |
|
|
|
|
|
|
|
|
|
|
|
- **30-11-2025:** Add [`use_ray_pose`]( |
|
|
- **25-11-2025:** Add [Awesome DA3 Projects]( |
|
|
- **14-11-2025:** Paper, project page, code and models are all released. |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
We release three series of models, each tailored for specific use cases in visual geometry. |
|
|
|
|
|
- 🌟 **DA3 Main Series** (`DA3-Giant`, `DA3-Large`, `DA3-Base`, `DA3-Small`) These are our flagship foundation models, trained with a unified depth-ray representation. By varying the input configuration, a single model can perform a wide range of tasks: |
|
|
+ 🌊 **Monocular Depth Estimation**: Predicts a depth map from a single RGB image. |
|
|
+ 🌊 **Multi-View Depth Estimation**: Generates consistent depth maps from multiple images for high-quality fusion. |
|
|
+ 🎯 **Pose-Conditioned Depth Estimation**: Achieves superior depth consistency when camera poses are provided as input. |
|
|
+ 📷 **Camera Pose Estimation**: Estimates camera extrinsics and intrinsics from one or more images. |
|
|
+ 🟡 **3D Gaussian Estimation**: Directly predicts 3D Gaussians, enabling high-fidelity novel view synthesis. |
|
|
|
|
|
- 📐 **DA3 Metric Series** (`DA3Metric-Large`) A specialized model fine-tuned for metric depth estimation in monocular settings, ideal for applications requiring real-world scale. |
|
|
|
|
|
- 🔍 **DA3 Monocular Series** (`DA3Mono-Large`). A dedicated model for high-quality relative monocular depth estimation. Unlike disparity-based models (e.g., [Depth Anything 2](https://github.com/DepthAnything/Depth-Anything-V2)), it directly predicts depth, resulting in superior geometric accuracy. |
|
|
|
|
|
🔗 Leveraging these available models, we developed a **nested series** (`DA3Nested-Giant-Large`). This series combines a any-view giant model with a metric model to reconstruct visual geometry at a real-world metric scale. |
|
|
|
|
|
|
|
|
Our repository is designed to be a powerful and user-friendly toolkit for both practical application and future research. |
|
|
- 🎨 **Interactive Web UI & Gallery**: Visualize model outputs and compare results with an easy-to-use Gradio-based web interface. |
|
|
- ⚡ **Flexible Command-Line Interface (CLI)**: Powerful and scriptable CLI for batch processing and integration into custom workflows. |
|
|
- 💾 **Multiple Export Formats**: Save your results in various formats, including `glb`, `npz`, depth images, `ply`, 3DGS videos, etc, to seamlessly connect with other tools. |
|
|
- 🔧 **Extensible and Modular Design**: The codebase is structured to facilitate future research and the integration of new models or functionalities. |
|
|
|
|
|
|
|
|
<!-- |
|
|
We introduce a new benchmark to rigorously evaluate geometry prediction models on three key tasks: pose estimation, 3D reconstruction, and visual rendering (novel view synthesis) quality. |
|
|
|
|
|
- 🔄 **Broad Model Compatibility**: Our benchmark is designed to be versatile, supporting the evaluation of various models, including both monocular and multi-view depth estimation approaches. |
|
|
- 🔬 **Robust Evaluation Pipeline**: We provide a standardized pipeline featuring RANSAC-based pose alignment, TSDF fusion for dense reconstruction, and a principled view selection strategy for novel view synthesis. |
|
|
- 📊 **Standardized Metrics**: Performance is measured using established metrics: AUC for pose accuracy, F1-score and Chamfer Distance for reconstruction, and PSNR/SSIM/LPIPS for rendering quality. |
|
|
- 🌍 **Diverse and Challenging Datasets**: The benchmark spans a wide range of scenes from datasets like HiRoom, ETH3D, DTU, 7Scenes, ScanNet++, DL3DV, Tanks and Temples, and MegaDepth. --> |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
```bash |
|
|
|
|
|
pip install awesome-depth-anything-3 |
|
|
|
|
|
|
|
|
pip install awesome-depth-anything-3[app] |
|
|
|
|
|
|
|
|
pip install awesome-depth-anything-3[cuda] |
|
|
|
|
|
|
|
|
pip install awesome-depth-anything-3[all] |
|
|
``` |
|
|
|
|
|
<details> |
|
|
<summary><b>Development installation</b></summary> |
|
|
|
|
|
```bash |
|
|
git clone https://github.com/Aedelon/awesome-depth-anything-3.git |
|
|
cd awesome-depth-anything-3 |
|
|
pip install -e ".[dev]" |
|
|
|
|
|
|
|
|
pip install --no-build-isolation git+https://github.com/nerfstudio-project/gsplat.git@0b4dddf |
|
|
``` |
|
|
</details> |
|
|
|
|
|
For detailed model information, please refer to the [Model Cards]( |
|
|
|
|
|
|
|
|
|
|
|
```python |
|
|
import glob, os, torch |
|
|
from depth_anything_3.api import DepthAnything3 |
|
|
device = torch.device("cuda") |
|
|
model = DepthAnything3.from_pretrained("depth-anything/DA3NESTED-GIANT-LARGE") |
|
|
model = model.to(device=device) |
|
|
example_path = "assets/examples/SOH" |
|
|
images = sorted(glob.glob(os.path.join(example_path, "*.png"))) |
|
|
prediction = model.inference( |
|
|
images, |
|
|
) |
|
|
|
|
|
print(prediction.processed_images.shape) |
|
|
|
|
|
print(prediction.depth.shape) |
|
|
|
|
|
print(prediction.conf.shape) |
|
|
|
|
|
print(prediction.extrinsics.shape) |
|
|
|
|
|
print(prediction.intrinsics.shape) |
|
|
``` |
|
|
|
|
|
```bash |
|
|
|
|
|
export MODEL_DIR=depth-anything/DA3NESTED-GIANT-LARGE |
|
|
|
|
|
|
|
|
|
|
|
export GALLERY_DIR=workspace/gallery |
|
|
mkdir -p $GALLERY_DIR |
|
|
|
|
|
|
|
|
da3 backend --model-dir ${MODEL_DIR} --gallery-dir ${GALLERY_DIR} |
|
|
da3 auto assets/examples/SOH \ |
|
|
--export-format glb \ |
|
|
--export-dir ${GALLERY_DIR}/TEST_BACKEND/SOH \ |
|
|
--use-backend |
|
|
|
|
|
|
|
|
da3 video assets/examples/robot_unitree.mp4 \ |
|
|
--fps 15 \ |
|
|
--use-backend \ |
|
|
--export-dir ${GALLERY_DIR}/TEST_BACKEND/robo \ |
|
|
--export-format glb-feat_vis \ |
|
|
--feat-vis-fps 15 \ |
|
|
--process-res-method lower_bound_resize \ |
|
|
--export-feat "11,21,31" |
|
|
|
|
|
|
|
|
da3 auto assets/examples/SOH \ |
|
|
--export-format glb \ |
|
|
--export-dir ${GALLERY_DIR}/TEST_CLI/SOH \ |
|
|
--model-dir ${MODEL_DIR} |
|
|
|
|
|
``` |
|
|
|
|
|
The model architecture is defined in [`DepthAnything3Net`](src/depth_anything_3/model/da3.py), and specified with a Yaml config file located at [`src/depth_anything_3/configs`](src/depth_anything_3/configs). The input and output processing are handled by [`DepthAnything3`](src/depth_anything_3/api.py). To customize the model architecture, simply create a new config file (*e.g.*, `path/to/new/config`) as: |
|
|
|
|
|
```yaml |
|
|
__object__: |
|
|
path: depth_anything_3.model.da3 |
|
|
name: DepthAnything3Net |
|
|
args: as_params |
|
|
|
|
|
net: |
|
|
__object__: |
|
|
path: depth_anything_3.model.dinov2.dinov2 |
|
|
name: DinoV2 |
|
|
args: as_params |
|
|
|
|
|
name: vitb |
|
|
out_layers: [5, 7, 9, 11] |
|
|
alt_start: 4 |
|
|
qknorm_start: 4 |
|
|
rope_start: 4 |
|
|
cat_token: True |
|
|
|
|
|
head: |
|
|
__object__: |
|
|
path: depth_anything_3.model.dualdpt |
|
|
name: DualDPT |
|
|
args: as_params |
|
|
|
|
|
dim_in: &head_dim_in 1536 |
|
|
output_dim: 2 |
|
|
features: &head_features 128 |
|
|
out_channels: &head_out_channels [96, 192, 384, 768] |
|
|
``` |
|
|
|
|
|
Then, the model can be created with the following code snippet. |
|
|
```python |
|
|
from depth_anything_3.cfg import create_object, load_config |
|
|
|
|
|
Model = create_object(load_config("path/to/new/config")) |
|
|
``` |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
- 🖥️ [Command Line Interface](docs/CLI.md) |
|
|
- 📑 [Python API](docs/API.md) |
|
|
<!-- - 🏁 [Visual Geometry Benchmark](docs/BENCHMARK.md) --> |
|
|
|
|
|
|
|
|
|
|
|
Generally, you should observe that DA3-LARGE achieves comparable results to VGGT. |
|
|
|
|
|
The Nested series uses an Any-view model to estimate pose and depth, and a monocular metric depth estimator for scaling. |
|
|
|
|
|
| 🗃️ Model Name | 📏 Params | 📊 Rel. Depth | 📷 Pose Est. | 🧭 Pose Cond. | 🎨 GS | 📐 Met. Depth | ☁️ Sky Seg | 📄 License | |
|
|
|-------------------------------|-----------|---------------|--------------|---------------|-------|---------------|-----------|----------------| |
|
|
| **Nested** | | | | | | | | | |
|
|
| [DA3NESTED-GIANT-LARGE](https://huggingface.co/depth-anything/DA3NESTED-GIANT-LARGE) | 1.40B | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | CC BY-NC 4.0 | |
|
|
| **Any-view Model** | | | | | | | | | |
|
|
| [DA3-GIANT](https://huggingface.co/depth-anything/DA3-GIANT) | 1.15B | ✅ | ✅ | ✅ | ✅ | | | CC BY-NC 4.0 | |
|
|
| [DA3-LARGE](https://huggingface.co/depth-anything/DA3-LARGE) | 0.35B | ✅ | ✅ | ✅ | | | | CC BY-NC 4.0 | |
|
|
| [DA3-BASE](https://huggingface.co/depth-anything/DA3-BASE) | 0.12B | ✅ | ✅ | ✅ | | | | Apache 2.0 | |
|
|
| [DA3-SMALL](https://huggingface.co/depth-anything/DA3-SMALL) | 0.08B | ✅ | ✅ | ✅ | | | | Apache 2.0 | |
|
|
| | | | | | | | | | |
|
|
| **Monocular Metric Depth** | | | | | | | | | |
|
|
| [DA3METRIC-LARGE](https://huggingface.co/depth-anything/DA3METRIC-LARGE) | 0.35B | ✅ | | | | ✅ | ✅ | Apache 2.0 | |
|
|
| | | | | | | | | | |
|
|
| **Monocular Depth** | | | | | | | | | |
|
|
| [DA3MONO-LARGE](https://huggingface.co/depth-anything/DA3MONO-LARGE) | 0.35B | ✅ | | | | | ✅ | Apache 2.0 | |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Inference throughput measured on Apple Silicon (MPS) with PyTorch 2.9.0. For detailed benchmarks, see [BENCHMARKS.md](BENCHMARKS.md). |
|
|
|
|
|
|
|
|
|
|
|
| Model | Latency | Throughput | |
|
|
|-------|---------|------------| |
|
|
| DA3-Small | 46 ms | **22 img/s** | |
|
|
| DA3-Base | 93 ms | **11 img/s** | |
|
|
| DA3-Large | 265 ms | **3.8 img/s** | |
|
|
| DA3-Giant | 618 ms | **1.6 img/s** | |
|
|
|
|
|
|
|
|
|
|
|
| Device | Throughput | vs CPU | |
|
|
|--------|------------|--------| |
|
|
| CPU | 0.3 img/s | 1.0x | |
|
|
| Apple Silicon (MPS) | 3.8 img/s | **13x** | |
|
|
| NVIDIA L4 (CUDA) | 10.3 img/s | **34x** | |
|
|
|
|
|
|
|
|
|
|
|
```python |
|
|
from depth_anything_3.api import DepthAnything3 |
|
|
|
|
|
model = DepthAnything3.from_pretrained("depth-anything/DA3-LARGE") |
|
|
|
|
|
|
|
|
results = model.batch_inference( |
|
|
images=image_paths, |
|
|
batch_size="auto", |
|
|
target_memory_utilization=0.85, |
|
|
) |
|
|
|
|
|
|
|
|
results = model.batch_inference( |
|
|
images=image_paths, |
|
|
batch_size=4, |
|
|
) |
|
|
``` |
|
|
|
|
|
> See [BENCHMARKS.md](BENCHMARKS.md) for comprehensive benchmarks including preprocessing, attention mechanisms, and adaptive batching strategies. |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
- **Monocular Metric Depth**: To obtain metric depth in meters from `DA3METRIC-LARGE`, use `metric_depth = focal * net_output / 300.`, where `focal` is the focal length in pixels (typically the average of fx and fy from the camera intrinsic matrix K). Note that the output from `DA3NESTED-GIANT-LARGE` is already in meters. |
|
|
|
|
|
- <a id="use-ray-pose"></a>**Ray Head (`use_ray_pose`)**: Our API and CLI support `use_ray_pose` arg, which means that the model will derive camera pose from ray head, which is generally slightly slower, but more accurate. Note that the default is `False` for faster inference speed. |
|
|
<details> |
|
|
<summary>AUC3 Results for DA3NESTED-GIANT-LARGE</summary> |
|
|
|
|
|
| Model | HiRoom | ETH3D | DTU | 7Scenes | ScanNet++ | |
|
|
|-------|------|-------|-----|---------|-----------| |
|
|
| `ray_head` | 84.4 | 52.6 | 93.9 | 29.5 | 89.4 | |
|
|
| `cam_head` | 80.3 | 48.4 | 94.1 | 28.5 | 85.0 | |
|
|
|
|
|
</details> |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
- **Older GPUs without XFormers support**: See [Issue |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
A community-curated list of Depth Anything 3 integrations across 3D tools, creative pipelines, robotics, and web/VR viewers, including but not limited to these. You are welcome to submit your DA3-based project via PR, and we will review and feature it if applicable. |
|
|
|
|
|
- [DA3-blender](https://github.com/xy-gao/DA3-blender): Blender addon for DA3-based 3D reconstruction from a set of images. |
|
|
|
|
|
- [ComfyUI-DepthAnythingV3](https://github.com/PozzettiAndrea/ComfyUI-DepthAnythingV3): ComfyUI nodes for Depth Anything 3, supporting single/multi-view and video-consistent depth with optional point‑cloud export. |
|
|
|
|
|
- [DA3-ROS2-Wrapper](https://github.com/GerdsenAI/GerdsenAI-Depth-Anything-3-ROS2-Wrapper): Real-time DA3 depth in ROS2 with multi-camera support. |
|
|
|
|
|
- [VideoDepthViewer3D](https://github.com/amariichi/VideoDepthViewer3D): Streaming videos with DA3 metric depth to a Three.js/WebXR 3D viewer for VR/stereo playback. |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
This package is built on top of **Depth Anything 3**, created by the ByteDance Seed team: |
|
|
|
|
|
- [Haotong Lin](https://haotongl.github.io/), [Sili Chen](https://github.com/SiliChen321), [Jun Hao Liew](https://liewjunhao.github.io/), [Donny Y. Chen](https://donydchen.github.io), [Zhenyu Li](https://zhyever.github.io/), [Guang Shi](https://scholar.google.com/citations?user=MjXxWbUAAAAJ), [Jiashi Feng](https://scholar.google.com.sg/citations?user=Q8iay0gAAAAJ), [Bingyi Kang](https://bingykang.github.io/) |
|
|
|
|
|
All model weights, architecture, and core algorithms are their work. This fork only adds production optimizations and deployment tooling. |
|
|
|
|
|
|
|
|
|
|
|
This optimized fork is maintained by [Delanoe Pirard (Aedelon)](https://github.com/Aedelon). |
|
|
|
|
|
Contributions: |
|
|
- Model caching system |
|
|
- Adaptive batching |
|
|
- Apple Silicon (MPS) optimizations |
|
|
- PyPI packaging and CI/CD |
|
|
- Comprehensive benchmarking |
|
|
|
|
|
|
|
|
|
|
|
If you use Depth Anything 3 in your research, please cite the original paper: |
|
|
|
|
|
```bibtex |
|
|
@article{depthanything3, |
|
|
title={Depth Anything 3: Recovering the visual space from any views}, |
|
|
author={Haotong Lin and Sili Chen and Jun Hao Liew and Donny Y. Chen and Zhenyu Li and Guang Shi and Jiashi Feng and Bingyi Kang}, |
|
|
journal={arXiv preprint arXiv:2511.10647}, |
|
|
year={2025} |
|
|
} |
|
|
``` |
|
|
|
|
|
If you specifically use features from this fork (caching, batching, MPS optimizations), you may additionally reference: |
|
|
|
|
|
``` |
|
|
awesome-depth-anything-3: https://github.com/Aedelon/awesome-depth-anything-3 |
|
|
``` |
|
|
|