# üåä Depth Anything 3 ‚Äî From Images to 3D in Seconds

<div align="center">

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Aedelon/awesome-depth-anything-3/blob/main/notebooks/da3_tutorial.ipynb)
[![GitHub Stars](https://img.shields.io/github/stars/Aedelon/awesome-depth-anything-3?style=social)](https://github.com/Aedelon/awesome-depth-anything-3)
[![PyPI](https://img.shields.io/pypi/v/awesome-depth-anything-3)](https://pypi.org/project/awesome-depth-anything-3/)
[![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)

**State-of-the-art monocular depth estimation + 3D reconstruction**

</div>

---

### What you'll get:

| Input | Output |
|-------|--------|
| üì∏ Single image | üåä Metric depth map |
| üé¨ Video / Multi-view | ‚òÅÔ∏è 3D Point Cloud + Camera poses |
| üñºÔ∏è Any scene | üì¶ Downloadable GLB file |

---

### ‚ö° Quick Start

1. **Runtime ‚Üí Change runtime type ‚Üí T4 GPU** (free tier works!)
2. **Run all cells** (Ctrl+F9) or click ‚ñ∂Ô∏è on each cell
3. **Upload your images** in Section 4
4. **Download your 3D model** (.glb file)

‚è±Ô∏è **Total time: ~5 minutes** (including model download)

In [None]:
#@title üöÄ **1. Install** (run this first!) { display-mode: "form" }
#@markdown > ‚è±Ô∏è Takes ~2 minutes on first run

%%capture
!pip install awesome-depth-anything-3

# Verify installation
import torch
from IPython.display import HTML, display

device = "cuda" if torch.cuda.is_available() else "cpu"
gpu_name = torch.cuda.get_device_name(0) if device == "cuda" else "None"
vram = torch.cuda.get_device_properties(0).total_memory / 1e9 if device == "cuda" else 0

if device == "cuda":
    status = f'''
    <div style="background: linear-gradient(135deg, #10B981, #059669); padding: 20px; border-radius: 12px; color: white; font-family: system-ui;">
        <h3 style="margin: 0 0 10px 0;">‚úÖ Ready to go!</h3>
        <p style="margin: 5px 0;"><b>GPU:</b> {gpu_name}</p>
        <p style="margin: 5px 0;"><b>VRAM:</b> {vram:.1f} GB</p>
        <p style="margin: 5px 0;"><b>PyTorch:</b> {torch.__version__}</p>
    </div>
    '''
else:
    status = '''
    <div style="background: linear-gradient(135deg, #EF4444, #DC2626); padding: 20px; border-radius: 12px; color: white; font-family: system-ui;">
        <h3 style="margin: 0 0 10px 0;">‚ö†Ô∏è No GPU detected!</h3>
        <p style="margin: 5px 0;">Go to <b>Runtime ‚Üí Change runtime type ‚Üí GPU</b></p>
        <p style="margin: 5px 0;">Then restart the notebook.</p>
    </div>
    '''

display(HTML(status))

In [None]:
#@title üß† **2. Load Model** { display-mode: "form" }
#@markdown Choose model size:
model_size = "DA3-LARGE" #@param ["DA3-SMALL", "DA3-BASE", "DA3-LARGE", "DA3-GIANT", "DA3NESTED-GIANT-LARGE"]
#@markdown ---
#@markdown | Model | Speed | Quality | VRAM |
#@markdown |-------|-------|---------|------|
#@markdown | SMALL | ‚ö°‚ö°‚ö° | ‚òÖ‚òÖ‚òÜ | 4GB |
#@markdown | BASE | ‚ö°‚ö° | ‚òÖ‚òÖ‚òÖ | 6GB |
#@markdown | LARGE | ‚ö° | ‚òÖ‚òÖ‚òÖ‚òÖ | 8GB |
#@markdown | GIANT | üê¢ | ‚òÖ‚òÖ‚òÖ‚òÖ‚òÖ | 12GB |
#@markdown | NESTED | üê¢ | ‚òÖ‚òÖ‚òÖ‚òÖ‚òÖ+ | 16GB |

from depth_anything_3.api import DepthAnything3
import time

print(f"üì• Loading {model_size}...")
start = time.time()

model = DepthAnything3.from_pretrained(f"depth-anything/{model_size}")
model = model.to(device).eval()

print(f"‚úÖ Model loaded in {time.time()-start:.1f}s")

In [None]:
#@title üñºÔ∏è **3. Try with Sample Image** { display-mode: "form" }
#@markdown Run depth estimation on a sample image

import matplotlib.pyplot as plt
import numpy as np
from PIL import Image
import urllib.request
import os

# Download sample
os.makedirs("samples", exist_ok=True)
url = "https://images.unsplash.com/photo-1506905925346-21bda4d32df4?w=1280"
urllib.request.urlretrieve(url, "samples/mountain.jpg")

# Run inference
result = model.inference(["samples/mountain.jpg"])

# Visualize
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

axes[0].imshow(result.processed_images[0])
axes[0].set_title("üì∏ Input", fontsize=14, fontweight='bold')
axes[0].axis("off")

depth = result.depth[0]
im = axes[1].imshow(depth, cmap='Spectral_r')
axes[1].set_title(f"üåä Depth (range: {depth.min():.1f}m - {depth.max():.1f}m)", fontsize=14, fontweight='bold')
axes[1].axis("off")
plt.colorbar(im, ax=axes[1], fraction=0.046, pad=0.04, label='Depth (m)')

plt.tight_layout()
plt.show()

print(f"\nüìä Output shapes:")
print(f"   Depth: {result.depth.shape}")
print(f"   Confidence: {result.conf.shape}")
print(f"   Camera intrinsics: {result.intrinsics.shape}")

---

## üì§ 4. Use Your Own Images

Upload your images and get a 3D point cloud!

In [None]:
#@title üìÅ **Upload Images** { display-mode: "form" }
#@markdown Upload **2-50 images** of the same scene from different angles.
#@markdown 
#@markdown üí° **Tips for best results:**
#@markdown - Move the camera, not the objects
#@markdown - 30-50% overlap between consecutive images
#@markdown - Avoid motion blur
#@markdown - Good lighting helps!

from google.colab import files
import shutil

# Clean up previous uploads
upload_dir = "my_images"
if os.path.exists(upload_dir):
    shutil.rmtree(upload_dir)
os.makedirs(upload_dir, exist_ok=True)

print("üì§ Select your images...")
uploaded = files.upload()

# Save uploaded files
for filename, data in uploaded.items():
    with open(f"{upload_dir}/{filename}", 'wb') as f:
        f.write(data)

image_files = sorted([f"{upload_dir}/{f}" for f in os.listdir(upload_dir) 
                      if f.lower().endswith(('.jpg', '.jpeg', '.png', '.webp'))])

print(f"\n‚úÖ Uploaded {len(image_files)} images")

# Preview
n_preview = min(6, len(image_files))
fig, axes = plt.subplots(1, n_preview, figsize=(3*n_preview, 3))
if n_preview == 1:
    axes = [axes]
for i, img_path in enumerate(image_files[:n_preview]):
    img = Image.open(img_path)
    axes[i].imshow(img)
    axes[i].set_title(f"#{i+1}", fontsize=10)
    axes[i].axis("off")
if len(image_files) > n_preview:
    print(f"   (showing first {n_preview} of {len(image_files)})")
plt.tight_layout()
plt.show()

In [None]:
#@title ‚ö° **Run 3D Reconstruction** { display-mode: "form" }
#@markdown This will:
#@markdown 1. Estimate depth for each image
#@markdown 2. Compute camera poses
#@markdown 3. Generate a 3D point cloud
#@markdown 4. Export to GLB format

from depth_anything_3.utils.export.glb import export_to_glb
import time

print(f"üîÑ Processing {len(image_files)} images...")
start = time.time()

# Run inference
result = model.inference(
    image_files,
    process_res_method="upper_bound_resize",
)

inference_time = time.time() - start
print(f"‚úÖ Inference done in {inference_time:.1f}s ({len(image_files)/inference_time:.1f} img/s)")

# Export to GLB
output_dir = "output_3d"
os.makedirs(output_dir, exist_ok=True)

print("üì¶ Generating 3D point cloud...")
export_to_glb(
    result,
    export_dir=output_dir,
    show_cameras=True,
    conf_thresh_percentile=20,  # Filter low-confidence points
    num_max_points=500_000,
)

print(f"\n‚úÖ 3D model saved to {output_dir}/")
!ls -lh {output_dir}/

In [None]:
#@title üì• **Download Your 3D Model** { display-mode: "form" }
#@markdown Downloads a `.glb` file you can view in:
#@markdown - [glTF Viewer](https://gltf-viewer.donmccurdy.com/)
#@markdown - Blender
#@markdown - Windows 3D Viewer
#@markdown - Any 3D software

from google.colab import files

glb_file = f"{output_dir}/point_cloud.glb"
if os.path.exists(glb_file):
    files.download(glb_file)
    print("\nüéâ Download started!")
    print("\nüëâ View your model: https://gltf-viewer.donmccurdy.com/")
else:
    print("‚ùå GLB file not found. Run the previous cell first.")

---

## üìä 5. Visualize Results

In [None]:
#@title üåä **View All Depth Maps** { display-mode: "form" }

n_images = len(result.depth)
cols = min(4, n_images)
rows = (n_images + cols - 1) // cols

fig, axes = plt.subplots(rows, cols, figsize=(4*cols, 4*rows))
axes = np.array(axes).flatten() if n_images > 1 else [axes]

for i in range(n_images):
    depth = result.depth[i]
    axes[i].imshow(depth, cmap='Spectral_r')
    axes[i].set_title(f"Frame {i+1}", fontsize=10)
    axes[i].axis("off")

# Hide unused subplots
for i in range(n_images, len(axes)):
    axes[i].axis("off")

plt.suptitle("üåä Depth Maps", fontsize=16, fontweight='bold')
plt.tight_layout()
plt.show()

In [None]:
#@title üì∑ **View Camera Poses** { display-mode: "form" }
#@markdown Visualize estimated camera positions in 3D

from mpl_toolkits.mplot3d import Axes3D

# Extract camera positions from extrinsics
positions = []
for ext in result.extrinsics:
    # Extrinsic is world-to-camera, invert to get camera-to-world
    R = ext[:3, :3]
    t = ext[:3, 3]
    cam_pos = -R.T @ t  # Camera position in world coordinates
    positions.append(cam_pos)

positions = np.array(positions)

fig = plt.figure(figsize=(10, 8))
ax = fig.add_subplot(111, projection='3d')

# Plot camera positions
ax.scatter(positions[:, 0], positions[:, 1], positions[:, 2], 
           c=range(len(positions)), cmap='viridis', s=100, marker='o')

# Connect cameras with lines
ax.plot(positions[:, 0], positions[:, 1], positions[:, 2], 
        'b-', alpha=0.5, linewidth=1)

# Mark first and last
ax.scatter(*positions[0], c='green', s=200, marker='^', label='First')
ax.scatter(*positions[-1], c='red', s=200, marker='v', label='Last')

ax.set_xlabel('X')
ax.set_ylabel('Y')
ax.set_zlabel('Z')
ax.set_title('üì∑ Camera Trajectory', fontsize=14, fontweight='bold')
ax.legend()

plt.tight_layout()
plt.show()

print(f"üìç {len(positions)} camera poses estimated")

---

## üé¨ 6. Process Video

In [None]:
#@title üé¨ **Upload Video** { display-mode: "form" }
#@markdown Upload a short video (< 30 seconds recommended)

fps_extract = 2 #@param {type:"slider", min:1, max:10, step:1}
#@markdown ‚Üë Frames per second to extract (lower = faster, higher = more detail)

from google.colab import files
import subprocess

print("üì§ Select a video file...")
uploaded = files.upload()

video_file = list(uploaded.keys())[0]
frames_dir = "video_frames"

# Extract frames
if os.path.exists(frames_dir):
    shutil.rmtree(frames_dir)
os.makedirs(frames_dir, exist_ok=True)

print(f"üéûÔ∏è Extracting frames at {fps_extract} FPS...")
subprocess.run([
    "ffmpeg", "-i", video_file, 
    "-vf", f"fps={fps_extract}",
    f"{frames_dir}/frame_%04d.jpg",
    "-hide_banner", "-loglevel", "error"
])

video_images = sorted([f"{frames_dir}/{f}" for f in os.listdir(frames_dir)])
print(f"‚úÖ Extracted {len(video_images)} frames")

# Preview
n_preview = min(8, len(video_images))
fig, axes = plt.subplots(1, n_preview, figsize=(2*n_preview, 2))
step = max(1, len(video_images) // n_preview)
for i, ax in enumerate(axes):
    idx = i * step
    if idx < len(video_images):
        ax.imshow(Image.open(video_images[idx]))
    ax.axis("off")
plt.suptitle(f"üé¨ Video Frames ({len(video_images)} total)", fontsize=12)
plt.tight_layout()
plt.show()

In [None]:
#@title ‚ö° **Process Video Frames** { display-mode: "form" }

print(f"üîÑ Processing {len(video_images)} frames...")
start = time.time()

result_video = model.inference(
    video_images,
    process_res_method="upper_bound_resize",
)

elapsed = time.time() - start
print(f"‚úÖ Done in {elapsed:.1f}s ({len(video_images)/elapsed:.1f} FPS)")

# Export
video_output = "video_3d"
os.makedirs(video_output, exist_ok=True)

export_to_glb(
    result_video,
    export_dir=video_output,
    show_cameras=True,
    conf_thresh_percentile=15,
    num_max_points=1_000_000,
)

print(f"\nüì¶ 3D model saved!")
!ls -lh {video_output}/

In [None]:
#@title üì• **Download Video 3D Model** { display-mode: "form" }

glb_file = f"{video_output}/point_cloud.glb"
if os.path.exists(glb_file):
    files.download(glb_file)
    print("üéâ Download started!")
else:
    print("‚ùå Run the previous cell first.")

---

## üîß 7. Advanced: Python API

In [None]:
#@title üíª **API Reference** { display-mode: "form" }
#@markdown Quick code snippets for common tasks

from IPython.display import Markdown

api_docs = '''
### Basic Usage

```python
from depth_anything_3.api import DepthAnything3

# Load model
model = DepthAnything3.from_pretrained("depth-anything/DA3-LARGE")
model = model.to("cuda").eval()

# Single image
result = model.inference(["image.jpg"])
depth = result.depth[0]  # Shape: (H, W)

# Multiple images
result = model.inference(["img1.jpg", "img2.jpg", "img3.jpg"])
depths = result.depth  # Shape: (N, H, W)
```

### Output Attributes

| Attribute | Shape | Description |
|-----------|-------|-------------|
| `depth` | `(N, H, W)` | Metric depth in meters |
| `conf` | `(N, H, W)` | Confidence [0-1] |
| `extrinsics` | `(N, 3, 4)` | Camera poses (world-to-cam) |
| `intrinsics` | `(N, 3, 3)` | Camera K matrix |
| `processed_images` | `(N, H, W, 3)` | Resized inputs (uint8) |

### Export to 3D

```python
from depth_anything_3.utils.export.glb import export_to_glb

export_to_glb(
    result,
    export_dir="output",
    show_cameras=True,          # Show camera frustums
    conf_thresh_percentile=20,  # Filter low confidence
    num_max_points=500_000,     # Max points in cloud
)
```

### CLI Usage

```bash
# Single image
da3 infer image.jpg -o output/

# Directory of images
da3 infer images/ -o output/ --model DA3-LARGE

# Video
da3 infer video.mp4 -o output/ --fps 2
```
'''

display(Markdown(api_docs))

---

## üíæ 8. Save to Google Drive

In [None]:
#@title üíæ **Mount Google Drive** { display-mode: "form" }

from google.colab import drive
drive.mount('/content/drive')

drive_output = "/content/drive/MyDrive/DepthAnything3_Results"
os.makedirs(drive_output, exist_ok=True)
print(f"‚úÖ Drive mounted at: {drive_output}")

In [None]:
#@title üíæ **Save Results to Drive** { display-mode: "form" }

import shutil
from datetime import datetime

# Create timestamped folder
timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
save_dir = f"{drive_output}/{timestamp}"
os.makedirs(save_dir, exist_ok=True)

# Copy all outputs
for folder in ["output_3d", "video_3d"]:
    if os.path.exists(folder):
        for f in os.listdir(folder):
            shutil.copy(f"{folder}/{f}", save_dir)
            print(f"  ‚úì {f}")

print(f"\n‚úÖ Saved to: {save_dir}")

---

## üôè Credits & Links

<div align="center">

**Depth Anything 3** by ByteDance Research

[üìÑ Paper](https://arxiv.org/abs/2511.10647) ‚Ä¢ [üåê Project](https://depth-anything-3.github.io) ‚Ä¢ [ü§ó Models](https://huggingface.co/collections/depth-anything/depth-anything-3)

---

**awesome-depth-anything-3** ‚Äî Optimized fork with batching, caching & CLI

[‚≠ê GitHub](https://github.com/Aedelon/awesome-depth-anything-3) ‚Ä¢ [üì¶ PyPI](https://pypi.org/project/awesome-depth-anything-3/)

---

Made with ‚ù§Ô∏è by [Delanoe Pirard](https://github.com/Aedelon)

</div>