LAM3C

LAM3C is a self-supervised learning method trained on video-generated point clouds reconstructed from unlabeled indoor walkthrough videos. This repository provides pretrained Point Transformerv3 (PTv3) backbones for feature extraction and downstream 3D scene understanding.

  • LAM3C is not a raw-video model. The released checkpoints take point clouds as input, not videos.
  • The expected per-point input is 9D: XYZ + RGB + normals.
  • The backbone checkpoints are feature extractors. They do not include a task-specific segmentation head unless explicitly stated.

arXiv: 3D sans 3D Scans: Scalable Pre-training from Video-Generated Point Clouds (CVPR 2026)
Github: ryosuke-yamada/lam3c

What makes LAM3C different?

Most 3D self-supervised learning methods rely on real 3D scans, which are expensive to collect at scale. LAM3C instead learns from RoomTours, a large collection of point clouds reconstructed from unlabeled room-tour videos gathered from the web.

The method combines:

  • RoomTours, a scalable VGPC pre-training dataset
  • Point Transformer v3 backbones
  • a noise-robust self-supervised objective with:
    • Laplacian smoothing loss
    • noise consistency loss

Available checkpoints

Checkpoint Backbone Params Training data Intended use
lam3c_roomtours49k_ptv3-base PTv3-Base 121M RoomTours-49k LAM3C pre-trained model
lam3c_roomtours49k_ptv3-large PTv3-Large 224M RoomTours-49k LAM3C pre-trained model
lam3c_ptv3-base_roomtours49k_probe-head_scannet PTv3-Base (linear probing head) - ScanNet LAM3C pre-trained model trained on ScanNet linear probing
lam3c_ptv3-large_roomtours49k_probe-head_scannet PTv3-Large (linear probing head) - ScanNet LAM3C pre-trained model trained on ScanNet linear probing

Quickstart

Load a pretrained backbone

import torch
from lam3c.model import PointTransformerV3

device = "cuda" if torch.cuda.is_available() else "cpu"

model = PointTransformerV3.from_pretrained("aist-cvrt/lam3c").to(device)
model.eval()

Extract point features

point = transform(point)  # prepares xyz / rgb / normals
point = model(point)
features = point.feat

Use the backbone features for linear probing, full fine-tuning, or as initialization for segmentation heads. The exact preprocessing pipeline and checkpoint-specific loading utilities are provided in the code repository.

Intended uses

LAM3C is intended for:

  • self-supervised point cloud feature extraction
  • initialization for 3D semantic segmentation
  • initialization for 3D instance segmentation
  • representation learning research on indoor point clouds

Training data

The backbone checkpoints are pretrained on RoomTours, a dataset of 49,219 indoor scenes reconstructed from unlabeled indoor walkthrough videos. The paper reports that the authors' independently collected portion of RoomTours contains 3,462 videos from 19 countries, producing 15,921 indoor sequences after CLIP-based filtering and scene splitting.

RoomTours scenes are reconstructed with an off-the-shelf feed-forward 3D reconstruction model and then aligned in Z-up orientation and scale. The resulting point clouds use 9D input features: coordinates, colors, and normals.

The released LAM3C checkpoints do not use real 3D scans as pre-training inputs. However, the paper explicitly notes that the reconstruction model used to create RoomTours was itself trained with real point clouds.

Model details

  • Architecture: Point Transformer v3 backbone
  • Learning paradigm: self-supervised teacher-student clustering
  • Noise robustness: Laplacian smoothing + noise consistency
  • Default pre-training setup in the paper: 8 NVIDIA H200 GPUs, total batch size 16, AdamW, OneCycleLR, 145,600 iterations for the default PTv3-Base setting

Evaluation

Selected results from the paper are shown below.

Semantic segmentation

Checkpoint ScanNet (LP) ScanNet (Full-FT) ScanNet200 (LP) ScanNet200 (Full-FT) ScanNet++ Val (LP) ScanNet++ Val (Full-FT) S3DIS Area 5 (LP) S3DIS Area 5 (Full-FT)
lam3c_roomtours49k_ptv3-base 66.0 75.1 25.3 35.1 34.2 43.1 65.7 72.9
lam3c_roomtours49k_ptv3-large* 69.5 79.5 28.1 35.5 35.9 43.1 69.5 75.5

Instance segmentation

Checkpoint ScanNet (LP) ScanNet (Full-FT) ScanNet200 (LP) ScanNet200 (Full-FT) ScanNet++ Val (LP) ScanNet++ Val (Full-FT) S3DIS Area 5 (LP) S3DIS Area 5 (Full-FT)
lam3c_roomtours49k_ptv3-base 25.1 39.7 8.3 19.6 11.3 20.5 21.6 45.7
lam3c_roomtours49k_ptv3-large* 28.6 41.7 9.5 21.9 12.1 21.1 27.8 47.2

* In the paper, the PTv3-Large variant uses 434k pre-training steps.

For full benchmark details and experimental settings, please refer to the paper.

Limitations

  • RoomTours is built from indoor walkthrough videos, including real-estate tours and apartment viewings. Performance may not transfer directly to cluttered, industrial, outdoor, or LiDAR-native domains.
  • Video-generated point clouds are noisy and incomplete. The paper shows examples with blurred object boundaries, doubled walls/floors, and weaker global geometric consistency than models pretrained on accurate real scans.
  • In very small-data downstream regimes, the paper reports drops attributable to the domain gap between real scans and video-generated point clouds.
  • Although the released checkpoints are pretrained without real 3D scans as inputs, the RoomTours reconstruction pipeline depends on a reconstruction model trained with real point clouds.

Citation

If you use LAM3C in your research, please cite:

@inproceedings{yamada2026lam3c,
  title={3D sans 3D Scans: Scalable Pre-training from Video-Generated Point Clouds},
  author={Yamada, Ryousuke and Ide, Kohsuke and Fukuhara, Yoshihiro and Kataoka, Hirokatsu and Puy, Gilles and Bursuc, Andrei and Asano, Yuki M.},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  year={2026}
}

License

The model weights are released under CC BY-NC 4.0. This license allows non-commercial reuse subject to attribution. Please review the license terms before using the model in products or services.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Paper for aist-cvrt/lam3c-roomtours