arxiv:2511.16301

Upsample Anything: A Simple and Hard to Beat Baseline for Feature Upsampling

Published on Nov 20

· Submitted by

Authors:

Abstract

Upsample Anything is a lightweight test-time optimization framework that enhances low-resolution features to high-resolution outputs without training, using an anisotropic Gaussian kernel for precise reconstruction in tasks like semantic segmentation and depth estimation.

AI-generated summary

We present Upsample Anything, a lightweight test-time optimization (TTO) framework that restores low-resolution features to high-resolution, pixel-wise outputs without any training. Although Vision Foundation Models demonstrate strong generalization across diverse downstream tasks, their representations are typically downsampled by 14x/16x (e.g., ViT), which limits their direct use in pixel-level applications. Existing feature upsampling approaches depend on dataset-specific retraining or heavy implicit optimization, restricting scalability and generalization. Upsample Anything addresses these issues through a simple per-image optimization that learns an anisotropic Gaussian kernel combining spatial and range cues, effectively bridging Gaussian Splatting and Joint Bilateral Upsampling. The learned kernel acts as a universal, edge-aware operator that transfers seamlessly across architectures and modalities, enabling precise high-resolution reconstruction of features, depth, or probability maps. It runs in only approx0.419 s per 224x224 image and achieves state-of-the-art performance on semantic segmentation, depth estimation, and both depth and probability map upsampling. Project page: https://seominseok0429.github.io/Upsample-Anything/{https://seominseok0429.github.io/Upsample-Anything/}

View arXiv page View PDF Project page GitHub 157 Add to collection

Community

minseok96

Paper submitter 12 days ago

This paper presents a remarkably simple yet highly effective test-time optimization framework for feature upsampling.
The method is fully training-free and generalizes seamlessly across domains, tasks, and backbone architectures.
Its per-pixel anisotropic Gaussian formulation offers strong edge preservation and superior spatial fidelity compared to prior works.
The approach is computationally lightweight, scalable to high resolutions, and consistently achieves state-of-the-art performance.
Overall, this work provides a robust and universal upsampler that elegantly bridges JBU and Gaussian Splatting.

librarian-bot

10 days ago

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2511.16301 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2511.16301 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2511.16301 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.