Upsample Anything: A Simple and Hard to Beat Baseline for Feature Upsampling
Abstract
Upsample Anything is a lightweight test-time optimization framework that enhances low-resolution features to high-resolution outputs without training, using an anisotropic Gaussian kernel for precise reconstruction in tasks like semantic segmentation and depth estimation.
We present Upsample Anything, a lightweight test-time optimization (TTO) framework that restores low-resolution features to high-resolution, pixel-wise outputs without any training. Although Vision Foundation Models demonstrate strong generalization across diverse downstream tasks, their representations are typically downsampled by 14x/16x (e.g., ViT), which limits their direct use in pixel-level applications. Existing feature upsampling approaches depend on dataset-specific retraining or heavy implicit optimization, restricting scalability and generalization. Upsample Anything addresses these issues through a simple per-image optimization that learns an anisotropic Gaussian kernel combining spatial and range cues, effectively bridging Gaussian Splatting and Joint Bilateral Upsampling. The learned kernel acts as a universal, edge-aware operator that transfers seamlessly across architectures and modalities, enabling precise high-resolution reconstruction of features, depth, or probability maps. It runs in only approx0.419 s per 224x224 image and achieves state-of-the-art performance on semantic segmentation, depth estimation, and both depth and probability map upsampling. Project page: https://seominseok0429.github.io/Upsample-Anything/{https://seominseok0429.github.io/Upsample-Anything/}
Community
- This paper presents a remarkably simple yet highly effective test-time optimization framework for feature upsampling.
- The method is fully training-free and generalizes seamlessly across domains, tasks, and backbone architectures.
- Its per-pixel anisotropic Gaussian formulation offers strong edge preservation and superior spatial fidelity compared to prior works.
- The approach is computationally lightweight, scalable to high resolutions, and consistently achieves state-of-the-art performance.
- Overall, this work provides a robust and universal upsampler that elegantly bridges JBU and Gaussian Splatting.
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- NAF: Zero-Shot Feature Upsampling via Neighborhood Attention Filtering (2025)
- AnyUp: Universal Feature Upsampling (2025)
- Gen-LangSplat: Generalized Language Gaussian Splatting with Pre-Trained Feature Compression (2025)
- MSLoRA: Multi-Scale Low-Rank Adaptation via Attention Reweighting (2025)
- One Small Step in Latent, One Giant Leap for Pixels: Fast Latent Upscale Adapter for Your Diffusion Models (2025)
- Another BRIXEL in the Wall: Towards Cheaper Dense Features (2025)
- Learned Adaptive Kernels for High-Fidelity Image Downscaling (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper