Initial commit.

Browse files

Files changed (6) hide show

.gitattributes +37 -0
Gigi_2_448.png +3 -0
Gigi_2_448.png_uplift_dinov2-s14-4-PCA.png +3 -0
Gigi_2_448.png_uplift_dinov2-s14-base-feature-PCA.png +0 -0
README.md +130 -0
uplift_dinov2-s14.safetensors +3 -0

.gitattributes ADDED Viewed

	@@ -0,0 +1,37 @@

+*.7z filter=lfs diff=lfs merge=lfs -text
+*.arrow filter=lfs diff=lfs merge=lfs -text
+*.bin filter=lfs diff=lfs merge=lfs -text
+*.bz2 filter=lfs diff=lfs merge=lfs -text
+*.ckpt filter=lfs diff=lfs merge=lfs -text
+*.ftz filter=lfs diff=lfs merge=lfs -text
+*.gz filter=lfs diff=lfs merge=lfs -text
+*.h5 filter=lfs diff=lfs merge=lfs -text
+*.joblib filter=lfs diff=lfs merge=lfs -text
+*.lfs.* filter=lfs diff=lfs merge=lfs -text
+*.mlmodel filter=lfs diff=lfs merge=lfs -text
+*.model filter=lfs diff=lfs merge=lfs -text
+*.msgpack filter=lfs diff=lfs merge=lfs -text
+*.npy filter=lfs diff=lfs merge=lfs -text
+*.npz filter=lfs diff=lfs merge=lfs -text
+*.onnx filter=lfs diff=lfs merge=lfs -text
+*.ot filter=lfs diff=lfs merge=lfs -text
+*.parquet filter=lfs diff=lfs merge=lfs -text
+*.pb filter=lfs diff=lfs merge=lfs -text
+*.pickle filter=lfs diff=lfs merge=lfs -text
+*.pkl filter=lfs diff=lfs merge=lfs -text
+*.pt filter=lfs diff=lfs merge=lfs -text
+*.pth filter=lfs diff=lfs merge=lfs -text
+*.rar filter=lfs diff=lfs merge=lfs -text
+*.safetensors filter=lfs diff=lfs merge=lfs -text
+saved_model/**/* filter=lfs diff=lfs merge=lfs -text
+*.tar.* filter=lfs diff=lfs merge=lfs -text
+*.tar filter=lfs diff=lfs merge=lfs -text
+*.tflite filter=lfs diff=lfs merge=lfs -text
+*.tgz filter=lfs diff=lfs merge=lfs -text
+*.wasm filter=lfs diff=lfs merge=lfs -text
+*.xz filter=lfs diff=lfs merge=lfs -text
+*.zip filter=lfs diff=lfs merge=lfs -text
+*.zst filter=lfs diff=lfs merge=lfs -text
+*tfevents* filter=lfs diff=lfs merge=lfs -text
+Gigi_2_448.png filter=lfs diff=lfs merge=lfs -text
+Gigi_2_448.png_uplift_dinov2-s14-4-PCA.png filter=lfs diff=lfs merge=lfs -text

Gigi_2_448.png ADDED Viewed

Git LFS Details

SHA256: 6e65f5c45baa89361fc61ed2656e2cbd16ebb6dc49c14c8892e57c62546494e0
Pointer size: 131 Bytes
Size of remote file: 341 kB

Gigi_2_448.png_uplift_dinov2-s14-4-PCA.png ADDED Viewed

Git LFS Details

SHA256: f03e5015988d6b9a65180448f45d8f8a2185f46a51cbf9182c95b16963421071
Pointer size: 131 Bytes
Size of remote file: 157 kB

Gigi_2_448.png_uplift_dinov2-s14-base-feature-PCA.png ADDED Viewed

README.md ADDED Viewed

	@@ -0,0 +1,130 @@

+---
+license: mit
+library_name: pytorch
+tags:
+  - feature-upsampling
+  - pixel-dense-features
+  - computer-vision
+  - dinov2
+  - vision-transformer
+  - uplift
+datasets:
+  - ILSVRC/imagenet-1k
+---
+# UPLiFT for DINOv2-S/14
+| Input Image | Base DINOv2 Features | UPLiFT Upsampled Features |
+|:-----------:|:--------------------:|:-------------------------:|
+| ![Input](Gigi_2_448.png) | ![Base Features](Gigi_2_448.png_uplift_dinov2-s14-base-feature-PCA.png) | ![UPLiFT Features](Gigi_2_448.png_uplift_dinov2-s14-4-PCA.png) |
+This is the official pretrained **UPLiFT** (Efficient Pixel-Dense Feature Upsampling with Local Attenders) model for the **DINOv2-S/14** backbone.
+UPLiFT is a lightweight method to upscale features from pretrained vision backbones to create pixel-dense feature maps. It uses Local Attenders to efficiently upsample low-resolution backbone features while preserving semantic information.
+## Model Details
+| Property | Value |
+|----------|-------|
+| **Backbone** | DINOv2-S/14 (`vit_small_patch14_dinov2.lvd142m`) |
+| **Backbone Channels** | 384 |
+| **Patch Size** | 14 |
+| **Upsampling Factor** | 2x per iteration |
+| **Local Attender Size** | N=17 |
+| **Training Dataset** | ImageNet |
+| **Training Image Size** | 448x448 |
+| **License** | MIT |
+## Links
+- **Paper**: [Coming Soon]
+- **GitHub**: [https://github.com/mwalmer-umd/UPLiFT](https://github.com/mwalmer-umd/UPLiFT)
+- **Project Website**: [https://www.cs.umd.edu/~mwalmer/uplift/](https://www.cs.umd.edu/~mwalmer/uplift/)
+## Installation
+```bash
+pip install 'uplift[vit] @ git+https://github.com/mwalmer-umd/UPLiFT.git'
+```
+## Quick Start
+```python
+import torch
+from PIL import Image
+# Load model (weights auto-download from HuggingFace)
+model = torch.hub.load('mwalmer-umd/UPLiFT', 'uplift_dinov2_s14')
+# Run inference
+image = Image.open('your_image.jpg')
+features = model(image)  # Returns pixel-dense features
+```
+## Usage Options
+### Adjust Upsampling Iterations
+Control the number of iterative upsampling steps (default: 4):
+```python
+# Fewer iterations = lower memory usage
+model = torch.hub.load('mwalmer-umd/UPLiFT', 'uplift_dinov2_s14', iters=4)
+```
+### Raw UPLiFT Model (Without Backbone)
+Load only the UPLiFT upsampling module without the DINOv2 backbone:
+```python
+model = torch.hub.load('mwalmer-umd/UPLiFT', 'uplift_dinov2_s14',
+                       include_extractor=False)
+```
+### Return Base Features
+Get both upsampled and original backbone features:
+```python
+model = torch.hub.load('mwalmer-umd/UPLiFT', 'uplift_dinov2_s14',
+                       return_base_feat=True)
+upsampled_features, base_features = model(image)
+```
+## Architecture
+UPLiFT consists of:
+1. **Encoder**: Processes the input image with a series of convolutional blocks to create dense representations to guide feature upsampling
+2. **Decoder**: Upsamples features using transposed convolutions with bilinear residual connections
+3. **Local Attender**: A local-neighborhood-based attention pooling module that maintains semantic consistency with the original features
+The model uses encoder sharing, meaning a single encoder pass is used across all upsampling iterations for efficiency.
+## Intended Use
+This model is designed for:
+- Creating pixel-dense feature maps from DINOv2 features
+- Dense prediction tasks (semantic segmentation, depth estimation, etc.)
+- Feature visualization and analysis
+- Research on vision foundation models
+## Limitations
+- Optimized specifically for DINOv2-S/14 features; may not generalize to other backbones without retraining
+- Performance depends on the quality of the underlying DINOv2 features
+- Higher iteration counts increase computation time
+- DINOv2 uses a patch size of 14, so 14x upsampling is required to make pixel-dense features. UPLiFT with 4 iterations performs 16x upsampling, slightly over-sampling the features. If exactly pixel-dense features are required, we recommend downsampling these over-sampled features to the correct size with bilinear or bicubic interpolation.
+## Citation
+If you use UPLiFT in your research, please cite our paper.
+[citation coming soon]
+## Acknowledgements
+This work builds upon:
+- [DINOv2](https://github.com/facebookresearch/dinov2) by Meta AI
+- [timm](https://github.com/huggingface/pytorch-image-models) for model loading

uplift_dinov2-s14.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:eb6ecab99ec684c5dff3a9319f9a5811755c45b59b1ad74f513b762404467031
+size 3170708