AniAggarwal commited on
Commit
fa00a21
·
verified ·
0 Parent(s):

Initial commit.

Browse files
.gitattributes ADDED
@@ -0,0 +1,37 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ *.7z filter=lfs diff=lfs merge=lfs -text
2
+ *.arrow filter=lfs diff=lfs merge=lfs -text
3
+ *.bin filter=lfs diff=lfs merge=lfs -text
4
+ *.bz2 filter=lfs diff=lfs merge=lfs -text
5
+ *.ckpt filter=lfs diff=lfs merge=lfs -text
6
+ *.ftz filter=lfs diff=lfs merge=lfs -text
7
+ *.gz filter=lfs diff=lfs merge=lfs -text
8
+ *.h5 filter=lfs diff=lfs merge=lfs -text
9
+ *.joblib filter=lfs diff=lfs merge=lfs -text
10
+ *.lfs.* filter=lfs diff=lfs merge=lfs -text
11
+ *.mlmodel filter=lfs diff=lfs merge=lfs -text
12
+ *.model filter=lfs diff=lfs merge=lfs -text
13
+ *.msgpack filter=lfs diff=lfs merge=lfs -text
14
+ *.npy filter=lfs diff=lfs merge=lfs -text
15
+ *.npz filter=lfs diff=lfs merge=lfs -text
16
+ *.onnx filter=lfs diff=lfs merge=lfs -text
17
+ *.ot filter=lfs diff=lfs merge=lfs -text
18
+ *.parquet filter=lfs diff=lfs merge=lfs -text
19
+ *.pb filter=lfs diff=lfs merge=lfs -text
20
+ *.pickle filter=lfs diff=lfs merge=lfs -text
21
+ *.pkl filter=lfs diff=lfs merge=lfs -text
22
+ *.pt filter=lfs diff=lfs merge=lfs -text
23
+ *.pth filter=lfs diff=lfs merge=lfs -text
24
+ *.rar filter=lfs diff=lfs merge=lfs -text
25
+ *.safetensors filter=lfs diff=lfs merge=lfs -text
26
+ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
27
+ *.tar.* filter=lfs diff=lfs merge=lfs -text
28
+ *.tar filter=lfs diff=lfs merge=lfs -text
29
+ *.tflite filter=lfs diff=lfs merge=lfs -text
30
+ *.tgz filter=lfs diff=lfs merge=lfs -text
31
+ *.wasm filter=lfs diff=lfs merge=lfs -text
32
+ *.xz filter=lfs diff=lfs merge=lfs -text
33
+ *.zip filter=lfs diff=lfs merge=lfs -text
34
+ *.zst filter=lfs diff=lfs merge=lfs -text
35
+ *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ Gigi_2_448.png filter=lfs diff=lfs merge=lfs -text
37
+ Gigi_2_448.png_uplift_dinov2-s14-4-PCA.png filter=lfs diff=lfs merge=lfs -text
Gigi_2_448.png ADDED

Git LFS Details

  • SHA256: 6e65f5c45baa89361fc61ed2656e2cbd16ebb6dc49c14c8892e57c62546494e0
  • Pointer size: 131 Bytes
  • Size of remote file: 341 kB
Gigi_2_448.png_uplift_dinov2-s14-4-PCA.png ADDED

Git LFS Details

  • SHA256: f03e5015988d6b9a65180448f45d8f8a2185f46a51cbf9182c95b16963421071
  • Pointer size: 131 Bytes
  • Size of remote file: 157 kB
Gigi_2_448.png_uplift_dinov2-s14-base-feature-PCA.png ADDED
README.md ADDED
@@ -0,0 +1,130 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ library_name: pytorch
4
+ tags:
5
+ - feature-upsampling
6
+ - pixel-dense-features
7
+ - computer-vision
8
+ - dinov2
9
+ - vision-transformer
10
+ - uplift
11
+ datasets:
12
+ - ILSVRC/imagenet-1k
13
+ ---
14
+
15
+ # UPLiFT for DINOv2-S/14
16
+
17
+ | Input Image | Base DINOv2 Features | UPLiFT Upsampled Features |
18
+ |:-----------:|:--------------------:|:-------------------------:|
19
+ | ![Input](Gigi_2_448.png) | ![Base Features](Gigi_2_448.png_uplift_dinov2-s14-base-feature-PCA.png) | ![UPLiFT Features](Gigi_2_448.png_uplift_dinov2-s14-4-PCA.png) |
20
+
21
+ This is the official pretrained **UPLiFT** (Efficient Pixel-Dense Feature Upsampling with Local Attenders) model for the **DINOv2-S/14** backbone.
22
+
23
+ UPLiFT is a lightweight method to upscale features from pretrained vision backbones to create pixel-dense feature maps. It uses Local Attenders to efficiently upsample low-resolution backbone features while preserving semantic information.
24
+
25
+ ## Model Details
26
+
27
+ | Property | Value |
28
+ |----------|-------|
29
+ | **Backbone** | DINOv2-S/14 (`vit_small_patch14_dinov2.lvd142m`) |
30
+ | **Backbone Channels** | 384 |
31
+ | **Patch Size** | 14 |
32
+ | **Upsampling Factor** | 2x per iteration |
33
+ | **Local Attender Size** | N=17 |
34
+ | **Training Dataset** | ImageNet |
35
+ | **Training Image Size** | 448x448 |
36
+ | **License** | MIT |
37
+
38
+ ## Links
39
+
40
+ - **Paper**: [Coming Soon]
41
+ - **GitHub**: [https://github.com/mwalmer-umd/UPLiFT](https://github.com/mwalmer-umd/UPLiFT)
42
+ - **Project Website**: [https://www.cs.umd.edu/~mwalmer/uplift/](https://www.cs.umd.edu/~mwalmer/uplift/)
43
+
44
+ ## Installation
45
+
46
+ ```bash
47
+ pip install 'uplift[vit] @ git+https://github.com/mwalmer-umd/UPLiFT.git'
48
+ ```
49
+
50
+ ## Quick Start
51
+
52
+ ```python
53
+ import torch
54
+ from PIL import Image
55
+
56
+ # Load model (weights auto-download from HuggingFace)
57
+ model = torch.hub.load('mwalmer-umd/UPLiFT', 'uplift_dinov2_s14')
58
+
59
+ # Run inference
60
+ image = Image.open('your_image.jpg')
61
+ features = model(image) # Returns pixel-dense features
62
+ ```
63
+
64
+ ## Usage Options
65
+
66
+ ### Adjust Upsampling Iterations
67
+
68
+ Control the number of iterative upsampling steps (default: 4):
69
+
70
+ ```python
71
+ # Fewer iterations = lower memory usage
72
+ model = torch.hub.load('mwalmer-umd/UPLiFT', 'uplift_dinov2_s14', iters=4)
73
+ ```
74
+
75
+ ### Raw UPLiFT Model (Without Backbone)
76
+
77
+ Load only the UPLiFT upsampling module without the DINOv2 backbone:
78
+
79
+ ```python
80
+ model = torch.hub.load('mwalmer-umd/UPLiFT', 'uplift_dinov2_s14',
81
+ include_extractor=False)
82
+ ```
83
+
84
+ ### Return Base Features
85
+
86
+ Get both upsampled and original backbone features:
87
+
88
+ ```python
89
+ model = torch.hub.load('mwalmer-umd/UPLiFT', 'uplift_dinov2_s14',
90
+ return_base_feat=True)
91
+ upsampled_features, base_features = model(image)
92
+ ```
93
+
94
+ ## Architecture
95
+
96
+ UPLiFT consists of:
97
+
98
+ 1. **Encoder**: Processes the input image with a series of convolutional blocks to create dense representations to guide feature upsampling
99
+ 2. **Decoder**: Upsamples features using transposed convolutions with bilinear residual connections
100
+ 3. **Local Attender**: A local-neighborhood-based attention pooling module that maintains semantic consistency with the original features
101
+
102
+ The model uses encoder sharing, meaning a single encoder pass is used across all upsampling iterations for efficiency.
103
+
104
+ ## Intended Use
105
+
106
+ This model is designed for:
107
+
108
+ - Creating pixel-dense feature maps from DINOv2 features
109
+ - Dense prediction tasks (semantic segmentation, depth estimation, etc.)
110
+ - Feature visualization and analysis
111
+ - Research on vision foundation models
112
+
113
+ ## Limitations
114
+
115
+ - Optimized specifically for DINOv2-S/14 features; may not generalize to other backbones without retraining
116
+ - Performance depends on the quality of the underlying DINOv2 features
117
+ - Higher iteration counts increase computation time
118
+ - DINOv2 uses a patch size of 14, so 14x upsampling is required to make pixel-dense features. UPLiFT with 4 iterations performs 16x upsampling, slightly over-sampling the features. If exactly pixel-dense features are required, we recommend downsampling these over-sampled features to the correct size with bilinear or bicubic interpolation.
119
+
120
+ ## Citation
121
+
122
+ If you use UPLiFT in your research, please cite our paper.
123
+
124
+ [citation coming soon]
125
+
126
+ ## Acknowledgements
127
+
128
+ This work builds upon:
129
+ - [DINOv2](https://github.com/facebookresearch/dinov2) by Meta AI
130
+ - [timm](https://github.com/huggingface/pytorch-image-models) for model loading
uplift_dinov2-s14.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:eb6ecab99ec684c5dff3a9319f9a5811755c45b59b1ad74f513b762404467031
3
+ size 3170708