lvjiameng commited on
Commit
fbe038c
·
verified ·
1 Parent(s): 166abe8

Upload 3 files

Browse files
huge/logs/events.out.tfevents.1740460581.zjlab-23.1144.0 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6fb1cb23ab241058b3731fbe3b9685a71949f8d38fef69cc66ef0ff4e6d58a04
3
+ size 71558
huge/model.txt ADDED
@@ -0,0 +1,540 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ MaskedAutoEncoderViT(
2
+ (patch_embed): PatchEmbed(
3
+ (proj): Conv2d(3, 1536, kernel_size=(14, 14), stride=(14, 14))
4
+ (norm): Identity()
5
+ )
6
+ (blocks): ModuleList(
7
+ (0): Block(
8
+ (norm1): LayerNorm((1536,), eps=1e-06, elementwise_affine=True)
9
+ (attn): Attention(
10
+ (qkv): Linear(in_features=1536, out_features=4608, bias=True)
11
+ (attn_drop): Dropout(p=0.0, inplace=False)
12
+ (proj): Linear(in_features=1536, out_features=1536, bias=True)
13
+ (proj_drop): Dropout(p=0.0, inplace=False)
14
+ )
15
+ (ls1): Identity()
16
+ (drop_path1): Identity()
17
+ (norm2): LayerNorm((1536,), eps=1e-06, elementwise_affine=True)
18
+ (mlp): Mlp(
19
+ (fc1): Linear(in_features=1536, out_features=6144, bias=True)
20
+ (act): GELU(approximate=none)
21
+ (drop1): Dropout(p=0.0, inplace=False)
22
+ (fc2): Linear(in_features=6144, out_features=1536, bias=True)
23
+ (drop2): Dropout(p=0.0, inplace=False)
24
+ )
25
+ (ls2): Identity()
26
+ (drop_path2): Identity()
27
+ )
28
+ (1): Block(
29
+ (norm1): LayerNorm((1536,), eps=1e-06, elementwise_affine=True)
30
+ (attn): Attention(
31
+ (qkv): Linear(in_features=1536, out_features=4608, bias=True)
32
+ (attn_drop): Dropout(p=0.0, inplace=False)
33
+ (proj): Linear(in_features=1536, out_features=1536, bias=True)
34
+ (proj_drop): Dropout(p=0.0, inplace=False)
35
+ )
36
+ (ls1): Identity()
37
+ (drop_path1): Identity()
38
+ (norm2): LayerNorm((1536,), eps=1e-06, elementwise_affine=True)
39
+ (mlp): Mlp(
40
+ (fc1): Linear(in_features=1536, out_features=6144, bias=True)
41
+ (act): GELU(approximate=none)
42
+ (drop1): Dropout(p=0.0, inplace=False)
43
+ (fc2): Linear(in_features=6144, out_features=1536, bias=True)
44
+ (drop2): Dropout(p=0.0, inplace=False)
45
+ )
46
+ (ls2): Identity()
47
+ (drop_path2): Identity()
48
+ )
49
+ (2): Block(
50
+ (norm1): LayerNorm((1536,), eps=1e-06, elementwise_affine=True)
51
+ (attn): Attention(
52
+ (qkv): Linear(in_features=1536, out_features=4608, bias=True)
53
+ (attn_drop): Dropout(p=0.0, inplace=False)
54
+ (proj): Linear(in_features=1536, out_features=1536, bias=True)
55
+ (proj_drop): Dropout(p=0.0, inplace=False)
56
+ )
57
+ (ls1): Identity()
58
+ (drop_path1): Identity()
59
+ (norm2): LayerNorm((1536,), eps=1e-06, elementwise_affine=True)
60
+ (mlp): Mlp(
61
+ (fc1): Linear(in_features=1536, out_features=6144, bias=True)
62
+ (act): GELU(approximate=none)
63
+ (drop1): Dropout(p=0.0, inplace=False)
64
+ (fc2): Linear(in_features=6144, out_features=1536, bias=True)
65
+ (drop2): Dropout(p=0.0, inplace=False)
66
+ )
67
+ (ls2): Identity()
68
+ (drop_path2): Identity()
69
+ )
70
+ (3): Block(
71
+ (norm1): LayerNorm((1536,), eps=1e-06, elementwise_affine=True)
72
+ (attn): Attention(
73
+ (qkv): Linear(in_features=1536, out_features=4608, bias=True)
74
+ (attn_drop): Dropout(p=0.0, inplace=False)
75
+ (proj): Linear(in_features=1536, out_features=1536, bias=True)
76
+ (proj_drop): Dropout(p=0.0, inplace=False)
77
+ )
78
+ (ls1): Identity()
79
+ (drop_path1): Identity()
80
+ (norm2): LayerNorm((1536,), eps=1e-06, elementwise_affine=True)
81
+ (mlp): Mlp(
82
+ (fc1): Linear(in_features=1536, out_features=6144, bias=True)
83
+ (act): GELU(approximate=none)
84
+ (drop1): Dropout(p=0.0, inplace=False)
85
+ (fc2): Linear(in_features=6144, out_features=1536, bias=True)
86
+ (drop2): Dropout(p=0.0, inplace=False)
87
+ )
88
+ (ls2): Identity()
89
+ (drop_path2): Identity()
90
+ )
91
+ (4): Block(
92
+ (norm1): LayerNorm((1536,), eps=1e-06, elementwise_affine=True)
93
+ (attn): Attention(
94
+ (qkv): Linear(in_features=1536, out_features=4608, bias=True)
95
+ (attn_drop): Dropout(p=0.0, inplace=False)
96
+ (proj): Linear(in_features=1536, out_features=1536, bias=True)
97
+ (proj_drop): Dropout(p=0.0, inplace=False)
98
+ )
99
+ (ls1): Identity()
100
+ (drop_path1): Identity()
101
+ (norm2): LayerNorm((1536,), eps=1e-06, elementwise_affine=True)
102
+ (mlp): Mlp(
103
+ (fc1): Linear(in_features=1536, out_features=6144, bias=True)
104
+ (act): GELU(approximate=none)
105
+ (drop1): Dropout(p=0.0, inplace=False)
106
+ (fc2): Linear(in_features=6144, out_features=1536, bias=True)
107
+ (drop2): Dropout(p=0.0, inplace=False)
108
+ )
109
+ (ls2): Identity()
110
+ (drop_path2): Identity()
111
+ )
112
+ (5): Block(
113
+ (norm1): LayerNorm((1536,), eps=1e-06, elementwise_affine=True)
114
+ (attn): Attention(
115
+ (qkv): Linear(in_features=1536, out_features=4608, bias=True)
116
+ (attn_drop): Dropout(p=0.0, inplace=False)
117
+ (proj): Linear(in_features=1536, out_features=1536, bias=True)
118
+ (proj_drop): Dropout(p=0.0, inplace=False)
119
+ )
120
+ (ls1): Identity()
121
+ (drop_path1): Identity()
122
+ (norm2): LayerNorm((1536,), eps=1e-06, elementwise_affine=True)
123
+ (mlp): Mlp(
124
+ (fc1): Linear(in_features=1536, out_features=6144, bias=True)
125
+ (act): GELU(approximate=none)
126
+ (drop1): Dropout(p=0.0, inplace=False)
127
+ (fc2): Linear(in_features=6144, out_features=1536, bias=True)
128
+ (drop2): Dropout(p=0.0, inplace=False)
129
+ )
130
+ (ls2): Identity()
131
+ (drop_path2): Identity()
132
+ )
133
+ (6): Block(
134
+ (norm1): LayerNorm((1536,), eps=1e-06, elementwise_affine=True)
135
+ (attn): Attention(
136
+ (qkv): Linear(in_features=1536, out_features=4608, bias=True)
137
+ (attn_drop): Dropout(p=0.0, inplace=False)
138
+ (proj): Linear(in_features=1536, out_features=1536, bias=True)
139
+ (proj_drop): Dropout(p=0.0, inplace=False)
140
+ )
141
+ (ls1): Identity()
142
+ (drop_path1): Identity()
143
+ (norm2): LayerNorm((1536,), eps=1e-06, elementwise_affine=True)
144
+ (mlp): Mlp(
145
+ (fc1): Linear(in_features=1536, out_features=6144, bias=True)
146
+ (act): GELU(approximate=none)
147
+ (drop1): Dropout(p=0.0, inplace=False)
148
+ (fc2): Linear(in_features=6144, out_features=1536, bias=True)
149
+ (drop2): Dropout(p=0.0, inplace=False)
150
+ )
151
+ (ls2): Identity()
152
+ (drop_path2): Identity()
153
+ )
154
+ (7): Block(
155
+ (norm1): LayerNorm((1536,), eps=1e-06, elementwise_affine=True)
156
+ (attn): Attention(
157
+ (qkv): Linear(in_features=1536, out_features=4608, bias=True)
158
+ (attn_drop): Dropout(p=0.0, inplace=False)
159
+ (proj): Linear(in_features=1536, out_features=1536, bias=True)
160
+ (proj_drop): Dropout(p=0.0, inplace=False)
161
+ )
162
+ (ls1): Identity()
163
+ (drop_path1): Identity()
164
+ (norm2): LayerNorm((1536,), eps=1e-06, elementwise_affine=True)
165
+ (mlp): Mlp(
166
+ (fc1): Linear(in_features=1536, out_features=6144, bias=True)
167
+ (act): GELU(approximate=none)
168
+ (drop1): Dropout(p=0.0, inplace=False)
169
+ (fc2): Linear(in_features=6144, out_features=1536, bias=True)
170
+ (drop2): Dropout(p=0.0, inplace=False)
171
+ )
172
+ (ls2): Identity()
173
+ (drop_path2): Identity()
174
+ )
175
+ (8): Block(
176
+ (norm1): LayerNorm((1536,), eps=1e-06, elementwise_affine=True)
177
+ (attn): Attention(
178
+ (qkv): Linear(in_features=1536, out_features=4608, bias=True)
179
+ (attn_drop): Dropout(p=0.0, inplace=False)
180
+ (proj): Linear(in_features=1536, out_features=1536, bias=True)
181
+ (proj_drop): Dropout(p=0.0, inplace=False)
182
+ )
183
+ (ls1): Identity()
184
+ (drop_path1): Identity()
185
+ (norm2): LayerNorm((1536,), eps=1e-06, elementwise_affine=True)
186
+ (mlp): Mlp(
187
+ (fc1): Linear(in_features=1536, out_features=6144, bias=True)
188
+ (act): GELU(approximate=none)
189
+ (drop1): Dropout(p=0.0, inplace=False)
190
+ (fc2): Linear(in_features=6144, out_features=1536, bias=True)
191
+ (drop2): Dropout(p=0.0, inplace=False)
192
+ )
193
+ (ls2): Identity()
194
+ (drop_path2): Identity()
195
+ )
196
+ (9): Block(
197
+ (norm1): LayerNorm((1536,), eps=1e-06, elementwise_affine=True)
198
+ (attn): Attention(
199
+ (qkv): Linear(in_features=1536, out_features=4608, bias=True)
200
+ (attn_drop): Dropout(p=0.0, inplace=False)
201
+ (proj): Linear(in_features=1536, out_features=1536, bias=True)
202
+ (proj_drop): Dropout(p=0.0, inplace=False)
203
+ )
204
+ (ls1): Identity()
205
+ (drop_path1): Identity()
206
+ (norm2): LayerNorm((1536,), eps=1e-06, elementwise_affine=True)
207
+ (mlp): Mlp(
208
+ (fc1): Linear(in_features=1536, out_features=6144, bias=True)
209
+ (act): GELU(approximate=none)
210
+ (drop1): Dropout(p=0.0, inplace=False)
211
+ (fc2): Linear(in_features=6144, out_features=1536, bias=True)
212
+ (drop2): Dropout(p=0.0, inplace=False)
213
+ )
214
+ (ls2): Identity()
215
+ (drop_path2): Identity()
216
+ )
217
+ (10): Block(
218
+ (norm1): LayerNorm((1536,), eps=1e-06, elementwise_affine=True)
219
+ (attn): Attention(
220
+ (qkv): Linear(in_features=1536, out_features=4608, bias=True)
221
+ (attn_drop): Dropout(p=0.0, inplace=False)
222
+ (proj): Linear(in_features=1536, out_features=1536, bias=True)
223
+ (proj_drop): Dropout(p=0.0, inplace=False)
224
+ )
225
+ (ls1): Identity()
226
+ (drop_path1): Identity()
227
+ (norm2): LayerNorm((1536,), eps=1e-06, elementwise_affine=True)
228
+ (mlp): Mlp(
229
+ (fc1): Linear(in_features=1536, out_features=6144, bias=True)
230
+ (act): GELU(approximate=none)
231
+ (drop1): Dropout(p=0.0, inplace=False)
232
+ (fc2): Linear(in_features=6144, out_features=1536, bias=True)
233
+ (drop2): Dropout(p=0.0, inplace=False)
234
+ )
235
+ (ls2): Identity()
236
+ (drop_path2): Identity()
237
+ )
238
+ (11): Block(
239
+ (norm1): LayerNorm((1536,), eps=1e-06, elementwise_affine=True)
240
+ (attn): Attention(
241
+ (qkv): Linear(in_features=1536, out_features=4608, bias=True)
242
+ (attn_drop): Dropout(p=0.0, inplace=False)
243
+ (proj): Linear(in_features=1536, out_features=1536, bias=True)
244
+ (proj_drop): Dropout(p=0.0, inplace=False)
245
+ )
246
+ (ls1): Identity()
247
+ (drop_path1): Identity()
248
+ (norm2): LayerNorm((1536,), eps=1e-06, elementwise_affine=True)
249
+ (mlp): Mlp(
250
+ (fc1): Linear(in_features=1536, out_features=6144, bias=True)
251
+ (act): GELU(approximate=none)
252
+ (drop1): Dropout(p=0.0, inplace=False)
253
+ (fc2): Linear(in_features=6144, out_features=1536, bias=True)
254
+ (drop2): Dropout(p=0.0, inplace=False)
255
+ )
256
+ (ls2): Identity()
257
+ (drop_path2): Identity()
258
+ )
259
+ (12): Block(
260
+ (norm1): LayerNorm((1536,), eps=1e-06, elementwise_affine=True)
261
+ (attn): Attention(
262
+ (qkv): Linear(in_features=1536, out_features=4608, bias=True)
263
+ (attn_drop): Dropout(p=0.0, inplace=False)
264
+ (proj): Linear(in_features=1536, out_features=1536, bias=True)
265
+ (proj_drop): Dropout(p=0.0, inplace=False)
266
+ )
267
+ (ls1): Identity()
268
+ (drop_path1): Identity()
269
+ (norm2): LayerNorm((1536,), eps=1e-06, elementwise_affine=True)
270
+ (mlp): Mlp(
271
+ (fc1): Linear(in_features=1536, out_features=6144, bias=True)
272
+ (act): GELU(approximate=none)
273
+ (drop1): Dropout(p=0.0, inplace=False)
274
+ (fc2): Linear(in_features=6144, out_features=1536, bias=True)
275
+ (drop2): Dropout(p=0.0, inplace=False)
276
+ )
277
+ (ls2): Identity()
278
+ (drop_path2): Identity()
279
+ )
280
+ (13): Block(
281
+ (norm1): LayerNorm((1536,), eps=1e-06, elementwise_affine=True)
282
+ (attn): Attention(
283
+ (qkv): Linear(in_features=1536, out_features=4608, bias=True)
284
+ (attn_drop): Dropout(p=0.0, inplace=False)
285
+ (proj): Linear(in_features=1536, out_features=1536, bias=True)
286
+ (proj_drop): Dropout(p=0.0, inplace=False)
287
+ )
288
+ (ls1): Identity()
289
+ (drop_path1): Identity()
290
+ (norm2): LayerNorm((1536,), eps=1e-06, elementwise_affine=True)
291
+ (mlp): Mlp(
292
+ (fc1): Linear(in_features=1536, out_features=6144, bias=True)
293
+ (act): GELU(approximate=none)
294
+ (drop1): Dropout(p=0.0, inplace=False)
295
+ (fc2): Linear(in_features=6144, out_features=1536, bias=True)
296
+ (drop2): Dropout(p=0.0, inplace=False)
297
+ )
298
+ (ls2): Identity()
299
+ (drop_path2): Identity()
300
+ )
301
+ (14): Block(
302
+ (norm1): LayerNorm((1536,), eps=1e-06, elementwise_affine=True)
303
+ (attn): Attention(
304
+ (qkv): Linear(in_features=1536, out_features=4608, bias=True)
305
+ (attn_drop): Dropout(p=0.0, inplace=False)
306
+ (proj): Linear(in_features=1536, out_features=1536, bias=True)
307
+ (proj_drop): Dropout(p=0.0, inplace=False)
308
+ )
309
+ (ls1): Identity()
310
+ (drop_path1): Identity()
311
+ (norm2): LayerNorm((1536,), eps=1e-06, elementwise_affine=True)
312
+ (mlp): Mlp(
313
+ (fc1): Linear(in_features=1536, out_features=6144, bias=True)
314
+ (act): GELU(approximate=none)
315
+ (drop1): Dropout(p=0.0, inplace=False)
316
+ (fc2): Linear(in_features=6144, out_features=1536, bias=True)
317
+ (drop2): Dropout(p=0.0, inplace=False)
318
+ )
319
+ (ls2): Identity()
320
+ (drop_path2): Identity()
321
+ )
322
+ (15): Block(
323
+ (norm1): LayerNorm((1536,), eps=1e-06, elementwise_affine=True)
324
+ (attn): Attention(
325
+ (qkv): Linear(in_features=1536, out_features=4608, bias=True)
326
+ (attn_drop): Dropout(p=0.0, inplace=False)
327
+ (proj): Linear(in_features=1536, out_features=1536, bias=True)
328
+ (proj_drop): Dropout(p=0.0, inplace=False)
329
+ )
330
+ (ls1): Identity()
331
+ (drop_path1): Identity()
332
+ (norm2): LayerNorm((1536,), eps=1e-06, elementwise_affine=True)
333
+ (mlp): Mlp(
334
+ (fc1): Linear(in_features=1536, out_features=6144, bias=True)
335
+ (act): GELU(approximate=none)
336
+ (drop1): Dropout(p=0.0, inplace=False)
337
+ (fc2): Linear(in_features=6144, out_features=1536, bias=True)
338
+ (drop2): Dropout(p=0.0, inplace=False)
339
+ )
340
+ (ls2): Identity()
341
+ (drop_path2): Identity()
342
+ )
343
+ (16): Block(
344
+ (norm1): LayerNorm((1536,), eps=1e-06, elementwise_affine=True)
345
+ (attn): Attention(
346
+ (qkv): Linear(in_features=1536, out_features=4608, bias=True)
347
+ (attn_drop): Dropout(p=0.0, inplace=False)
348
+ (proj): Linear(in_features=1536, out_features=1536, bias=True)
349
+ (proj_drop): Dropout(p=0.0, inplace=False)
350
+ )
351
+ (ls1): Identity()
352
+ (drop_path1): Identity()
353
+ (norm2): LayerNorm((1536,), eps=1e-06, elementwise_affine=True)
354
+ (mlp): Mlp(
355
+ (fc1): Linear(in_features=1536, out_features=6144, bias=True)
356
+ (act): GELU(approximate=none)
357
+ (drop1): Dropout(p=0.0, inplace=False)
358
+ (fc2): Linear(in_features=6144, out_features=1536, bias=True)
359
+ (drop2): Dropout(p=0.0, inplace=False)
360
+ )
361
+ (ls2): Identity()
362
+ (drop_path2): Identity()
363
+ )
364
+ (17): Block(
365
+ (norm1): LayerNorm((1536,), eps=1e-06, elementwise_affine=True)
366
+ (attn): Attention(
367
+ (qkv): Linear(in_features=1536, out_features=4608, bias=True)
368
+ (attn_drop): Dropout(p=0.0, inplace=False)
369
+ (proj): Linear(in_features=1536, out_features=1536, bias=True)
370
+ (proj_drop): Dropout(p=0.0, inplace=False)
371
+ )
372
+ (ls1): Identity()
373
+ (drop_path1): Identity()
374
+ (norm2): LayerNorm((1536,), eps=1e-06, elementwise_affine=True)
375
+ (mlp): Mlp(
376
+ (fc1): Linear(in_features=1536, out_features=6144, bias=True)
377
+ (act): GELU(approximate=none)
378
+ (drop1): Dropout(p=0.0, inplace=False)
379
+ (fc2): Linear(in_features=6144, out_features=1536, bias=True)
380
+ (drop2): Dropout(p=0.0, inplace=False)
381
+ )
382
+ (ls2): Identity()
383
+ (drop_path2): Identity()
384
+ )
385
+ (18): Block(
386
+ (norm1): LayerNorm((1536,), eps=1e-06, elementwise_affine=True)
387
+ (attn): Attention(
388
+ (qkv): Linear(in_features=1536, out_features=4608, bias=True)
389
+ (attn_drop): Dropout(p=0.0, inplace=False)
390
+ (proj): Linear(in_features=1536, out_features=1536, bias=True)
391
+ (proj_drop): Dropout(p=0.0, inplace=False)
392
+ )
393
+ (ls1): Identity()
394
+ (drop_path1): Identity()
395
+ (norm2): LayerNorm((1536,), eps=1e-06, elementwise_affine=True)
396
+ (mlp): Mlp(
397
+ (fc1): Linear(in_features=1536, out_features=6144, bias=True)
398
+ (act): GELU(approximate=none)
399
+ (drop1): Dropout(p=0.0, inplace=False)
400
+ (fc2): Linear(in_features=6144, out_features=1536, bias=True)
401
+ (drop2): Dropout(p=0.0, inplace=False)
402
+ )
403
+ (ls2): Identity()
404
+ (drop_path2): Identity()
405
+ )
406
+ (19): Block(
407
+ (norm1): LayerNorm((1536,), eps=1e-06, elementwise_affine=True)
408
+ (attn): Attention(
409
+ (qkv): Linear(in_features=1536, out_features=4608, bias=True)
410
+ (attn_drop): Dropout(p=0.0, inplace=False)
411
+ (proj): Linear(in_features=1536, out_features=1536, bias=True)
412
+ (proj_drop): Dropout(p=0.0, inplace=False)
413
+ )
414
+ (ls1): Identity()
415
+ (drop_path1): Identity()
416
+ (norm2): LayerNorm((1536,), eps=1e-06, elementwise_affine=True)
417
+ (mlp): Mlp(
418
+ (fc1): Linear(in_features=1536, out_features=6144, bias=True)
419
+ (act): GELU(approximate=none)
420
+ (drop1): Dropout(p=0.0, inplace=False)
421
+ (fc2): Linear(in_features=6144, out_features=1536, bias=True)
422
+ (drop2): Dropout(p=0.0, inplace=False)
423
+ )
424
+ (ls2): Identity()
425
+ (drop_path2): Identity()
426
+ )
427
+ (20): Block(
428
+ (norm1): LayerNorm((1536,), eps=1e-06, elementwise_affine=True)
429
+ (attn): Attention(
430
+ (qkv): Linear(in_features=1536, out_features=4608, bias=True)
431
+ (attn_drop): Dropout(p=0.0, inplace=False)
432
+ (proj): Linear(in_features=1536, out_features=1536, bias=True)
433
+ (proj_drop): Dropout(p=0.0, inplace=False)
434
+ )
435
+ (ls1): Identity()
436
+ (drop_path1): Identity()
437
+ (norm2): LayerNorm((1536,), eps=1e-06, elementwise_affine=True)
438
+ (mlp): Mlp(
439
+ (fc1): Linear(in_features=1536, out_features=6144, bias=True)
440
+ (act): GELU(approximate=none)
441
+ (drop1): Dropout(p=0.0, inplace=False)
442
+ (fc2): Linear(in_features=6144, out_features=1536, bias=True)
443
+ (drop2): Dropout(p=0.0, inplace=False)
444
+ )
445
+ (ls2): Identity()
446
+ (drop_path2): Identity()
447
+ )
448
+ (21): Block(
449
+ (norm1): LayerNorm((1536,), eps=1e-06, elementwise_affine=True)
450
+ (attn): Attention(
451
+ (qkv): Linear(in_features=1536, out_features=4608, bias=True)
452
+ (attn_drop): Dropout(p=0.0, inplace=False)
453
+ (proj): Linear(in_features=1536, out_features=1536, bias=True)
454
+ (proj_drop): Dropout(p=0.0, inplace=False)
455
+ )
456
+ (ls1): Identity()
457
+ (drop_path1): Identity()
458
+ (norm2): LayerNorm((1536,), eps=1e-06, elementwise_affine=True)
459
+ (mlp): Mlp(
460
+ (fc1): Linear(in_features=1536, out_features=6144, bias=True)
461
+ (act): GELU(approximate=none)
462
+ (drop1): Dropout(p=0.0, inplace=False)
463
+ (fc2): Linear(in_features=6144, out_features=1536, bias=True)
464
+ (drop2): Dropout(p=0.0, inplace=False)
465
+ )
466
+ (ls2): Identity()
467
+ (drop_path2): Identity()
468
+ )
469
+ (22): Block(
470
+ (norm1): LayerNorm((1536,), eps=1e-06, elementwise_affine=True)
471
+ (attn): Attention(
472
+ (qkv): Linear(in_features=1536, out_features=4608, bias=True)
473
+ (attn_drop): Dropout(p=0.0, inplace=False)
474
+ (proj): Linear(in_features=1536, out_features=1536, bias=True)
475
+ (proj_drop): Dropout(p=0.0, inplace=False)
476
+ )
477
+ (ls1): Identity()
478
+ (drop_path1): Identity()
479
+ (norm2): LayerNorm((1536,), eps=1e-06, elementwise_affine=True)
480
+ (mlp): Mlp(
481
+ (fc1): Linear(in_features=1536, out_features=6144, bias=True)
482
+ (act): GELU(approximate=none)
483
+ (drop1): Dropout(p=0.0, inplace=False)
484
+ (fc2): Linear(in_features=6144, out_features=1536, bias=True)
485
+ (drop2): Dropout(p=0.0, inplace=False)
486
+ )
487
+ (ls2): Identity()
488
+ (drop_path2): Identity()
489
+ )
490
+ (23): Block(
491
+ (norm1): LayerNorm((1536,), eps=1e-06, elementwise_affine=True)
492
+ (attn): Attention(
493
+ (qkv): Linear(in_features=1536, out_features=4608, bias=True)
494
+ (attn_drop): Dropout(p=0.0, inplace=False)
495
+ (proj): Linear(in_features=1536, out_features=1536, bias=True)
496
+ (proj_drop): Dropout(p=0.0, inplace=False)
497
+ )
498
+ (ls1): Identity()
499
+ (drop_path1): Identity()
500
+ (norm2): LayerNorm((1536,), eps=1e-06, elementwise_affine=True)
501
+ (mlp): Mlp(
502
+ (fc1): Linear(in_features=1536, out_features=6144, bias=True)
503
+ (act): GELU(approximate=none)
504
+ (drop1): Dropout(p=0.0, inplace=False)
505
+ (fc2): Linear(in_features=6144, out_features=1536, bias=True)
506
+ (drop2): Dropout(p=0.0, inplace=False)
507
+ )
508
+ (ls2): Identity()
509
+ (drop_path2): Identity()
510
+ )
511
+ )
512
+ (norm): LayerNorm((1536,), eps=1e-06, elementwise_affine=True)
513
+ (decoder_embed): Linear(in_features=1536, out_features=512, bias=True)
514
+ (decoder_blocks): ModuleList(
515
+ (0): Block(
516
+ (norm1): LayerNorm((512,), eps=1e-06, elementwise_affine=True)
517
+ (attn): Attention(
518
+ (qkv): Linear(in_features=512, out_features=1536, bias=True)
519
+ (attn_drop): Dropout(p=0.0, inplace=False)
520
+ (proj): Linear(in_features=512, out_features=512, bias=True)
521
+ (proj_drop): Dropout(p=0.0, inplace=False)
522
+ )
523
+ (ls1): Identity()
524
+ (drop_path1): Identity()
525
+ (norm2): LayerNorm((512,), eps=1e-06, elementwise_affine=True)
526
+ (mlp): Mlp(
527
+ (fc1): Linear(in_features=512, out_features=2048, bias=True)
528
+ (act): GELU(approximate=none)
529
+ (drop1): Dropout(p=0.0, inplace=False)
530
+ (fc2): Linear(in_features=2048, out_features=512, bias=True)
531
+ (drop2): Dropout(p=0.0, inplace=False)
532
+ )
533
+ (ls2): Identity()
534
+ (drop_path2): Identity()
535
+ )
536
+ )
537
+ (decoder_norm): LayerNorm((512,), eps=1e-06, elementwise_affine=True)
538
+ (decoder_pred): Linear(in_features=512, out_features=588, bias=True)
539
+ )
540
+ Namespace(batch_size=64, blr=0.0002, dataset='/mnt/wangz/dataset/astroimgnet', device='cuda', dist_backend='nccl', dist_url='env://', distributed=True, epochs=800, gpu=0, lr=0.0032, mask_ratio=0.75, min_lr=0.0, model_name='mae_vit_huge_patch14', norm_pix_loss=True, rank=0, resume='./run/huge-data1m-epochs800-mask0.75-0.01-3/weights/best/epoch_363_loss_0.7427/ckpt.pth', save_dir='./run/huge-data1m-epochs800-mask0.75-0.01-4', start_epoch=0, use_amp=True, warmup_epochs=40, weight_decay=0.05, workers=32, world_size=64)
huge/weights/best/epoch_778_loss_0.7222/ckpt.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d4931630ae9744cc3b8227ba6c29cf88ccbfb63a474a3c4374251834fa319f36
3
+ size 8223807155