COP-GEN Variational Autoencoders
This repository contains the suite of modality-specific KL-regularised VAEs used in COP-GEN. Each VAE encodes a distinct Copernicus modality (or band group) into a shared latent space at 8 latent channels. These are prerequisites for both COP-GEN inference and training — the diffusion backbone operates on the latents produced by these encoders.
Model Details
- Developed by: Miguel Espinosa, Eva Gmelich Meijling, Valerio Marsocci, Elliot J. Crowley, Mikolaj Czerkawski
- Model type: KL-regularised Variational Autoencoder (per-modality)
- License: CC-BY-4.0
- Paper: arXiv:2603.03239
- Repository: github.com/miquel-espinosa/COP-GEN
Included VAEs
| Modality | Bands | Input Resolution | Latent Channels |
|---|---|---|---|
| DEM | DEM | 64×64 | 8 |
| LULC | LULC | 192×192 | 8 |
| S1RTC | VV, VH | 192×192 | 8 |
| S2L1C | B02, B03, B04, B08 | 192×192 | 8 |
| S2L1C | B05, B06, B07, B8A, B11, B12 | 96×96 | 8 |
| S2L1C | B01, B09, B10 | 32×32 | 8 |
| S2L1C | Cloud mask | 192×192 | 8 |
| S2L2A | B02, B03, B04, B08 | 192×192 | 8 |
| S2L2A | B05, B06, B07, B8A, B11, B12 | 96×96 | 8 |
| S2L2A | B01, B09 | 32×32 | 8 |
How to Get Started
Download all VAEs into the expected directory:
git clone https://huggingface.co/mespinosami/copgen-vaes ./models/vae
rm -rf ./models/vae/.git ./models/vae/.gitattributes
Each VAE checkpoint is stored as model-<step>-ema.pt alongside its config file. The EMA weights have already been extracted into the correct format for inference. To use a VAE directly:
from libs.vae import load_vae
vae = load_vae(
config_path="configs/vae/final/S2L2A/copgen_ae_kl_192x192_S2L2A_B4_3_2_8_latent_8.yaml",
checkpoint_path="models/vae/S2L2A_192x192_B4_3_2_8_latent_8/model-50-ema.pt"
)
latents = vae.encode(image_tensor) # (B, 8, H/f, W/f)
recon = vae.decode(latents)
See the GitHub README for full encoding instructions for each modality.
Training Details
Each VAE is trained independently on its respective modality. Inputs are normalised to [-1, 1] using precomputed per-modality min-max statistics (included in the config files). Sentinel-2 data uses a fixed scale factor of 1/1000. Training uses the accelerate launcher and supports single- and multi-GPU setups.
# Example: train the S2L2A RGB+NIR VAE
accelerate launch --num_processes 1 train_vae.py \
--cfg configs/vae/final/S2L2A/copgen_ae_kl_192x192_S2L2A_B4_3_2_8_latent_8.yaml \
--data_dir ./data/majorTOM/edinburgh/Core-S2L2A
After training, extract the EMA weights before use with COP-GEN:
python3 scripts/extract_ema_convert_model.py models/vae/<modality>/<config>/model-*.pt
Relationship to COP-GEN
These VAEs are used in two ways:
- Offline encoding — the full training dataset is encoded into LMDB latent stores, which the COP-GEN diffusion backbone then trains on.
- Inference decoding — at generation time, COP-GEN produces latents that each VAE decodes back into pixel-space imagery.
The COP-GEN diffusion model is available at mespinosami/copgen-base.
Citation
@article{copgen2026,
title = {COP-GEN: Latent Diffusion Transformer for Copernicus Earth
Observation Data},
author = {Espinosa, Miguel and Gmelich Meijling, Eva and Marsocci,
Valerio and Crowley, Elliot J. and Czerkawski, Mikolaj},
year = {2026},
journal = {arXiv preprint arXiv:2603.03239},
url = {https://arxiv.org/abs/2603.03239},
}

