COP-GEN Variational Autoencoders

This repository contains the suite of modality-specific KL-regularised VAEs used in COP-GEN. Each VAE encodes a distinct Copernicus modality (or band group) into a shared latent space at 8 latent channels. These are prerequisites for both COP-GEN inference and training — the diffusion backbone operates on the latents produced by these encoders.

Model Details

Developed by: Miguel Espinosa, Eva Gmelich Meijling, Valerio Marsocci, Elliot J. Crowley, Mikolaj Czerkawski
Model type: KL-regularised Variational Autoencoder (per-modality)
License: CC-BY-4.0
Paper: arXiv:2603.03239
Repository: github.com/miquel-espinosa/COP-GEN

Included VAEs

Modality	Bands	Input Resolution	Latent Channels
DEM	DEM	64×64	8
LULC	LULC	192×192	8
S1RTC	VV, VH	192×192	8
S2L1C	B02, B03, B04, B08	192×192	8
S2L1C	B05, B06, B07, B8A, B11, B12	96×96	8
S2L1C	B01, B09, B10	32×32	8
S2L1C	Cloud mask	192×192	8
S2L2A	B02, B03, B04, B08	192×192	8
S2L2A	B05, B06, B07, B8A, B11, B12	96×96	8
S2L2A	B01, B09	32×32	8

How to Get Started

Download all VAEs into the expected directory:

git clone https://huggingface.co/mespinosami/copgen-vaes ./models/vae
rm -rf ./models/vae/.git ./models/vae/.gitattributes

Each VAE checkpoint is stored as model-<step>-ema.pt alongside its config file. The EMA weights have already been extracted into the correct format for inference. To use a VAE directly:

from libs.vae import load_vae

vae = load_vae(
    config_path="configs/vae/final/S2L2A/copgen_ae_kl_192x192_S2L2A_B4_3_2_8_latent_8.yaml",
    checkpoint_path="models/vae/S2L2A_192x192_B4_3_2_8_latent_8/model-50-ema.pt"
)

latents = vae.encode(image_tensor)   # (B, 8, H/f, W/f)
recon   = vae.decode(latents)

See the GitHub README for full encoding instructions for each modality.

Training Details

Each VAE is trained independently on its respective modality. Inputs are normalised to [-1, 1] using precomputed per-modality min-max statistics (included in the config files). Sentinel-2 data uses a fixed scale factor of 1/1000. Training uses the accelerate launcher and supports single- and multi-GPU setups.

# Example: train the S2L2A RGB+NIR VAE
accelerate launch --num_processes 1 train_vae.py \
    --cfg configs/vae/final/S2L2A/copgen_ae_kl_192x192_S2L2A_B4_3_2_8_latent_8.yaml \
    --data_dir ./data/majorTOM/edinburgh/Core-S2L2A

After training, extract the EMA weights before use with COP-GEN:

python3 scripts/extract_ema_convert_model.py models/vae/<modality>/<config>/model-*.pt

Relationship to COP-GEN

These VAEs are used in two ways:

Offline encoding — the full training dataset is encoded into LMDB latent stores, which the COP-GEN diffusion backbone then trains on.
Inference decoding — at generation time, COP-GEN produces latents that each VAE decodes back into pixel-space imagery.

The COP-GEN diffusion model is available at mespinosami/copgen-base.

Citation

@article{copgen2026,
    title   = {COP-GEN: Latent Diffusion Transformer for Copernicus Earth
               Observation Data},
    author  = {Espinosa, Miguel and Gmelich Meijling, Eva and Marsocci,
               Valerio and Crowley, Elliot J. and Czerkawski, Mikolaj},
    year    = {2026},
    journal = {arXiv preprint arXiv:2603.03239},
    url     = {https://arxiv.org/abs/2603.03239},
}

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train mespinosami/copgen-vaes

Collection including mespinosami/copgen-vaes

COPGEN

Collection

"COP-GEN: Latent Diffusion Transformer for Copernicus Earth Observation Data – Generation Stochastic by Design" (Model checkpoints, assets, datasets) • 6 items • Updated about 15 hours ago • 2

Paper for mespinosami/copgen-vaes

COP-GEN: Latent Diffusion Transformer for Copernicus Earth Observation Data -- Generation Stochastic by Design

Paper • 2603.03239 • Published Mar 3 • 1