copgen-banner-github

COP-GEN Variational Autoencoders

arXiv GitHub Website HF Collection

This repository contains the suite of modality-specific KL-regularised VAEs used in COP-GEN. Each VAE encodes a distinct Copernicus modality (or band group) into a shared latent space at 8 latent channels. These are prerequisites for both COP-GEN inference and training — the diffusion backbone operates on the latents produced by these encoders.

Model Details

  • Developed by: Miguel Espinosa, Eva Gmelich Meijling, Valerio Marsocci, Elliot J. Crowley, Mikolaj Czerkawski
  • Model type: KL-regularised Variational Autoencoder (per-modality)
  • License: CC-BY-4.0
  • Paper: arXiv:2603.03239
  • Repository: github.com/miquel-espinosa/COP-GEN

Included VAEs

Modality Bands Input Resolution Latent Channels
DEM DEM 64×64 8
LULC LULC 192×192 8
S1RTC VV, VH 192×192 8
S2L1C B02, B03, B04, B08 192×192 8
S2L1C B05, B06, B07, B8A, B11, B12 96×96 8
S2L1C B01, B09, B10 32×32 8
S2L1C Cloud mask 192×192 8
S2L2A B02, B03, B04, B08 192×192 8
S2L2A B05, B06, B07, B8A, B11, B12 96×96 8
S2L2A B01, B09 32×32 8

vaes-banner

How to Get Started

Download all VAEs into the expected directory:

git clone https://huggingface.co/mespinosami/copgen-vaes ./models/vae
rm -rf ./models/vae/.git ./models/vae/.gitattributes

Each VAE checkpoint is stored as model-<step>-ema.pt alongside its config file. The EMA weights have already been extracted into the correct format for inference. To use a VAE directly:

from libs.vae import load_vae

vae = load_vae(
    config_path="configs/vae/final/S2L2A/copgen_ae_kl_192x192_S2L2A_B4_3_2_8_latent_8.yaml",
    checkpoint_path="models/vae/S2L2A_192x192_B4_3_2_8_latent_8/model-50-ema.pt"
)

latents = vae.encode(image_tensor)   # (B, 8, H/f, W/f)
recon   = vae.decode(latents)

See the GitHub README for full encoding instructions for each modality.

Training Details

Each VAE is trained independently on its respective modality. Inputs are normalised to [-1, 1] using precomputed per-modality min-max statistics (included in the config files). Sentinel-2 data uses a fixed scale factor of 1/1000. Training uses the accelerate launcher and supports single- and multi-GPU setups.

# Example: train the S2L2A RGB+NIR VAE
accelerate launch --num_processes 1 train_vae.py \
    --cfg configs/vae/final/S2L2A/copgen_ae_kl_192x192_S2L2A_B4_3_2_8_latent_8.yaml \
    --data_dir ./data/majorTOM/edinburgh/Core-S2L2A

After training, extract the EMA weights before use with COP-GEN:

python3 scripts/extract_ema_convert_model.py models/vae/<modality>/<config>/model-*.pt

Relationship to COP-GEN

These VAEs are used in two ways:

  1. Offline encoding — the full training dataset is encoded into LMDB latent stores, which the COP-GEN diffusion backbone then trains on.
  2. Inference decoding — at generation time, COP-GEN produces latents that each VAE decodes back into pixel-space imagery.

The COP-GEN diffusion model is available at mespinosami/copgen-base.

Citation

@article{copgen2026,
    title   = {COP-GEN: Latent Diffusion Transformer for Copernicus Earth
               Observation Data},
    author  = {Espinosa, Miguel and Gmelich Meijling, Eva and Marsocci,
               Valerio and Crowley, Elliot J. and Czerkawski, Mikolaj},
    year    = {2026},
    journal = {arXiv preprint arXiv:2603.03239},
    url     = {https://arxiv.org/abs/2603.03239},
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train mespinosami/copgen-vaes

Collection including mespinosami/copgen-vaes

Paper for mespinosami/copgen-vaes