--- license: mit library_name: diffusers pipeline_tag: text-to-image language: - en tags: - diffusers - matfuse - pbr - material-generation - svbrdf - text-to-image --- # MatFuse — Controllable Material Generation with Diffusion Models MatFuse generates tileable PBR material maps (diffuse, normal, roughness, specular) from text, reference images, sketches, and/or color palettes. > **Paper:** [MatFuse: Controllable Material Generation with Diffusion Models](https://arxiv.org/abs/2308.11408) — CVPR 2024 > **Project page:** ## Quick Start ```python import torch from diffusers import DiffusionPipeline pipe = DiffusionPipeline.from_pretrained( "gvecchio/MatFuse", trust_remote_code=True, torch_dtype=torch.float16, ) pipe = pipe.to("cuda") result = pipe( text="red brick wall", num_inference_steps=50, guidance_scale=4.0, generator=torch.Generator("cuda").manual_seed(42), ) result["diffuse"][0].save("diffuse.png") result["normal"][0].save("normal.png") result["roughness"][0].save("roughness.png") result["specular"][0].save("specular.png") ``` ## Conditioning Inputs All conditions are **optional** and freely composable: | Input | Type | Description | |-------|------|-------------| | `text` | `str` | Text description of the material | | `image` | `PIL.Image` | Reference image for style/appearance | | `sketch` | `PIL.Image` (grayscale) | Binary edge map for structure | | `palette` | `list[tuple]` | Up to 5 RGB colour tuples (0–255) | ```python from PIL import Image result = pipe( image=Image.open("reference.png"), text="rough stone texture", palette=[(120, 80, 60), (90, 60, 40), (150, 110, 80), (70, 50, 30), (180, 140, 100)], num_inference_steps=50, guidance_scale=4.0, ) ``` ## Architecture | Component | Class | Key parameters | |-----------|-------|----------------| | **UNet** | `UNet2DConditionModel` | in=16, out=12, blocks=[256,512,1024], cross_attn=512 | | **VAE** | `MatFuseVQModel` (custom) | 4 encoders + 4 VQ codebooks (4096×3), shared decoder, f=8 | | **Scheduler** | `DDIMScheduler` | β 0.0015–0.0195, scaled_linear, ε-prediction | | **Conditioning** | `MultiConditionEncoder` (custom) | CLIP ViT-B/16 · sentence-transformers · palette MLP · sketch CNN | ## 📜 Citation ```bibtex @inproceedings{vecchio2024matfuse, author = {Vecchio, Giuseppe and Sortino, Renato and Palazzo, Simone and Spampinato, Concetto}, title = {MatFuse: Controllable Material Generation with Diffusion Models}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2024}, pages = {4429-4438} } ``` ## License This project is licensed under the MIT License.