---
license: mit
library_name: diffusers
pipeline_tag: text-to-image
language:
- en
tags:
- diffusers
- matfuse
- pbr
- material-generation
- svbrdf
- text-to-image
---


# MatFuse — Controllable Material Generation with Diffusion Models

MatFuse generates tileable PBR material maps (diffuse, normal, roughness,
specular) from text, reference images, sketches, and/or color palettes.

> **Paper:** [MatFuse: Controllable Material Generation with Diffusion Models](https://arxiv.org/abs/2308.11408) — CVPR 2024
> **Project page:** <https://gvecchio.com/matfuse/>

## Quick Start

```python
import torch
from diffusers import DiffusionPipeline

pipe = DiffusionPipeline.from_pretrained(
    "gvecchio/MatFuse",
    trust_remote_code=True,
    torch_dtype=torch.float16,
)
pipe = pipe.to("cuda")

result = pipe(
    text="red brick wall",
    num_inference_steps=50,
    guidance_scale=4.0,
    generator=torch.Generator("cuda").manual_seed(42),
)

result["diffuse"][0].save("diffuse.png")
result["normal"][0].save("normal.png")
result["roughness"][0].save("roughness.png")
result["specular"][0].save("specular.png")
```

## Conditioning Inputs

All conditions are **optional** and freely composable:

| Input | Type | Description |
|-------|------|-------------|
| `text` | `str` | Text description of the material |
| `image` | `PIL.Image` | Reference image for style/appearance |
| `sketch` | `PIL.Image` (grayscale) | Binary edge map for structure |
| `palette` | `list[tuple]` | Up to 5 RGB colour tuples (0–255) |

```python
from PIL import Image

result = pipe(
    image=Image.open("reference.png"),
    text="rough stone texture",
    palette=[(120, 80, 60), (90, 60, 40), (150, 110, 80), (70, 50, 30), (180, 140, 100)],
    num_inference_steps=50,
    guidance_scale=4.0,
)
```

## Architecture

| Component | Class | Key parameters |
|-----------|-------|----------------|
| **UNet** | `UNet2DConditionModel` | in=16, out=12, blocks=[256,512,1024], cross_attn=512 |
| **VAE** | `MatFuseVQModel` (custom) | 4 encoders + 4 VQ codebooks (4096×3), shared decoder, f=8 |
| **Scheduler** | `DDIMScheduler` | β 0.0015–0.0195, scaled_linear, ε-prediction |
| **Conditioning** | `MultiConditionEncoder` (custom) | CLIP ViT-B/16 · sentence-transformers · palette MLP · sketch CNN |

## 📜 Citation

```bibtex
@inproceedings{vecchio2024matfuse,
  author    = {Vecchio, Giuseppe and Sortino, Renato and Palazzo, Simone and Spampinato, Concetto},
  title     = {MatFuse: Controllable Material Generation with Diffusion Models},
  booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  month     = {June},
  year      = {2024},
  pages     = {4429-4438}
}
```

## License

This project is licensed under the MIT License.