VibeVoice-1.5B

VibeVoice-1.5B is a text-to-speech (TTS) model hosted on Hugging Face. This repository provides scripts and examples to synthesize speech from text using pre-trained checkpoints.

Repository

Hugging Face model page: technicalheist/vibevoice-1.5b

Requirements

  • Python 3.8+
  • PyTorch (with CUDA support recommended)
  • Transformers
  • FFmpeg (for audio processing)

Installation

Clone the repository and install dependencies:

# Clone the repository
!git clone https://huggingface.co/technicalheist/vibevoice-1.5b

# Change directory
%cd /content/vibevoice-1.5b

# Install in editable mode
!pip install -e .

# Install ffmpeg for audio handling
!apt update && apt install ffmpeg -y

Usage

Run inference using the provided demo script:

!python /content/vibevoice-1.5b/demo/inference_from_file.py \
  --model_path /content/vibevoice-1.5b \
  --txt_path /content/vibevoice-1.5b/demo/text_examples/1p_abs.txt \
  --speaker_names Alice

Arguments

  • --model_path: Path to the model directory (local or Hugging Face repo name).
  • --txt_path: Path to a text file containing the input text.
  • --speaker_names: Names of the speakers to be used for synthesis (multiple speakers supported).

Example with multiple speakers

!python /content/vibevoice-1.5b/demo/inference_from_file.py \
  --model_path /content/vibevoice-1.5b \
  --txt_path /content/vibevoice-1.5b/demo/text_examples/2p_music.txt \
  --speaker_names Alice Frank

Google Colab Notebook

A ready-to-use Google Colab notebook is available for quick experimentation:

Open in Colab

Output

  • Generated audio files will be saved in the output directory specified in the script.
  • Default output format: .wav

License

Check the license terms on the model page before use.

Downloads last month
16
Safetensors
Model size
3B params
Tensor type
BF16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support