Harmony Parler TTS

A fine-tuned speech synthesis model built to enhance the quality and expressiveness of the original Parler TTS model, incorporating stylistic characteristics inspired by Gemini voice models.


🧠 Model Description

Harmony Parler TTS is a high-quality text-to-speech model that improves the naturalness, clarity, and expressiveness of the original Parler TTS baseline. It has been trained on a curated dataset specifically formatted for Parler TTS training, resulting in better output fidelity and voice quality suitable for a wide range of spoken audio applications.

This model is ideal for:

  • Focused on Educational content (children)
  • Conversational AI voices
  • Audiobook narration
  • Assistive applications
  • Voice-enabled interfaces

πŸ“š Training Dataset

This model was fine-tuned using the following dataset:

πŸ”— https://huggingface.co/datasets/SeifElden2342532/parler-tts-dataset-format

The dataset contains well-formatted paired text and audio suitable for TTS model training, ensuring robust performance and clear speech generation. Training was done to refine the acoustic quality, pronunciation, and prosody of the base Parler TTS voices.


πŸ› οΈ Training Code & Repository

The training scripts and full fine-tuning workflow are available on GitHub:

πŸ”— https://github.com/SeifEldenOsama/Harmony_parler_TTS

This repository contains:

  • Training scripts (e.g., fine-tuning loops)
  • Logging and evaluation code
  • Dataset processing utilities
  • Configuration files used for the final model

⚑ Usage Example

🧩 Install Requirements

pip install git+https://github.com/huggingface/parler-tts.git
pip install torch transformers datasets
pip install git+https://github.com/huggingface/accelerate

πŸ—£οΈ How to Use

import torch
from parler_tts import ParlerTTSForConditionalGeneration
from transformers import AutoTokenizer
import soundfile as sf

device = "cuda" if torch.cuda.is_available() else "cpu"

model_id = "SeifElden2342532/Harmony_Parler_TTS"
model = ParlerTTSForConditionalGeneration.from_pretrained(model_id).to(device)
tokenizer = AutoTokenizer.from_pretrained(model_id)

description = "A calm male voice with medium speed and clear audio"
prompt_text = "This is the text I want the model to say."

input_ids = tokenizer(description, return_tensors="pt").input_ids.to(device)
prompt_input_ids = tokenizer(prompt_text, return_tensors="pt").input_ids.to(device)

generated = model.generate(input_ids=input_ids, prompt_input_ids=prompt_input_ids)
audio = generated.cpu().numpy().squeeze()

sf.write("out.wav", audio, model.config.sampling_rate)

Example Script

Sprout: Good morning, Mr. Sun! Your light feels so warm on my teeny tiny leaves today!

Sun: Good morning, little one. I have come to help you grow tall and reach for the blue sky.

Sprout: Oh, thank you! I feel so strong! One day, I will be the tallest tree in the whole forest!

Sun: Patience, little sprout. With a bit of my light and a drop of rain, you will surely get there.

Parler TTS Mini

Harmony TTS

Disclaimer

This model is part of the 3YNO project, an educational application targeting dyslexics and visual learners. It allows uploading a book/research/text content, then converting it into an explanatory video using artificial intelligence after extracting the scientific content from that content and converting it into a story or a narrative, then creating the characters.

Note : Model still under development (working on capturing emotions better from text description)

Downloads last month
169
Safetensors
Model size
0.9B params
Tensor type
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Dataset used to train SeifElden2342532/Harmony_Parler_TTS