Harmony Parler TTS
A fine-tuned speech synthesis model built to enhance the quality and expressiveness of the original Parler TTS model, incorporating stylistic characteristics inspired by Gemini voice models.
π§ Model Description
Harmony Parler TTS is a high-quality text-to-speech model that improves the naturalness, clarity, and expressiveness of the original Parler TTS baseline. It has been trained on a curated dataset specifically formatted for Parler TTS training, resulting in better output fidelity and voice quality suitable for a wide range of spoken audio applications.
This model is ideal for:
- Focused on Educational content (children)
- Conversational AI voices
- Audiobook narration
- Assistive applications
- Voice-enabled interfaces
π Training Dataset
This model was fine-tuned using the following dataset:
π https://huggingface.co/datasets/SeifElden2342532/parler-tts-dataset-format
The dataset contains well-formatted paired text and audio suitable for TTS model training, ensuring robust performance and clear speech generation. Training was done to refine the acoustic quality, pronunciation, and prosody of the base Parler TTS voices.
π οΈ Training Code & Repository
The training scripts and full fine-tuning workflow are available on GitHub:
π https://github.com/SeifEldenOsama/Harmony_parler_TTS
This repository contains:
- Training scripts (e.g., fine-tuning loops)
- Logging and evaluation code
- Dataset processing utilities
- Configuration files used for the final model
β‘ Usage Example
π§© Install Requirements
pip install git+https://github.com/huggingface/parler-tts.git
pip install torch transformers datasets
pip install git+https://github.com/huggingface/accelerate
π£οΈ How to Use
import torch
from parler_tts import ParlerTTSForConditionalGeneration
from transformers import AutoTokenizer
import soundfile as sf
device = "cuda" if torch.cuda.is_available() else "cpu"
model_id = "SeifElden2342532/Harmony_Parler_TTS"
model = ParlerTTSForConditionalGeneration.from_pretrained(model_id).to(device)
tokenizer = AutoTokenizer.from_pretrained(model_id)
description = "A calm male voice with medium speed and clear audio"
prompt_text = "This is the text I want the model to say."
input_ids = tokenizer(description, return_tensors="pt").input_ids.to(device)
prompt_input_ids = tokenizer(prompt_text, return_tensors="pt").input_ids.to(device)
generated = model.generate(input_ids=input_ids, prompt_input_ids=prompt_input_ids)
audio = generated.cpu().numpy().squeeze()
sf.write("out.wav", audio, model.config.sampling_rate)
Example Script
Sprout: Good morning, Mr. Sun! Your light feels so warm on my teeny tiny leaves today!
Sun: Good morning, little one. I have come to help you grow tall and reach for the blue sky.
Sprout: Oh, thank you! I feel so strong! One day, I will be the tallest tree in the whole forest!
Sun: Patience, little sprout. With a bit of my light and a drop of rain, you will surely get there.
Parler TTS Mini
Harmony TTS
Disclaimer
This model is part of the 3YNO project, an educational application targeting dyslexics and visual learners. It allows uploading a book/research/text content, then converting it into an explanatory video using artificial intelligence after extracting the scientific content from that content and converting it into a story or a narrative, then creating the characters.
Note : Model still under development (working on capturing emotions better from text description)
- Downloads last month
- 169