license: apache-2.0

TruthShield VoiceGen

Multi-Speaker, Multilingual TTS with Accent & Style Transfer

License HuggingFace

Overview

TruthShield VoiceGen is an advanced text-to-speech system supporting 11 languages with voice cloning, accent transfer, and style control capabilities. Built with safety-first principles using forensic speaker verification.

Features

  • 🌍 11 Languages: Hindi, Bengali, Telugu, Tamil, Kannada, Marathi, Gujarati, Bhojpuri, Maithili, Chhattisgarhi, Magahi, English
  • 🎀 Voice Cloning: Clone voices from short reference audio
  • πŸ—£οΈ Accent Transfer: Transfer accents while preserving content
  • 🎭 Style Control: Adjust speaking style and emotion
  • πŸ›‘οΈ Safety Verification: ECAPA-TDNN forensic verification

Quick Start

Installation

git clone https://github.com/truthshield/voicegen.git
cd voicegen
pip install -r requirements.txt

Run Server

uvicorn server:app --host 0.0.0.0 --port 8080

API Usage

curl -X GET "http://localhost:8080/Get_Inference?text=hello%20world&lang=english" \
  -F "[email protected]" \
  --output output.wav

API Specification

Endpoint: GET /Get_Inference

Parameter Type Required Description
text query Yes Text to synthesize
lang query Yes Language code
speaker_wav file Yes Reference speaker audio (WAV)

Supported Languages

bhojpuri, bengali, english, gujarati, hindi, chhattisgarhi, kannada, magahi, maithili, marathi, telugu

Response Headers

  • X-Model-Version: Model version string
  • X-Speaker-Similarity: Voice similarity score
  • X-Safety-Verified: Safety verification status

Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   Text   │──▢│ Phoneme  │──▢│   VITS   │──▢│  Safety  β”‚
β”‚  Input   β”‚   β”‚ Encoder  β”‚   β”‚ Encoder  β”‚   β”‚  Layer   β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”˜
                                                  β”‚
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”
β”‚  Audio   │◀──│   WAV Out    │◀──│   HiFiGAN Vocoder    β”‚
β”‚  Output  β”‚   β”‚  + Headers   β”‚   β”‚                      β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Safety Layer

All generated audio passes through ECAPA-TDNN speaker verification:

  1. Extract speaker embeddings from reference
  2. Generate audio using VITS
  3. Extract embeddings from generated audio
  4. Compute similarity score
  5. Apply threshold (0.85) for verification

Datasets

See datasets.csv for training data sources.

License

Apache 2.0

Citation

@misc{truthshield2024voicegen,
  title={TruthShield VoiceGen: Multi-Speaker Multilingual TTS},
  author={TruthShield Team},
  year={2024}
}
Downloads last month
13
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support