prabindersinghh's picture
Update README.md
0a3b694 verified
metadata
license: apache-2.0
language:
  - en
  - hi
  - gu
  - bn
  - kn
  - mr
  - bho
  - mag
  - mai
  - te
  - chh
datasets:
  - TruthShieldAI/TruthShieldVoiceGen
base_model: coqui-ai/TTS-VITS
pipeline_tag: text-to-speech
library_name: TTS
tags:
  - tts
  - multi-speaker
  - multilingual
  - accent-transfer
  - style-transfer
  - voice-cloning
  - india-languages

license: apache-2.0

TruthShield VoiceGen

Multi-Speaker, Multilingual TTS with Accent & Style Transfer

License HuggingFace

Overview

TruthShield VoiceGen is an advanced text-to-speech system supporting 11 languages with voice cloning, accent transfer, and style control capabilities. Built with safety-first principles using forensic speaker verification.

Features

  • 🌍 11 Languages: Hindi, Bengali, Telugu, Tamil, Kannada, Marathi, Gujarati, Bhojpuri, Maithili, Chhattisgarhi, Magahi, English
  • 🎀 Voice Cloning: Clone voices from short reference audio
  • πŸ—£οΈ Accent Transfer: Transfer accents while preserving content
  • 🎭 Style Control: Adjust speaking style and emotion
  • πŸ›‘οΈ Safety Verification: ECAPA-TDNN forensic verification

Quick Start

Installation

git clone https://github.com/truthshield/voicegen.git
cd voicegen
pip install -r requirements.txt

Run Server

uvicorn server:app --host 0.0.0.0 --port 8080

API Usage

curl -X GET "http://localhost:8080/Get_Inference?text=hello%20world&lang=english" \
  -F "[email protected]" \
  --output output.wav

API Specification

Endpoint: GET /Get_Inference

Parameter Type Required Description
text query Yes Text to synthesize
lang query Yes Language code
speaker_wav file Yes Reference speaker audio (WAV)

Supported Languages

bhojpuri, bengali, english, gujarati, hindi, chhattisgarhi, kannada, magahi, maithili, marathi, telugu

Response Headers

  • X-Model-Version: Model version string
  • X-Speaker-Similarity: Voice similarity score
  • X-Safety-Verified: Safety verification status

Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   Text   │──▢│ Phoneme  │──▢│   VITS   │──▢│  Safety  β”‚
β”‚  Input   β”‚   β”‚ Encoder  β”‚   β”‚ Encoder  β”‚   β”‚  Layer   β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”˜
                                                  β”‚
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”
β”‚  Audio   │◀──│   WAV Out    │◀──│   HiFiGAN Vocoder    β”‚
β”‚  Output  β”‚   β”‚  + Headers   β”‚   β”‚                      β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Safety Layer

All generated audio passes through ECAPA-TDNN speaker verification:

  1. Extract speaker embeddings from reference
  2. Generate audio using VITS
  3. Extract embeddings from generated audio
  4. Compute similarity score
  5. Apply threshold (0.85) for verification

Datasets

See datasets.csv for training data sources.

License

Apache 2.0

Citation

@misc{truthshield2024voicegen,
  title={TruthShield VoiceGen: Multi-Speaker Multilingual TTS},
  author={TruthShield Team},
  year={2024}
}