Speech synthesis#

Speech synthesis is the artificial production of human speech. A computer system used for this purpose is called a speech computer or speech synthesizer, and can be implemented in software or hardware products. A text-to-speech (TTS) system converts normal language text into speech; other systems render symbolic linguistic representations like phonetic transcriptions into speech. The reverse process is speech recognition.

Examples#

Using AI#

Coqui: A startup providing open speech tech for everyone ⭐
ElevenLabs: Explore the most advanced text to speech and voice cloning software ever ⭐
FakeYou: Use FakeYou deep fake tech to say stuff with your favorite characters ⭐
Mimic 3: A fast local neural text to speech engine for Mycroft ⭐
Sonantic: Deliver compelling, lifelike performances with fully expressive AI-generated voices
SpeechBrain: A PyTorch-based Speech Toolkit
TorToiSe: A multi-voice TTS system trained with an emphasis on quality ⭐
[2020] Glow-TTS: A Generative Flow for Text-to-Speech via Monotonic Alignment Search
[2020] HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis
[2020] Tacotron: A TensorFlow implementation of Google’s Tacotron speech synthesis with pre-trained model
- [2020] Mimic 2: Text to Speech engine based on the Tacotron architecture
[2020] Tacotron 2: PyTorch implementation with faster-than-realtime inference
[2019] MelGAN: Generative Adversarial Networks for Conditional Waveform Synthesis
- [2020] Multi-band MelGAN: Faster Waveform Generation for High-Quality Text-to-Speech

Not using AI#

eSpeak NG: An open source speech synthesizer
Festival: Offers a general framework for building speech synthesis systems
- Flite: A small fast portable speech synthesis system
  - Mimic 1: Mycroft’s TTS engine, based on CMU’s Flite (Festival Lite)

Resources#

[2023] Ask HN: Are there any good open source text-to-speech tools? ⭐