Skip to content

Speech synthesis#

Speech synthesis is the artificial production of human speech. A computer system used for this purpose is called a speech computer or speech synthesizer, and can be implemented in software or hardware products. A text-to-speech (TTS) system converts normal language text into speech; other systems render symbolic linguistic representations like phonetic transcriptions into speech. The reverse process is speech recognition.

Read more on Wikipedia.

Examples#

Using AI#
  • Coqui: A startup providing open speech tech for everyone ⭐
  • ElevenLabs: Explore the most advanced text to speech and voice cloning software ever ⭐
  • FakeYou: Use FakeYou deep fake tech to say stuff with your favorite characters ⭐
  • Mimic 3: A fast local neural text to speech engine for Mycroft ⭐
  • Sonantic: Deliver compelling, lifelike performances with fully expressive AI-generated voices
  • SpeechBrain: A PyTorch-based Speech Toolkit
  • TorToiSe: A multi-voice TTS system trained with an emphasis on quality ⭐
  • [2020] Glow-TTS: A Generative Flow for Text-to-Speech via Monotonic Alignment Search
  • [2020] HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis
  • [2020] Tacotron: A TensorFlow implementation of Google’s Tacotron speech synthesis with pre-trained model
    • [2020] Mimic 2: Text to Speech engine based on the Tacotron architecture
  • [2020] Tacotron 2: PyTorch implementation with faster-than-realtime inference
  • [2019] MelGAN: Generative Adversarial Networks for Conditional Waveform Synthesis
    • [2020] Multi-band MelGAN: Faster Waveform Generation for High-Quality Text-to-Speech
Not using AI#
  • eSpeak NG: An open source speech synthesizer
  • Festival: Offers a general framework for building speech synthesis systems
    • Flite: A small fast portable speech synthesis system
      • Mimic 1: Mycroft’s TTS engine, based on CMU’s Flite (Festival Lite)

Resources#