Skip to content

Speech recognition#

Speech recognition is an interdisciplinary subfield of computer science and computational linguistics that develops methodologies and technologies that enable the recognition and translation of spoken language into text by computers with the main benefit of searchability. It is also known as automatic speech recognition (ASR), computer speech recognition or speech to text (STT). It incorporates knowledge and research in the computer science, linguistics and computer engineering fields. The reverse process is speech synthesis.

Read more on Wikipedia.

Examples#

  • ocotillo: Performant and accurate speech recognition built on Pytorch
  • Project Shasta: AI-powered audio recording and editing, all in the web
  • SpeechBrain: A PyTorch-based Speech Toolkit
  • Whisper: A general-purpose speech recognition model ⭐
    • Faster Whisper: Faster Whisper transcription with CTranslate2
    • whisper.cpp: Port of OpenAI’s Whisper model in C/C++
    • WhisperX: Timestamp-Accurate Automatic Speech Recognition

Resources#

Video#