Speech recognition#

Speech recognition is an interdisciplinary subfield of computer science and computational linguistics that develops methodologies and technologies that enable the recognition and translation of spoken language into text by computers with the main benefit of searchability. It is also known as automatic speech recognition (ASR), computer speech recognition or speech to text (STT). It incorporates knowledge and research in the computer science, linguistics and computer engineering fields. The reverse process is speech synthesis.

Examples#

ocotillo: Performant and accurate speech recognition built on Pytorch
Project Shasta: AI-powered audio recording and editing, all in the web
SpeechBrain: A PyTorch-based Speech Toolkit
Whisper: A general-purpose speech recognition model ⭐
- Faster Whisper: Faster Whisper transcription with CTranslate2
- whisper.cpp: Port of OpenAI’s Whisper model in C/C++
- WhisperX: Timestamp-Accurate Automatic Speech Recognition

Resources#

Video#

[2022] Open AI Whisper - Open Source Translation and Transcription