A PyTorch-based Speech Toolkit
-
Updated
Dec 20, 2024 - Python
A PyTorch-based Speech Toolkit
Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding
Reading list for research topics in multimodal machine learning
Silero VAD: pre-trained enterprise-grade Voice Activity Detector
Foundation Architecture for (M)LLMs
WaveNet vocoder
Multilingual Automatic Speech Recognition with word-level timestamps and confidence
PyTorch implementation of convolutional neural networks-based text-to-speech synthesis models
A curated list of awesome Speaker Diarization papers, libraries, datasets, and other resources.
AI powered speech denoising and enhancement
Controllable and fast Text-to-Speech for over 7000 languages!
💎 A list of accessible speech corpora for ASR, TTS, and other Speech Technologies
SincNet is a neural architecture for efficiently processing raw audio samples.
Open source audio annotation tool for humans
General Speech Restoration
StreamSpeech is an “All in One” seamless model for offline and simultaneous speech recognition, speech translation and speech synthesis.
A tutorial for Speech Enhancement researchers and practitioners. The purpose of this repo is to organize the world’s resources for speech enhancement and make them universally accessible and useful.
A neural network for end-to-end speech denoising
Speech, Language, Audio, Music Processing with Large Language Model
Add a description, image, and links to the speech-processing topic page so that developers can more easily learn about it.
To associate your repository with the speech-processing topic, visit your repo's landing page and select "manage topics."