Speaker Splitter

A Python tool to separate audio files by speaker using diarization data. This tool takes a WAV audio file and a JSON file containing speaker timestamps, and creates individual WAV files for each speaker, maintaining the original timing and replacing other speakers' segments with silence.

Features

Separates multi-speaker audio into individual speaker files
Preserves original timing and audio quality
Handles timestamps in HH:MM:SS,MMM format
Creates silence during non-speaking segments
Supports multiple speakers
Simple command-line interface

Prerequisites

Python 3.11 or higher
pydub library for audio processing
FFmpeg (required by pydub)

Installation

Clone the repository:

git clone https://github.com/mmaudet/audio-speaker-separator.git
cd audio-speaker-separator

Create and activate a Conda environment:

conda create -n audio-splitter python=3.11
conda activate audio-splitter

Install required Python packages:

pip install pydub

Install FFmpeg:

Ubuntu/Debian:
```
sudo apt-get install ffmpeg
```
macOS:
```
brew install ffmpeg
```

Usage

The basic command format is:

python speaker_splitter.py input.wav diarization.json

Input Files

Audio file (input.wav):
- Must be in WAV format
- Contains the multi-speaker audio to be separated
JSON file (diarization.json):
- Contains speaker segments with timestamps
- Can be generated using various diarization tools such as:
  - WhisperX: an open-source tool that combines Whisper with speaker diarization
  - LinTO: an enterprise-grade conversational AI platform developed by LINAGORA
- Format example:

{
  "segments": [
    {
      "speaker": "SPEAKER_00",
      "start": "00:01:57,000",
      "end": "00:01:59,000",
      "text": "Example text"
    },
    {
      "speaker": "SPEAKER_01",
      "start": "00:02:14,000",
      "end": "00:02:20,000",
      "text": "Another example"
    }
  ]
}

Generating the JSON File

Using WhisperX:

# Install WhisperX
pip install whisperx

# Run diarization
whisperx audio.wav --diarize

Using LinTO:
- Visit LinTO Platform
- Upload your audio file
- Use the transcription and diarization service
- Export the results in JSON format

Both tools provide accurate speaker diarization and transcription, with the JSON output being compatible with this tool.

Output

The script generates separate WAV files for each speaker:

output-audio-SPEAKER_00.wav
output-audio-SPEAKER_01.wav
etc.

Each output file:

Has the same duration as the input file
Contains only the specified speaker's segments
Contains silence during other speakers' segments
Maintains original timing and audio quality

Error Handling

The script includes error handling for common issues:

Invalid input files
Incorrect JSON format
Missing audio file
Invalid timestamps
Python version verification

Future Developments (not planned yet)

WhisperX Integration: a major planned enhancement is the direct integration of WhisperX for a complete transcription and diarization workflow
Audio Format Support: add support for additional audio formats such as MP3, FLAC, etc.
Cross-fade between segments to reduce abrupt transitions
Simple web interface for file upload and processing and real-time processing status
Speech overlap detection and handling
Process multiple files in batch
Docker container for easy deployment
Integration with LinTO platform

Contributing

Contributions are welcome! Please feel free to submit pull requests.

Submit pull requests for any of these features
Propose new features or improvements
Report bugs or issues
Share use cases and requirements

License

This project is licensed under the GNU Affero General Public License v3.0 (AGPL-3.0). This license ensures that:

You can use, modify, and distribute the software
If you modify the software and provide it as a service over a network, you must make the source code available
Any derivative work must also be licensed under AGPL-3.0

See the LICENSE for the full text of the license.

Acknowledgments

Uses pydub for audio processing
Inspired by the need for clean speaker separation in multi-speaker recordings
Thanks to the WhisperX and LinTO teams for providing excellent diarization tools

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
assets/images		assets/images
readme.md		readme.md
speaker_splitter.py		speaker_splitter.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Speaker Splitter

Features

Prerequisites

Installation

Usage

Input Files

Generating the JSON File

Output

Error Handling

Future Developments (not planned yet)

Contributing

License

Acknowledgments

About

Languages

mmaudet/speaker-splitter

Folders and files

Latest commit

History

Repository files navigation

Speaker Splitter

Features

Prerequisites

Installation

Usage

Input Files

Generating the JSON File

Output

Error Handling

Future Developments (not planned yet)

Contributing

License

Acknowledgments

About

Topics

Resources

Stars

Watchers

Forks

Languages