TTS Book to Audio

Overview

This project converts text files into audiobooks using Text-to-Speech (TTS).

It processes input files, tags dialogues with character names, generates character and metadata files for user's to customize the output, and results in an .m4b audiobook file.

For an illustrative example of the process see:

inputs/example.txt -> outputs/example/example_tagged.txt -> outputs/example/example.m4b

Installation

If you don't have pipenv or Python 3.9 installed, please install it.

Then,

git clone https://github.com/ebonsignori/tts-book-to-audio.git
cd tts-book-to-audio
pipenv install

Usage

Setting up secrets

See .env.example and rename it to .env with the respective keys.

This project uses a GitHub PAT with any level of permissions set via GITHUB_TOKEN to access OpenAI's 4o model via GitHub Models.

The OPENAI_API_KEY is only needed if you are using the --tts-method openai option. This is NOT a free API, however the resulting audio quality may be higher if you choose to use this method.
The ELEVENLABS_API_KEY is only needed if you are using the --tts-method elevenlabs option. This is NOT a free API, however the resulting audio quality may be higher if you choose to use this method.

Invoking the Script

bash pipenv run python src/main.py -i <input_book_name> [options]

Example:

bash pipenv run python src/main.py -i my_book.epub

Input Options

-i, --input-file: (Required) The name of the book file in the inputs/ directory. Should include the file extension (e.g., my_book.epub).

Supported Formats:
- .txt
- .epub
- .mobi
- .pdf
--tts-method, -t: (Optional) Text-to-Speech method to use. Choices are:
- local: (default) Free and fast, but not as high quality as paid APIs.
- openai: Requires an OPENAI_API_KEY in .env. Costs money to use the OpenAPI TTS API.
- elevenlabs: Requires an ELEVENLABS_API_KEY in .env. Costs money to use the ElevenLabs TTS API.
--steps, -s: (Optional) Comma-separated list of processing steps to execute. If not provided, all steps will run. See Processing Steps
--m4b-method, -m: (Optional) Method to combine audio files into .m4b.

Supported methods
- av
- ffmpeg
-p, --write-processed-blocks (Optional): Write intermediate text processing blocks to output/<input_book_name>/processed_blocks/processed_#.txt returned from the GPT. Useful for debugging.

Note: Ensure the input file is placed inside the inputs/ directory.

Processing Steps

The conversion process is divided into four main steps. You can execute all steps at once or specify individual steps for manual intervention or customization.

Step 1: Process Input File into Plaintext.

Converts the input book file into a plaintext file.
Output: outputs/<input_book_name>/<input_book_name>_plaintext.txt

Step 2: Tag Dialogues and Generate JSON Files

Transforms plaintext by surrounding dialogues with <character_name> tags.
Generates characters.json with character names and their corresponding voices.
Creates metadata.json for audiobook metadata customization.
Outputs:
outputs/<input_book_name>/<input_book_name>_tagged.txt
outputs/<input_book_name>/characters.json
outputs/<input_book_name>/metadata.json
output/<input_book_name>/processed_blocks/processed_#.txt (if -p flag is passed)

Step 3: Generate TTS Audio Files

Converts the tagged text into audio files using the specified TTS method.
Output: outputs/<input_book_name>/audio_files/<file_number>.mp3

Step 4: Combine Audio Files into an .m4b Audiobook

Merges all generated audio files into a single .m4b file using the chosen method (av or ffmpeg).
Output: outputs/<input_book_name>/<input_book_name>.m4b

Running Specific Steps

To run specific steps, use the -s or --steps option followed by a comma-separated list of step numbers.

Example:

bash pipenv run python src/main.py -i my_book.epub -s 1,2

This command will execute Step 1 and Step 2 only.

Note: After running certain steps, you may manually edit the generated files (e.g., characters.json, metadata.json, or _plaintext.txt) before proceeding to the next steps.

Example input / output structure

book-to-audio-converter/
├── inputs/
│   └── my_book.epub
│   └── my_book.jpg
├── outputs/
│   └── my_book/
│       ├── my_book_plaintext.txt
│       ├── my_book_tagged.txt
│       ├── characters.json
│       ├── metadata.json
│       ├── audio_files/
│       │   ├── 1.mp3
│       │   ├── 2.mp3
│       │   └── ...
│       └── my_book.m4b

More local voice options

Run pipenv run python src/generate-voice-examples.py
Browse the resulting local-voice-examples directory and play audio files to hear the speaker's voice
Adjust vits_voice_mapping and the gendered voices in CONFIG.voice_identifiers in the src/config.py file.

For example, if you listened to p237 in local-voice-examples and want to add it as another female voice option, append the following to vits_voice_mapping:

"female_3": {
    "model": "tts_models/en/vctk/vits",
    "speaker": "p237"
},

Then in CONFIG.voice_identifiers.female_voices, add the new voice as an auto-map option so that it shows up in auto-generated characters.json:

"female_voices": ["female_1", "female_2", "female_3"],

License

This project is licensed under the MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.vscode		.vscode
inputs		inputs
outputs/example		outputs/example
src		src
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
Pipfile		Pipfile
Pipfile.lock		Pipfile.lock
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TTS Book to Audio

Table of Contents

Overview

Installation

Usage

Setting up secrets

Invoking the Script

Input Options

Processing Steps

Running Specific Steps

Example input / output structure

More local voice options

License

About

Releases

Packages

Languages

License

Ebonsignori/tts-book-to-audio

Folders and files

Latest commit

History

Repository files navigation

TTS Book to Audio

Table of Contents

Overview

Installation

Usage

Setting up secrets

Invoking the Script

Input Options

Processing Steps

Running Specific Steps

Example input / output structure

More local voice options

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages