v0.7.0
Highlights
Example Pipelines
torchaudio is expanding its support for models and end-to-end applications. Please file an issue on github to provide feedback on them.
- Speech Recognition: Building on the addition of the Wav2Letter model for speech recognition in the last release, we added a training example pipelines for speech recognition that uses the LibriSpeech dataset.
- Text-to-Speech: With the goal of supporting text-to-speech applications, we added a vocoder based on the WaveRNN model. WaveRNN model is based on the implementation from this repository. The original implementation was introduced in "Efficient Neural Audio Synthesis". We provide an example training pipeline in the example folder that uses the LibriTTS dataset added to torchaudio in this release.
- Source Separation: We also support source separation with the addition of the ConvTasNet model, based on the paper "Conv-TasNet: Surpassing Ideal Time-Frequency Magnitude Masking for Speech Separation." An example training pipeline is provided with the wsj0-mix dataset.
I/O Improvements
As you are likely already aware from the last release we’re currently in the process of making sox_io
, which ships with new features such as TorchScript support and performance improvements, the new default. If you want to benefit from these features now, we encourage you to migrate. For more information see issue #903.
Backwards Incompatible Changes
- Switched all %-based string formatting to
str.format
to adopt changes in PyTorch, leading to improved error messages for TorchScript (#850) - Split
sox_utils.list_formats()
for read and write (#811) - Made directory traversal order alphabetical and breadth-first, consistent across operating systems (#814)
- Changed GTZAN so that it only traverses filenames belonging to the dataset (#791)
New Features
- Added ConvTasNet model (#920, #933) with pipeline (#894)
- Added canonical pipeline with wav2letter (#632)
- The WaveRNN model (#705, #797, #801, #810, #836) is available with a canonical pipeline (#749, #802, #831, #863)
- Added all 3 releases of tedlium dataset (#882, #934, #945, #895)
- Added
VCTK_092
dataset (#812) - Added LibriTTS (#790, #820)
- Added SPHERE support to
sox_io
backend (#871) - Added torchscript sox effects (#760)
- Added a flag to change the interface of
soundfile
backend to the one identical tosox_io
backend. (#922)
Improvements
- Added
soundfile
compatibility backend. (#922) - Improved the speed of
torchaudio.compliance.kaldi.fbank
(#947) - Improved the speed of phaser (#660)
- Added warning when a Mel filter is all zero (#914)
- Added
pathlib.Path
support tosox_io
backend (#907) - Simplified C++ registration with TORCH_LIBRARY (#840)
- Merged sox effect and
sox_io
C++ implementation (#779)
Internal
- CI: Added test to validate torchscript backward compatibility (#838)
- CI: Used mocked datasets to test CMUArctic (#829), CommonVoice (#827), Speech Commands (#824), LJSpeech (#826), LibriSpeech (#825), YESNO (#792, #832)
- CI: Made *nix unit test fail if C++ extension is not available (#847, #849)
- CI: Separated I/O in testing. (#813, #773, #783)
- CI: Added smoke tests to
sox_io
andsox_effects
(#806) - CI: Tested utilities have been refactored (#805, #808, #809, #817, #822, #831)
- Doc: Added how to run tests (#843)
- Doc: Added 0.6.0 to version matrix in README (#833)
Bug Fixes
- Fixed device in interactive ASR example (#900)
- Fixed incorrect extension parsing (#885)
- Fixed dither with
noise_shaping = True
(#865) - Run unit test with non-editable installation (#845), and set
zip_safe = False
to disable egg installation (#842) - Sorted GTZAN dataset and use on-the-fly data in GTZAN test (#819)