Skip to content

Commit

Permalink
Documentation WIP
Browse files Browse the repository at this point in the history
  • Loading branch information
mmcauliffe committed Jul 4, 2016
1 parent ddd4f81 commit 2afdcd0
Show file tree
Hide file tree
Showing 9 changed files with 329 additions and 32 deletions.
4 changes: 3 additions & 1 deletion docs/source/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -129,7 +129,9 @@
# further. For a list of options available for each theme, see the
# documentation.
#
# html_theme_options = {}
html_theme_options = {
'page_width': 'auto',
}

# Add any paths that contain custom themes here, relative to this directory.
# html_theme_path = []
Expand Down
40 changes: 40 additions & 0 deletions docs/source/example.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
.. example:
.. _`LibriSpeech lexicon`: http://www.openslr.org/resources/11/librispeech-lexicon.txt

.. _`LibriSpeech data set`: https://www.dropbox.com/s/i08yunn7yqnbv0h/LibriSpeech.zip?dl=0

*******
Example
*******

This example for aligning the LibriSpeech test data set assumes that
the Montreal Forced Aligner is has been downloaded and works.

Set up
======

1. Download the prepared LibriSpeech dataset (`LibriSpeech data set`_) and extract it somewhere on your computer.
2. Download the LibriSpeech lexicon (`LibriSpeech lexicon`_) and save it somewhere on your computer.


Alignment
=========

Aligning using pre-trained models
---------------------------------

Enter the following command into the terminal:

.. code-block:: bash
bin/mfa_align --english /path/to/librispeech/dataset ~/Documents/aligned_librispeech
Aligning through training
-------------------------

Enter the following command into the terminal:

.. code-block:: bash
bin/mfa_train_and_align /path/to/librispeech/dataset /path/to/librispeech/lexicon ~/Documents/aligned_librispeech
1 change: 1 addition & 0 deletions docs/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@ Contents:
.. toctree::
:maxdepth: 2

introduction.rst
installation.rst
tutorial.rst
commonerrors.rst
Expand Down
71 changes: 64 additions & 7 deletions docs/source/installation.rst
Original file line number Diff line number Diff line change
@@ -1,12 +1,69 @@
.. Montreal Forced Aligner documentation master file, created by
sphinx-quickstart on Wed Jun 15 13:27:38 2016.
You can adapt this file completely to your liking, but it should at least
contain the root `toctree` directive.
.. _installation:

.. _`Montreal Forced Aligner releases`: https://github.com/MontrealCorpusTools/Montreal-Forced-Aligner/releases

.. _`Kaldi GitHub repository`: https://github.com/kaldi-asr/kaldi

************
Installation
===================================================
************

All releases for the Montreal Forced Aligner are available on
`Montreal Forced Aligner releases`_.

Mac
===

1. Download the zip folder for Mac and unzip the folder to anywhere
2. Open a terminal window
3. Navigate to the ``montreal-forced-aligner`` folder (``cd /path/to/montreal-forced-aligner``)
4. Test the commands ``bin/mfa_align`` and ``bin/mfa_train_and_align``
5. The above commands should print usage messages about the commands

Windows
=======

1. Download the zip folder for Windows and unzip the folder to anywhere
2. Open a command window (Open the Start menu and search for ``cmd``)
3. Navigate to the ``montreal-forced-aligner`` folder (``cd C:\path\to\montreal-forced-aligner``,
you can copy the path of it by holding Shift and right clicking on the folder
and selecting "Copy as path" and pasting it into the command prompt)
4. Test the commands ``bin/mfa_align`` and ``bin/mfa_train_and_align``
5. The above commands should print usage messages about the commands

Linux
=====

The Linux distributions were built on Ubuntu 14.04, and so may not work on
machines that have older versions of Linux system packages. If these instructions
do not work, then the executables will have to be built from source.

1. Download the zip folder for Linux and unzip the folder to anywhere
2. Open a terminal window
3. Navigate to the ``montreal-forced-aligner`` folder (``cd /path/to/montreal-forced-aligner``)
4. Test the commands ``bin/mfa_align`` and ``bin/mfa_train_and_align``
5. The above commands should print usage messages about the commands

Building from source
====================

NB: These instructions require Python 3 (and you may have to replace
instances of ``python`` and ``pip`` with ``python3`` and ``pip3`` if Python 3 is
not your default Python) and assume Linux in the commands.

1. Get kaldi compiled and working: `Kaldi GitHub repository`_
2. Download the source zip from the releases page.
3. Open a terminal and go to the unzipped folder (``cd /path/to/Montreal-Forced-Aligner/thirdparty``)
4. Run the ``thirdparty/kaldibinaries.py`` script point it to where Kaldi was built (``python thirdparty/kaldibinaries.py /path/to/kaldi/root``)
5. Run ``pip install -r requirements.txt`` to install the requirements for the aligner.
6. Build the executable by doing ``. freezing/freeze.sh`` and there will be a ``montreal-forced-aligner`` folder in the dist/ folder.
7. This folder should contain two executables ``mfa_align`` and ``mfa_train_and_align`` that should be used for alignment.

Download the montreal-forced-aligner folder, and save it wherever you want to run the aligner from. Nothing else is required to be able to run the aligner.
Files created when using the Montreal Forced Aligner
====================================================

The aligner will save data and logs for the models it trains in a new folder, Documents/MFA (which it creates). If a model for a corpus folder already exists in MFA, it will use the existing model if you try to align it again. (If this is not desired, delete or move the old model folder.)
The aligner will save data and logs for the models it trains in a new folder,
``Documents/MFA`` (which it creates in your user's home directory). If a model for a corpus already
exists in MFA, it will use any existing models if you try to align it again.
(If this is not desired, delete or move the old model folder.)

73 changes: 73 additions & 0 deletions docs/source/introduction.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,73 @@
.. _introduction:

.. _`Kaldi homepage`: http://kaldi-asr.org/

.. _`HTK homepage`: http://htk.eng.cam.ac.uk/

.. _`Prosodylab-aligner homepage`: http://prosodylab.org/tools/aligner/

.. _`P2FA homepage`: https://www.ling.upenn.edu/phonetics/old_website_2015/p2fa/

.. _`FAVE-align homepage`: http://fave.ling.upenn.edu/FAAValign.html

.. _`MAUS homepage`: http://www.bas.uni-muenchen.de/Bas/BasMAUS.html

.. _`Praat homepage`: http://www.fon.hum.uva.nl/praat/

.. _`EasyAlign homepage`: http://latlcui.unige.ch/phonetique/easyalign.php

Introduction
============

What is forced alignment?
-------------------------

Forced alignment is a technique to take an orthographic transcription of
an audio file and generate a time-aligned version using a pronunciation
dictionary to look up phones for words.

Underlying technology
---------------------

The Montreal Forced Aligner uses the Kaldi ASR toolkit
(`Kaldi homepage`_) to perform forced alignment.
Kaldi is under active development and uses modern ASR and includes state-of-the-art algorithms for tasks
in automatic speech recognition beyond forced alignment.

Relation to other forced alignment tools
----------------------------------------

Most tools for forced alignment used by linguists rely on the HMM Toolkit
(HTK; `HTK homepage`_), including:

* Prosodylab-aligner (`Prosodylab-aligner homepage`_)
* Penn Phonetics Forced Aligner (P2FA, `P2FA homepage`_)
* FAVE-align (`FAVE-align homepage`_)
* (Web) MAUS(`MAUS homepage`_)

Praat (`Praat homepage`_)
has a built-in aligner as well.
EasyAlign (`EasyAlign homepage`_)
is a Praat plug-in built to facilitate its use.




Contributors
------------

* Michael McAuliffe
* Michaela Socolof
* Sarah Mihuc
* Michael Wagner

Citation
--------

McAuliffe, Michael, Michaela Socolof, Sarah Mihuc, and Michael Wagner (2016).
Montreal Forced Aligner [Computer program]. Version 0.5,
retrieved 13 July 2016 from http://montrealcorpustools.github.io/Montreal-Forced-Aligner/.

Funding
-------

153 changes: 136 additions & 17 deletions docs/source/tutorial.rst
Original file line number Diff line number Diff line change
@@ -1,32 +1,151 @@
.. Montreal Forced Aligner documentation master file, created by
sphinx-quickstart on Wed Jun 15 13:27:38 2016.
You can adapt this file completely to your liking, but it should at least
contain the root `toctree` directive.
.. _tutorial:

.. _`LibriSpeech lexicon`: http://www.openslr.org/resources/11/librispeech-lexicon.txt

.. _`LibriSpeech corpus`: http://www.openslr.org/12/

.. _`CMU Pronouncing Dictionary`: http://www.speech.cs.cmu.edu/cgi-bin/cmudict

.. _`Prosodylab-aligner English dictionary`: https://github.com/prosodylab/Prosodylab-Aligner/blob/master/eng.dict

.. _`Prosodylab-aligner French dictionary`: https://github.com/prosodylab/prosodylab-alignermodels/blob/master/FrenchQuEu/fr-QuEu.dict

********
Tutorial
===================================================
********

There are two modes for the Montreal Forced Aligner:

1. Use a pretrained model to align a data set (``mfa_align``)

2. Align a data set using only that data set (``mfa_train_and_align``) and
optionally output the trained model for future use

The Montreal Forced Aligner supports two data formats:

1. Prosodylab-Aligner format (single channel sound files and corresponding orthographic
transcriptions in .lab files with speaker designations specified)

2. Textgrid format (mono/stereo sound files and corresponding TextGrids where
each speaker has a tier and each interval contains the orthographic
transcription)

Dictionaries
============

Dictionaries should be specified in the following format:

::

WORDA PHONEA PHONEB
WORDB PHONEB PHONEC

Where each line is a word with a transcription separated by white space.
Each phone should be separated by white space as well.

A dictionary for English that has good coverage is the lexicon derived
from the LibriSpeech corpus (`LibriSpeech lexicon`_).
This lexicon uses the Arpabet transcription format (like the `CMU Pronouncing Dictionary`_).

There is an option when running the aligner for not using a dictionary (`--nodict`).
When run in this mode, the aligner will construct pronunciations for words
in the corpus based off their orthographies. In this mode, a dataset with an example transcription

::

WORDA WORDB

for a sound file would have the following dictionary generated:

::

WORDA W O R D A
WORDB W O R D B

The Prosodylab-aligner has two preconstructed dictionaries as well, one
for English (`Prosodylab-aligner English dictionary`_)
and one for Quebec French (`Prosodylab-aligner French dictionary`_)

Data formats
============

Prosodylab-Aligner format
-------------------------

Things you need before you can align:

1. Every .wav sound file you are aligning must have a corresponding .lab file which contains the text transcription of that .wav file. The .wav and .lab files must have the same name. For example, if you have givrep_1027_2_1.wav, its transcription should be in givrep_1027_2_1.lab (which is just a text file with the .lab extension). If you have transcriptions in a tab-separated text file (or an Excel file which can be saved as one), you can generate .lab files from it using the relabel function of relabel_clean.py. The relabel_clean.py script is currently in the prosodylab.alignertools repository on GitHub.
1. Every .wav sound file you are aligning must have a corresponding .lab
file which contains the text transcription of that .wav file. The .wav and
.lab files must have the same name. For example, if you have ``givrep_1027_2_1.wav``,
its transcription should be in ``givrep_1027_2_1.lab`` (which is just a
text file with the .lab extension). If you have transcriptions in a
tab-separated text file (or an Excel file which can be saved as one),
you can generate .lab files from it using the relabel function of relabel_clean.py.
The relabel_clean.py script is currently in the prosodylab.alignertools repository on GitHub.

2. These .lab files do not have be in the same case as the words in the dictionary
(i.e. all words are coerced to lower case), and punctuation is ignored.

3. You also need a pronunciation dictionary for the language you're
aligning. Our dictionaries for English and French are provided with
the old Prosodylab Aligner (French is in prosodylab.alignermodels).
You can also write your own dictionary or download others.


2. These .lab files must be in the same format as the words in the dictionary (i.e. all capitalized for our dictionaries), and should ideally contain no punctuation. (The aligner deals with punctuation for you.) If your .lab files aren't in the correct format, you can use our relabel_clean.py script to clean your .lab files - this puts them into the correct format to work with our dictionaries.
TextGrid format
---------------

3. You also need a pronunciation dictionary for the language you're aligning. Our dictionaries for English and French are provided with the old Prosodylab Aligner (French is in prosodylab.alignermodels). You can also write your own dictionary or download others.


Running the aligner
===================

Align using pretrained models
-----------------------------

The Montreal Forced Aligner comes with pretrained models/dictionaries for:

- English - trained from the LibriSpeech data set (`LibriSpeech corpus`_)
- Quebec French

Steps to align:



Align using only the data set
-----------------------------

Steps to align:

1. Open terminal, and change directory to montreal-forced-aligner.

2. type ./montreal-forced-aligner followed by the arguments described above in Usage. (On Mac/Unix, to save time typing out the path, you can drag a folder from Finder into Terminal and it will put the full path to that folder into your command.)
A template command:
./montreal-forced-aligner -s [#] [corpus-folder] [dictionary] [output-folder]
This command will train a new model and align the files in [corpus-folder] using the file [dictionary], and save the output TextGrids to [output-folder]. It will take the first [#] characters of the file name to be the speaker ID number.

An example command:
./montreal-forced-aligner -s 7 ~/2_French_training ~/French/fr-QuEu.dict ~/2_French_training -f -v
This command will train a new model and align the files in ~/2_French_training using the dictionary file ~/French/fr-QuEu.dict, and save the output TextGrids to ~/2_French_training. It will take the first 7 characters of the file name to be the speaker ID number. It will be fast (do half as many training iterations) and verbose (output more info to Terminal during training).
2. Type ``bin/mfa_train_and_align`` followed by the arguments described
above in Usage. (On Mac/Unix, to save time typing out the path, you
can drag a folder from Finder into Terminal and it will put the full
path to that folder into your command.)


A template command:

.. code-block:: bash
bin/mfa_train_and_align -s [#] [corpus-folder] [dictionary] [output-folder]
This command will train a new model and align the files in [corpus-folder]
using the file [dictionary], and save the output TextGrids to [output-folder].
It will take the first [#] characters of the file name to be the speaker ID number.
An example command:
.. code-block:: bash
3. Once the aligner finishes, the resulting TextGrids will be in the specified output directory. Training can take a couple hours for large datasets.
bin/mfa_train_and_align -s 7 ~/2_French_training ~/French/fr-QuEu.dict ~/2_French_training -f -v
This command will train a new model and align the files in ``~/2_French_training``
using the dictionary file ``~/French/fr-QuEu.dict``, and save the output
TextGrids to ``~/2_French_training``. It will take the first 7 characters
of the file name to be the speaker ID number. It will be fast (do half
as many training iterations) and verbose (output more info to Terminal during training).
3. Once the aligner finishes, the resulting TextGrids will be in the
specified output directory. Training can take a couple hours for large datasets.
9 changes: 9 additions & 0 deletions freezing/freeze.bat
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@

pyinstaller --clean -y ^
--additional-hooks-dir=freezing/hooks ^
aligner/command_line/train_and_align.py

pyinstaller --clean -y ^
--additional-hooks-dir=freezing/hooks ^
aligner/command_line/align.py

Loading

0 comments on commit 2afdcd0

Please sign in to comment.