Documentation WIP

MontrealCorpusTools · Jul 4, 2016 · 2afdcd0 · 2afdcd0
1 parent ddd4f81
commit 2afdcd0
Show file tree

Hide file tree

Showing 9 changed files with 329 additions and 32 deletions.
diff --git a/docs/source/conf.py b/docs/source/conf.py
@@ -129,7 +129,9 @@
 # further.  For a list of options available for each theme, see the
 # documentation.
 #
-# html_theme_options = {}
+html_theme_options = {
+    'page_width': 'auto',
+}
 
 # Add any paths that contain custom themes here, relative to this directory.
 # html_theme_path = []

diff --git a/docs/source/example.rst b/docs/source/example.rst
@@ -0,0 +1,40 @@
+.. example:
+
+.. _`LibriSpeech lexicon`: http://www.openslr.org/resources/11/librispeech-lexicon.txt
+
+.. _`LibriSpeech data set`: https://www.dropbox.com/s/i08yunn7yqnbv0h/LibriSpeech.zip?dl=0
+
+*******
+Example
+*******
+
+This example for aligning the LibriSpeech test data set assumes that
+the Montreal Forced Aligner is has been downloaded and works.
+
+Set up
+======
+
+1. Download the prepared LibriSpeech dataset (`LibriSpeech data set`_) and extract it somewhere on your computer.
+2. Download the LibriSpeech lexicon (`LibriSpeech lexicon`_) and save it somewhere on your computer.
+
+
+Alignment
+=========
+
+Aligning using pre-trained models
+---------------------------------
+
+Enter the following command into the terminal:
+
+.. code-block:: bash
+
+   bin/mfa_align --english /path/to/librispeech/dataset ~/Documents/aligned_librispeech
+
+Aligning through training
+-------------------------
+
+Enter the following command into the terminal:
+
+.. code-block:: bash
+
+   bin/mfa_train_and_align  /path/to/librispeech/dataset /path/to/librispeech/lexicon ~/Documents/aligned_librispeech
diff --git a/docs/source/index.rst b/docs/source/index.rst
@@ -11,6 +11,7 @@ Contents:
 .. toctree::
    :maxdepth: 2
 
+   introduction.rst
    installation.rst
    tutorial.rst
    commonerrors.rst

diff --git a/docs/source/installation.rst b/docs/source/installation.rst
@@ -1,12 +1,69 @@
-.. Montreal Forced Aligner documentation master file, created by
-   sphinx-quickstart on Wed Jun 15 13:27:38 2016.
-   You can adapt this file completely to your liking, but it should at least
-   contain the root `toctree` directive.
+.. _installation:
 
+.. _`Montreal Forced Aligner releases`: https://github.com/MontrealCorpusTools/Montreal-Forced-Aligner/releases
+
+.. _`Kaldi GitHub repository`: https://github.com/kaldi-asr/kaldi
+
+************
 Installation
-===================================================
+************
+
+All releases for the Montreal Forced Aligner are available on
+`Montreal Forced Aligner releases`_.
+
+Mac
+===
+
+1. Download the zip folder for Mac and unzip the folder to anywhere
+2. Open a terminal window
+3. Navigate to the ``montreal-forced-aligner`` folder (``cd /path/to/montreal-forced-aligner``)
+4. Test the commands ``bin/mfa_align`` and ``bin/mfa_train_and_align``
+5. The above commands should print usage messages about the commands
+
+Windows
+=======
+
+1. Download the zip folder for Windows and unzip the folder to anywhere
+2. Open a command window (Open the Start menu and search for ``cmd``)
+3. Navigate to the ``montreal-forced-aligner`` folder (``cd C:\path\to\montreal-forced-aligner``,
+   you can copy the path of it by holding Shift and right clicking on the folder
+   and selecting "Copy as path" and pasting it into the command prompt)
+4. Test the commands ``bin/mfa_align`` and ``bin/mfa_train_and_align``
+5. The above commands should print usage messages about the commands
+
+Linux
+=====
+
+The Linux distributions were built on Ubuntu 14.04, and so may not work on
+machines that have older versions of Linux system packages.  If these instructions
+do not work, then the executables will have to be built from source.
+
+1. Download the zip folder for Linux and unzip the folder to anywhere
+2. Open a terminal window
+3. Navigate to the ``montreal-forced-aligner`` folder (``cd /path/to/montreal-forced-aligner``)
+4. Test the commands ``bin/mfa_align`` and ``bin/mfa_train_and_align``
+5. The above commands should print usage messages about the commands
+
+Building from source
+====================
+
+NB: These instructions require Python 3 (and you may have to replace
+instances of ``python`` and ``pip`` with ``python3`` and ``pip3`` if Python 3 is
+not your default Python) and assume Linux in the commands.
+
+1. Get kaldi compiled and working: `Kaldi GitHub repository`_
+2. Download the source zip from the releases page.
+3. Open a terminal and go to the unzipped folder (``cd /path/to/Montreal-Forced-Aligner/thirdparty``)
+4. Run the ``thirdparty/kaldibinaries.py`` script point it to where Kaldi was built (``python thirdparty/kaldibinaries.py /path/to/kaldi/root``)
+5. Run ``pip install -r requirements.txt`` to install the requirements for the aligner.
+6. Build the executable by doing ``. freezing/freeze.sh`` and there will be a ``montreal-forced-aligner`` folder in the dist/ folder.
+7. This folder should contain two executables ``mfa_align`` and ``mfa_train_and_align`` that should be used for alignment.
 
-Download the montreal-forced-aligner folder, and save it wherever you want to run the aligner from.  Nothing else is required to be able to run the aligner.
+Files created when using the Montreal Forced Aligner
+====================================================
 
-The aligner will save data and logs for the models it trains in a new folder, Documents/MFA (which it creates).  If a model for a corpus folder already exists in MFA, it will use the existing model if you try to align it again.  (If this is not desired, delete or move the old model folder.)
+The aligner will save data and logs for the models it trains in a new folder,
+``Documents/MFA`` (which it creates in your user's home directory).  If a model for a corpus already
+exists in MFA, it will use any existing models if you try to align it again.
+(If this is not desired, delete or move the old model folder.)
 
diff --git a/docs/source/introduction.rst b/docs/source/introduction.rst
@@ -0,0 +1,73 @@
+.. _introduction:
+
+.. _`Kaldi homepage`: http://kaldi-asr.org/
+
+.. _`HTK homepage`: http://htk.eng.cam.ac.uk/
+
+.. _`Prosodylab-aligner homepage`: http://prosodylab.org/tools/aligner/
+
+.. _`P2FA homepage`: https://www.ling.upenn.edu/phonetics/old_website_2015/p2fa/
+
+.. _`FAVE-align homepage`: http://fave.ling.upenn.edu/FAAValign.html
+
+.. _`MAUS homepage`: http://www.bas.uni-muenchen.de/Bas/BasMAUS.html
+
+.. _`Praat homepage`: http://www.fon.hum.uva.nl/praat/
+
+.. _`EasyAlign homepage`: http://latlcui.unige.ch/phonetique/easyalign.php
+
+Introduction
+============
+
+What is forced alignment?
+-------------------------
+
+Forced alignment is a technique to take an orthographic transcription of
+an audio file and generate a time-aligned version using a pronunciation
+dictionary to look up phones for words.
+
+Underlying technology
+---------------------
+
+The Montreal Forced Aligner uses the Kaldi ASR toolkit
+(`Kaldi homepage`_) to perform forced alignment.
+Kaldi is under active development and uses modern ASR and includes state-of-the-art algorithms for tasks
+in automatic speech recognition beyond forced alignment.
+
+Relation to other forced alignment tools
+----------------------------------------
+
+Most tools for forced alignment used by linguists rely on the HMM Toolkit
+(HTK; `HTK homepage`_), including:
+
+* Prosodylab-aligner (`Prosodylab-aligner homepage`_)
+* Penn Phonetics Forced Aligner (P2FA, `P2FA homepage`_)
+* FAVE-align (`FAVE-align homepage`_)
+* (Web) MAUS(`MAUS homepage`_)
+
+Praat (`Praat homepage`_)
+has a built-in aligner as well.
+EasyAlign (`EasyAlign homepage`_)
+is a Praat plug-in built to facilitate its use.
+
+
+
+
+Contributors
+------------
+
+* Michael McAuliffe
+* Michaela Socolof
+* Sarah Mihuc
+* Michael Wagner
+
+Citation
+--------
+
+McAuliffe, Michael, Michaela Socolof, Sarah Mihuc, and Michael Wagner (2016).
+Montreal Forced Aligner [Computer program]. Version 0.5,
+retrieved 13 July 2016 from http://montrealcorpustools.github.io/Montreal-Forced-Aligner/.
+
+Funding
+-------
+
diff --git a/docs/source/tutorial.rst b/docs/source/tutorial.rst
@@ -1,32 +1,151 @@
-.. Montreal Forced Aligner documentation master file, created by
-   sphinx-quickstart on Wed Jun 15 13:27:38 2016.
-   You can adapt this file completely to your liking, but it should at least
-   contain the root `toctree` directive.
+.. _tutorial:
 
+.. _`LibriSpeech lexicon`: http://www.openslr.org/resources/11/librispeech-lexicon.txt
+
+.. _`LibriSpeech corpus`: http://www.openslr.org/12/
+
+.. _`CMU Pronouncing Dictionary`: http://www.speech.cs.cmu.edu/cgi-bin/cmudict
+
+.. _`Prosodylab-aligner English dictionary`: https://github.com/prosodylab/Prosodylab-Aligner/blob/master/eng.dict
+
+.. _`Prosodylab-aligner French dictionary`: https://github.com/prosodylab/prosodylab-alignermodels/blob/master/FrenchQuEu/fr-QuEu.dict
+
+********
 Tutorial
-===================================================
+********
+
+There are two modes for the Montreal Forced Aligner:
+
+1. Use a pretrained model to align a data set (``mfa_align``)
+
+2. Align a data set using only that data set (``mfa_train_and_align``) and
+   optionally output the trained model for future use
+
+The Montreal Forced Aligner supports two data formats:
+
+1. Prosodylab-Aligner format (single channel sound files and corresponding orthographic
+   transcriptions in .lab files with speaker designations specified)
+
+2. Textgrid format (mono/stereo sound files and corresponding TextGrids where
+   each speaker has a tier and each interval contains the orthographic
+   transcription)
+
+Dictionaries
+============
+
+Dictionaries should be specified in the following format:
+
+::
+
+  WORDA PHONEA PHONEB
+  WORDB PHONEB PHONEC
+
+Where each line is a word with a transcription separated by white space.
+Each phone should be separated by white space as well.
+
+A dictionary for English that has good coverage is the lexicon derived
+from the LibriSpeech corpus (`LibriSpeech lexicon`_).
+This lexicon uses the Arpabet transcription format (like the `CMU Pronouncing Dictionary`_).
+
+There is an option when running the aligner for not using a dictionary (`--nodict`).
+When run in this mode, the aligner will construct pronunciations for words
+in the corpus based off their orthographies.  In this mode, a dataset with an example transcription
+
+::
+
+  WORDA WORDB
+
+for a sound file would have the following dictionary generated:
+
+::
+
+  WORDA W O R D A
+  WORDB W O R D B
+
+The Prosodylab-aligner has two preconstructed dictionaries as well, one
+for English (`Prosodylab-aligner English dictionary`_)
+and one for Quebec French (`Prosodylab-aligner French dictionary`_)
+
+Data formats
+============
+
+Prosodylab-Aligner format
+-------------------------
 
 Things you need before you can align:
 
-1. Every .wav sound file you are aligning must have a corresponding .lab file which contains the text transcription of that .wav file.  The .wav and .lab files must have the same name. For example, if you have givrep_1027_2_1.wav, its transcription should be in givrep_1027_2_1.lab (which is just a text file with the .lab extension). If you have transcriptions in a tab-separated text file (or an Excel file which can be saved as one), you can generate .lab files from it using the relabel function of relabel_clean.py. The relabel_clean.py script is currently in the prosodylab.alignertools repository on GitHub.
+1. Every .wav sound file you are aligning must have a corresponding .lab
+   file which contains the text transcription of that .wav file.  The .wav and
+   .lab files must have the same name. For example, if you have ``givrep_1027_2_1.wav``,
+   its transcription should be in ``givrep_1027_2_1.lab`` (which is just a
+   text file with the .lab extension). If you have transcriptions in a
+   tab-separated text file (or an Excel file which can be saved as one),
+   you can generate .lab files from it using the relabel function of relabel_clean.py.
+   The relabel_clean.py script is currently in the prosodylab.alignertools repository on GitHub.
+
+2. These .lab files do not have be in the same case as the words in the dictionary
+   (i.e. all words are coerced to lower case), and punctuation is ignored.
+
+3. You also need a pronunciation dictionary for the language you're
+   aligning.  Our dictionaries for English and French are provided with
+   the old Prosodylab Aligner (French is in prosodylab.alignermodels).
+   You can also write your own dictionary or download others.
+
 
-2. These .lab files must be in the same format as the words in the dictionary (i.e. all capitalized for our dictionaries), and should ideally contain no punctuation.  (The aligner deals with punctuation for you.)  If your .lab files aren't in the correct format, you can use our relabel_clean.py script to clean your .lab files - this puts them into the correct format to work with our dictionaries.
+TextGrid format
+---------------
 
-3. You also need a pronunciation dictionary for the language you're aligning.  Our dictionaries for English and French are provided with the old Prosodylab Aligner (French is in prosodylab.alignermodels).  You can also write your own dictionary or download others.
+
+
+Running the aligner
+===================
+
+Align using pretrained models
+-----------------------------
+
+The Montreal Forced Aligner comes with pretrained models/dictionaries for:
+
+- English - trained from the LibriSpeech data set (`LibriSpeech corpus`_)
+- Quebec French
+
+Steps to align:
+
+
+
+Align using only the data set
+-----------------------------
 
 Steps to align:
 
 1. Open terminal, and change directory to montreal-forced-aligner.
 
-2. type ./montreal-forced-aligner followed by the arguments described above in Usage.  (On Mac/Unix, to save time typing out the path, you can drag a folder from Finder into Terminal and it will put the full path to that folder into your command.)
-    A template command:
-    ./montreal-forced-aligner -s [#] [corpus-folder] [dictionary] [output-folder]
-    This command will train a new model and align the files in [corpus-folder] using the file [dictionary], and save the output TextGrids to [output-folder].  It will take the first [#] characters of the file name to be the speaker ID number.
-
-    An example command: 
-    ./montreal-forced-aligner -s 7 ~/2_French_training ~/French/fr-QuEu.dict ~/2_French_training -f -v
-    This command will train a new model and align the files in ~/2_French_training using the dictionary file ~/French/fr-QuEu.dict, and save the output TextGrids to ~/2_French_training.  It will take the first 7 characters of the file name to be the speaker ID number.  It will be fast (do half as many training iterations) and verbose (output more info to Terminal during training).
+2. Type ``bin/mfa_train_and_align`` followed by the arguments described
+   above in Usage.  (On Mac/Unix, to save time typing out the path, you
+   can drag a folder from Finder into Terminal and it will put the full
+   path to that folder into your command.)
+
+
+A template command:
+
+.. code-block:: bash
+
+   bin/mfa_train_and_align -s [#] [corpus-folder] [dictionary] [output-folder]
+
+This command will train a new model and align the files in [corpus-folder]
+using the file [dictionary], and save the output TextGrids to [output-folder].
+It will take the first [#] characters of the file name to be the speaker ID number.
+
+An example command:
+
+.. code-block:: bash
 
-3. Once the aligner finishes, the resulting TextGrids will be in the specified output directory.  Training can take a couple hours for large datasets.
+   bin/mfa_train_and_align -s 7 ~/2_French_training ~/French/fr-QuEu.dict ~/2_French_training -f -v
 
+This command will train a new model and align the files in ``~/2_French_training``
+using the dictionary file ``~/French/fr-QuEu.dict``, and save the output
+TextGrids to ``~/2_French_training``.  It will take the first 7 characters
+of the file name to be the speaker ID number.  It will be fast (do half
+as many training iterations) and verbose (output more info to Terminal during training).
 
+3. Once the aligner finishes, the resulting TextGrids will be in the
+   specified output directory.  Training can take a couple hours for large datasets.
diff --git a/freezing/freeze.bat b/freezing/freeze.bat
@@ -0,0 +1,9 @@
+
+pyinstaller --clean -y ^
+--additional-hooks-dir=freezing/hooks ^
+aligner/command_line/train_and_align.py
+
+pyinstaller --clean -y ^
+--additional-hooks-dir=freezing/hooks ^
+aligner/command_line/align.py
+