Replies: 6 comments
-
Beta Was this translation helpful? Give feedback.
-
>>> rbewoor |
Beta Was this translation helpful? Give feedback.
-
Beta Was this translation helpful? Give feedback.
-
>>> rbewoor |
Beta Was this translation helpful? Give feedback.
-
Beta Was this translation helpful? Give feedback.
-
>>> rbewoor |
Beta Was this translation helpful? Give feedback.
-
>>> rbewoor
[August 20, 2019, 11:03pm]
Hi All, slash
I am trying to train and use a model for English from scratch on version
0.5.1. My aim is to train two models, one with and another without a
language model. Request your help on several fronts please. Sorry this
is long but trying be as detailed as possible; and also, being new to
Linux and data-science I may be stating some very obvious things. slash
Thank you in advance for your help.
Part A) My Questions slash
Part B) Background info slash
Regards, slash
Rohit
Part A) My Questions
A1) When using a language model either for training or inference, do I
HAVE to specify the lm_binary parameter AND the corresponding trie file?
Can using only the lm_binary or trie parameter work?
A2) Say I train two models on same data. For first model with an LM
specified (built using KenLM library on the vocabulary of transcripts
used for training data, and specifying lm_binary and trie parameters).
The second model is trained without any LM parameters. Later I use each
of these models for inference. Can I choose to use OR not use a language
model during the inference stage? Can a different language model be used
during inference or should one use the same LM used in training? Are
there things to note while choosing an alternative model? E.g. training
using a 3-gram model but using a 4-gram model during inference? etc...
A3) I am facing a problem when I try to use a different LM from the one
used during training. My model is trained with only 1k data points. The
LM used was built using same 1k data points as vocabulary and a 4-gram
lm_binary and trie was specified during training.
Inference works but is understandably very poor. Console Output:
> (dpsp5v051basic) rohit slash DE-W-0246802: slash ~/dpspCODE/v051/DeepSpeech slash $
> deepspeech slash
> --model
> /home/rohit/dpspTraining/models/v051/model8-validFirst1k-yesLM-4gram/savedModel/output_graph.pb slash
> --alphabet
> /home/rohit/dpspTraining/data/wavFiles/commVoiceSet5-1kTotal/alphabetDir/alphabet-Set5First1050.txt slash
> --lm
> /home/rohit/dpspTraining/data/wavFiles/commVoiceSet5-1kTotal/lm/lm4gram/vocabulary-Set5First1050_4gram.klm slash
> --trie
> /home/rohit/dpspTraining/data/wavFiles/commVoiceSet5-1kTotal/trie/trie4gram/Set5First1050_4gram.trie slash
> --audio /home/rohit/dpspTraining/data/wavFiles/wav33/test/File28.wav slash
> Loading model from file
> /home/rohit/dpspTraining/models/v051/model8-validFirst1k-yesLM-4gram/savedModel/output_graph.pb slash
> TensorFlow: v1.13.1-10-g3e0cc53 slash
> DeepSpeech: v0.5.1-0-g4b29b78 slash
> Warning: reading entire model file into memory. Transform model file
> into an mmapped graph to reduce heap usage. slash
> 2019-08-01 16:11:02.155443: I
> tensorflow/core/platform/cpu_feature_guard.cc:141 slash ] Your CPU supports
> instructions that this TensorFlow binary was not compiled to use: AVX2
> FMA slash
> 2019-08-01 16:11:02.179690: E
> tensorflow/core/framework/op_kernel.cc:1325 slash ] OpKernel ('op:
> 'UnwrapDatasetVariant' device_type: 'CPU'') for unknown op:
> UnwrapDatasetVariant slash
> 2019-08-01 16:11:02.179740: E
> tensorflow/core/framework/op_kernel.cc:1325 slash ] OpKernel ('op:
> 'WrapDatasetVariant' device_type: 'GPU' host_memory_arg:
> 'input_handle' host_memory_arg: 'output_handle'') for unknown op:
> WrapDatasetVariant slash
> 2019-08-01 16:11:02.179756: E
> tensorflow/core/framework/op_kernel.cc:1325 slash ] OpKernel ('op:
> 'WrapDatasetVariant' device_type: 'CPU'') for unknown op:
> WrapDatasetVariant slash
> 2019-08-01 16:11:02.179891: E
> tensorflow/core/framework/op_kernel.cc:1325 slash ] OpKernel ('op:
> 'UnwrapDatasetVariant' device_type: 'GPU' host_memory_arg:
> 'input_handle' host_memory_arg: 'output_handle'') for unknown op:
> UnwrapDatasetVariant slash
> Loaded model in 0.0283s. slash
> Loading language model from files
> /home/rohit/dpspTraining/data/wavFiles/commVoiceSet5-1kTotal/lm/lm4gram/vocabulary-Set5First1050_4gram.klm
> /home/rohit/dpspTraining/data/wavFiles/commVoiceSet5-1kTotal/trie/trie4gram/Set5First1050_4gram.trie slash
> Loaded language model in 0.068s. slash
> Running inference. slash
> a on a in a is the slash
> Inference took 0.449s for 3.041s audio file.
Now I want to use an LM created from a larger vocabulary file of say
600k data points (transcripts), which in this case does include the 1k
wav files that were used as data for training. This is from the
validated.tsv file of the CommonVoice2 corpus. I have double checked
that the alphabet.txt for the first 1k data points vocabulary and the
larger 600k vocabulary are identical. Also I have created the lm_binary
and trie files (allValidated_o4gram.klm, allValidated_o4gram.trie) as
4-gram versions. Thus basic specs of the LM match the one used for
training. slash
But while using the larger LM during inference I get an error saying
'Error: Trie file version mismatch (4 instead of expected 3). Update
your trie file.'. Is it still loading the larger LM? Did Deepspeech
actually pick it up and apply it correctly? How do I fix this error
please?
Console output:
> (dpsp5v051basic) rohit slash DE-W-0246802: slash ~/dpspCODE/v051/DeepSpeech slash $
> deepspeech slash
> --model
> /home/rohit/dpspTraining/models/v051/model8-validFirst1k-yesLM-4gram/savedModel/output_graph.pb slash
> --alphabet
> /home/rohit/dpspTraining/data/wavFiles/commVoiceSet5-1kTotal/alphabetDir/alphabet-Set5First1050.txt slash
> --lm
> /home/rohit/dpspTraining/data/wavFiles/testVocabAllValidated/lm/lm4gram/vocabulary-allValidated_o4gram.klm slash
> --trie
> /home/rohit/dpspTraining/data/wavFiles/testVocabAllValidated/trie/trie4gram/allValidated_o4gram.trie slash
> --audio /home/rohit/dpspTraining/data/wavFiles/wav33/test/File28.wav slash
> Loading model from file
> /home/rohit/dpspTraining/models/v051/model8-validFirst1k-yesLM-4gram/savedModel/output_graph.pb slash
> TensorFlow: v1.13.1-10-g3e0cc53 slash
> DeepSpeech: v0.5.1-0-g4b29b78 slash
> Warning: reading entire model file into memory. Transform model file
> into an mmapped graph to reduce heap usage. slash
> 2019-08-01 16:11:58.305524: I
> tensorflow/core/platform/cpu_feature_guard.cc:141 slash ] Your CPU supports
> instructions that this TensorFlow binary was not compiled to use: AVX2
> FMA slash
> 2019-08-01 16:11:58.322902: E
> tensorflow/core/framework/op_kernel.cc:1325 slash ] OpKernel ('op:
> 'UnwrapDatasetVariant' device_type: 'CPU'') for unknown op:
> UnwrapDatasetVariant slash
> 2019-08-01 16:11:58.322945: E
> tensorflow/core/framework/op_kernel.cc:1325 slash ] OpKernel ('op:
> 'WrapDatasetVariant' device_type: 'GPU' host_memory_arg:
> 'input_handle' host_memory_arg: 'output_handle'') for unknown op:
> WrapDatasetVariant slash
> 2019-08-01 16:11:58.322956: E
> tensorflow/core/framework/op_kernel.cc:1325 slash ] OpKernel ('op:
> 'WrapDatasetVariant' device_type: 'CPU'') for unknown op:
> WrapDatasetVariant slash
> 2019-08-01 16:11:58.323063: E
> tensorflow/core/framework/op_kernel.cc:1325 slash ] OpKernel ('op:
> 'UnwrapDatasetVariant' device_type: 'GPU' host_memory_arg:
> 'input_handle' host_memory_arg: 'output_handle'') for unknown op:
> UnwrapDatasetVariant slash
> Loaded model in 0.0199s. slash
> Loading language model from files
> /home/rohit/dpspTraining/data/wavFiles/testVocabAllValidated/lm/lm4gram/vocabulary-allValidated_o4gram.klm
> /home/rohit/dpspTraining/data/wavFiles/testVocabAllValidated/trie/trie4gram/allValidated_o4gram.trie slash
> Error: Trie file version mismatch (4 instead of expected 3). Update
> your trie file. slash
> Loaded language model in 0.00368s. slash
> Running inference. slash
> an on o tn o as te tee slash
> Inference took 1.893s for 3.041s audio file.
Note that the input audio is same File28.wav but the output transcript
varies with different LMs:
> a on a in a is the (smaller LM used in training and inference) vs slash
> an on o tn o as te tee (using different larger LM for inference
> only)
A) Background:
A1) Ubuntu 18.04LTS, no GPU, 32GB ram, Deespeech v0.5.1 git repo.
mid-June 2019.
and pruned dataset to 629731 entries.
train:dev:test and created csv files.
10 seconds. slash
Setup Anaconda environment with Deepspeech v0.5.1.
to create the generate_trie executable and other required setup: slash
python util/taskcluster.py --target .
python util/taskcluster.py --decoder
A2) Language model related:
create 4-gram version:
> ./lmplz -o 4 slash --text
> /home/rohit/dpspTraining/data/wavFiles/commVoiceSet3-10kTotal/vocabDir/vocabulary-Set3First10k.txt
> slash --arpa
> /home/rohit/dpspTraining/data/wavFiles/commVoiceSet3-10kTotal/vocabDir/vocabulary-Set3First10k_4gram.arpa
> ./build_binary
> /home/rohit/dpspTraining/data/wavFiles/commVoiceSet3-10kTotal/vocabDir/vocabulary-Set3First10k_4gram.arpa
> /home/rohit/dpspTraining/data/wavFiles/commVoiceSet3-10kTotal/lm/lm4gram/vocabulary-Set3First10k_4gram.klm
> /home/rohit/dpspCODE/v051/DeepSpeech/generate_trie
> /home/rohit/dpspTraining/data/wavFiles/commVoiceSet3-10kTotal/alphabetDir/alphabet-Set3First10k.txt
> /home/rohit/dpspTraining/data/wavFiles/commVoiceSet3-10kTotal/lm/lm4gram/vocabulary-Set3First10k_4gram.klm
> /home/rohit/dpspTraining/data/wavFiles/commVoiceSet3-10kTotal/trie/trie4gram/set3First10k_4gram.trie
training.
A3) Commands to start model training (training in progress still):
A3a) Model without language model:
> python3 -u DeepSpeech.py slash
> --train_files
> /home/rohit/dpspTraining/data/wavFiles/commVoiceSet3-10kTotal/csvFiles/train.csv slash
> --dev_files
> /home/rohit/dpspTraining/data/wavFiles/commVoiceSet3-10kTotal/csvFiles/dev.csv slash
> --test_files
> /home/rohit/dpspTraining/data/wavFiles/commVoiceSet3-10kTotal/csvFiles/test.csv slash
> --train_batch_size 1 slash
> --dev_batch_size 1 slash
> --test_batch_size 1 slash
> --n_hidden 2048 slash
> --epoch 20 slash
> --dropout_rate 0.15 slash
> --learning_rate 0.0001 slash
> --export_dir
> /home/rohit/dpspTraining/models/v051/model5-validFirst10k-noLM/savedModel slash
> --checkpoint_dir
> /home/rohit/dpspTraining/models/v051/model5-validFirst10k-noLM/checkpointDir slash
> --alphabet_config_path
> /home/rohit/dpspTraining/data/wavFiles/commVoiceSet3-10kTotal/alphabetDir/alphabet-Set3First10k.txt slash
> ' slash $'
A3b) Model with Language model:
> python3 -u DeepSpeech.py slash
> --train_files
> /home/rohit/dpspTraining/data/wavFiles/commVoiceSet3-10kTotal/csvFiles/train.csv slash
> --dev_files
> /home/rohit/dpspTraining/data/wavFiles/commVoiceSet3-10kTotal/csvFiles/dev.csv slash
> --test_files
> /home/rohit/dpspTraining/data/wavFiles/commVoiceSet3-10kTotal/csvFiles/test.csv slash
> --train_batch_size 1 slash
> --dev_batch_size 1 slash
> --test_batch_size 1 slash
> --n_hidden 2048 slash
> --epoch 20 slash
> --dropout_rate 0.15 slash
> --learning_rate 0.0001 slash
> --export_dir
> /home/rohit/dpspTraining/models/v051/model6-validFirst10k-yesLM-4gram/savedModel slash
> --checkpoint_dir
> /home/rohit/dpspTraining/models/v051/model6-validFirst10k-yesLM-4gram/checkpointDir slash
> --decoder_library_path
> /home/rohit/dpspCODE/v051/DeepSpeech/native_client/libctc_decoder_with_kenlm.so slash
> --alphabet_config_path
> /home/rohit/dpspTraining/data/wavFiles/commVoiceSet3-10kTotal/alphabetDir/alphabet-Set3First10k.txt slash
> --lm_binary_path
> /home/rohit/dpspTraining/data/wavFiles/commVoiceSet3-10kTotal/lm/lm4gram/vocabulary-Set3First10k_4gram.klm slash
> --lm_trie_path
> /home/rohit/dpspTraining/data/wavFiles/commVoiceSet3-10kTotal/trie/trie4gram/set3First10k_4gram.trie slash
> ' slash $'
Thank you for your time! Regards.
[Using Deep Speech
[This is an archived TTS discussion thread from discourse.mozilla.org/t/language-model-during-training-effect]
Beta Was this translation helpful? Give feedback.
All reactions