You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Feb 9, 2023. It is now read-only.
My primary goal was slightly different. I just wanted to provide the good and open-sourced Polish ASR. I tried to experiment with the Mozilla DeepSpeech, Kaldi, etc. there are several attempts, but well ... They are overcomplicated and too specific for further research. I decided to build this little package from scratch.
OK, and back to the question. To make this package more general, I had to adjust my aim and provide the English model. I plan to train a model from the very beginning, but for now, I adapted the English model from the Seq2Seq repository (here the NVIDIA documentation, and the configuration file where you can find detailed information here and the my model adaptation file - It should be compatible what we have here)
I do not want to stuck with CTC based models. In the next months, I will do the second version of this package, where I introduce the Transformer based English ASR (I am quite fascinated about NLP in general, check out my new repo: Aspect Based Sentiment Analysis).
ps. The presented result is for the greedy decoder. In my opinion, the sophisticated decoding algorithms are old-fashioned, crude... isn't it? ;)
Hi @rolczynski
I am experimenting with your code and would like to know how to repeat benchmark results from the Table?
Is it the pipeline from readme? With 25epoch and batch size of 32? How many gpu-s did you use (4x8 I guess)?
Dataset should be full librispeech.
Was data augmentation used?
Does the code support decoding on the whole dev-clean subset?
The text was updated successfully, but these errors were encountered: