Add Phoneme labels and timestamps - take two #1377

madhephaestus · 2023-06-01T01:19:59Z

The first PR seems to have died. rutujaubale Made the original effort to add the feature. Nathravorn fixed the build in their branch. I am now making a new PR to get this feature merged in.

this PR replaces #528

and closes #687 with a solution

… to disable MBR to generate phone outputs

…to output phone results

…_options for better consistency

…er result

Shallowmallow · 2023-07-20T14:14:31Z

Really nice. But it doesn't seem to work when you use alternatives ? It would be really cool if it was the case :)

tobiasalanboyd · 2024-10-09T16:12:38Z

Hello! I am trying to make a version of test_microphone.py that recognizes phonemes rather than words/sentences. However, I am struggling to figure out what the python equivalent would be to
vosk_recognizer_set_result_options(recognizer, "phones");
from test_phone_results.c
I thought perhaps that would be
SetResultOptions(rec, "phones")
but when I add this line I get the message that SetResultOptions is not defined.
Apologies if the answer to this is obvious, I am new to working with this type of code. Thank you in advance!

madhephaestus · 2024-10-09T17:39:50Z

it looks like the method would be SetResultOptions(self, options), so no need to pass in an instance of the recognizer since that seems to be a private class variable not a parameter in the Python API.

tobiasalanboyd · 2024-10-09T18:03:44Z

Thanks for getting back to me! I have tried pretty much every variation on the above that I can think of, and am not sure if the issue is due to me being new to Python or if there's something else happening here.
All of the below examples were inserted below this line in my copy of test_microphone.py:
rec = KaldiRecognizer(model, args.samplerate)
Examples of what I have tried adding so far with no success:
SetResultOptions("phones")
SetResultOptions(rec, "phones")
SetResultOptions(rec._handle, "phones")
rec.SetResultOptions("phones")
rec._handle.SetResultOptions("phones")
rec.SetResultOptions(rec, "phones")
rec.SetResultOptions(rec._handle, "phones")
rec.SetResultOptions()
rec._handle.SetResultOptions()

If this is helpful to know, I am running the program in CMD with the following command:
C:\Users\myusername\vosk-api\python\example>py .\test_microphone_phon.py

EDIT: Before realizing that regular Vosk would not provide individual phonemes, I installed it via pypi - is it possible this is contributing to the difficulties?

321Proteus · 2024-12-03T20:24:06Z

Hello, I'm currently trying to build your version of Vosk, but I keep getting the same error as in #1082 :

recognizer.cc: In member function ‘const char* Recognizer::PartialResult()’:
recognizer.cc:855:13: error: ‘WordAlignLatticePartial’ was not declared in this scope
  855 |             WordAlignLatticePartial(clat, *model_->trans_model_, *model_->winfo_, 0, &aligned_lat);
      |             ^~~~~~~~~~~~~~~~~~~~~~~

I'm using the AlphaCephei branch of Kaldi (with OpenFST 1.7.2, tried also 1.8.3 from Kaldi-ASR with the same result). Any idea what's going on?

nshmyrev · 2024-12-04T20:44:17Z

I'm using the AlphaCephei branch of Kaldi

WordAlignLatticePartial is there. Probably you are using some old version. Please recheck.

321Proteus · 2024-12-07T09:24:13Z

OK, I did it. I redownloaded the Dockerfile and ran it on my machine instead of building everything locally (normally I'd just build Kaldi and OpenFST using Docker, then copy them to local and build Vosk from there). Now everything compiles just fine!

rutujaubale and others added 13 commits November 8, 2021 13:23

Added script to compute phoneme labels and timestamps

df7b238

Removed some debugging changes

4f1cbbb

Added handling for sizeable silences without phone outputs and option…

fb0d25d

… to disable MBR to generate phone outputs

Added an example script of how to set the recognition result options …

edf0dca

…to output phone results

Renamed vosk_recognizer_set_result_opts to vosk_recognizer_set_result…

7c0b55b

…_options for better consistency

Updated Makefile to include compiling test_phone_results.c

9f438fe

Merge phoneInfo fork

83ec83b

Include lat/lattice-functions-transition-model.h in recognizer.h

b118e2e

Remove obsolete KaldiRecognizer left over from merge

2075b2e

Add SetResultOptions to python bindings

42bd029

Change result_opts_ from cstring to std::string

91425f6

Fix result_opts_ not being taken into account when computing recogniz…

7ddf77d

…er result

Update README.md

ad3772a

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Phoneme labels and timestamps - take two #1377

Add Phoneme labels and timestamps - take two #1377

madhephaestus commented Jun 1, 2023 •

edited

Loading

Shallowmallow commented Jul 20, 2023 •

edited

Loading

tobiasalanboyd commented Oct 9, 2024

madhephaestus commented Oct 9, 2024

tobiasalanboyd commented Oct 9, 2024 •

edited

Loading

321Proteus commented Dec 3, 2024 •

edited

Loading

nshmyrev commented Dec 4, 2024

321Proteus commented Dec 7, 2024

Add Phoneme labels and timestamps - take two #1377

Are you sure you want to change the base?

Add Phoneme labels and timestamps - take two #1377

Conversation

madhephaestus commented Jun 1, 2023 • edited Loading

Shallowmallow commented Jul 20, 2023 • edited Loading

tobiasalanboyd commented Oct 9, 2024

madhephaestus commented Oct 9, 2024

tobiasalanboyd commented Oct 9, 2024 • edited Loading

321Proteus commented Dec 3, 2024 • edited Loading

nshmyrev commented Dec 4, 2024

321Proteus commented Dec 7, 2024

madhephaestus commented Jun 1, 2023 •

edited

Loading

Shallowmallow commented Jul 20, 2023 •

edited

Loading

tobiasalanboyd commented Oct 9, 2024 •

edited

Loading

321Proteus commented Dec 3, 2024 •

edited

Loading