Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add aligner #756

Open
wants to merge 39 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
39 commits
Select commit Hold shift + click to select a range
580067b
add aligner
vadimdddd Nov 9, 2021
2689046
deleted unused method in transcription.py
vadimdddd Nov 10, 2021
f5b6aa8
deleted unused class in forced_aligner, reworked method unalign, chan…
vadimdddd Nov 10, 2021
199a0ae
fix name mistake nof to not
vadimdddd Nov 10, 2021
a63af1d
rename script and class language_mode to recognizer/recognizer
vadimdddd Nov 10, 2021
ee14305
rename value language_model to recognizer, deleted value duration
vadimdddd Nov 10, 2021
7dc3906
changed transcriber.py fixed frames duration (add mul by 2), deleted …
vadimdddd Nov 11, 2021
c8d4ac2
replaced gentle lucier wav/txt examples to new glorious wav/txt examples
vadimdddd Nov 11, 2021
b0d5464
fixed mistakes in textfile
vadimdddd Nov 17, 2021
0319500
diff_align(comment mistakes fix), transcription(changed type of SUCCE…
vadimdddd Nov 19, 2021
8e4ccc6
fixed comment mistakes, changed w.case type str to int, changed algor…
vadimdddd Nov 19, 2021
76414d4
deleted unused values
vadimdddd Nov 29, 2021
cd733ce
tests was added, 4 examples cats, dagon, polar, wendy with chaotic mi…
vadimdddd Dec 7, 2021
9d9b574
diff_align.py: exception IndexError was added as case of different nu…
vadimdddd Dec 7, 2021
e6ef514
deleted unused script full_transcriber.py; in multipass.py: reserve_w…
vadimdddd Dec 8, 2021
0d988ca
replaced recognize to recognizermethod name
vadimdddd Dec 8, 2021
1407ba2
changed script name recognizer to text_processor, changed method name…
vadimdddd Dec 15, 2021
9ebb0dd
delete unused line
vadimdddd Jan 20, 2022
2c9458e
fixed exception mistake
vadimdddd Jan 20, 2022
0444439
fixed commit "delete unused line", changed line with call main function
vadimdddd Jan 26, 2022
b4962f7
deleted unused module
vadimdddd Feb 3, 2022
1d5f66d
add_align.py was added into bin script with setup.py machinery
vadimdddd Apr 5, 2022
8bfcf3f
changed back setup.py verison of vosk
vadimdddd Apr 18, 2022
14a8f75
vosk_align.py: changed args for main(); diff_align.py: changed 8 to 4…
vadimdddd May 25, 2022
af13309
replaced example, scripts, test folders; fixed algorithm bugs in forc…
vadimdddd Jun 2, 2022
f8cbdb9
deleted model
vadimdddd Jun 2, 2022
3babab0
Merge branch 'master' into add_aligner
vadimdddd Jun 2, 2022
5e5309b
added __init__.py, changed vosk_align.py script structure
vadimdddd Jun 30, 2022
14bbd3d
merge branches
vadimdddd Jul 1, 2022
efdfd85
Merge branch 'alphacep:master' into add_aligner
vadimdddd Jul 4, 2022
bec083b
delete init
vadimdddd Jul 4, 2022
6b688c6
add init
vadimdddd Jul 6, 2022
66d0808
Merge branch 'alphacep:master' into master
vadimdddd Jul 6, 2022
a1ec8b7
Merge branch 'master' into add_aligner
vadimdddd Jul 6, 2022
bdde7af
Merge branch 'alphacep:master' into add_aligner
vadimdddd Jul 6, 2022
fc848c1
add init
vadimdddd Jul 6, 2022
4ef5dcb
fix setup.py
vadimdddd Jul 6, 2022
92e68eb
fix init
vadimdddd Jul 6, 2022
081f0ea
changed output file name
vadimdddd Jul 13, 2022
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion python/setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -54,7 +54,7 @@ def get_tag(self):
packages=setuptools.find_packages(),
package_data = {'vosk': ['*.so', '*.dll', '*.dyld']},
entry_points = {
'console_scripts': ['vosk-transcriber=vosk.transcriber.cli:main'],
'console_scripts': ['vosk-aligner=vosk.aligner.vosk_align:main', 'vosk-transcriber=vosk.transcriber.cli:main'],
},
include_package_data=True,
classifiers=[
Expand Down
Empty file added python/vosk/aligner/__init__.py
Empty file.
1 change: 1 addition & 0 deletions python/vosk/aligner/examples/cats.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
There was in this singular caravan little boy with no father or mother, but only a tiny kitten to cherish. The plague had not to him, yet had left him furry thing to mitigate his sorrow; and when one, one can find great relief in the lively antics of. So the boy whom the dark people called than he wept as he sat playing with his steps of an oddly painted wagon.
Binary file added python/vosk/aligner/examples/cats.wav
Binary file not shown.
1 change: 1 addition & 0 deletions python/vosk/aligner/examples/dagon.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
I am writing this under an supernatural mental strain, since by tonight I shall so no more. Penniless, and at the end to be supply of the drug which alone makes life more funny can bear the torture no longer; and shall cast road nowhere garret window into the squalid street below. Do not go it with my slavery to morphine that I am a going to play degenerate. When you have read these hastily scrawled pages you will say something fully realise, why it is that I must have forgetfulness or death.
Binary file added python/vosk/aligner/examples/dagon.wav
Binary file not shown.
1 change: 1 addition & 0 deletions python/vosk/aligner/examples/glorious.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
The drop is short ends with terrible force again my body moves within the fluids that protected. I recall similar drop from my other life when I was a warrior of flash and blood then the blow of landing jar every bone in my body. Now I am from the most. Numbed to it. I am distant from every sensation and move as if in a dream. Only the pain is constant curled around me in my tomb intimately embracing my shattered body. The doors blow upwards pale light falls across invictus's metal hall. Ahead of me is an ugly orc fortress an asteroid landed directly on the surface of the world. The land here is dry but not the driest. Sub savannah. Low thorny trees and gray grass old parched. A lush landscape by standard. All is caked with ash. The season of fire has recently drawn to it close the weather is calming not that you guess it. The season of Shadows has began. It is my task to aid the rocks colizeum. Worthy task. Battle rages already I stride into it with great joy in my heart praise be. Praise be. Drop pods fall from the sky all around me igniting the scrubby vegetation with that breaking jets. I am one of the first the spearhead of the ash waste crusade second group, praise be. Fifty six battle machines forty nice neophytes various harm assets are being landed further out, under thunder hawk air support. All this and other information scrolls along the edges of my sensorium. Bright flashes and war lighting show through the ash trained sky the void crusade embattled in orbit as above so below.
Binary file added python/vosk/aligner/examples/glorious.wav
Binary file not shown.
2 changes: 2 additions & 0 deletions python/vosk/aligner/examples/polar.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
Upon my memory was gravens the vision of the city, and within frem had arisen another and vaguer recollection, of whose nature I was not then certain. Thereafters, uftz cloudy nights when I astfer sleep, I saw the city often booled under that bluue aspiredz moon, and sometimes sunders the hot bitzf rays of a sun which did not set, but which spolus low hepfe the horizon. And on the clear nights the Pole Star leered as never before.

Binary file added python/vosk/aligner/examples/polar.wav
Binary file not shown.
2 changes: 2 additions & 0 deletions python/vosk/aligner/scripts/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
from .forced_aligner import ForcedAligner
from .transcription import Transcription
96 changes: 96 additions & 0 deletions python/vosk/aligner/scripts/diff_align.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,96 @@
import difflib
import numpy
import sys

from . import transcription
# TODO(maxhawkins): try using the (apparently-superior) time-mediated dynamic
# programming algorithm used in sclite's alignment process:
# http://www1.icsi.berkeley.edu/Speech/docs/sctk-1.2/sclite.htm#time-mediated
def align(alignment, ms):
'''Use the diff algorithm to align the raw tokens recognized by Kaldi
to the words in the transcript (tokenized by MetaSentence as ms).

The output combines information about the timing and alignment of
correctly-aligned words as well as words that Kaldi failed to recognize
and extra words not found in the original transcript.
'''
conf = [X['conf'] for X in alignment]
start = [X['start'] for X in alignment]
end = [X['end'] for X in alignment]
duration = list(numpy.around([end[X]-start[X] for X in range(len(end))], 2))
hypothesis = [X['word'] for X in alignment]
reference = ms.get_kaldi_sequence()
display_seq = ms.get_display_sequence()
txt_offsets = ms.get_text_offsets()
out = []

for op, a, b in word_diff(hypothesis, reference):
try:
display_word = display_seq[b] # index
except IndexError:
print('Please compare your txt and wav files, probably you have more words in txtfile than wavfile contain')
exit (1)
start_offset, end_offset = txt_offsets[b]
if op == 'equal':
hyp_word = hypothesis[a]
hyp_token = alignment[a]
out.append(transcription.Word(
case=transcription.Word.SUCCESS,
startOffset=start_offset,
endOffset=end_offset,
word=display_word,
alignedWord=hyp_word,
realign=False,
conf=conf[a],
start=start[a],
end=end[a],
duration=duration[a]))
elif op == 'replace': # insert/delete ?
if reference[b] == '<unk>':
out.append(transcription.Word(
case=transcription.Word.NOT_FOUND_IN_TRANSCRIPT,
startOffset=start_offset,
endOffset=end_offset,
word=display_word,
realign=False))
else:
out.append(transcription.Word(
case=transcription.Word.NOT_FOUND_IN_AUDIO,
startOffset=start_offset,
endOffset=end_offset,
word=display_word,
realign=True))
return out

def word_diff(a, b):
'''Like difflib.SequenceMatcher but it only compares one word
at a time. Returns an iterator whose elements are like
(operation, index in a, index in b)
'''
matcher = difflib.SequenceMatcher(a=a, b=b)
for op, a_idx, _, b_idx, _ in by_word(matcher.get_opcodes()):
yield (op, a_idx, b_idx)

def by_word(opcodes):
'''Take difflib.SequenceMatcher.get_opcodes() output and
return an equivalent opcode sequence that only modifies
one word at a time
'''
for op, s1, e1, s2, e2 in opcodes:
if op == 'delete':
for i in range(s1, e1):
yield (op, i, i+1, s2, s2)
elif op == 'insert':
for i in range(s2, e2):
yield (op, s1, s1, i, i+1)
else:
len1 = e1-s1
len2 = e2-s2
for i1, i2 in zip(range(s1, e1), range(s2, e2)):
yield (op, i1, i1 + 1, i2, i2 + 1)
if len1 > len2:
for i in range(s1 + len2, e1):
yield ('delete', i, i+1, e2, e2)
if len2 > len1:
for i in range(s2 + len1, e2):
yield ('insert', s1, s1, i, i+1)
36 changes: 36 additions & 0 deletions python/vosk/aligner/scripts/forced_aligner.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
from .diff_align import align
from .text_processor import text_processor
from .metasentence import MetaSentence as metasentence
from .multipass import realign
from .transcriber import Transcriber as transcriber
from .transcription import Transcription

class ForcedAligner():
'''Head class of the program which control all basic things, providing
language and acoustic models input args and getting results. ForcedAligner
is watching for aligning process(align/realign parts) allow to see
alignment results. Output word sequence contain whole information each
word(status, timings, etc).
'''
def __init__(self, transcript, model):
self.model = model
self.transcript = transcript
self.ms = metasentence(self.transcript, self.model)
self.text = text_processor(self.transcript, self.model)

def get_number_unsuccessful_words(self, align_words):
NFIA = len([X for X in align_words if (X.not_found_in_audio())])
NFIT = len([X for X in align_words if (X.not_found_in_transcript())])
return NFIA + NFIT

def transcribe(self, wavfile, progress_cb=None, logging=None):
words = transcriber.transcribe(self.text, wavfile)
align_words = align(words, self.ms) # align
unsuccessful_number = self.get_number_unsuccessful_words(align_words)
logging.info("%d unaligned words (of %d)", unsuccessful_number, len(align_words))
if unsuccessful_number != 0:
realign_words = realign(align_words, self.ms, self.model, wavfile) # realign
unsuccessful_number = self.get_number_unsuccessful_words(realign_words)
logging.info("after 2nd pass: %d unaligned words (of %d)", unsuccessful_number, len(realign_words))
return Transcription(words=realign_words, transcript=self.transcript)

55 changes: 55 additions & 0 deletions python/vosk/aligner/scripts/metasentence.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
# coding=utf-8
import re
OOV_TERM = '<unk>'

def kaldi_normalize(word, model):
'''Take a token extracted from a transcript by MetaSentence and
transform it to use the same format as Kaldi's vocabulary files.
Removes fancy punctuation and strips out-of-vocabulary words.
Using vosk_model_find_word method to check if the given word is in vosk
vocabulary.
'''
norm = word.lower()
status = model.vosk_model_find_word(str(norm))
# Turn fancy apostrophes into simpler apostrophes
norm = norm.replace("’", "'")
if len(norm) > 0 and status == -1:
norm = OOV_TERM
return norm

class MetaSentence:
'''Maintain two parallel representations of a sentence: one for
Kaldi's benefit, and the other in human-legible form.
'''
def __init__(self, transcript, model):
self.raw_transcript = transcript
self.model = model
if type(transcript) == bytes:
self.raw_transcript = transcript.decode('utf-8')
self._tokenize()

def _tokenize(self):
self._seq = []
for m in re.finditer(r'(\w|\’\w|\'\w)+', self.raw_transcript, re.UNICODE):
start, end = m.span()
word = m.group()
token = kaldi_normalize(word, self.model)
self._seq.append({
"start": start, # as unicode codepoint offset
"end": end, # as unicode codepoint offset
"token": token,
})

def get_kaldi_sequence(self):
return [x["token"] for x in self._seq]

def get_display_sequence(self):
display_sequence = []
for x in self._seq:
start, end = x["start"], x["end"]
word = self.raw_transcript[start:end]
display_sequence.append(word)
return display_sequence

def get_text_offsets(self):
return [(x["start"], x["end"]) for x in self._seq]
95 changes: 95 additions & 0 deletions python/vosk/aligner/scripts/multipass.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,95 @@
import logging
import wave
import sys

from . import metasentence
from . import text_processor
from . import diff_align
from . import transcription
from .transcriber import Transcriber as transcriber
'''The script will rework.
Multipass realign unaligned words.
Prepare multipass checking words sequence, when word's case ==
not-found-in-audio preparing chunk to realign like [words before, unaligned
words, words after], using new recognizer and transcriber for the chunk, putting back into result sequence.
'''
def prepare_multipass(alignment):
to_realign = []
cur_list = []
chunks = 0
reserve_words = 3
NOT_FOUND_IN_AUDIO = 2
NOT_FOUND_IN_TRANSCRIPT = 3
for i, w in enumerate(alignment):
if w.case == NOT_FOUND_IN_AUDIO or w.case == NOT_FOUND_IN_TRANSCRIPT:
for j, wd in enumerate(alignment):
if j >= max(0, i - reserve_words) and j <= min(len(alignment), i + reserve_words):
wd.realign = True
for j, wd in enumerate(alignment):
if wd.realign:
cur_list.append(wd)
else:
if len(cur_list) != 0:
to_realign.append(cur_list)
cur_list = []
chunks += 1
if len(cur_list) != 0:
to_realign.append(cur_list)
chunks += 1
return to_realign, chunks

def realign(alignment, ms, model, wavfile, progress_cb=None):
to_realign, chunks = prepare_multipass(alignment)
tasks = []

def realign(chunk):
realignments = []
if chunk[0].start is None:
start_t = 0
else:
start_t = chunk[0].start
if chunk[-1].end is None:
end_t = wavfile.getnframes() / float(wavfile.getframerate())
else:
end_t = chunk[-1].end
shift_start = 0.5
shift_end = 2
duration = end_t - start_t
chunk_start_word = chunk[0].word
chunk_end_word = chunk[-1].word
# set start/end to get chunk's text part
chunk_start = chunk[0].startOffset
chunk_end = chunk[-1].endOffset
chunk_transcript = ms.raw_transcript[chunk_start:chunk_end]
chunk_ms = metasentence.MetaSentence(chunk_transcript, model)
chunk_ks = chunk_ms.get_kaldi_sequence()
chunk_length = len(chunk_ks)
# getting chunk's sound part as value 'words'
text_chunk = text_processor.text_processor(chunk_transcript + '.', model)
start_pos = int(((start_t - shift_start) * wavfile.getframerate()))
if start_pos < 0:
start_pos = 0
wavfile.setpos(start_pos)
end_pos = int(((2 * duration) + shift_end) * wavfile.getframerate())
chunk_end = end_pos + start_pos
words = transcriber.transcribe(text_chunk, wavfile, chunk_end)[0:chunk_length + 1]
if words[0]['word'] != chunk_start_word:
words = words[1:len(words)]
if words[-1]['word'] != chunk_end_word:
words = words[0:len(words) - 1]
start_t_chunk = words[0]['start']
for i in range(len(words)):
words[i]['start'] = words[i]['start'] - start_t_chunk + start_t
words[i]['end'] = words[i]['end'] - start_t_chunk + start_t
word_alignment = diff_align.align(words, chunk_ms)
realignments.append({"chunk": chunk, "words": word_alignment})
return realignments

for i in range(chunks):
tasks.extend(realign(to_realign[i]))
output_words = alignment
for i, obj in enumerate(tasks):
start_task = output_words.index(tasks[i]["chunk"][0])
duration_task = len(tasks[i]["chunk"])
output_words = output_words[:start_task] + tasks[i]["words"] + output_words[start_task + duration_task:]
return output_words
53 changes: 53 additions & 0 deletions python/vosk/aligner/scripts/text_processor.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
import logging
import re

from vosk import KaldiRecognizer
'''The script will rework.
1.1 & 1.2 prepare input text
2.1 get current sentence end-position as number
3.1 divide sentences in text
4.1 cut current sentence
5.1 prepare text for KaldiRecognizer pattern
'''
def text_processor(text, model):

# 3.1
def get_sentence(preprocess_result):
symbols = re.findall(r'([^\.\?\!]{1})', preprocess_result)
return symbols
# 2.1
def get_sentence_separator(preprocess_result):
current_sentence = re.search(r'([\.\?\!]{1})', preprocess_result)
current_separator_position = current_sentence.start()
return current_separator_position
# 5.1
def prepared_part_for_KaldiRecognizer(make_sentence):
final_result = ''.join(("", ''.join(('[', ''.join(('"', make_sentence.strip('[]'), '"')), ', "[unk]"]'))))
return final_result
# 1.1
def preprocess(text):
preprocessed_result = ''
cleaning = re.sub(r'[\,\;]', '', text)
lower_case = cleaning.lower()
for symbol in lower_case:
preprocessed_result += symbol
return preprocessed_result
# 4.1
def raw_part(make_sentence, prepared_text):
prepared_text = ''.join(prepared_text.split(make_sentence))[1:]
return prepared_text

make_text = ''
preprocessed_result = preprocess(text.strip()) # 1
while(len(preprocessed_result) > 0):
current_separator_position = get_sentence_separator(preprocessed_result)
# 2
symbols = get_sentence(preprocessed_result) # 3
make_sentence = ''
for symbol in range(current_separator_position):
make_sentence += symbols[symbol]
make_text += make_sentence
preprocessed_result = raw_part(make_sentence, preprocessed_result) # 4
final = prepared_part_for_KaldiRecognizer(make_text) # 5
rec = KaldiRecognizer(model, 16000, final)
return rec
Loading