Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

window size -› seg_len #6

Closed
G874713346 opened this issue May 12, 2021 · 4 comments
Closed

window size -› seg_len #6

G874713346 opened this issue May 12, 2021 · 4 comments
Labels
bug Something isn't working

Comments

@G874713346
Copy link

How to realize the window size is drawn from a uniform distribution within [240ms, 1600ms] during training?

In your source code dvector.py, there are two questions. One is the conditional judgment: if utterance. size (1) < = self. seg _ len:, which should be compared with the 0 th dimension, because the 1 ST dimension is 40, so the horizontal dimension is smaller than seg_len=160, and the following sliding window part unfold cannot be reached; Second, the output shape of unfold is [bacth_size, 40, seg_len], while the input shape of AttentivePooledLSTMDvector should be [bacth_size, seg_len, 40], that is, size(-1) must be 40.

As for the uniform distribution seg_len, can I directly add the evenly distributed seg_len when traversing each utterance?

I hope you can give me an answer, thank you!
image

@yistLin
Copy link
Owner

yistLin commented May 12, 2021

Thank you for pointing out this important issue! You are right, and I think the last time I modified this part I didn't test it thoroughly...

I'll fix this ASAP!

As for a uniform distribution of seg_len during training, I didn't implement this yet. The function embed_utterance is only used in testing time. You can take a look at the __getitem__ function in ge2e_dataset.py (line 53-55) and do the sampling of the length there.

@yistLin yistLin added the bug Something isn't working label May 12, 2021
@yistLin
Copy link
Owner

yistLin commented May 12, 2021

The unfolding problem has been fixed.

@yistLin yistLin closed this as completed May 13, 2021
@G874713346
Copy link
Author

Thank you for your answer. I have another question, because you made a mistake in judging the seg_len condition of sliding window, so did the model dvector.pt in the example adopt sliding window? Or directly extract dvector instead of sliding window averaging dvector?

@yistLin
Copy link
Owner

yistLin commented May 13, 2021

I'm not pretty sure what dvector.pt in the example is. If you mean the released jit-compiled dvector-step250000.pt, yes it has been recompiled and uses the sliding window to extract audio segments now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants