Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

long form inference #11

Open
eschmidbauer opened this issue Oct 9, 2024 · 2 comments
Open

long form inference #11

eschmidbauer opened this issue Oct 9, 2024 · 2 comments

Comments

@eschmidbauer
Copy link

Is long form inference possible with whisper_trt ?
I tried inference on 4m16s audio clip and it appeared to only transcribe 30s, here is my script:

from whisper_trt import load_trt_model

model = load_trt_model("small.en")
result = model.transcribe("test.wav")
@jaybdub
Copy link
Contributor

jaybdub commented Oct 16, 2024

Hi @eschmidbauer ,

It should be possible, but seems like we'll need to make some modifications to the transcribe function:

if int(mel.shape[2]) > whisper.audio.N_FRAMES:

Currently, it runs on a single 30s window.

John

@eschmidbauer
Copy link
Author

It would be great to demonstrate long-form here perhaps by using sliding window

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants