-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Why is the output different from Vibe's work? #9
Comments
have you used any audio normalization? it is almost as important a process as the models themselves. Unfortunately, when I reviewed their papers/original repos (pyannote/whisper etc), I could not find a "general" normalization method that should be used to get the best results. What I did was mostly experimental. Note that, Vibe uses: pub fn normalize(input: PathBuf, output: PathBuf) -> Result<()> {
let ffmpeg_path = find_ffmpeg_path().context("ffmpeg not found")?;
tracing::debug!("ffmpeg path is {}", ffmpeg_path.display());
let mut cmd = Command::new(ffmpeg_path);
let cmd = cmd.stderr(Stdio::piped()).args([
"-i",
input.to_str().context("tostr")?,
"-ar",
"16000",
"-ac",
"1",
"-c:a",
"pcm_s16le",
"-af", // normalize loudness
"loudnorm=I=-16:TP=-1.5:LRA=11",
output.to_str().context("tostr")?,
"-hide_banner",
"-y",
"-loglevel",
"error",
]); from what I learned from the speaker identification/whisper process is audio normalization plays a crucial part. I have no idea what is the best normalization to do, it's mostly experimental and different normalizations for different situations can give different results. This is especially obvious in parallel speech. For my tests though, I generally use gstreamer's audio normalization. They work really nicely. https://github.com/sdroege/gstreamer-rs/tree/main/gstreamer-audio |
strange. Which one is more accurate? |
Vibe more accurate |
@thewh1teagle Any idea what the difference is? |
I checked the output of the library and the result of the Vibe application (which uses it). Why are their results different?
Vibe
This Lib
The text was updated successfully, but these errors were encountered: