Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Turn off the microfrontend autogain and noise_suppression? #279

Closed
StuartIanNaylor opened this issue Jan 10, 2025 · 2 comments
Closed

Turn off the microfrontend autogain and noise_suppression? #279

StuartIanNaylor opened this issue Jan 10, 2025 · 2 comments

Comments

@StuartIanNaylor
Copy link

https://github.com/google-ai-edge/LiteRT/tree/main/tflite/experimental/microfrontend

I wasn't sure if the config_yaml actually disables microfrontend autogain and noise_suppression or remember if they are disabled training params.
The xmos algs have AGC and NS so if active you likely have a duplication and the microfrontend libs are likely inferior
noise_suppression_level: 0
auto_gain: 0 dbfs

https://github.com/google-ai-edge/LiteRT/blob/main/tflite/experimental/microfrontend/python/ops/audio_microfrontend_op.py

def audio_microfrontend(audio,
                        sample_rate=16000,
                        window_size=25,
                        window_step=10,
                        num_channels=32,
                        upper_band_limit=7500.0,
                        lower_band_limit=125.0,
                        smoothing_bits=10,
                        even_smoothing=0.025,
                        odd_smoothing=0.06,
                        min_signal_remaining=0.05,
                        enable_pcan=True,
                        pcan_strength=0.95,
                        pcan_offset=80.0,
                        gain_bits=21,
                        enable_log=True,
                        scale_shift=6,
                        left_context=0,
                        right_context=0,
                        frame_stride=1,
                        zero_padding=False,
                        out_scale=1,
                        out_type=dtypes.uint16):

Seems to be enabled in the python microfrontend for training so presuming is also in https://github.com/google-ai-edge/LiteRT/blob/main/tflite/experimental/microfrontend/ops/audio_microfrontend_op.cc
Dunno but likely both should be disabled when using the xmos and datasets normalised to xmos levels

@kahrendt
Copy link
Contributor

The mWW models do not work well if you just turn off the AGC and NS, as they were trained with them on. Ideally, we would train on audio processed by XMOS's algorithms, but there doesn't seem to be a straightforward way to do this.

We tested several different versions of the audio fed into mWW. Only AEC applied, AEC + Noise Supression, AEC + NS + Interference Cancellation, and AEC + NS + IC + Automatic Gain Control. We had the best results with AEC + NS + IC, so that's what the firmware uses. Duplicating the AGC increased the number of false activations. The XMOS NS and IC algorithms don't seem to interfere with the microfeature's NS algorithm.

@kahrendt kahrendt closed this as not planned Won't fix, can't repro, duplicate, stale Jan 17, 2025
@StuartIanNaylor
Copy link
Author

StuartIanNaylor commented Jan 17, 2025

I am saying turn off AGC and NS in training so you can turn off AGC and NS of audio fed into mWW.
Its very likely the TF4Micro NS is more rudimentary than the Xmos NS/Voice extraction but is further up the audio stream so reducing Xmos input to the TF4Micro NS alg...

You tested several different versions of the audio fed into mWW and never changed the training ?!
AGC is just Automatic Gain Control and recording via a constant wyoming broadcast with no mWW would give you a dataset to assess at the 0.3-3m what levels the AGC give that you can duplicate in a dataset.
It would be really good if the Xmos USB Audio class libs where enabled but unfortunately only the DFU driver is enabled.
Then it would be plug&play from Pi to PC but a straight to Wyoming constant broadcast setup could be used to analyse the AGC.
The need for TF4Micro NS is likely it produces artifacts that are being introduced in training...

Its not just you are wasting the purchased in speech enhancement by forcing it through basic algs to run on a selection of low end microcontrollers you are wasting all those ops on the esp32-S3 as they are being duplicated on the purchased in xmos...
'closed this as not planned' ?!? ok...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants