Version 1.1.0 has onnxruntime thread affinity crash #1169

Appfinity-development · 2024-11-23T06:06:30Z

Updated from 1.0.3 to 1.1.0. Now an onnxruntime thread affinity crash occurs each time. Both versions run on a Nvidia A40 with 4 CPU cores, 48GB VRAM and 16GB RAM (on a private Replicate server). Shouldn't be a hardware issue. Our model config:

  self.whisper_model = WhisperModel(
            "large-v2",
            device="cuda",
            compute_type="float16",
            cpu_threads=4,
            num_workers=1
        )
        
        ...
        
         options = dict(
            vad_filter=True,
            vad_parameters=dict(min_silence_duration_ms=1000),
            initial_prompt=prompt,
            word_timestamps=True,
            language=language,
            log_progress=True,
            hotwords=prompt
        )
        
        segments, transcript_info = self.whisper_model.transcribe(audio=audio_file, **options)

Also tried this:

import os
os.environ["ORT_DISABLE_CPU_AFFINITY"] = "1"
os.environ["OMP_NUM_THREADS"] = "4"
os.environ["OPENBLAS_NUM_THREADS"] = "4"
os.environ["MKL_NUM_THREADS"] = "4"
os.environ["VECLIB_MAXIMUM_THREADS"] = "4"
os.environ["NUMEXPR_NUM_THREADS"] = "4"

But to no avail. Any suggestions? Below the crash log.

Loading large-v2 model...
Done loading large-v2 model, took: 75.503 seconds
Starting transcribing
INFO:faster_whisper:Processing audio with duration 03:25.706
2024-11-22 19:33:53.322733977 [E:onnxruntime:Default, env.cc:234 ThreadMain] pthread_setaffinity_np failed for thread: 785, index: 1, mask: {2, }, error code: 22 error msg: Invalid argument. Specify the number of threads explicitly so the affinity is not set.
INFO:faster_whisper:VAD filter removed 00:19.722 of audio
DEBUG:faster_whisper:VAD filter kept the following audio segments: [00:00.048 -> 01:07.440], [01:07.984 -> 03:06.576]
0%| | 0/185.98 [00:00<?, ?seconds/s]DEBUG:faster_whisper:Processing segment at 00:00.000
Traceback (most recent call last):
File "/root/.pyenv/versions/3.10.15/lib/python3.10/site-packages/cog/server/runner.py", line 417, in _handle_done
f.result()
File "/root/.pyenv/versions/3.10.15/lib/python3.10/concurrent/futures/_base.py", line 451, in result
return self.__get_result()
File "/root/.pyenv/versions/3.10.15/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
raise self._exception
cog.server.exceptions.FatalWorkerException: Prediction failed for an unknown reason. It might have run out of memory? (exitcode -6)

The cog.yaml with dependencies looks like this:

build:
  gpu: true
  system_packages:
    - "ffmpeg"
    - "libmagic1"
  python_version: "3.10"
  python_packages:
    # Core ML packages
    - "torch==2.3.0"
    - "torchaudio==2.3.0"
    - "faster-whisper==1.1.0"
    - "pyannote-audio==3.3.1"
    - "onnxruntime"

    # API and utility packages
    - "requests==2.31.0"
    - "firebase-admin==6.4.0"
    - "google-generativeai==0.3.2"
    - "babel==2.14.0"
    - "openai==1.12.0"
    - "supabase==2.10.0"
    - "kalyke-apns==1.0.3"
    - "numpy<2.0.0"

  run:
    - "pip install --upgrade pip"
    - "echo env is ready!"

predict: "predict.py:Predictor"

Also tried removing the onnxruntime dependency or setting it to a specific gpu version. But nothing fixes the issue. Anyone with ideas (@MahmoudAshraf97) ?

If the cpu is used as device on WhisperModel the onnxruntime error still shows in the logs but there is no crash and transcribing finishes successfully.

The text was updated successfully, but these errors were encountered:

MahmoudAshraf97 · 2024-11-23T12:30:39Z

Can you limit the number of threads here and try again?
https://github.com/SYSTRAN/faster-whisper/blob/97a4785fa13d067c300f8b6e40c4381ad0381c02/faster_whisper/vad.py#L263:L264

Appfinity-development · 2024-11-23T14:30:20Z

Which API is available to set SileroVADModel SessionOptions parameters?

Purfview · 2024-11-23T14:34:24Z

Which API is available to set SileroVADModel SessionOptions parameters?

Just change it in vad.py to:

        opts.inter_op_num_threads = 1
        opts.intra_op_num_threads = 1

Appfinity-development · 2024-11-25T11:10:25Z

Im running the code on a docker environment which just pulls in faster_whisper package from PyPi. So local changes I make in Pycharm to package won't propagate to the Replicate server. Only 2 options I see is monkey patching or forking the whole lib. Both which I'm not really keen on doing..

Or am I missing a third option?

MahmoudAshraf97 · 2024-11-25T11:12:22Z

No third option currently, I just want you to test the fix first before we actually take any steps to fix

Appfinity-development · 2024-11-28T10:49:49Z

Tried monkey patching, this does remove the onnxruntime error but the OOM error still persisted. It turned out to be ctranslate2 version 4.5.0 was incompatible with the cog docker env of replicate. After downgrading to 4.4.0 it worked again. I did however keep the monkey patch since the logs won't be polluted then and the error seems something that should be addressed in 1.1.1.

Im now using large-v2 with the BatchedInferencePipeline which speeds up the processing time around 2x. Very nice for the same model.

This is my current packages in case someone else runs into the issue:

    - "torch==2.3.0"
    - "torchaudio==2.3.0"
    - "faster-whisper==1.1.0"
    - "pyannote-audio==3.3.2"
    - "ctranslate2==4.4.0"

monkey patch:

import faster_whisper.vad
from faster_whisper.vad import SileroVADModel

# to prevent "Invalid argument. Specify the number of threads explicitly so the affinity is not set" onnxruntime error

class PatchedSileroVADModel(SileroVADModel):
    def __init__(self, encoder_path, decoder_path):
        try:
            import onnxruntime
        except ImportError as e:
            raise RuntimeError(
                "Applying the VAD filter requires the onnxruntime package"
            ) from e

        # Custom modification for SessionOptions
        opts = onnxruntime.SessionOptions()
        opts.inter_op_num_threads = 4
        opts.intra_op_num_threads = 4
        opts.log_severity_level = 3

        # Initialize sessions with modified options
        self.encoder_session = onnxruntime.InferenceSession(
            encoder_path,
            providers=["CPUExecutionProvider"],
            sess_options=opts,
        )
        self.decoder_session = onnxruntime.InferenceSession(
            decoder_path,
            providers=["CPUExecutionProvider"],
            sess_options=opts,
        )

faster_whisper.vad.SileroVADModel = PatchedSileroVADModel

Purfview · 2024-11-28T11:23:11Z

I think it should be

        opts.inter_op_num_threads = 1
        opts.intra_op_num_threads = 1

MahmoudAshraf97 · 2024-11-28T12:39:19Z

I think it should be

        opts.inter_op_num_threads = 1
        opts.intra_op_num_threads = 1

the error he's mentioning is only caused when the value is 0 since that means onnx must infer the actual number and it fails to do so, any fixed number should fix the error, setting it to 1 should be the safest but not the fastest

Also VAD encoder now benefits from GPU acceleration if anyone needs it

Reported problems: SYSTRAN#1193 SYSTRAN#1169 VAD implementations consumes humongous memory amounts [original Silero doesn't have this problem] This PR should fix the OOM problem. Alt solution could be removing 'lru_cache'.

Purfview · 2024-12-10T18:37:05Z

@Appfinity-development
Try this fix -> #1198

Reported problems: SYSTRAN#1193 SYSTRAN#1169 VAD implementations consumes humongous memory amounts [original Silero doesn't have this problem] This PR should fix the OOM problem. Alt solution could be removing 'lru_cache'.

Appfinity-development changed the title ~~Version 1.1.0 has onxyruntime crash~~ Version 1.1.0 has onnxruntime thread affinity crash Nov 23, 2024

Purfview mentioned this issue Dec 10, 2024

Fixes OOM Errors - too high RAM usage by VAD #1198

Merged

MahmoudAshraf97 closed this as completed Dec 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Version 1.1.0 has onnxruntime thread affinity crash #1169

Version 1.1.0 has onnxruntime thread affinity crash #1169

Appfinity-development commented Nov 23, 2024 •

edited

Loading

MahmoudAshraf97 commented Nov 23, 2024

Appfinity-development commented Nov 23, 2024 •

edited

Loading

Purfview commented Nov 23, 2024 •

edited

Loading

Appfinity-development commented Nov 25, 2024

MahmoudAshraf97 commented Nov 25, 2024

Appfinity-development commented Nov 28, 2024 •

edited

Loading

Purfview commented Nov 28, 2024

MahmoudAshraf97 commented Nov 28, 2024

Purfview commented Dec 10, 2024

Version 1.1.0 has onnxruntime thread affinity crash #1169

Version 1.1.0 has onnxruntime thread affinity crash #1169

Comments

Appfinity-development commented Nov 23, 2024 • edited Loading

MahmoudAshraf97 commented Nov 23, 2024

Appfinity-development commented Nov 23, 2024 • edited Loading

Purfview commented Nov 23, 2024 • edited Loading

Appfinity-development commented Nov 25, 2024

MahmoudAshraf97 commented Nov 25, 2024

Appfinity-development commented Nov 28, 2024 • edited Loading

Purfview commented Nov 28, 2024

MahmoudAshraf97 commented Nov 28, 2024

Purfview commented Dec 10, 2024

Appfinity-development commented Nov 23, 2024 •

edited

Loading

Appfinity-development commented Nov 23, 2024 •

edited

Loading

Purfview commented Nov 23, 2024 •

edited

Loading

Appfinity-development commented Nov 28, 2024 •

edited

Loading