-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Faster video inference script. #650
base: master
Are you sure you want to change the base?
Conversation
Tested. This really works! Thanks!! Test results (480p, upscale parameter 2): |
25% faster for me! |
May I ask a question? |
I am running the animevideov3 model without FP32 and the outputs are correct. |
Sorry I tried with or without fp32 and there's no difference, whole white output at all.
Just simply using Python inference_video_fast.py --fp32 or not with general x3v4 model (the tiny denoise one, the master branch work fine with fp32) |
This command is working fine on my machine: python inference_realesrgan_video_fast.py --model_name=realesr-general-x4v3 -i "videos\2022-12-24 17-53-30.mp4" -s 2 Did I understand your input correctly? |
I think you are right. I did it with -dn 0, I will try to use it again without it. |
It also works here with -dn 0. |
So I guess I need to debug into it... Wait I did use the no nb_frames video patch and my input is a webm file. (still the original script works. ) Okay test the demo.mp4 find out fp16 has some detail and fp32 is just color blocks... |
FYI, I observe
Might be worth a shot? I'm not sure how easy |
Thanks for your comments! Which model are you using? On my side, using channel_last seems to reduce performance by half. |
"Officlal" Real-ESRGAN x4 I suspect channel_last / channel_first gain will vary by device? Without channel_last, I get about 1.5x speedup on A4000. |
pytorch/pytorch#92542 |
You could also add an option to change the default libx264 to h264_nvenc encoder for ffmpeg which would give an additional performance boost. It would require ffmpeg compiled with cuda support, hence this as an option. |
how to use this on images instead of videos? |
Insane performance increase! Made this library unrealistic on my 4070 to nearly realtime. 25 fps source video went from doing 6.1fps to 21.5fps! Beautiful mate! |
Changes:
--batch
parameter, default is4
). Crushed CUDA GPU util to 100%.The metrics above are measured on a 1920x1080 30 fps anime video. On AMD R9-5900HX CPU (8 cores 16 threads) and 3080 LP (16GB), FP16, the processing rate goes from 0.8 fps to 4.6 fps with the optimizations (575% speed-up!). About 7.6 GB VRAM is used. You also get 4.4 fps (550% speed-up) at batch size 2, which now requires about 4.4 GB VRAM.
The script is not yet extensively tested (have no idea how to go, need some advice), and does not support extracting frames first, face enhance, alpha or grayscale images. Extract frames and face enhance go through very different workflows so the optimizations may not be applicable. Alpha and grayscale should not be an issue for almost all videos to be processed.
See #619, #634, #531.