Optimizing functions in alg.c #1496
Replies: 6 comments
-
Hi, could you please also review the following attempts to optimize NEON and SSE made by @notorca? Maybe you could benefit and improve those I'm already using NEON optimizations in my motion fork for few years, works great |
Beta Was this translation helpful? Give feedback.
-
@tosiara Thank you for pointing this out.
I understand MrDaves concerns with respect to maintainability.
My insights (so far) during this whole exercise are:
Yes I see benefits and improvements. notorca's branch would be the start for that. |
Beta Was this translation helpful? Give feedback.
-
@tosiara I noticed that the fork from notorca is gone. |
Beta Was this translation helpful? Give feedback.
-
@tosiara Hmmm, notorca's work can be found in your fork of course ... sorry for interrupting you. |
Beta Was this translation helpful? Give feedback.
-
Hi guys! I’m not using this project anymore, but ping me if you need any help. |
Beta Was this translation helpful? Give feedback.
-
Will do, thanks for the good work! |
Beta Was this translation helpful? Give feedback.
-
I optimized some functions in motions alg.c file, using intel SSE2.
Why? Mainly for fun and learning, also hoping that I could squeeze more out of old hardware.
That last reason was abandoned as soon as I realized that most of the CPU utilization was caused by encoding and decoding of video.
If there is any interest, let me know and read on ...
It's in a "works for me" stage.
I use it 24x7 on a 720p stream and two 480p streams.
I did it for the top 5 that showed up in gprof at the time I started.
Below the current results (roughly):
data:image/s3,"s3://crabby-images/f7cdd/f7cdd17741687eb6495b9929d1d14941bccab504" alt="image"
Somehow gprof could not profile the dilate function.
So I the took the numbers from despeckle, which also includes labeling.
The last column shows the speedup for the total amount of time motion was busy during processing of my testclip (an approximation of CPU time spent). Note: this column is motion code only, gprof did not profile the linked libraries (encoding, decoding, etc...) which consume the most CPU time in the motion application.
I used a (real life) testclip,, 7 minutes, 720p, 25 fps, in which a couple of events take place
I used standard despeckle (EedDl)
I'm now planning:
Beta Was this translation helpful? Give feedback.
All reactions