Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Floating point vs Fixed Point #31

Open
hpmax opened this issue Jul 19, 2020 · 6 comments
Open

Floating point vs Fixed Point #31

hpmax opened this issue Jul 19, 2020 · 6 comments

Comments

@hpmax
Copy link
Contributor

hpmax commented Jul 19, 2020

Looking through the code, it appears there's a lot of highly repetitive code (like for synthesizing the waveform) which uses floating point numbers, for things that could probably work just as well for fixed point. Hopefully that should make it run faster and with less power consumption. Anyone have comments on this?

@kedder
Copy link
Member

kedder commented Jul 23, 2020

I don't think it is worth changing without some real measurements. Would you be able to detect any change in power consumption if you remove that repetitive code completely?

@hpmax
Copy link
Contributor Author

hpmax commented Jul 23, 2020

I believe I properly optimized waveform synthesis while retaining floating point. Audio seemed unaffected, unfortunately so did the reported CPU utilization. So I'm not optimistic that switching to fixed point would yield much further improvement. Even so, I might be wrong and see no downside to my changes.

I'll branch it and people can take a look and see if the changes look better. With one minor exception, all changes were in audiovario.c I did introduce two number variables (pulse_risei, and pulse_falli) which are the pre-computed reciprocals of pulse_rise, and pulse_fall. I'm assuming floating point multiply is preferable over floating point divide. If not, that change was pointless.

@hpmax
Copy link
Contributor Author

hpmax commented Jul 23, 2020

I set up a pull request on the code changes I made. Feel free to look it over.

@hpmax
Copy link
Contributor Author

hpmax commented Oct 11, 2020

Okay, I've done some actual measurements on it. First off, I can tell you floating point divides are awful, they are about 5x worse than floating point multiplies. Interestingly, I can't tell floating point multiplies are worse than floating point add/subtract. But floating point if's don't seem too great. Besides that, it's surprisingly hard to benchmark this stuff, because the compiler is optimizing out a lot of my benchmarking code.

I've spent some time optimizing the code. I got rid of all the divides (from the inner loop, there are still some outside it), and reduced a lot of the redundant math. Based on my current estimates:

The current code (the one that I just optimized for the click/pop suppression) is using approximately 0.22% of the CPU.
Unreleased optimized code is using approximately 0.16% of the CPU.
Unreleased optimized code with conversion to a fixed-point inner loop (the one that gets executed once per 22.6us of real time), is utilizing a bit under 0.075% of the CPU.

The unreleased optimized floating point code is tested and ready to go and should be pretty safe. It's just multiplying by reciprocals instead of dividing, and pre-calculating and storing values which are commonly used. The fixed point code is not tested -- however, I don't anticipate it taking too long to get get working.

Is this worth pursuing?

@kedder
Copy link
Member

kedder commented Oct 11, 2020

I feel like it is not worth it, unless it provides some benefits (like performance or better code readability / maintainability). It doesn't look like anyone could notice the performance improvement. So unless it makes code easier to understand, I'd not touch it.

@hpmax
Copy link
Contributor Author

hpmax commented Oct 11, 2020

My "unreleased optimized code" is a little harder to read, but I added documentation which I think should more than make up for it. The main issue is I added a few variables (for example pulse_risei in addition to pulse_rise, where pulse_risei = 1/pulse_rise).

The fixed point conversion is limited to the inner loop where it makes the most difference which means I have a few parallel integer variables. and add in some *32768, <<15, or >>15...

Fixed Point:

int deltaphase_int = (int) round(deltaphase*2<<15);
int deltapulse_int = (int) round(deltapulse*2<<15);
for (j<0;j<safeguard;++j) {
pcm_buffer[j]=pulse_syn(phase_ptr,pulse_phase_ptr);
phase_ptr+=deltaphase_int;
if (phase_ptr>4*32768) phase_ptr-=4*32768;
pulse_phase_ptr+=deltapulse_int;
if (pulse_phase_ptr>2*m_pi*32768) pulse_phase_ptr-=2*m_pi*32768
}

Floating Point:

for (j<0;j<safeguard;++j) {
pcm_buffer[j]=pulse_syn(phase_ptr,pulse_phase_ptr);
phase_ptr+=deltaphase;
if (phase_ptr>4) phase_ptr-=4;
pulse_phase_ptr+=deltapulse;
if (pulse_phase_ptr>2*m_pi) pulse_phase_ptr-=2*m_pi;
}

I'd love to be able to say I optimized variod by a factor of 4 or 5 (which is probably the difference between where I started and the fixed point code), but realistically the differene between the optimized fixed point and floating point is only .1% of the CPU.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants