Skip to content

Commit

Permalink
grammar / wording
Browse files Browse the repository at this point in the history
  • Loading branch information
LeonEricsson committed Mar 28, 2024
1 parent 71be14a commit d61f6b0
Showing 1 changed file with 11 additions and 7 deletions.
18 changes: 11 additions & 7 deletions blog/2024-03-21-floatingpoints.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,31 +11,35 @@ type: blog
Floating point numbers was something I just took at face value for a long time, I tried wrapping my head around them but I always rejected the mathematical notation and things never stuck

$$
(-1)^S * 1.M * 2^{(E-127)}
(-1)^S * 1.M * 2^{(E-127)}.
$$

Some time ago however, floating points were explained to me in a way that just made sense, and eventually even the formula above was clear. So if you're like me and you put off understanding floating points, hopefully this will be as eye-opening to you as it was to me.
But some time ago, i read an explanation that just made sense, things clicked, and eventually the formula above no longer terrified me. If you're like me, and you've put off understanding floating points, today might be the day that things click for you to!

<br/><br/>
## An attempt at a inutitive explanation
I'd like to get to fewer bit representations later, but let's start of with the classic 32-bit floating point, as defined by IEEE 754, but using our own terminology:
## An attempt at a intuitive explanation of FP32
Let's start of with the classic 32-bit floating point, as defined by IEEE 754. You've probably got the words exponent, mantissa and significand stored somewhere in memory, hoping not to have to think about them again, but if we ignore those for now and imagine a 32-bit floating point defined like this:

![](/images/fp32.png)

A floating point is represented by a sign bit S, 8 window bits and 23 offset bits. The window defines in between which consecutive power-of-two's a number lies: [$2^1$, $2^2$], [$2^3$, $2^4$] and so on up to [$2^{127}$, $2^{128}$]. The offset bits divides each of these windows into $2^{23} = 8388608$ *frames*, that enable you to approximate a floating point number.
Our floating point is represented by a sign bit S, 8 window bits and 23 offset bits. Now, initially it might look like I've just slapped new words on already complicated things, but bare with me. The first bit, the sign bit, is the easiest, when its 0 your number is positive and when its 1, the number is negative. The window defines in between which consecutive power-of-two's a number lies: [$2^1$, $2^2$], [$2^3$, $2^4$] and so on up to [$2^{127}$, $2^{128}$]. Example: to represent 1000, you find the window [$2^9$, $2^{10}$] = [512,1024]. Finall, the offset bits divides each of these windows into $2^{23} = 8388608$ *frames*, that enable you to approximate a floating point number.

You might have already noticed that the windows grow exponentially in size, while the number of frames remain constant. The consequence is that our precision is reduced for larger numbers.



### An example
How do we represent the number $4.3$ for example?
I think the best way to learn is through an example. *How do we represent the number $4.3$?*

- The number is positive so our sign bit is 0.
- Which window is $4.3$ in? It lands between $2^2 = 4$ and $2^3 = 8$, and therefor our window bits should represent $2^2$ (think window start + offset = number)
- Finally, we need to find the frame that is the closest approximation of our desired number. We find that the offset ratio is $\frac{4.3 - 4}{8-4} = 0.075$, which when translated to our mantissa range gives $2^{23} * 0.075 = 629145.6$. Notice that this isn't a whole number, but we said that we divided our range into exactly $8388608$ frames. This means we have to round up $629145.6$ to $629146$, and this represents our precision error!

Going back to the original formula, we get
Did we get things right? Well let's go back to the original formula.

- The sign bit, S, is the same
- The mantissa, $M$, is our offset.
- Our window is the exponent, $E$.

$$
(-1)^0 * 1.075 * 2^{(2)} = 4.3
Expand Down

0 comments on commit d61f6b0

Please sign in to comment.