Friday, April 19, 2013

Floating Points - an explanation for non computer scientists

!!! Attention: this article (and in fact my whole blog) have moved to site: !!!

This article is not intended for very proficient programmers or computer scientists, but offers more of an introductory and high level explanation of floating point variables to the hobbyist programmer or high school student who likes to explore computers.

Let's look at Integers first

One of the first variable types you learnt were probably integer variables. With integer variables we can represent integers (i.e. 0, 1, 2, ..., -1, -2, ...). The numbers are represented accurately on the computer, meaning the exact mathematical value can be stored in the computer. However, note that a computer cannot represent the complete infinite set of integers (..., -2, -1, 0, 1, 2, ...) but only a finite subset of it, because a computer's memory is always finite. In C, the two limit values are stored in the constants INT_MIN and INT_MAX.

Getting Real

The limitations of integers are apparent when doing simple divisions like 1 / 3. For code like "int x = 1 / 3" the result would just be rounded down and we'd get 0 for x. This is why computer architects and mathematicians in the early days of computing thought of ways to represent real numbers in the computer. The mathematical set they were interested to do calculations with where the real numbers (containing rational numbers like 1/3 and irrational numbers like pi). The fundamental problem is that the real numbers are an infinite set, but the computer only has finite memory.

Infinity != Infinity ?

Note that with the set of integers, the problem of infinity was solved by specifying two limit values INT_MIN and INT_MAX, such that only the finite subset between those limits can be represented. Now, why can't we just take this approach over to the real numbers? The problem is that the real numbers present a "larger" infinite set than the integers. This was shown in Cantor's famous and lovely proof that the real numbers are an uncountable set whereas integers (and even rational numbers) are a countable set. Anyway, I'm deriving... (I just find these mathematical subtleties very lovely). If you haven't learnt about these concepts before, don't worry. The essential thing we're looking at here is how to represent real numbers in the best possible way on a computer.

Say we set limits just as with integers, a REAL_MIN and REAL_MAX value, so that the computer can accurately represent all numbers in there. Does that work? Unfortunately no. The reason is that even within this limit there is still an infinite number of real numbers. That is the essential difference to integers! Thus even in a very small interval, say 0.0 and 1.0, there are infinitely many real numbers, and the computer with its limited storage can only represent a finite subset of numbers in that interval.

So for real numbers we need to take two measures in order to represent them (or at least their approximation) on a computer:
  • Just like with integers, we need to have two limits values. In the case of C they're called FLT_MIN and FLT_MAX. (Their exact values depend on how the computer internally stores real numbers and doesn't concern us here.)
  • Moreover, we cannot represent all numbers within our interval, since there are still infinitely many numbers, but only a chosen subset of those numbers.

Fixed point numbers

In the early days of computer architecture, when researchers were devising ways to represent these decimal numbers, there was an approach known as Fixed Point Numbers. In this approach the decimal point is fixed in the computer representation. So for example for the number 0.1 the computer would know that in the internal representation the decimal point is always at its original position. Furthermore it will remember that the value behind the decimal point is "1". In the case of 0.00001 what would the computer have to remember? Since the decimal point is fixed, the computer has to remember all the digits behind the decimal point, namely "00001".

Now, floating points!

What made floating point numbers different was that they said: We don't set the decimal point at the fixed position where we usually write it in the decimal number, but instead let it "float" around in our number to a position so that we can save the information we need in less memory than the fixed point approach. So in the previous example, for 0.00001, we would let our decimal point float 4 digits to the right so that we get 00000.1 or 0.1 Now this way we only need to save "1" as the value behind the floating point. This is much better than "00001", right? (Since it requires less memory). However, in addition to that we also need to save the amount we moved our decimal point around, so that the original decimal number can be accurately reconstructed. In this case we would have to somehow encode that the decimal point was moved 4 decimals to the right. Even with the additional information, we're still requiring less space to encode the number 0.00001 with a floating point instead of a fixed point number. This is essentially the biggest benefit of floating point numbers, and is also the reason why we use them (instead of fixed points or other representation) in most mathematical computations.


This was a high level introduction to floating points with a bit of history and math. I've left out many complicated technical details on purpose, and as I said, this article is not targeted towards computer scientists, but rather towards people who have done a bit of programming and want to deepen their knowledge.

Addional Resources

Recommended reading for computer scientists: What Every Computer Scientist Should Know About Floating-Point Arithmetic.

Cantor's diagonal argument to show that real numbers are uncountable on Wikipedia.

Interesting classical paper by von Neumann that was arguing against floating points:
Preliminary discussion of the logical design of an electronic computing instrument

Let me know what you think of this article and where you see room for improvement. I only just started this blog, so feedback of any kind is appreciated!

Related Articles:
The Peculiarities of Floating Point comparisons. (to be written...)
Should I study Computer Science?

No comments:

Post a Comment