The Math Package

The 'math package' is a reasonably discrete block of code that provides floating-point arithmetic capability for the rest of BASIC. It also includes some math functions, such as SQR (square root) and is probably the hardest part of BASIC to understand. There are three general reasons for this :

You may have forgotten a lot of the maths you learnt at school. This certainly applied to me : when I began working on this section from first principles, I quickly found myself floundering at the very idea of binary fractions.
Unless you're a numerical analyst who has reason to distrust conventional hardware/software floating-point support, you probably never needed to think about how floating point worked before now. Modern processors, compilers, and runtime libraries took the pain away years ago, and quite right too.
Floating point is hard to code. Consider this : Bill Gates is one of the brightest kids in America at the time, but he and his equally brainy pal Paul Allen end up having to hire a third wunderkind, Monte Davidoff, just to do floating point. They needed a specialist to do specialist work, and Monte had done it before.

Maths Refresher

The Basics of Bases

Consider an everyday decimal number such as 317.25. The digits that make up this and every other decimal number represent multiples of powers of ten, all added together:

10²	10¹	10⁰	.	10^-1	10^-2
3	1	7	.	2	5

So writing 317.25 is basically just shorthand for 3*10² + 1*10¹ + 7*10⁰ + 2*10^-1 + 5*10^-2. The shorthand form is far more readable, and that's why everybody uses it. At risk of labouring the point, the below table should clarify this.

Digit Position	Digit Value for Position	Decimal Number	Digit value for this number
2	100	317.25	3 * 100	=	300
1	10	317.25	1 * 10	=	10
0	1	317.25	7 * 1	=	7
-1	1/10	317.25	2 * .1	=	.2
-2	1/100	317.25	5 * .01	=	.05
Total:				=	317.25

Now consider the same number in binary (base two). The decimal number 317.25, expressed in binary, is :

2⁸	2⁷	2⁶	2⁵	2⁴	2³	2²	2¹	2⁰	.	2^-1	2^-2
1	0	0	1	1	1	1	0	1	.	0	1

And here's a table like the decimal one above, which should make it completely clear (remember 'bit' is short for 'binary digit') :

Bit Position	Bit Value for Position	Binary Number	Bit value for this number
8	256	100111101.01	1 * 256	=	256
7	128	100111101.01	0 * 128	=	0
6	64	100111101.01	0 * 64	=	0
5	32	100111101.01	1 * 32	=	32
4	16	100111101.01	1 * 16	=	16
3	8	100111101.01	1 * 8	=	8
2	4	100111101.01	1 * 4	=	4
1	2	100111101.01	0 * 2	=	0
0	1	100111101.01	1 * 1	=	1
-1	1/2	100111101.01	0 * 1/2	=	0
-2	1/4	100111101.01	1 * 1/4	=	0.25
Total:				=	317.25

Mantissas, Exponents, and Scientific Notation

Now let's think about decimal numbers again. Another way of representing the number 317.25 is like this : 3.1725 * 10². Yes we've split one number into two numbers - we've extracted the number's magnitude and written it seperately. Why is this useful? Well, consider a very small number such as 0.00000000000588. Looking at it now, precisely how small is that? That's a lot of zeros to work through. Also, let's pretend we're using very small numbers like this one in a pen+paper calculation - something like 0.00000000000588 + 0.000000000000291. You'd better be sure you don't miss out a zero when you're working the problem through, or your answer will be off by a factor of 10. It's much easier to have those numbers represented as 5.88 * 10^-12 and 2.91* 10^-13 (yes the second number had an extra zero - did you spot that?). The same principle applies for very large numbers like 100000000 - it's just easier and less human error prone to keep the magnitudes seperated out when working with such numbers.

It's the smallest of small steps to get from this form of number notation to proper scientific notation. The only difference is how the magnitude is written - in scientific notation we lose the magnitude's base and only write it's exponent part, thusly : 3.1725 E 2. The part that's left of the E, the 3.1725, is called the mantissa. The bit to the right of the E is the exponent.

Mantissas and Exponents in Binary

Let's go back to considering 317.25 in binary : 100111101.01. Using scientific notation, this is 1.0011110101 E 1000. Remember that both mantissa and exponent are written in binary that exponent value 1000 is a binary number, 8 in decimal.

Why floating point?

Consider the eternal problem of having a finite amount of computer memory. Not having infinate RAM means we cannot represent an infinite range of numbers. If we have eight bits of memory, we can represent the integers from 0 to 255 only. If we have sixteen, we can raise our range from 0 to 65535, and so on. The more bits we can play with, the larger the range of numbers we can represent. With fractional numbers there is a second problem : precision. Many fractions recur : eg one third in decimal is 0.33333 recurring. Likewise, one tenth is 0.1 in decimal but 0.0001100110011 last four bits recurring in binary.

So any method we choose for storing fractional numbers has to take these two problems into consideration. Bearing this in mind, consider the two possible approaches for storing fractional numbers :

Fixed point. Store the integer part in one field, and the fractional part in another field. It's called fixed point representation since the point (binary or decimal) is always in the same place - between the integer and fractional fields.
Floating point. Store the mantissa in one field, and the exponent in another field. This way, the point wouldn't be fixed into place - it could be anywhere, as determined by the binary exponent. It would, in fact be, a floating point.

Why is floating point better than fixed point? Let's say we have 32 bits to play with. Let's use fixed point and assign 16 bits for the integer part and 16 for the fractional part. This allows a range of 0 to 65535.9999 or so, which isn't very good value, range-wise, for 32 bits. OK, lets increase the range - we'll change to using 20 bits for the integer and 12 for the fraction. This gives us a range of 0 to 1,048,575.999ish . Still not a huge range, and since we've only got 12 bits for the fraction we're losing precision - numbers stored this way will be rounded to the nearest 1/4096th.

Now lets try floating point instead. Lets assign a whopping 24 bits for the mantissa and 8 bits for the exponent. 8 bits doesn't sound like much, but this is an exponent after all - with these 8 bits we get a range of -128 to +127 which is roughly 10^-38 to to 10³⁸. That's a nice big range! And we get 24 bits of precision too! It's clearly the better choice.

Floating point is not a perfect solution though... adding a very small number to a very large number is likely to produce an erroneous result. For example, go to the BASIC emulator and try PRINT 10000+.1. You get 10000.1 as expected. Now try PRINT 10000+.01 or PRINT 100000+.1. See?

Normalisation

Normalisation is the process of shifting the mantissa until it is between 0.5 and 1 and adjusting the exponent to compensate. For example, these binary numbers are unnormalised :

101.001
0.0001
0.011 E 101

After normalisation these same binary numbers become :

0.101001 E 11
0.1 E -11
0.11 E 100

blah

How Altair BASIC stored floating point numbers

There was no industry standard for floating-point number representation back in 1975, so Monte had to roll his own. He decided that 32 bits would allow an adequate range, and defined his floating-point number format like this :

Floating-point number representation in Altair BASIC

The 8-bit exponent field had a bias of 128. This just meant that the stored exponent was stored as 'exponent+128'.

Also, the mantissa was really 24 bits long, but squeezed into 23 bits. How did he save an extra bit of precision? By considering zero as a special case, indicated by exponent zero. Any non-zero number will always have a mantissa with a leading 1. And since the first bit is always going to be 1, why bother storing it?

The intermediate storage of unpacked fp numbers is undefined and seems to be generally done on the fly.

fixme: put example of normalising and denormalising.