# Floating Point

Encoding non-integer values requires the use of Scientific or Floating-point notation. Many floating-point systems exist but one of the more common is the IEEE 754 standard.

A value encoded in floating-point format is composed of two major components: a mantissa and an exponent. The actual structure of these is examined for the IEEE 754 standard.

Floating-point values may be handled completely through software, by means of an extra "math co-processor", or by complex instructions built into the main processor.

## Fractional Values

Fractions can be handled in 3 different ways:
• The Ratio of 2 Integer Values - This is the first form most people consider when they think of the term "fraction" e.g. 2/7. Except for some specially developed software packages or very specialized mathematics co-processors, this form is not supported by computers.
• Fixed-Point Values - We often refer to this form as "decimal fractions" although the base-10 (decimal) system really has nothing to do with it (the same positional notation could be, and sometimes is, used with binary and hexadecimal represented values). A (decimal) point is used to separate the digit position with a weight of 1 from digit positions to its right representing values whose weights are 1/base (e.g. 1/10 for decimal) smaller than the weight to their immediate left. A specific "Fixed point" values is always coded with the same number of digits to the right of the (decimal) point. Computers store this kind of "fraction" internally in the same way as they store integer values and divide by an appropriate power of the base when combining this value with other numbers or when performing IO. For example, "dollar values" may often be stored as fixed point values with 2 digits to the right of the decimal point; in fact, the value is stored as an integer number of cents and only divided by 100 for output display.
• Floating-Point Values - Often when using numbers to represent "real world" measurements, we know that our values are not perfectly accurate. We say that the distance between two towns is a certain number of kilometers, where even if we are correct in our statement to the nearest kilometer, we most certainly are not correct to the nearest millimeter. Furthermore, we don't care! The value we have given is "good enough" for our purposes. Whether some distance is 27 kilometers or 270 kilometers is a much more important question than whether some other distance is 300,000 kilometers or 301,000 kilometers. Differences in "scale" are more important than differences in "precision" at the same "scale". Floating-point values are a method for representing values in a manner that recognizes this difference in importance. "Scale" is considered as a separate value separate from some (approximated) "precision" value. "Scientific notation" is a special form of this, often used (apart from computer systems) by engineers and scientists. As an example, 301,000 would be represented in "scientific notation" as 3.01E5; the precision, in this case the 301 portion, is normally stated as a value with exactly one non-zero digit written to the left of the decimal point; the scale, in this case 5, indicates how many positions the decimal point needs to be moved to the right in order to match the intended "real world" measurement (negative scale values mean that the decimal point needs to move to the left). As the "scale" value changes, the decimal point moves around or "floats" within the "precision" value. In mathematical language, the "precision" value is called the "mantissa" and the "scale" value is called the "exponent".

## IEEE 754 Standard

• Normalized Mantissa (Base 2) - in a normal (or "normalized") form, a floating point value in binary is always represented as 1.xxxx times (2 to the power of some "exponent") where xxxx represents some sequence of 0's and 1's (actually, there might be a plus or minus sign in front of the 1.xxxx as well, but let's ignore that for now). The main point here is that a "normalized" base 2 floating point value will always have a single 1 to the left of the decimal point. Since it will always be the same thing, there is no need to encode it; it can be assumed without taking up any actual code bit space.
• Excess-127 Exponent (Base 2) - the IEEE 754 standard specifies that the "exponent" will be encoded as an 8-bit value, using the unsigned binary code of a value which is 127 more than (in "excess" of) the actual base 2 exponent required to represent the desired value.
• 32-bit Format - this format allows for approximately 7 decimal digits of precision and a scale of approximately 40 digits.
```      1-bit sign | 8-bit exponent | 23-bit mantissa
(sign bit = 1 for negative values)

For example:
43 4D 40 00 (hex)
re-written in binary:
0100 0011  0100 1101  0100 0000 0000 0000
re-grouped:
sign:   exponent(+127):   mantissa:
0       10000110         (1.)100110101000...
(positive)    134(dec)
exponent:
134-127 = 7
moving the decimal point 7 positions to the right:
1   1  0  0  1  1  0  1 .  0   1
128 64 32 16  8  4  2  1   1/2 1/4 (weights)
= +205.25
```
• 64-bit Format - the 64-bit ("double precision") format has the same structure as the 32-bit format except that it uses an 11-bit exponent encoded in excess-1023 notation and a 52-bit mantissa. This provides for approximately 15 decimal digits of precision and a scale of 300 digits (in decimal).
• Special Values - note that some patterns are reserved for special values. Floating point values with either all 0's or all 1's represent non-standard (special) values. Specifically, for example, the value 0 is represented by a floating point encoding with all 0's in both the exponent and mantissa fields.

## Other Floating Point Forms

Main mainframe computers were designed prior to the establishment of the IEEE 754 standard and employ their own format for floating point encoding.
• IBM Mainframe - the IBM mainframe has three different floating point forms: a 32-bit, a 64-bit, and a 128-bit form. Unlike the IEEE forms, the exponent field is the same length for all forms: 7-bits (in excess-64 notation); the longer floating point forms have increased precision but not increased scale. The encoded base is 16 (instead of 2) so that the exponent is actually a indicator of how many times the decimal point should be shifted 4 bits to the right (or to the left if the exponent is negative); this results in an effective scale of a little over 250 (for decimal values). Note also that because the base is 16, the normalized precision may start with any value between 1 and 15; the leading value must therefore be encoded in the floating point form.

## Implementation Methods

How floating point arithmetic is actually performed varies among different computer systems.
• Software - in small or older microcomputer systems, floating point manipulation is/was done using software subroutines; most microcomputer languages still include these subroutines when you compile and link a program which uses floating point values.
• Math Co-Processor - larger and more modern microcomputers include a second processor (either as a second chip or built-in to the main processor); this processor has instructions, not found in the main processor, for performing floating point (and often BCD) arithmetic directly with hardware circuits.
• Built-in Floating Point Instructions - inclusion of floating point manipulation instructions within the main processor's instruction set is normally only found in large mainframe computer systems.