Floating Point
Encoding non-integer values requires the use of Scientific or Floating-point notation. Many
floating-point systems exist but one of the more common is the IEEE 754 standard.
A value encoded in floating-point format is composed of two major components: a mantissa and
an exponent. The actual structure of these is examined for the IEEE 754 standard.
Floating-point values may be handled completely through software, by means of an extra "math
co-processor", or by complex instructions built into the main processor.
Fractional Values
Fractions can be handled in 3 different ways:
- The Ratio of 2 Integer Values - This is the first form most people consider when they think
of the term "fraction" e.g. 2/7. Except for some specially developed software packages or very
specialized mathematics co-processors, this form is not supported by computers.
- Fixed-Point Values - We often refer to this form as "decimal fractions" although the base-10
(decimal) system really has nothing to do with it (the same positional notation could be, and
sometimes is, used with binary and hexadecimal represented values). A (decimal) point is used
to separate the digit position with a weight of 1 from digit positions to its right representing
values whose weights are 1/base (e.g. 1/10 for decimal) smaller than the weight to their
immediate left. A specific "Fixed point" values is always coded with the same number of digits
to the right of the (decimal) point. Computers store this kind of "fraction" internally in the
same way as they store integer values and divide by an appropriate power of the base when
combining this value with other numbers or when performing IO. For example, "dollar values"
may often be stored as fixed point values with 2 digits to the right of the decimal point; in fact,
the value is stored as an integer number of cents and only divided by 100 for output display.
- Floating-Point Values - Often when using numbers to represent "real world" measurements,
we know that our values are not perfectly accurate. We say that the distance between two
towns is a certain number of kilometers, where even if we are correct in our statement to the
nearest kilometer, we most certainly are not correct to the nearest millimeter. Furthermore, we
don't care! The value we have given is "good enough" for our purposes. Whether some
distance is 27 kilometers or 270 kilometers is a much more important question than whether
some other distance is 300,000 kilometers or 301,000 kilometers. Differences in "scale" are
more important than differences in "precision" at the same "scale". Floating-point values are a
method for representing values in a manner that recognizes this difference in importance.
"Scale" is considered as a separate value separate from some (approximated) "precision" value.
"Scientific notation" is a special form of this, often used (apart from computer systems) by
engineers and scientists. As an example, 301,000 would be represented in "scientific notation"
as 3.01E5; the precision, in this case the 301 portion, is normally stated as a value with exactly
one non-zero digit written to the left of the decimal point; the scale, in this case 5, indicates
how many positions the decimal point needs to be moved to the right in order to match the
intended "real world" measurement (negative scale values mean that the decimal point needs to
move to the left). As the "scale" value changes, the decimal point moves around or "floats"
within the "precision" value. In mathematical language, the "precision" value is called the
"mantissa" and the "scale" value is called the "exponent".
IEEE 754 Standard
- Normalized Mantissa (Base 2) - in a normal (or "normalized") form, a floating point value in
binary is always represented as 1.xxxx times (2 to the power of some "exponent") where xxxx
represents some sequence of 0's and 1's (actually, there might be a plus or minus sign in
front of the 1.xxxx as well, but let's ignore that for now). The main point here is that a
"normalized" base 2 floating point value will always have a single 1 to the left of the decimal
point. Since it will always be the same thing, there is no need to encode it; it can be assumed
without taking up any actual code bit space.
- Excess-127 Exponent (Base 2) - the IEEE 754 standard specifies that the "exponent" will be
encoded as an 8-bit value, using the unsigned binary code of a value which is 127 more than (in
"excess" of) the actual base 2 exponent required to represent the desired value.
- 32-bit Format - this format allows for approximately 7 decimal digits of precision and a scale
of approximately 40 digits.
1-bit sign | 8-bit exponent | 23-bit mantissa
(sign bit = 1 for negative values)
For example:
43 4D 40 00 (hex)
re-written in binary:
0100 0011 0100 1101 0100 0000 0000 0000
re-grouped:
sign: exponent(+127): mantissa:
0 10000110 (1.)100110101000...
(positive) 134(dec)
exponent:
134-127 = 7
moving the decimal point 7 positions to the right:
1 1 0 0 1 1 0 1 . 0 1
128 64 32 16 8 4 2 1 1/2 1/4 (weights)
= +205.25
-
64-bit Format - the 64-bit ("double precision") format has the same structure as the 32-bit
format except that it uses an 11-bit exponent encoded in excess-1023 notation and a 52-bit
mantissa. This provides for approximately 15 decimal digits of precision and a scale of 300
digits (in decimal).
- Special Values - note that some patterns are reserved for special values. Floating point values
with either all 0's or all 1's represent non-standard (special) values. Specifically, for example,
the value 0 is represented by a floating point encoding with all 0's in both the exponent and
mantissa fields.
Other Floating Point Forms
Main mainframe computers were designed prior to the establishment of the IEEE 754 standard
and employ their own format for floating point encoding.
- IBM Mainframe - the IBM mainframe has three different floating point forms: a 32-bit, a
64-bit, and a 128-bit form. Unlike the IEEE forms, the exponent field is the same length for all
forms: 7-bits (in excess-64 notation); the longer floating point forms have increased
precision but not increased scale. The encoded base is 16 (instead of 2) so that the exponent
is actually a indicator of how many times the decimal point should be shifted 4 bits to the right
(or to the left if the exponent is negative); this results in an effective scale of a little over 250
(for decimal values). Note also that because the base is 16, the normalized precision may start
with any value between 1 and 15; the leading value must therefore be encoded in the floating
point form.
Implementation Methods
How floating point arithmetic is actually performed varies among different computer systems.
- Software - in small or older microcomputer systems, floating point manipulation is/was done
using software subroutines; most microcomputer languages still include these subroutines
when you compile and link a program which uses floating point values.
- Math Co-Processor - larger and more modern microcomputers include a second processor
(either as a second chip or built-in to the main processor); this processor has instructions, not
found in the main processor, for performing floating point (and often BCD) arithmetic directly
with hardware circuits.
- Built-in Floating Point Instructions - inclusion of floating point manipulation instructions
within the main processor's instruction set is normally only found in large mainframe computer
systems.