=============================== The Big Picture on Bit Patterns =============================== -Ian! D. Allen - idallen@idallen.ca - www.idallen.com Bit patterns have no inherent meaning. They may represent signed integers, unsigned integers, floating point numbers, characters, or even executable program instructions. The instructions that operate on the bits give the bits meaning. You write the programs that generate those instructions. Example: The 32-bit pattern 00111111100000000000000000000000 (3F800000h) If you interpret this bit pattern as: 1. unsigned integer -> 1065353216 decimal 2. sign/magnitude -> 1065353216 decimal 3. two's complement -> 1065353216 decimal 4. IEEE 754 SP FP -> 1.0 decimal 5. Four 7-bit ASCII characters (in 8-bit bytes): 00111111 = 63 decimal = '?' (question mark character) 10000000 = 128 decimal = NOT ASCII (ASCII is only 0-127 decimal) 00000000 = 0 = NUL (control character - not printable) 00000000 = 0 = NUL (control character - not printable) 6. Four 8-bit, excess-127 integers: 00111111 = 63 --> 63-127 = -64 decimal 10000000 = 128 --> 128-127 = +1 decimal 00000000 = 0 --> 0-127 = -127 decimal 00000000 = 0 --> 0-127 = -127 decimal Example: The 32-bit pattern 10111111100000000000000000000000 (BF800000h) If you interpret this bit pattern as: 1. unsigned integer -> 3212836864 decimal 2. sign/magnitude -> -1065353216 decimal 3. two's complement -> -1082130432 decimal 4. IEEE 754 SP FP -> -1.0 decimal 5. Four 7-bit ASCII characters (in 8-bit bytes): 10111111 = 191 decimal = NOT ASCII (ASCII is only 0-127 decimal) 10000000 = 128 decimal = NOT ASCII (ASCII is only 0-127 decimal) 00000000 = 0 = NUL (control character - not printable) 00000000 = 0 = NUL (control character - not printable) 6. Four 8-bit, excess-127 integers: 10111111 = 191 --> 191-127 = +64 decimal 10000000 = 128 --> 128-127 = +1 decimal 00000000 = 0 --> 0-127 = -127 decimal 00000000 = 0 --> 0-127 = -127 decimal If your program works correctly, you read out of memory the same type of data as you store into memory, and everything you store has its own separate memory location so that you don't overwrite anything. When programs go bad, they may write one type of data into memory and read it out as a different data type, causing program misbehaviour. The worst type of overwriting occurs when character or numeric data overwrites executable instructions, possibly causing the program to surrender control to an attacker. *** Numbers Numbers represented in computers have a limited size (number of bits), hence limited precision and limited range. Numbers can be stored in many ways. Two common ways are as integers or as floating-point values. Both precision and range are essentially the same for integers, since integers have no exponent field. Floating point numbers have both a mantissa (for precision) and an exponent field (for range); they usually trade away some bits of precision in favour of greater range. Though floating-point numbers almost always have a greater range than integers, the range is not infinite. To store a value accurately in a floating-point representation in a computer, two things must work out: the number's value must lie within the *range* of the floating-point representation (the exponent must fit), and the value must not lose any *precision* (the mantissa must fit). In practice, some precision is often lost when working with floating-point numbers. As a simple example, the decimal value 0.1 (1/10 or one tenth) cannot be accurately represented as a binary floating-point number. No finite sum of powers-of-two will ever equal exactly 0.1, just as no finite sum of powers-of-ten will ever exactly equal one-third (0.3333...). If you write a computer program that adds together ten tenths, the result will not equal 1.0, though it may come very close. *** Characters Characters may be represented in computers using any one of many different standards. Some standards allocate only one byte per character (e.g. ASCII or Latin-1); other standards always use multi-byte characters (e.g. Unicode); other standards use single- and multi-byte characters depending on which character is being encoded (e.g. UTF-8). -- | Ian! D. Allen - idallen@idallen.ca - Ottawa, Ontario, Canada | Home Page: http://idallen.com/ Contact Improv: http://contactimprov.ca/ | College professor (Free/Libre GNU+Linux) at: http://teaching.idallen.com/ | Defend digital freedom: http://eff.org/ and have fun: http://fools.ca/