============================================================ Binary Integer Mathematics, unsigned, two's complement, etc. ============================================================ - Ian! D. Allen - idallen@idallen.ca - www.idallen.com How the computer adds integer numbers ------------------------------------- When the ALU inside the CPU adds one to any integer value, the same thing always happens. Basic ALU integer math doesn't care about sign or negative numbers. There is no concept, at the ALU level, of "signed" or "unsigned". It's all just bits being added, and the simple rules for binary addition apply. In four bits binary, the math (adding one to each value) looks like this: 0000 plus one is 0001 plus one is 0010 plus one is 0011 plus one is 0100 plus one is 0101 plus one is 0110 plus one is 0111 plus one is (also causes the overflow flag to be set in the CPU) 1000 plus one is 1001 plus one is 1010 plus one is 1011 plus one is 1100 plus one is 1101 plus one is 1110 plus one is 1111 plus one is (also causes carry flag to be set in the CPU) 0000 plus one is 0001 plus one is 0010 plus one is ...etc... Note how this list is really a ring or wheel, not a list. There is no real beginning or end to the repeating sequence of bit patterns; adding one just moves to the next bit pattern in the ring. Adding two moves two places around the ring, etc. Subtraction moves the other way. Four bits have only 2**4 (16) possible bit pattterns; that means they can represent only 16 different things. Eight bits can only represent 2**8 (256) different numbers, etc. Ten bits, 1024 numbers. Unsigned Numbers: If all 16 of these four-bit patterns represent positive numbers, the range of numbers is 0 to 15. The smallest number is zero with bit pattern 0000 and the largest number is 15 with bit pattern 1111. Adding one to 15 doesn't give 16, it wraps around to start over at zero (0000) again. Subtracting one from zero wraps around to give the bit pattern for 15 (1111). As long as the adding/subtracting stays within the numbers 0 (0000) to 15 (1111), the unsigned math is correct. Signed Numbers: If we choose to have some of these 16 four-bit patterns represent negative numbers, we can't use as many bit patterns for positive numbers. Most systems for negative numbers divide up the 16 patterns into half positive and half negative. If the top (leftmost, most significant) bit is on, we can decide that the bit pattern represents a negative number. We call this top bit the "sign" bit. If half the bit patterns have the sign bit on, only eight bit patterns are left to represent positive numbers, and the range of eight positive numbers is now only 0 to +7 (a total of eight numbers). The other eight bit patterns, with sign bit on, represent eight negative numbers, usually the numbers from -1 down to -8. Two's Complement Negative Numbers: Which bit patterns represent each of the negative numbers from -1 down to -8? If we subtract one from zero (0000), the result is 1111; so, 1111 must be the bit pattern for -1. If we subtract one from 1111 (-1) we get 1110; so, 1110 must be the bit pattern for -2, etc., down to -8 with bit pattern 1000. This is the "two's complement" system for negative numbers. Subtracting one from -8 (1000) doesn't give -9, it wraps around to give the bit pattern for +7 (0111). Adding one to +7 (0111) doesn't give +8, it wraps around to start over at -8 (1000) again. With the introduction of two's complement negative numbers, the smallest four-bit number is -8 with bit pattern 1000 and the largest number is +7 with bit pattern 0111. Adding one to -1 (1111) correctly gives zero (0000). As long as the adding/subtracting stays within the range of numbers -8 (1000) to +7 (0111), the signed math is correct. The wrapping-around is true for all two's complement number systems - adding one to the highest positive number wraps around to give the most negative number. The most negative number always has an absolute value one larger than the largest positive number, since zero is included with the positive numbers. Thus, if you know that the largest positive 8-bit two's complement value is +127, you immediately know that the smallest negative value must be -128. For four-bit unsigned numbers, the range of 16 bit patterns starts at zero (0000) and goes up to 15 (1111). Unsigned math that steps outside this range (goes below 0000 or above 1111) is wrong - the answer wraps around incorrectly. For four-bit signed two's complement numbers, the range of 16 bit patterns starts at -8 (1000) and goes up to +7 (0111). Signed math that steps outside this range (goes below 1000 or above 0111) is wrong - the answer wraps around incorrectly. Note how unsigned math treats the list of 16 consecutive bit patterns as going from 0 to 15 (0000 to 1111) and signed math treats the list as the 16 patterns going from -8 to +7 (1000 to 0111). For addition (or subtraction) of these 16 four-bit numbers, it doesn't matter if the bit pattern is considered signed or unsigned - when you say "x = x + 1" the ALU inside the CPU just adds one to the bits in memory and possibly sets some CPU flags. Signed, unsigned, same thing. The CPU knows nothing about negative numbers; it only does binary addition to bit patterns. Rule: When adding binary, octal, or hexadecimal bit patterns together, just do the binary math and express the result in the allowed number of bits, throwing away the carry if it doesn't fit. Examples of binary integer math: binary 4 bits: 0000 + 0111 = 0111 hexadecimal 4 bits: 0h + 7h = 7h binary 4 bits: 0100 + 0100 = 1000 hexadecimal 4 bits: 4h + 4h = 8h binary 4 bits: 1010 + 1010 = 0100 (no room for the carry in 4 bits) hexadecimal 4 bits: Ah + Ah = 4h (no room for the carry in 4 bits) binary 4 bits: 1100 + 1100 = 1000 (no room for the carry in 4 bits) hexadecimal 4 bits: Ch + Ch = 8h (no room for the carry in 4 bits) Hex math calculator: http://www.csgnetwork.com/hexaddsubcalc.html hex 16 bits: 1ABCh + 0001h = 1ABDh hex 16 bits: 1ABCh + 0009h = 1AC5h hex 16 bits: 1ABCh + 9C3Eh = B6FAh hex 16 bits: 1ABCh + FFFFh = 1ABBh (no room for carry in 16 bits) hex 16 bits: 1ABCh + F000h = 0ABCh (no room for carry in 16 bits) Status Flags for binary integer mathematics =========================================== The ALU in the CPU has two math status flags in it: the "carry" flag and the "overflow" flag. These flags are set or unset after an integer math operation. The flags have different meanings depending on whether you are treating the bit patterns as all-positive, or postitive and negative numbers. Carry Flag - for unsigned integer math -------------------------------------- The carry flag is set in the ALU whenever any binary mathematics causes a "carry" out of the top (leftmost, sign) bit, e.g. in four bit binry math: 1000+1000=0000 (and the carry flag is set). The carry flag indicates that the answer didn't fit in the number of bits we are using - there was "carry" that would have required one more bit to represent. If we treat the bits in a word as "unsigned" numbers (all positive, no negative), we must watch to see if our arithmetic causes the "carry" flag to be set, indicating the (unsigned) result is wrong. (We don't care about the overflow flag when doing unsigned math. The overflow flag is only relevant to signed numbers, not unsigned.) 4-bit example: 1000 + 1000 = 0000 (and the carry flag comes on in the CPU) --> In decimal: 8 + 8 = 0 (WRONG - answer doesn't fit in 4 bits) - if we were allowed 5 bits, the answer would be correct as 10000 (16) 4-bit example: 1111 + 0011 = 0010 (and the carry flag comes on in the CPU) --> In decimal: 15 + 3 = 2 (WRONG - answer doesn't fit in 4 bits) - if we were allowed 5 bits, the answer would be correct as 10010 (18) The carry flag indicates that the binary math answer needs more bits. Overflow Flag - for two's complement integer math ------------------------------------------------- The overflow flag is set by the CPU whenever the ALU adds two numbers together with the same sign bit, and the result has the opposite sign bit, e.g. in four bit binary math: 0100+0100=1000 (and the overflow flag is set). The ALU always sets the carry and sign bits when doing integer mathematics; your program must interpret these bits or ignore them. The sign bit and overflow flag only have meaning if we choose to interpret the bit patterns using the two's complement number system, which assigns half the bit patterns to negative numbers. If we treat the bits in a word as "two's complement" signed values, we must watch to see if our arithmetic causes the "overflow" flag to be set, indicating the (two's complement) result is wrong. (We don't care about the carry flag when doing signed, two's complement math. The carry flag is only relevant to unsigned numbers, not signed.) 4-bit example: 0100 + 0100 = 1000 (and the overflow flag comes on in the CPU) --> in decimal: 4 + 4 = -8 (WRONG ANSWER for two's complement) Some mathematics on bit patterns gives answers that are wrong both for unsigned and two's complement numbers, in which case *both* the carry and overflow flags will be set in the CPU: 4-bit example: 1010 + 1010 = 0100 (and both carry and overflow come on) --> in decimal (unsigned): 10 + 10 = 4 (WRONG) --> in decimal (two's complement): -6 + -6 = +4 (WRONG) References ---------- See also "odometer math" in: http://elearning.algonquincollege.com/coursemat/pincka/dat2343/lectures.f03/04-Unsigned-Binary-Encoding.htm http://elearning.algonquincollege.com/coursemat/pincka/dat2343/lectures.f03/06-Signed-Binary-Encoding.htm Another good reference, showing the number wheel: http://www.cs.nmsu.edu/~pfeiffer/classes/273/notes/binary.html http://www.cs.nmsu.edu/~pfeiffer/classes/273/notes/neg.html Converting hex to decimal using bit flipping and adding one: http://www.madsci.org/posts/archives/2000-02/950277263.Cs.r.html Base Converter: Convert numbers in any base up to 32: http://www.cut-the-knot.org/binary.shtml Hex (only) to decimal and binary converter, and vice-versa: http://www.easycalculation.com/hex-converter.php http://www.easycalculation.com/decimal-converter.php ========================= C Language Implementation ========================= In C language with 32-bit integers: signed int sx = 2000000000; signed int sy = 2000000001; signed int sz = sx + sy; unsigned int ux = 2000000000; unsigned int uy = 2000000001; unsigned int uz = ux + uy; printf("sz is 0x%x which could be %d or %u\n", sz, sz, sz); printf("uz is 0x%x which could be %d or %u\n", sz, uz, uz); Output (first "0x" output is in hexadecimal): sz is 0xEE6B2801 which could be -294967295 or 4000000001 uz is 0xEE6B2801 which could be -294967295 or 4000000001 The hex value 0xEE6B2801 when turned into 32-bit binary looks like this: 11101110011010110010100000000001 See - absolutely no difference in the two bit patterns in memory. The bit patterns in memory for signed and unsigned math are the same; because, there is no difference in what the ALU does. The ALU doesn't care. Adding/subtracting don't care about sign; it is all pure binary math on bit patterns. As you see above, once we have some bits in memory, we can tell our program to interpret the bits in either an unsigned manner ("%u" - all the bit patterns represent non-negative numbers) or in a signed manner ("%d" - about half of the bit patterns will be displayed as negative, with leading minus). Signed/unsigned doesn't matter in computer math. Where a signed/unsigned declaration plays a big part is in *comparing* numbers (and in bit-shifting; but, save that for another day). If we compare the bit pattern 0xEE6B2801 (above) with zero, is the bit pattern less than zero or much greater than zero? *That* answer depends on whether we declared the memory location (the variable) to be signed or unsigned. If we declare a 32-bit variable as unsigned, numeric comparisons treat all 4,294,967,296 bit patterns from zero up to 0xFFFFFFFF (11111111111111111111111111111111) as non-negative numbers (zero to 4,294,967,295 for 32-bit numbers). Comparisons (unsigned) will never say any of these patterns are less than zero. If we declare a 32-bit variable as signed, we are saying to our compiler that numeric comparisons using that variable will treat about half of those 4,294,967,296 bit patterns as negative numbers. Any bit patterns in signed variables that have the leftmost bit (the sign bit) set, will be interpreted and compared as "less than zero". More C code: /* both sz and uz contain the same bit pattern 0xEE6B2801 * signed sz interprets the bits as -294967295 * unsigned uz interprets the bits as 4000000001 */ if ( sz < 0 ) printf("sz is less than zero\n"); else printf("sz is not less than zero\n"); if ( uz < 0 ) printf("uz is less than zero\n"); else printf("uz is not less than zero\n"); Output: sz is less than zero uz is not less than zero Remember that sz and uz contain exactly the same bit pattern! Because we declared the sz variable to be signed, the code generated by the compiler to test whether the bit pattern in sz is less than zero tests the sign bit (leftmost bit) on the sz memory, notices that is is on ('1') and declares the bit pattern as "less than zero" (signed). All the bit patterns with the sign bit on will be interpreted as negative numbers. Because we declared the uz variable to be unsigned, the code generated by the compiler to test whether the bit pattern in uz is less than zero has no work to do at all - unsigned numbers are never less than zero. The compiler arranges to print: "uz is not less than zero" As a programmer, declaring variables as signed/unsigned is just a convenient way of having the compiler conspire with us to treat half the bit patterns as "less than zero". For math (adding/subtracting), the declaration of signed/unsigned doesn't make any difference. For mathematical comparisons (and bit-shifting), it does. -- | Ian! D. Allen - idallen@idallen.ca - Ottawa, Ontario, Canada | Home Page: http://idallen.com/ Contact Improv: http://contactimprov.ca/ | College professor (Open Source / Linux) via: http://teaching.idallen.com/ | Defend digital freedom: http://eff.org/ and have fun: http://fools.ca/