Byte Order - Big and Little Endian

Reference: http://en.wikipedia.org/wiki/Endianness
Long Discussion: http://www.ietf.org/rfc/ien/ien137.txt

Most of the numeric (scalar) values stored in the computer memory or on storage media such as disks are more than one byte long, e.g. 2-byte integers, 8-byte double-precision floating point numbers, etc.

This brings up the question of how to store multi-byte quantities on machines where each byte has its own address - which byte gets stored at the "first", lower, memory location, and which bytes follow in higher memory addresses?

If a two-byte integer 0x55FF is stored on disk by one machine with the 0x55 (high byte) stored at the lower memory address and the 0xFF (low byte) stored at a higher memory address, but a different machine reads the integer by picking up the 0xFF for the high byte and the 0x55 for the low byte, giving 0xFF55, the two machines will not agree on the value of the integer!

Alas, there is no "right" ordering to store the bytes in multi-byte quantities. Hardware is built to handle the bytes in a particular order, and as long as compatible hardware reads the bytes in the same order, things are fine. We will look at two major types of byte ordering: Little-Endian and Big-Endian.

A quote from Bruce McKinney's Hardcore Visual Basic:

Endian refers to the order in which bytes are stored. The term is taken from a story in Gulliver’s Travels by Jonathan Swift about wars fought between those who thought eggs should be cracked on the Big End and those who insisted on the Little End. With chips, as with eggs, it doesn’t really matter as long as you know which end is up.

Little Endian

If the hardware is built so that the lowest, least significant byte of a multi-byte scalar is stored "first", at the lowest memory address, then the hardware is said to be "little-endian"; the "little" end of the integer gets stored first, and the next bytes get stored in higher (increasing) memory locations. Little-Endian byte order is "littlest end goes first (to the littlest address)".

Machines such as the Intel/AMD x86, Digital VAX, and Digital Alpha, handle scalars in Little-Endian form.

Big Endian

If the hardware is built so that the highest, most significant byte of a multi-byte scalar is stored "first", at the lowest memory address, then the hardware is said to be "big-endian"; the "big" end of the integer gets stored first, and the next bytes get stored in higher (increasing) memory locations. Big-Endian byte order is "biggest end goes first (to the lowest address)".

Machines such as IBM mainframes, the Motorola 680xO, Sun SPARC, PowerPC, and most RISC machines, handle scalars in Big-Endian form.

Four-byte Example

Consider the four-byte integer 0x44332211. The "little" end byte, the lowest or least significant byte, is 0x11, and the "big" end byte, the highest or most significant byte, is 0x44. The two storage patterns for the four bytes are:

Memory address	Big-Endian byte value	Little-Endian byte value
104	11	44
103	22	33
102	33	22
101	44	11

If we were to look at a memory dump of those four bytes, what would it look like? A memory dump always displays adjacent memory bytes from left-to-right on the page, in increasing memory address order. Low memory addresses are on the left and the addresses go up one-by-one as we move to the right on a line in the dump.

If we display the memory dump of the number 0x44332211 stored in memory at address 101 in Big-Endian order, we see something like this:

   ADDRESS: ---------- MEMORY BYTES ---------- 
       100: 00 44 33 22 11 00 00 00 00 00 ...

In the Big-Endian storage order, the "big", most significant, byte 0x44 is stored "first", at the lowest memory location 101. The 0x44 displays to the left of the other bytes, which follow in ascending left-to-right memory address order: 102, 103, 104. The order of the bytes output in the dump "44 33 22 11" is the same order as the bytes in the integer when written as 0x44332211. This is a nice property of Big-Endian hardware - the bytes of multi-byte numbers are dumped on the page in the "correct" order.

If we display the memory dump of the same number 0x44332211 stored in memory at address 101 in Little-Endian order, we see something like this:

   ADDRESS: ---------- MEMORY BYTES ---------- 
       100: 00 11 22 33 44 00 00 00 00 00 ...

In the Little-Endian storage order, the "little", least significant, byte 0x11 is stored "first", at the lowest memory location 101. The 0x11 displays to the left of the other bytes, which follow in ascending left-to-right memory address order: 102, 103, 104. But the order of the bytes output in the dump "11 22 33 44" appears to be "backwards" to the way we read the original 0x44332211 number! This is an awkward thing about Little-Endian dumps - the bytes of multi-byte scalars are listed "backwards" on the page. The hardware knows how to pick up the bytes in the correct order, but as humans, we have to remember to reverse the bytes that we see in the Little-Endian dump before we reconstruct the scalar value. We see the four-byte integer in the Little-Endian dump output as "11 22 33 44" and we must reverse and reconstruct it as 0x44332211.

NOTE: Always reverse the order of bytes of multi-byte scalars that you see in a Little-Endian dump!

Endianness and Character Data

Single-byte character data such as ASCII and Latin-1 is not affected by Endianness. If you store any ASCII character string in memory, it always looks the same, no matter what the Endianness of the hardware, since each character is one byte long and the start character of the string is always stored at the lowest memory location. For the string "abcd", the "a" is stored "first", i.e.

Memory address	byte value	ASCII
104	64	d
103	63	c
102	62	b
101	61	a

If you dump character data from memory, it reads correctly left-to-right in the dump, too:

   ADDRESS: ---------- MEMORY BYTES ----------     --- ASCII CHARACTERS ---
       100: 00 61 62 63 64 00 00 00 00 00 ...      .abcd....

Thus, no matter what the actual Endianness of the hardware, single-byte character data byte order resembles Big-Endian byte order - the left-most byte goes first - and it looks good in dump output.

None of this Endian-independence holds for multi-byte characters, e.g. Unicode, where each character takes more than one byte to represent. To correctly read multi-byte characters you need to know the Endianness used to store them. For example, if you mis-read the two-btye UTF-16 character 0x0041 (the letter "A") with bytes reversed as 0x4100, you get a UTF-16 Chinese ideograph that stands for "calamity, disaster, evil, or misfortune". Watch out!

Conversions Between Endianness

You can see that a file full of Little-Endian four-byte integers (and/or two-byte UTF-16 characters) will not read correctly on a Big-Endian machine, since the Big-Endian hardware will pick up each of the four-byte numbers (and the two-byte characters) in the "wrong" byte order, and vice-versa.

To pass multi-byte scalar data from one machine to another may require that each scalar be individually "byte-swapped" - the order of the bytes in each scalar may need to be reversed so that the number can be correctly read by the other hardware. Note that doing this requires an intimate knowledge of exactly where the scalars (and UTF-16 characters) are in the input, and how big they are. You can't simply swap all sequences of four bytes, since not all bytes may belong to four-byte integers!

So how does a program on one machine send an integer or other scalar to another machine, without knowing the byte order of both machines? You can't. You have to know the byte ordering of both machines.

Some programs try to finesse the problem by converting all scalar data into ASCII character strings, which are Endian-independent, e.g. rather than send the two-byte integer 0x010A (266 decimal), the program would send the three-byte ASCII string "266", since ASCII strings don't depend on byte ordering. The remote machine would convert from ASCII back to the native integer format.

-----------

Endian Wars

"This is an attempt to stop a war. I hope it is not too late and that somehow, magically perhaps, peace will prevail again." - Danny Cohen, 1980

Should multi-byte data types be stored with the least significant byte (LSB) at the lowest memory location (little-endian: 80x86, VAX, Alpha) or the highest memory location (big-endian: Motorola, PPC, SPARC, and just about everyone else)? In other words, if you grab the byte at the lowest address, do you grab the little-end of the number (LSB, little-endian) or the big end of the number (MSB, big-endian)?

Code Corner - Big-Endian Little-Endian - What's it all about

Dr. Bill's Notes on Little Endian vs. Big Endian

Little Endian vs. Big Endian

Here are some direct references to the original 1980 article by Danny Cohen that gave the names "endian" to the issue:

DAV's Endian FAQ

Big Endian vs. Little Endian

And here are grown-up people still arguing about the issue in August 2000, twenty years later:

Slashdot: More Linux on Merced info