Bits, bytes and basic data types

To understand how information is represented, stored and transmitted by computers, it's important to first understand how binary numbers work. Digital computers represent data using $0$s and $1$s, like switches with an on or off state. Binary numbers are numbers that are represented using only the digits 0 and 1. A bit is a single binary digit. It only has one of two states. It is either on or off (1 or 0).

A byte is the smallest addressable unit of memory for most computers. A byte is defined on most computers as a set of 8 bits. In the early 60's, a now widely supported 128 character set called the American Standard Code for Information Interchange (ASCII) [1] was adopted as the Federal Information Processing Standard. It can be described using only 7 bits. However, around the same time, IBM also produced the Extended Binary Coded Decimal Interchange Code (EBCDIC) for their new System/360 line of mainframe computers. This helped solidify the adoption of the 8-bit storage size. [2]

There exist other character encoding standards that expand on ASCII and are backward compatible. For example, some UTF-8 characters can require up to 4 bytes to describe every character in the set.


Figure 1: A byte consists of 8 bits.

Common data type byte sizes (ISO C)

Computers need to store more than just text data. Other common data types include integers, decimal numbers and even more complex structures. These generally require more than 8 bits to describe. For example, floating point numbers, which are basically decimal numbers with 7 digits of precision require 4 bytes. By concatenating bytes together, more complex data can be stored and reconstituted. A file in a computer consists of a continuous stream of bytes. Decoding the bytes of a particular file requires knowledge of the file format. In an ASCII file, each byte represents a different character in the text file making it one of the easiest to decode.

Here is a list of common C language data types and their byte sizes:

  • byte - 1 byte (8-bits)
  • char - 1 byte (8-bits)
  • short - 2 bytes (16-bits)
  • int - 2 bytes (16-bits)
  • long - 4 bytes (32-bits)
  • float - 4 bytes (32-bits)
  • double - 8 bytes (64-bits)
The byte and char

Both a 'byte' and a 'char' are 8 bit data types. In the C programming language, they are interchangeable. A 'char' is effectively a byte mapped to the ASCII data set.

The integer

In the C programming language, there are 3 integer datatypes: the 'short' (a short integer), the 'int' (an integer), and the 'long' (a long integer). A 'short' (short integer) is a 2-byte data type and the 'int' (integer) is most commonly defined to be the same as a 'short'. The decimal value of the 16 digit binary number created by the 2 bytes describes the value of the integer. If the integer is unsigned, the 16 bits can describe $2^16 = 65635$ integers; a number between 0 to 65535. If the integer is signed, the first digit's value is used to define whether the number is positive or negative. This leaves 15 digits, which can represent 32768 numbers; a number between −32,768 to 32,767.

A 'long' (long integer) is a 4 byte data type. Like the 'short' and 'int', the binary value of the 32 bits provides the decimal value of the long. If signed, this provides a number between −2,147,483,648 to 2,147,483,647. If unsigned, this provides a number between 0 to 4,294,967,295.

On most machine architectures, an integer and a short are identical, however the C standard allows the integer to be represented using either 2 or 4 bytes. A code author can enforce the byte size for variables using 'short' or 'long' data types in code intended for multiple machine types. This can make the code more portable. Most applications don't need to worry about the byte size of the integer and 'int' can be used without worry.

The float (single precision)

The float is a 4 byte data type which describes a decimal number with 7 to 9 digits of precision [3]. How each bits of the 4 bytes are used to represent the floating point number is illustrated in Figure 2. The first digit controls the sign of the number, the next 8 bits specifies what is called the exponent and the remaining 23 bits define the fraction.


Figure 2: IEEE 754 single-precision binary floating-point 4 byte format [4].

The value of a 32 bit float

The value of a 4 byte float is:
$value = (-1)^{sign}\left(1+\sum_{i=1}^{23}b_{23-i}2^{-i}\right)2^{e-127}$

In the example shown in Figure 2, the exponent value of the 8 digit binary number 01111100 is 124. The value of the float is determined as:
$value = (-1)^0\left(1+b_{21}2^{-2}\right)2^{(124-127)}\\
value = (1+0.25)2^{-3}\\
value = 0.15625$

The double (double precision float)

The double is an 8 byte data type which describes a decimal number with 15 to 17 digits of precision [5]. How each bits of the 4 bytes are used to represent the floating point number is illustrated in Figure 2. The first digit controls the sign of the number, the next 8 bits specifies what is called the exponent and the remaining 23 bits define the fraction.


Figure 3: IEEE 754 double-precision binary floating-point 8 byte format [6].

The value of a double can be determined similarly to the float from its 64 bits as:
$value = (-1)^{sign}\left(1+\sum_{i=1}^{52}b_{52-i}2^{-i}\right)2^{e-1023}$

References

[1] ASCII table by Yuriy Arabskyy [CC-BY-SA-3.0], via Wikimedia Commons, 2013.
[2] Wikipedia: Byte, March 1st, 2014.
[3] Wikipedia: Single-precision floating-point format, March 8th, 2014.
[4] Float example by Fresheneesz [CC-BY-SA-3.0], via Wikimedia Commons, 2007.
[5] Wikipedia: Double-precision floating-point format, March 8th, 2014.
[6] Double example by Codekaizen [CC-BY-SA-3.0], via Wikimedia Commons, 2008.

Documentation License: