Key words: data abstraction in C, fundamental data types in C, little endian and big endian byte ordering, extended sign bit representation, 1’s and 2’s complement, IEEE 754, single precision floating point representation, storing negative floats
Topics at a glance:
- The simplicity of ‘C’ and it’s power – the art of data abstraction
- Storing Integers – ordering bytes and packing them together
- Storing negative numbers – adding a negative sign ‘-‘ is not just enough
- Floating point numbers and IEEE 754 format
What is so interesting about the C programming language? It’s simplicity and power. It works so close to the hardware. The direct manipulation of addresses through pointers is what makes C so powerful and flexible. We all know, what are pointers and how useful they are for programmers. But there are various other features also, that ‘C’ provides, and it is imperative to know them to understand the philosophy of ‘C’.
C is a procedural programming language. There are essentially two aspects that a programmer should know when we talk about C’s realization of this programming paradigm.
Aspect 1: Data abstraction
Aspect 2: Execution model
Before going into further explanation, let me set some context right. Go through the points below.
- For all my analysis, I have taken a 32-bit machine, unless specified otherwise.
- 32-bit is also grouped as 4 bytes, 1 byte being 8 bits.
- For representation of data, hexadecimal number system is used instead of binary, for convenience.
For a quick reference go through the table below. It shows decimal, binary and hexadecimal values of numbers from 0 to 100.
Decimal |
Binary |
Hexadecimal |
Decimal |
Binary |
Hexadecimal |
Decimal |
Binary |
Hexadecimal |
Decimal |
Binary |
Hexadecimal |
0 |
0 |
0 |
25 |
11001 |
19 |
50 |
110010 |
32 |
75 |
1001011 |
4B |
1 |
1 |
1 |
26 |
11010 |
1A |
51 |
110011 |
33 |
76 |
1001100 |
4C |
2 |
10 |
2 |
27 |
11011 |
1B |
52 |
110100 |
34 |
77 |
1001101 |
4D |
3 |
11 |
3 |
28 |
11100 |
1C |
53 |
110101 |
35 |
78 |
1001110 |
4E |
4 |
100 |
4 |
29 |
11101 |
1D |
54 |
110110 |
36 |
79 |
1001111 |
4F |
5 |
101 |
5 |
30 |
11110 |
1E |
55 |
110111 |
37 |
80 |
1010000 |
50 |
6 |
110 |
6 |
31 |
11111 |
1F |
56 |
111000 |
38 |
81 |
1010001 |
51 |
7 |
111 |
7 |
32 |
100000 |
20 |
57 |
111001 |
39 |
82 |
1010010 |
52 |
8 |
1000 |
8 |
33 |
100001 |
21 |
58 |
111010 |
3A |
83 |
1010011 |
53 |
9 |
1001 |
9 |
34 |
100010 |
22 |
59 |
111011 |
3B |
84 |
1010100 |
54 |
10 |
1010 |
A |
35 |
100011 |
23 |
60 |
111100 |
3C |
85 |
1010101 |
55 |
11 |
1011 |
B |
36 |
100100 |
24 |
61 |
111101 |
3D |
86 |
1010110 |
56 |
12 |
1100 |
C |
37 |
100101 |
25 |
62 |
111110 |
3E |
87 |
1010111 |
57 |
13 |
1101 |
D |
38 |
100110 |
26 |
63 |
111111 |
3F |
88 |
1011000 |
58 |
14 |
1110 |
E |
39 |
100111 |
27 |
64 |
1000000 |
40 |
89 |
1011001 |
59 |
15 |
1111 |
F |
40 |
101000 |
28 |
65 |
1000001 |
41 |
90 |
1011010 |
5A |
16 |
10000 |
10 |
41 |
101001 |
29 |
66 |
1000010 |
42 |
91 |
1011011 |
5B |
17 |
10001 |
11 |
42 |
101010 |
2A |
67 |
1000011 |
43 |
92 |
1011100 |
5C |
18 |
10010 |
12 |
43 |
101011 |
2B |
68 |
1000100 |
44 |
93 |
1011101 |
5D |
19 |
10011 |
13 |
44 |
101100 |
2C |
69 |
1000101 |
45 |
94 |
1011110 |
5E |
20 |
10100 |
14 |
45 |
101101 |
2D |
70 |
1000110 |
46 |
95 |
1011111 |
5F |
21 |
10101 |
15 |
46 |
101110 |
2E |
71 |
1000111 |
47 |
96 |
1100000 |
60 |
22 |
10110 |
16 |
47 |
101111 |
2F |
72 |
1001000 |
48 |
97 |
1100001 |
61 |
23 |
10111 |
17 |
48 |
110000 |
30 |
73 |
1001001 |
49 |
98 |
1100010 |
62 |
24 |
11000 |
18 |
49 |
110001 |
31 |
74 |
1001010 |
4A |
99 |
1100011 |
63 |
100 |
1100100 |
64 |
Now, let us start exploring the philosophical aspects of C.
Aspect 1: Data abstraction
Fundamental data types in C
C support 5 fundamental data types.
- ‘char’: Obvious candidates are ‘A’, ‘a’, ‘B’, or any 8-bit values
- ‘int’: for integers such as 0, 1, 2 …
- ‘float’: for real numbers such as 1.25, 0.123, 1000.505 …
- ‘double’: For double precision floating point numbers
- ‘void’: for nothing. (of course void* is a completely different story, which I am saving for a future discussion.)
Little endian and big endian byte ordering
Let us see how these various datatypes are abstracted in ‘C’, starting with integers. I will explain what is happening, from a storage perspective.
The following figure shows how an integer ‘5’ typically gets stored in little endian and big endian systems.
int i = 5; // assume 'i' is stored at an arbitrary address 0x1000
- 4 bytes are allocated for storing integer
- These 4 bytes together is a named location in memory; name is ‘i’
- In Little endian systems, it is always LSB first.
- In Big endian systems, it is MSB first
- Endianness of a system determines the byte ordering.
So, now you can probably guess how the following code is going to work.
int j = 0x12345678 // assume j is stored at address 0x2000
Let’s see how this is different for a char data.
char c = 'A'; //assume c is stored at address 0x3000
Points to note for ‘char’
- Only one byte is allocated in memory for storing character
- This one byte of memory is named as ‘c’
- Regardless of endianness, ‘A’ is stored at 0x3000 in both the systems
Extended sign bit representation
Next, let us see, how negative numbers are getting stored?
int i_neg = -5; // assume i_neg is stored at address 0x4000
Before depicting how this is stored in memory, let me explain about signed number representation in computers.
- Most significant bit (MSb) i.e. bit 31 for sign bit. If set (‘1’) it means the number is negative. If clear (‘0’), it means the number is positive. (Note that a 32-bit data is stored in bits ranging from bit-31 to bit-0)
- Remaining 31 bits will be split into two parts. Part 1, from bit 30 to bit n for sign bit extension. Part 2, from bit n-1 to bit 0 for the numbers 2’s complement format
Let us go through the steps for converting ‘-5’ to its binary representation:
- Number 5 in binary is 101
- 1’s complement is 010
- Adding 1 to the 1’s complement to get 2’s complement, which is 011
- Sign bit is set to 1, as it is negative 5
- With sign bit extension it is, 1111 1111 1111 1111 1111 1111 1111 1011
Now, combining this information along with C’s style of storing, the number ‘-5’ i.e. Negative 5 will be stored as depicted below.
int i_neg = -5; //assume 'i_neg' is stored at address 0x4000
IEEE 754 : Single precision floating point representation
Now, let us examine how floating point numbers are stored. Floating point numbers are stored a little differently from normal integers.
Take for example a floating point number 5.125
float f = 5.125; // assume ‘f’ is stored at address 0x5000
We know that binary of ‘5’ is 101. But how to find the binary of 0.125? I will walk you through the algorithm for converting a decimal part of a fractional number to binary.
Consider the following fractional number notation I’ve used here:
x.y
Here, ‘x’ is the integral part and ‘y‘ is the decimal part, which is written after the decimal point ‘.‘
1. Multiply the decimal part by number ‘2′.
2. Integral part of resulting fractional number will be the first digit of fractional binary number.
3. Repeat steps 1 and 2, but now, using only decimal part of the fractional number.
Let us go through the steps, for finding the binary representation of 0.125.
- For finding binary of 0.125 using the above algorithm,
Number |
Result = |
Left of decimal point of Result |
0.125 |
0.250 |
0 |
0.250 |
0.500 |
0 |
0.500 |
1.000 |
1 |
Therefore, 0.125 is 0.001 in binary
i.e 5.125 is 101.001 in binary representation
Now, let us see how ‘C’ stores and manipulates floating point numbers:
Any standard ‘C’ compiler, unless specified otherwise, uses the well-known IEEE 754 format (see the figure) for representing floating point numbers.
Now, to represent in IEEE 754 format, we need three parts, as depicted in the figure.
- One sign bit: ‘0’ for positive floats, ‘1’ for negative floats
- Exponent – 8 bits (Biased exponent, derived from normalized mantissa)
- Normalized mantissa – 23 bits
Steps for getting exponent and mantissa parts:
In normalized mantissa only a single bit-‘1’ to the left of decimal point is allowed.
Therefor rewriting, 5.125 which is 101.001 in binary as 1.01001 x 2^2 , to get the normalized mantissa.
Note 1: 2^2 means 2 to the power of 2.
Note 2: Multiplying a number with 2^(n), where n positive, will shift the bits n times towards left.
Note 3: on the contrary, if multiplication is by 2^(-n), then number will get shifted n bits towards the right.
What is a Biased Exponent?
Sometimes, while determining a normalized mantissa part, it will be necessary to multiply by a 2^(negative number). In the above example, we multiplied by 2^(positive 2). But, what if the number was instead 0.0111001? In that case, to get a normalized mantissa, the number will be put as 1. 11001 x 2^(-2). So, raw exponent can be either positive or negative. The idea is to represent both positive and negative raw exponents using 8 bits available. The range of numbers from ‘0 – 255‘, will be split into two halves: -127 to -1 and 0 to 127. We can’t use sign bit binary representation here for negative/positive sign information for the 8-bits exponents. Therefore, a bias of 127 is added to both halves making the numbers back to 0 to 255. I hope you got the trick they have applied here.
To get biased exponent from raw exponent use the following formula:
Biased Exponent = Raw exponent (-ve or +ve) + 127
So, putting all these pieces together, we eventually arrive at the actual representation of 5.125
5.125 is 101.001, which is also 1.01001 x 2^2
Sign bit : 0 (as 5.125 is positive)
Biased Exponent : 2 + 127 = 129 = 10000001 (8 bits)
Normalized mantissa part : 01001
‘sign’ + ‘biased exponent’ + ‘normalized mantissa’ together is 01000000101001.
To get a full 32-bit number append 0’s i.e.
01000000101001 + 000000000000000000 = 01000000101001000000000000000000
This is equal to 0x40A40000 in hexadecimal
All these will culminate to the way C stores a floating point number as depicted below:
float f = 5.125F; // assume 'f' is stored at address 0x5000
Storing negative floats
Let us see how negative float is represented in ‘C’? In IEEE 754, there is a sign bit. For negative floats, just set this bit to ‘1’.
Let us consider a negative floating point number as below:
float f_neg = -5.125F; //assume 'f_neg' is stored at address 0x6000
This is how ‘C’ will store a negative float, -5.125F in memory:
Enjoyed this chapter? Let me know in the comments below. Thanks! 🙂
No ads in ur blog yet ?
No ads yet.