# Chapter 1 : C’s Abstraction Mechanisms

Topics at a glance:

• The simplicity of ‘C’ and it’s power – the art of data abstraction
• Storing Integers – ordering bytes and packing them together
• Storing negative numbers – adding a negative sign ‘-‘ is not just enough
• Floating point numbers and IEEE 754 format

What is so interesting about the C programming language? It’s simplicity and power. It works so close to the hardware. The direct manipulation of addresses through pointers is what makes C so powerful and flexible. We all know, what are pointers and how useful they are for programmers. But there are various other features also, that ‘C’ provides, and it is imperative to know them to understand the philosophy of ‘C’.

C is a procedural programming language. There are essentially two aspects that a programmer should know when we talk about C’s realization of this programming paradigm.

Aspect 1: Data abstraction

Aspect 2: Execution model

Before going into further explanation, let me set some context right. Go through the points below.

1. For all my analysis, I have taken a 32-bit machine, unless specified otherwise.
2. 32-bit is also grouped as 4 bytes, 1 byte being 8 bits.
3. For representation of data, hexadecimal number system is used instead of binary, for convenience.

For a quick reference go through the table below. It shows decimal, binary and hexadecimal values of numbers from 0 to 100.

 Decimal Binary Hexadecimal Decimal Binary Hexadecimal Decimal Binary Hexadecimal Decimal Binary Hexadecimal 0 0 0 25 11001 19 50 110010 32 75 1001011 4B 1 1 1 26 11010 1A 51 110011 33 76 1001100 4C 2 10 2 27 11011 1B 52 110100 34 77 1001101 4D 3 11 3 28 11100 1C 53 110101 35 78 1001110 4E 4 100 4 29 11101 1D 54 110110 36 79 1001111 4F 5 101 5 30 11110 1E 55 110111 37 80 1010000 50 6 110 6 31 11111 1F 56 111000 38 81 1010001 51 7 111 7 32 100000 20 57 111001 39 82 1010010 52 8 1000 8 33 100001 21 58 111010 3A 83 1010011 53 9 1001 9 34 100010 22 59 111011 3B 84 1010100 54 10 1010 A 35 100011 23 60 111100 3C 85 1010101 55 11 1011 B 36 100100 24 61 111101 3D 86 1010110 56 12 1100 C 37 100101 25 62 111110 3E 87 1010111 57 13 1101 D 38 100110 26 63 111111 3F 88 1011000 58 14 1110 E 39 100111 27 64 1000000 40 89 1011001 59 15 1111 F 40 101000 28 65 1000001 41 90 1011010 5A 16 10000 10 41 101001 29 66 1000010 42 91 1011011 5B 17 10001 11 42 101010 2A 67 1000011 43 92 1011100 5C 18 10010 12 43 101011 2B 68 1000100 44 93 1011101 5D 19 10011 13 44 101100 2C 69 1000101 45 94 1011110 5E 20 10100 14 45 101101 2D 70 1000110 46 95 1011111 5F 21 10101 15 46 101110 2E 71 1000111 47 96 1100000 60 22 10110 16 47 101111 2F 72 1001000 48 97 1100001 61 23 10111 17 48 110000 30 73 1001001 49 98 1100010 62 24 11000 18 49 110001 31 74 1001010 4A 99 1100011 63 100 1100100 64

Now, let us start exploring the philosophical aspects of C.

C support 5 fundamental data types.

1. ‘char’: Obvious candidates are ‘A’, ‘a’, ‘B’, or any 8-bit values
2. ‘int’: for integers such as 0, 1, 2 …
3. ‘float’: for real numbers such as 1.25, 0.123, 1000.505 …
4. ‘double’: For double precision floating point numbers
5. ‘void’: for nothing. (of course void* is a completely different story, which I am saving for a future discussion.)

Let us see how these various datatypes are abstracted in ‘C’, starting with integers. I will explain what is happening, from a storage perspective.

The following figure shows how an integer ‘5’ typically gets stored in little endian and big endian systems.

```int i = 5; // assume 'i' is stored at an arbitrary address 0x1000
```
• 4 bytes are allocated for storing integer
• These 4 bytes together is a named location in memory; name is ‘i’
• In Little endian systems, it is always LSB first.
• In Big endian systems, it is MSB first
• Endianness of a system determines the byte ordering.

So, now you can probably guess how the following code is going to work.

```int j = 0x12345678 // assume j is stored at address 0x2000
```

Let’s see how this is different for a char data.

```char c = 'A'; //assume c is stored at address 0x3000
```

Points to note for ‘char’

• Only one byte is allocated in memory for storing character
• This one byte of memory is named as ‘c’
• Regardless of endianness, ‘A’ is stored at 0x3000 in both the systems

Next, let us see, how negative numbers are getting stored?

``` int i_neg = -5; // assume i_neg is stored at address 0x4000
```

Before depicting how this is stored in memory, let me explain about signed number representation in computers.

• Most significant bit (MSb) i.e. bit 31 for sign bit. If set (‘1’) it means the number is negative. If clear (‘0’), it means the number is positive. (Note that a 32-bit data is stored in bits ranging from bit-31 to bit-0)
• Remaining 31 bits will be split into two parts. Part 1, from bit 30 to bit n for sign bit extension. Part 2, from bit n-1 to bit 0 for the numbers 2’s complement format

Let us go through the steps for converting ‘-5’ to its binary representation:

• Number 5 in binary is 101
• 1’s complement is 010
• Adding 1 to the 1’s complement to get 2’s complement, which is 011
• Sign bit is set to 1, as it is negative 5
• With sign bit extension it is, 1111 1111 1111 1111 1111 1111 1111 1011

Now, combining this information along with C’s style of storing, the number ‘-5’ i.e. Negative 5 will be stored as depicted below.

```int i_neg = -5; //assume 'i_neg' is stored at address 0x4000
```

### IEEE 754 : Single precision floating point representation

Now, let us examine how floating point numbers are stored. Floating point numbers are stored a little differently from normal integers.

Take for example a floating point number 5.125

```float f = 5.125; // assume ‘f’ is stored at address 0x5000
```

We know that binary of ‘5’ is 101. But how to find the binary of 0.125? I will walk you through the algorithm for converting a decimal part of a fractional number to binary.

Consider the following fractional number notation I’ve used here:

x.y

Here, ‘x’ is the integral part andy‘ is the decimal part, which is written after the decimal point ‘.

1. Multiply the decimal part by number ‘2′.

2. Integral part of resulting fractional number will be the first digit of fractional binary number.

3. Repeat steps 1 and 2, but now, using only decimal part of the fractional number.

Let us go through the steps, for finding the binary representation of 0.125.

• For finding binary of 0.125 using the above algorithm,
 Number Result = Number x 2 Left of decimal point of Result 0.125 0.250 0 0.250 0.500 0 0.500 1.000 1

Therefore, 0.125 is 0.001 in binary

i.e 5.125 is 101.001 in binary representation

Now, let us see how ‘C’ stores and manipulates floating point numbers:

Any standard ‘C’ compiler, unless specified otherwise, uses the well-known IEEE 754 format (see the figure) for representing floating point numbers.

Now, to represent in IEEE 754 format, we need three parts, as depicted in the figure.

1. One sign bit: ‘0’ for positive floats, ‘1’ for negative floats
2. Exponent – 8 bits (Biased exponent, derived from normalized mantissa)
3. Normalized mantissa – 23 bits

Steps for getting exponent and mantissa parts:

In normalized mantissa only a single bit-‘1’ to the left of decimal point is allowed.

Therefor rewriting, 5.125 which is 101.001 in binary as 1.01001 x 2^2 , to get the normalized mantissa.

Note 1: 2^2 means 2 to the power of 2.

Note 2: Multiplying a number with 2^(n), where n positive, will shift the bits n times towards left.

Note 3: on the contrary, if multiplication is by 2^(-n), then number will get shifted n bits towards the right.

What is a Biased Exponent

Sometimes, while determining a normalized mantissa part, it will be necessary to multiply by a 2^(negative number). In the above example, we multiplied by 2^(positive 2). But, what if the number was instead 0.0111001? In that case, to get a normalized mantissa, the number will be put as 1. 11001 x 2^(-2). So, raw exponent can be either positive or negative. The idea is to represent both positive and negative raw exponents using 8 bits available. The range of numbers from ‘0 – 255‘, will be split into two halves: -127 to -1 and 0 to 127. We can’t use sign bit binary representation here for negative/positive sign information for the 8-bits exponents. Therefore, a bias of 127 is added to both halves making the numbers back to 0 to 255. I hope you got the trick they have applied here.

To get biased exponent from raw exponent use the following formula:

Biased Exponent = Raw exponent (-ve or +ve) + 127

So, putting all these pieces together, we eventually arrive at the actual representation of 5.125

5.125 is 101.001, which is also 1.01001 x 2^2

Sign bit : 0 (as 5.125 is positive)

Biased Exponent : 2 + 127 = 129 = 10000001 (8 bits)

Normalized mantissa part : 01001

‘sign’ + ‘biased exponent’ + ‘normalized mantissa’ together is 01000000101001.

To get a full 32-bit number append 0’s i.e.

01000000101001 + 000000000000000000 = 01000000101001000000000000000000

This is equal to 0x40A40000 in hexadecimal

All these will culminate to the way C stores a floating point number as depicted below:

```float f = 5.125F; // assume 'f' is stored at address 0x5000
```

#### Storing negative floats

Let us see how negative float is represented in ‘C’? In IEEE 754, there is a sign bit. For negative floats, just set this bit to ‘1’.

Let us consider a negative floating point number as below:

```float f_neg = -5.125F; //assume 'f_neg' is stored at address 0x6000
```

This is how ‘C’ will store a negative float, -5.125F in memory:

Enjoyed this chapter? Let me know in the comments below. Thanks! 🙂

## 2 thoughts on “Chapter 1 : C’s Abstraction Mechanisms”

1. Anonymous says:

No ads in ur blog yet ?

1. Deepesh Menon says: