Chapter 1 : C’s Abstraction Mechanisms

Key words: data abstraction in C, fundamental data types in C, little endian and big endian byte ordering, extended sign bit representation, 1’s and 2’s complement, IEEE 754, single precision floating point representation, storing negative floats

Topics at a glance:

The simplicity of ‘C’ and it’s power – the art of data abstraction
Storing Integers – ordering bytes and packing them together
Storing negative numbers – adding a negative sign ‘-‘ is not just enough
Floating point numbers and IEEE 754 format

What is so interesting about the C programming language? It’s simplicity and power. It works so close to the hardware. The direct manipulation of addresses through pointers is what makes C so powerful and flexible. We all know, what are pointers and how useful they are for programmers. But there are various other features also, that ‘C’ provides, and it is imperative to know them to understand the philosophy of ‘C’.

C is a procedural programming language. There are essentially two aspects that a programmer should know when we talk about C’s realization of this programming paradigm.

Aspect 1: Data abstraction

Aspect 2: Execution model

Before going into further explanation, let me set some context right. Go through the points below.

For all my analysis, I have taken a 32-bit machine, unless specified otherwise.
32-bit is also grouped as 4 bytes, 1 byte being 8 bits.
For representation of data, hexadecimal number system is used instead of binary, for convenience.

For a quick reference go through the table below. It shows decimal, binary and hexadecimal values of numbers from 0 to 100.

Decimal	Binary	Hexadecimal	Decimal	Binary	Hexadecimal	Decimal	Binary	Hexadecimal	Decimal	Binary	Hexadecimal
0	0	0	25	11001	19	50	110010	32	75	1001011	4B
1	1	1	26	11010	1A	51	110011	33	76	1001100	4C
2	10	2	27	11011	1B	52	110100	34	77	1001101	4D
3	11	3	28	11100	1C	53	110101	35	78	1001110	4E
4	100	4	29	11101	1D	54	110110	36	79	1001111	4F
5	101	5	30	11110	1E	55	110111	37	80	1010000	50
6	110	6	31	11111	1F	56	111000	38	81	1010001	51
7	111	7	32	100000	20	57	111001	39	82	1010010	52
8	1000	8	33	100001	21	58	111010	3A	83	1010011	53
9	1001	9	34	100010	22	59	111011	3B	84	1010100	54
10	1010	A	35	100011	23	60	111100	3C	85	1010101	55
11	1011	B	36	100100	24	61	111101	3D	86	1010110	56
12	1100	C	37	100101	25	62	111110	3E	87	1010111	57
13	1101	D	38	100110	26	63	111111	3F	88	1011000	58
14	1110	E	39	100111	27	64	1000000	40	89	1011001	59
15	1111	F	40	101000	28	65	1000001	41	90	1011010	5A
16	10000	10	41	101001	29	66	1000010	42	91	1011011	5B
17	10001	11	42	101010	2A	67	1000011	43	92	1011100	5C
18	10010	12	43	101011	2B	68	1000100	44	93	1011101	5D
19	10011	13	44	101100	2C	69	1000101	45	94	1011110	5E
20	10100	14	45	101101	2D	70	1000110	46	95	1011111	5F
21	10101	15	46	101110	2E	71	1000111	47	96	1100000	60
22	10110	16	47	101111	2F	72	1001000	48	97	1100001	61
23	10111	17	48	110000	30	73	1001001	49	98	1100010	62
24	11000	18	49	110001	31	74	1001010	4A	99	1100011	63
									100	1100100	64

Now, let us start exploring the philosophical aspects of C.

Aspect 1: Data abstraction

Fundamental data types in C

C support 5 fundamental data types.

‘char’: Obvious candidates are ‘A’, ‘a’, ‘B’, or any 8-bit values
‘int’: for integers such as 0, 1, 2 …
‘float’: for real numbers such as 1.25, 0.123, 1000.505 …
‘double’: For double precision floating point numbers
‘void’: for nothing. (of course void* is a completely different story, which I am saving for a future discussion.)

Little endian and big endian byte ordering

Let us see how these various datatypes are abstracted in ‘C’, starting with integers. I will explain what is happening, from a storage perspective.

The following figure shows how an integer ‘5’ typically gets stored in little endian and big endian systems.

int i = 5; // assume 'i' is stored at an arbitrary address 0x1000

4 bytes are allocated for storing integer
These 4 bytes together is a named location in memory; name is ‘i’
In Little endian systems, it is always LSB first.
In Big endian systems, it is MSB first
Endianness of a system determines the byte ordering.

So, now you can probably guess how the following code is going to work.

int j = 0x12345678 // assume j is stored at address 0x2000

Let’s see how this is different for a char data.

char c = 'A'; //assume c is stored at address 0x3000

Points to note for ‘char’

Only one byte is allocated in memory for storing character
This one byte of memory is named as ‘c’
Regardless of endianness, ‘A’ is stored at 0x3000 in both the systems

Extended sign bit representation

Next, let us see, how negative numbers are getting stored?

 int i_neg = -5; // assume i_neg is stored at address 0x4000

Before depicting how this is stored in memory, let me explain about signed number representation in computers.

Most significant bit (MSb) i.e. bit 31 for sign bit. If set (‘1’) it means the number is negative. If clear (‘0’), it means the number is positive. (Note that a 32-bit data is stored in bits ranging from bit-31 to bit-0)
Remaining 31 bits will be split into two parts. Part 1, from bit 30 to bit n for sign bit extension. Part 2, from bit n-1 to bit 0 for the numbers 2’s complement format

Let us go through the steps for converting ‘-5’ to its binary representation:

Number 5 in binary is 101
1’s complement is 010
Adding 1 to the 1’s complement to get 2’s complement, which is 011
Sign bit is set to 1, as it is negative 5
With sign bit extension it is, 1111 1111 1111 1111 1111 1111 1111 1011

Now, combining this information along with C’s style of storing, the number ‘-5’ i.e. Negative 5 will be stored as depicted below.

int i_neg = -5; //assume 'i_neg' is stored at address 0x4000

IEEE 754 : Single precision floating point representation

Now, let us examine how floating point numbers are stored. Floating point numbers are stored a little differently from normal integers.

Take for example a floating point number 5.125

float f = 5.125; // assume ‘f’ is stored at address 0x5000

We know that binary of ‘5’ is 101. But how to find the binary of 0.125? I will walk you through the algorithm for converting a decimal part of a fractional number to binary.

Consider the following fractional number notation I’ve used here:

x.y

Here, ‘x’ is the integral part and ‘y‘ is the decimal part, which is written after the decimal point ‘.‘

1. Multiply the decimal part by number ‘2′.

2. Integral part of resulting fractional number will be the first digit of fractional binary number.

3. Repeat steps 1 and 2, but now, using only decimal part of the fractional number.

Let us go through the steps, for finding the binary representation of 0.125.

For finding binary of 0.125 using the above algorithm,

Number	Result = Number x 2	Left of decimal point of Result
0.125	0.250	0
0.250	0.500	0
0.500	1.000	1

Therefore, 0.125 is 0.001 in binary

i.e 5.125 is 101.001 in binary representation

Now, let us see how ‘C’ stores and manipulates floating point numbers:

Any standard ‘C’ compiler, unless specified otherwise, uses the well-known IEEE 754 format (see the figure) for representing floating point numbers.

Now, to represent in IEEE 754 format, we need three parts, as depicted in the figure.

One sign bit: ‘0’ for positive floats, ‘1’ for negative floats
Exponent – 8 bits (Biased exponent, derived from normalized mantissa)
Normalized mantissa – 23 bits

Steps for getting exponent and mantissa parts:

In normalized mantissa only a single bit-‘1’ to the left of decimal point is allowed.

Therefor rewriting, 5.125 which is 101.001 in binary as 1.01001 x 2^2 , to get the normalized mantissa.

Note 1: 2^2 means 2 to the power of 2.

Note 2: Multiplying a number with 2^(n), where n positive, will shift the bits n times towards left.

Note 3: on the contrary, if multiplication is by 2^(-n), then number will get shifted n bits towards the right.

What is a Biased Exponent?

Sometimes, while determining a normalized mantissa part, it will be necessary to multiply by a 2^(negative number). In the above example, we multiplied by 2^(positive 2). But, what if the number was instead 0.0111001? In that case, to get a normalized mantissa, the number will be put as 1. 11001 x 2^(-2). So, raw exponent can be either positive or negative. The idea is to represent both positive and negative raw exponents using 8 bits available. The range of numbers from ‘0 – 255‘, will be split into two halves: -127 to -1 and 0 to 127. We can’t use sign bit binary representation here for negative/positive sign information for the 8-bits exponents. Therefore, a bias of 127 is added to both halves making the numbers back to 0 to 255. I hope you got the trick they have applied here.

To get biased exponent from raw exponent use the following formula:

Biased Exponent = Raw exponent (-ve or +ve) + 127

So, putting all these pieces together, we eventually arrive at the actual representation of 5.125

5.125 is 101.001, which is also 1.01001 x 2^2

Sign bit : 0 (as 5.125 is positive)

Biased Exponent : 2 + 127 = 129 = 10000001 (8 bits)

Normalized mantissa part : 01001

‘sign’ + ‘biased exponent’ + ‘normalized mantissa’ together is 01000000101001.

To get a full 32-bit number append 0’s i.e.

01000000101001 + 000000000000000000 = 01000000101001000000000000000000

This is equal to 0x40A40000 in hexadecimal

All these will culminate to the way C stores a floating point number as depicted below:

float f = 5.125F; // assume 'f' is stored at address 0x5000

Storing negative floats

Let us see how negative float is represented in ‘C’? In IEEE 754, there is a sign bit. For negative floats, just set this bit to ‘1’.

Let us consider a negative floating point number as below:

float f_neg = -5.125F; //assume 'f_neg' is stored at address 0x6000

This is how ‘C’ will store a negative float, -5.125F in memory:

Enjoyed this chapter? Let me know in the comments below. Thanks! 🙂

The C and C++ Club

Let's Learn Together! :)

Chapter 1 : C’s Abstraction Mechanisms

Aspect 1: Data abstraction

Fundamental data types in C

Little endian and big endian byte ordering

Extended sign bit representation

IEEE 754 : Single precision floating point representation

Storing negative floats

2 thoughts on “Chapter 1 : C’s Abstraction Mechanisms”

Leave a ReplyCancel reply

Aspect 1: Data abstraction

Fundamental data types in C

Little endian and big endian byte ordering

Extended sign bit representation

IEEE 754 : Single precision floating point representation

Storing negative floats

Share this:

2 thoughts on “Chapter 1 : C’s Abstraction Mechanisms”

Leave a ReplyCancel reply

Discover more from The C and C++ Club