Chapter 1 : C’s Abstraction Mechanisms

Key words: data abstraction in C, fundamental data types in C, little endian and big endian byte ordering, extended sign bit representation, 1’s and 2’s complement, IEEE 754, single precision floating point representation, storing negative floats

Topics at a glance:

  • The simplicity of ‘C’ and it’s power – the art of data abstraction
  • Storing Integers – ordering bytes and packing them together
  • Storing negative numbers – adding a negative sign ‘-‘ is not just enough
  • Floating point numbers and IEEE 754 format

What is so interesting about the C programming language? It’s simplicity and power. It works so close to the hardware. The direct manipulation of addresses through pointers is what makes C so powerful and flexible. We all know, what are pointers and how useful they are for programmers. But there are various other features also, that ‘C’ provides, and it is imperative to know them to understand the philosophy of ‘C’.

C is a procedural programming language. There are essentially two aspects that a programmer should know when we talk about C’s realization of this programming paradigm.

Aspect 1: Data abstraction

Aspect 2: Execution model

Before going into further explanation, let me set some context right. Go through the points below.

  1. For all my analysis, I have taken a 32-bit machine, unless specified otherwise.
  2. 32-bit is also grouped as 4 bytes, 1 byte being 8 bits.
  3. For representation of data, hexadecimal number system is used instead of binary, for convenience.

For a quick reference go through the table below. It shows decimal, binary and hexadecimal values of numbers from 0 to 100.

Decimal

Binary

Hexadecimal

Decimal

Binary

Hexadecimal

Decimal

Binary

Hexadecimal

Decimal

Binary

Hexadecimal

0

0

0

25

11001

19

50

110010

32

75

1001011

4B

1

1

1

26

11010

1A

51

110011

33

76

1001100

4C

2

10

2

27

11011

1B

52

110100

34

77

1001101

4D

3

11

3

28

11100

1C

53

110101

35

78

1001110

4E

4

100

4

29

11101

1D

54

110110

36

79

1001111

4F

5

101

5

30

11110

1E

55

110111

37

80

1010000

50

6

110

6

31

11111

1F

56

111000

38

81

1010001

51

7

111

7

32

100000

20

57

111001

39

82

1010010

52

8

1000

8

33

100001

21

58

111010

3A

83

1010011

53

9

1001

9

34

100010

22

59

111011

3B

84

1010100

54

10

1010

A

35

100011

23

60

111100

3C

85

1010101

55

11

1011

B

36

100100

24

61

111101

3D

86

1010110

56

12

1100

C

37

100101

25

62

111110

3E

87

1010111

57

13

1101

D

38

100110

26

63

111111

3F

88

1011000

58

14

1110

E

39

100111

27

64

1000000

40

89

1011001

59

15

1111

F

40

101000

28

65

1000001

41

90

1011010

5A

16

10000

10

41

101001

29

66

1000010

42

91

1011011

5B

17

10001

11

42

101010

2A

67

1000011

43

92

1011100

5C

18

10010

12

43

101011

2B

68

1000100

44

93

1011101

5D

19

10011

13

44

101100

2C

69

1000101

45

94

1011110

5E

20

10100

14

45

101101

2D

70

1000110

46

95

1011111

5F

21

10101

15

46

101110

2E

71

1000111

47

96

1100000

60

22

10110

16

47

101111

2F

72

1001000

48

97

1100001

61

23

10111

17

48

110000

30

73

1001001

49

98

1100010

62

24

11000

18

49

110001

31

74

1001010

4A

99

1100011

63

100

1100100

64

 

Now, let us start exploring the philosophical aspects of C.

C support 5 fundamental data types.

  1. ‘char’: Obvious candidates are ‘A’, ‘a’, ‘B’, or any 8-bit values
  2. ‘int’: for integers such as 0, 1, 2 …
  3. ‘float’: for real numbers such as 1.25, 0.123, 1000.505 …
  4. ‘double’: For double precision floating point numbers
  5. ‘void’: for nothing. (of course void* is a completely different story, which I am saving for a future discussion.)

Let us see how these various datatypes are abstracted in ‘C’, starting with integers. I will explain what is happening, from a storage perspective.

The following figure shows how an integer ‘5’ typically gets stored in little endian and big endian systems.

int i = 5; // assume 'i' is stored at an arbitrary address 0x1000
  • 4 bytes are allocated for storing integer
  • These 4 bytes together is a named location in memory; name is ‘i’
  • In Little endian systems, it is always LSB first.
  • In Big endian systems, it is MSB first
  • Endianness of a system determines the byte ordering.

So, now you can probably guess how the following code is going to work.

int j = 0x12345678 // assume j is stored at address 0x2000

Let’s see how this is different for a char data.

char c = 'A'; //assume c is stored at address 0x3000

Points to note for ‘char’

  • Only one byte is allocated in memory for storing character
  • This one byte of memory is named as ‘c’
  • Regardless of endianness, ‘A’ is stored at 0x3000 in both the systems

Next, let us see, how negative numbers are getting stored?

 int i_neg = -5; // assume i_neg is stored at address 0x4000

Before depicting how this is stored in memory, let me explain about signed number representation in computers.

  • Most significant bit (MSb) i.e. bit 31 for sign bit. If set (‘1’) it means the number is negative. If clear (‘0’), it means the number is positive. (Note that a 32-bit data is stored in bits ranging from bit-31 to bit-0)
  • Remaining 31 bits will be split into two parts. Part 1, from bit 30 to bit n for sign bit extension. Part 2, from bit n-1 to bit 0 for the numbers 2’s complement format

Let us go through the steps for converting ‘-5’ to its binary representation:

  • Number 5 in binary is 101
  • 1’s complement is 010
  • Adding 1 to the 1’s complement to get 2’s complement, which is 011
  • Sign bit is set to 1, as it is negative 5
  • With sign bit extension it is, 1111 1111 1111 1111 1111 1111 1111 1011

Now, combining this information along with C’s style of storing, the number ‘-5’ i.e. Negative 5 will be stored as depicted below.

int i_neg = -5; //assume 'i_neg' is stored at address 0x4000

IEEE 754 : Single precision floating point representation

Now, let us examine how floating point numbers are stored. Floating point numbers are stored a little differently from normal integers.

Take for example a floating point number 5.125

float f = 5.125; // assume ‘f’ is stored at address 0x5000

We know that binary of ‘5’ is 101. But how to find the binary of 0.125? I will walk you through the algorithm for converting a decimal part of a fractional number to binary.

Consider the following fractional number notation I’ve used here:

x.y

Here, ‘x’ is the integral part andy‘ is the decimal part, which is written after the decimal point ‘.

1. Multiply the decimal part by number ‘2′.

2. Integral part of resulting fractional number will be the first digit of fractional binary number.

3. Repeat steps 1 and 2, but now, using only decimal part of the fractional number.

Let us go through the steps, for finding the binary representation of 0.125.

  • For finding binary of 0.125 using the above algorithm,

Number

Result =
Number x 2

Left of decimal point of Result

0.125

0.250

0

0.250

0.500

0

0.500

1.000

1

 

Therefore, 0.125 is 0.001 in binary

i.e 5.125 is 101.001 in binary representation

Now, let us see how ‘C’ stores and manipulates floating point numbers:

Any standard ‘C’ compiler, unless specified otherwise, uses the well-known IEEE 754 format (see the figure) for representing floating point numbers.

Now, to represent in IEEE 754 format, we need three parts, as depicted in the figure.

  1. One sign bit: ‘0’ for positive floats, ‘1’ for negative floats
  2. Exponent – 8 bits (Biased exponent, derived from normalized mantissa)
  3. Normalized mantissa – 23 bits

Steps for getting exponent and mantissa parts:

In normalized mantissa only a single bit-‘1’ to the left of decimal point is allowed.

Therefor rewriting, 5.125 which is 101.001 in binary as 1.01001 x 2^2 , to get the normalized mantissa.

Note 1: 2^2 means 2 to the power of 2.

Note 2: Multiplying a number with 2^(n), where n positive, will shift the bits n times towards left.

Note 3: on the contrary, if multiplication is by 2^(-n), then number will get shifted n bits towards the right.

What is a Biased Exponent

Sometimes, while determining a normalized mantissa part, it will be necessary to multiply by a 2^(negative number). In the above example, we multiplied by 2^(positive 2). But, what if the number was instead 0.0111001? In that case, to get a normalized mantissa, the number will be put as 1. 11001 x 2^(-2). So, raw exponent can be either positive or negative. The idea is to represent both positive and negative raw exponents using 8 bits available. The range of numbers from ‘0 – 255‘, will be split into two halves: -127 to -1 and 0 to 127. We can’t use sign bit binary representation here for negative/positive sign information for the 8-bits exponents. Therefore, a bias of 127 is added to both halves making the numbers back to 0 to 255. I hope you got the trick they have applied here.

To get biased exponent from raw exponent use the following formula:

Biased Exponent = Raw exponent (-ve or +ve) + 127

So, putting all these pieces together, we eventually arrive at the actual representation of 5.125

5.125 is 101.001, which is also 1.01001 x 2^2

Sign bit : 0 (as 5.125 is positive)

Biased Exponent : 2 + 127 = 129 = 10000001 (8 bits)

Normalized mantissa part : 01001

‘sign’ + ‘biased exponent’ + ‘normalized mantissa’ together is 01000000101001.

To get a full 32-bit number append 0’s i.e.

01000000101001 + 000000000000000000 = 01000000101001000000000000000000

This is equal to 0x40A40000 in hexadecimal

All these will culminate to the way C stores a floating point number as depicted below:

float f = 5.125F; // assume 'f' is stored at address 0x5000

Storing negative floats

Let us see how negative float is represented in ‘C’? In IEEE 754, there is a sign bit. For negative floats, just set this bit to ‘1’.

Let us consider a negative floating point number as below:

float f_neg = -5.125F; //assume 'f_neg' is stored at address 0x6000

This is how ‘C’ will store a negative float, -5.125F in memory:

Enjoyed this chapter? Let me know in the comments below. Thanks! 🙂

2 thoughts on “Chapter 1 : C’s Abstraction Mechanisms

Leave a Reply