Chapter 2 : Address Abstraction

Key words: Address abstraction in C, arrays in C, array indexing operation, negative indexing operation on arrays, bound check issues with arrays, function pointers in C, function types in C.

Topics at a glance:

  • Address abstraction through arrays.
  • Array size – does it really a matter of concern to the compiler or to the programmer ?
  • Negative indexes on arrays. Seriously! Does it work ?
  • Function pointers and how to use them correctly?

In this chapter, let us understand how composite data types are managed in ‘C’. ‘C’ supports the following composite datatypes:

  • Array
  • Union
  • Struct

A composite data type is an aggregate of one or more, data types. Hence, it is also known as ‘aggregate data type’. Various combinations are there. To name a few, arrays of primitive data types, structures and unions of different primitive data types etc. Composite data types can be often built from other composite data types such as structures of structures, arrays of structures, structure comprising of an array of another structure. The list goes on.

Address abstraction in C

We need to understand the notion of Address Abstraction in ‘C’. In a way, I am referring to pointers in ‘C’. But let us not start with pointers directly. Let’s get there slowly, beginning with array and its address abstraction mechanisms.

Arrays in C

An array is a collection of data of the same type, like, array of integers, array of floats, array of structures etc. An array stores its members always in contiguous memory locations, i.e. one after the other.

Consider the following array of integers.

int array_of_int[5] = {1, 2, 3, 4, 5};

Let’s see how this is stored in an arbitrary memory location starting from 0x1000. NOTE: I will not be explaining how an integer itself is stored in memory. For that please refer to chapter 1. Here, the focus is on a collection of integers and not an integer alone.

Array indexing operation

I have shown you how an array of five integers are created in memory. Now, let’s see how to access members of this array one by one. ‘C’ has defined indexing operation for accessing array members. i.e. we can index to a particular element location in an array.

array_of_int[0] will give ‘1’, 
array_of_int[1] will give ‘2’ 
array_of_int[4] will give ‘5’.

When we use indexing operation on an array (i.e. array[index]), programmer doesn’t need to directly manipulate addresses, the C compiler automatically does that for you.

Thus,

  • array_of_int[0] access the contents at location 0x1000
  • array_of_int[1] access the contents at location 0x1004
  • array_of_int[2] access the contents at location 0x1008
  • array_of_int[3] access the contents at location 0x100C
  • array_of_int[4] access the contents at location 0x1010

So, the question is, how the compiler knows how to find the location when using indexing operation on an array? When you look at the attributes of array, answer will become clear. Array stores members in contiguous address locations i.e. one after the other. Now the question is why arr_of_int[1] is at location 0x1004 and not at 0x1001 and array_of_int[2] is at 0x1008 and not at 0x1002. As I have explained in previous section, ‘C’ knows very well how to store an integer. Integers take 4 bytes in a typical 32-bit machine. ‘array_of_int[N]’ means, to access Nth member of array_of_int. If it is an array of integers, then to find the location, use the formula below.

Location of array_of_int[N] = Location of the very first element of the array + (N x size of int); Note that am talking about an array of integers here.

For instance, location of array_of_int[3] can be found as:

location of array_of_int[3] = 0x1000 + (3 x 4) = 0x1000 + 0xC (because, 12 in decimal is 0xC in hexadecimal)

Thus, location of array_of_int[3] is 0x100C.

Now, take another example where there is an array of chars. Say, arr_of_char[5]. Here, if we want to get the location of array_of_char[3], assuming array_of_char’s first member is stored at 0x2000.

location of array_of_char[3] = 0x2000 + (3 x 1) = 0x2000 + 0x3 = 0x2003; (size of char is 1 byte)

This doesn’t stop at primitive datatype.

Assume there is a structure struct my_struct of size 16 bytes. Assume there is an array of my_struct as below:

struct my_struct array_of_my_struct[5];

Here, assuming, that first member is stored at 0x3000

location of array_of_my_struct [3] = 0x3000 + (3 x 16) = 0x3000 + 0x30 (48 in decimal is 0x30 in hex) Therefore, location of array_of_my_struct[3] = 0x3030

Array manipulates addresses without bothering the programmer much! That’s the simplicity of an array.

At least, some of you may wonder, how the address of first element is obtained? For that, let’s re-visit the following code from previous section.  

int i = 0x5; // hexadecimal ‘5’

Let’s understand now what is a variable. In the above code, variable name is ‘i’, and memory block starting from 0x1000 to 0x1003 has been named by the alias ‘i’. So, when ‘C’ compiler sees a direct reference to this alias further in the code, it knows that it’s the memory block of 4-bytes from 0x1000 to 0x1003. Variable names are always associated with addresses internally. By referring to a variable’s name, programmer is actually referring to the specific memory location where ‘C’ has stored the data.

i.e. A variable’s name is actually an alias to a specific memory location.

But when it comes to array, programmer doesn’t declare names for each location/members of the array.

Here, the array name is just ‘array_of_int’; The ‘[]’ indicates the compiler that ‘array_of_int’ is an array.

int array_of_int[5] = {1, 2, 3, 4, 5};

The most important thing for array name is this. Unlike, a primitive data type, or structs or unions, for an array, its name or alias, is made ONLY for the first element of the array.

Here, in this example, it is not the memory block ranging from 0x1000 to 0x1013 is aliased as ‘array_of_int’. It is just the first member’s location i.e. 0x1000 which stores an integer, in this example.

NOTE: In case of structs, which is an example of packed data in ‘C’, we have to individually name the data members, and the entire packed data bytes is aliased as the struct variable’s name. This is not the case with arrays.

Bound check issues with arrays

Whenever a programmer refers to ‘array_of_int’, it will get associated with memory location 0x1000. The alias ‘array_of_int’ is only for first member’s location, i.e. 0x1000, and not for a specific range of memory from 0x1000 to 0x1013. By associating alias ‘array_of_int’ with an index of ‘6’ or ‘7’, the compiler, simply generates the code to access a location with address 0x1018 and 0x101C respectively. Syntactically it is correct. But Semantically (behaviorally) it is only PLAIN WRONG!

The biggest problem with arrays is this. Array variable name, DOES NOT CHECK for memory bounds.

Again consider array_of_int stored at 0x1000.

Can you guess why array index always starts from 0 in C? Remember the formula for accessing members through array indexing operation? Just substitute index as 0 in that formula to get the very first member of an array. You will get the location pointed by alias array_of_int i.e 0x1000.

i.e. Location of array_of_int[0] is the address associated with the alias array_of_int + ( 0 x 4 ), which is 0x1000. Just understand why it HAS TO start from 0.

I would like to point out one more thing before concluding on arrays. We know very well that array[index] can be used to access members at a specific location in the array. After going through the sections above, you got an idea of how addresses are manipulated using array names and indexing operation.

This is what happening under the hood when you perform indexing operation on an array:

  1. Array name gives the address of first member of the array i.e. start of array
  2. Index value tells the C compiler to access a specific location within the array
  3. Compiler generates the required amount of offset from start of array obtained in step 1 using the formula index multiplied by size of an array member
  4. Compiler finally adds the offset obtained in step 3 with start of array address obtained in step 1, to get the final address

The above four steps is what an indexing operation on array names do. So in simple terms indexing is just this:

Origin of array + required offset; i.e. a simple addition.

Anybody who knows that A + B = B + A, can see how,

Origin + Offset = Offset + Origin

Now, for the above theorem, I can state the corollary also

i.e. array[index] = index[array]

The following code snippet produces the exact same results for both array[index] and index[array] forms.

#include <stdio.h>

int main()
{
	int arr_of_int[5] = {1, 2, 3, 4, 5};

	printf("arr_of_int[0] : %d\n", arr_of_int[0]);
	printf("arr_of_int[1] : %d\n", arr_of_int[1]);
	printf("arr_of_int[2] : %d\n", arr_of_int[2]);
	printf("arr_of_int[3] : %d\n", arr_of_int[3]);
	printf("arr_of_int[4] : %d\n", arr_of_int[4]);

	puts("-------------------------------------");

	printf("0[arr_of_int] : %d\n", 0[arr_of_int]);
	printf("1[arr_of_int] : %d\n", 1[arr_of_int]);
	printf("2[arr_of_int] : %d\n", 2[arr_of_int]);
	printf("3[arr_of_int] : %d\n", 3[arr_of_int]);
	printf("4[arr_of_int] : %d\n", 4[arr_of_int]);
	
	return 0;
}

Output of the above code:

arr_of_int[0] : 1
arr_of_int[1] : 2
arr_of_int[2] : 3
arr_of_int[3] : 4
arr_of_int[4] : 5
-------------------------------------
0[arr_of_int] : 1
1[arr_of_int] : 2
2[arr_of_int] : 3
3[arr_of_int] : 4
4[arr_of_int] : 5

Negative indexing operation on arrays

To add one more interesting thing for your enthusiastic mind, let me ask you one question. What happens if you use a negative index on array. i.e array[ – index ]. ‘C’ compiler generates a code to access a location pointed by start of arrayrequired offset. (note the minus‘ sign). Of course, you really never would require to do it. But understand that this is how negative index works on array.

Let me explain negative indexing operation with an example.

Say,

int array[3] = {1, 2, 3};
int *pointer = &(array[1]);

pointer[- 1] gives you ‘1‘ which is actually array[0]. I hope you understood the way it is working.

I have not explained pointers yet. But just know that indexing operation is defined in C language for special variable’s that holds address. Pointers are examples for such variables. If you have correctly understood this chapter, then you will know that array names are also special variables that holds address to the first member of array. So, indexing operation is valid not only for array names, but also for any valid “pointers to data” in C. There are pointers to locations holding other than data in C. I am referring to function pointers. Just one advice. Don’t do indexing operations and address manipulations with function pointers. If you come across a function pointer in ‘C’ the only thing you should do with that, is to invoke the function using the function pointer.

Function pointers in C

Normal pointers are special variables in ‘C’ that can hold addresses to data, whereas function pointers are special variables that can hold addresses to functions. Let us walk through a code and see how all we can use a function pointer in C:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
#include <stdio.h>

typedef void function_type(int a);
typedef void (*function_ptr_type)(int a);
void (*function_ptr)(int a); // not a type definition, 

void function1(int i)
{
	printf("i : %d\n", i);
}

int main()
{
	// &function1 is not required, as function name, 
	// just like array names gives function's address
	function_type *f_ptr_1 = function1; 
	f_ptr_1(1);
	
	function_ptr_type f_ptr_2 = function1;
	// or simply, f_ptr(2). No need to explicitly dereference. 
	(*f_ptr_2)(2); 						
	
	// Here function_ptr is actually 
	// a function pointer variable 
	function_ptr = function1;			
	function_ptr(3);	
	
	return 0;
}

Result:

i : 1
i : 2
i : 3
Function Pointers and Function Types
  • To get a function’s address either use ‘&function_name’ or simply ‘function_name’
  • In the above code all the three approaches of invoking function1() is not direct, but indirectly through function pointers.
  • When you invoke a function through a function pointer, both the following approaches are valid:
    • a_function_pointer()
    • (*a_function_ptr)()
  • ‘function_type’ is just a type definition for a function. To use this, we need to define a pointer of type ‘function_type’
  • ‘function_ptr_type’ is a type definition for a function pointer. To use this, we need to define one variable of type function_ptr_type
  • ‘function_ptr’ is NOT a type but a variable itself in global scope. You cannot instantiate another variable of type ‘function_ptr;

When to use function pointer?

I will explain a simple use case of function pointers in C. Function pointers are used widely to implement callback in ‘C’. Callbacks are like a contract between two software entities. Party B signing the contract asks the other party A to invoke this particular function whenever any specific event happens. Party A invokes this function as a regular function but through this function pointer. That means party A gets the event, and party B is notified through this callback mechanism. When this function is executed, party B assumes that the event has triggered. This can also be used for delegation.

Enjoyed this chapter? Let me know in the comments below. Thanks! 🙂

9 thoughts on “Chapter 2 : Address Abstraction

Leave a Reply