Chapter 4 : The Stacks In C

Key words: stacks in C, execution model of C, function stack frames in C, function calls in C, varargs and variadic functions in C, stack corruption

Topics at a glance:

  • Stacks and its mysteries
  • C’s execution model and its companionship with stacks
  • The famous stack frames of ‘C’
  • Functions returning big values
  • Parameter size promotion – The size-default and integers
  • Variadic function and stack manipulation
  • The labyrinth of stack corruption – do all roads lead to Rome or can we take another path?

Stacks in C

This chapter discusses about the much anticipated topic in C – the stacks. Stack is a memory that behaves in a Last In First Out (LIFO) manner, for temporary storage of data.

Why stacks are so important in C? What all purposes that stacks serve in C? Stacks exist right from the very start of a C program’s execution and is an integral part of C’s execution model. Stacks have far-reaching role in C language. That said, it’s not surprising to say that more than 50 % of bugs in C code can be directly related to stack issues, while the other half can be indirectly related to stack issues. Am not kidding. Let us see how all stacks are used in C.

As I said, stacks exist right from the beginning of a C program. Setting up stack is required even before C program starts execution. The basic steps that a typical C run time does are as follows:

  1. Clear all BSS regions to 0s
  2. Initialize data segment regions with values from ROM, where usually the executable image is stored to RAM, where the image is loaded and actually executed.
  3. Set up the stack pointer
  4. Jump to C’s first function 

In the above steps, initializing BSS, and Data segment is not mandated by the C language. But we do that for getting correct results from the program, or otherwise program will be useless. But the third step i.e. to set up the stack pointer is so crucial that without this program will not even execute. I will explain why stack pointer is so important even for a tiny little C code to execute correctly. But if you are so curious, just see how the assembly code generated by the compiler for your C program looks like. You will see a lot of stack operations here and there especially where function calls/returns are happening.

Let us see for what all purposes C uses stack mainly:

  • For storing a function’s local variables (a.k.a auto variables)
  • For passing parameters to a function
  • For passing the return address to a callee function ( invoked function )
  • For temporary storage used for computing intermediate expression’s results
  • For saving processor registers across function calls and returns

So, in short, C won’t work without stack. Is it now clear why point number 3 in C run time’s responsibilities is so important? i.e. to Set up stack pointer.

Execution model of C

C is a procedural programming language. Everything which could be executed in C should be part of a function. So, execution model of C simply means how the functions work, or rather how function-calls work. Let us understand in detail how C uses stack in implementing functions. For this, we need to first understand how stack frames are working. This is implemented differently from processor to processor. i.e. the way in which C compilers generate stack frames for Intel x86 is different from that of ARM 32. Processor manufacturers document properly function calling conventions to be followed by compilers for that processor.

I am taking a typical Intel x86 processor for explaining these.

Function stack frames in C

Let us understand the anatomy of a stack frame generated for a C function on x86.

Before that please note the following points:

  1. Stack grows downwards. i.e. From higher address to lower addresses
  2. For putting things to stack we use a push operation. Assume stack pointer is at 0x1000, if we push one data, data will be copied to 0x1000, and current stack pointer will become 0xFFC (i.e. 0x1000 – 0x4)
  3. If you want to take any data from stack, you perform a pop operation. Let’s assume that current stack pointer is at 0xFFC, if we perform pop operation, stack pointer is incremented to 0x1000, and data from 0x1000 is returned/popped out of stack
  4. Note that size of data moved in/out during stack operations such as push/pop depends on processor’s word length. I have taken Intel x86, on which word length is 4 bytes (32-bits).

Now, let us analyze a stack frame on a typical x86 for a C function as follows: 

int function(int param0, int param1)
{
	int var0; 
	int var1; 
	// do some computation here 
	return var0; 
} 

EBP: Base pointer register: Indicating the origin of current stack frame. Most of other locations are accessed using offsets to EBP register.

In C, each function during execution, will have a current stack frame starting at an address pointed by EBP register. The above figure shows what all information are stored in stack frame, and also, w.r.t EBP how to access those data from stack frame.

Function calls in C

Let us see the steps involved in functionA calling another functionB and finally returning back to functionA w.r.t how stack frames are manipulated by the C compiler.

Steps done in functionA, prior to calling functionB: Caller is functionA. Callee is functionB

  1. functionA pushes processor registers such as EAX, ECX and EDX (This step is optional and will be done only if registers EAX, ECX and EDX are to be retained across function call. It’s very much likely that these registers will get changed in callee function.)
  2. functionA pushes input parameter values required for functionB.
  3. functionA saves EIP (current instruction pointer) by pushing that into stack. This is the return address used by functionB to return back to functionA.

NOTE: Step 3 is done automatically by ‘call’ instruction.

Steps done in functionB or here the callee function before starting actual computation defined in function body

  1. functionB at first saves current EBP to stack by pushing it to stack.
  2. functionB sets EBP to ESP, which will be used as the new stack frame address for the function
  3. functionB allocates storage for local variables if any
  4. functionB allocates temporary storage in stack if required. (This step is optional and is required only if some complex computation is happening in the function that may produce intermediate results)
  5. functionB then saves EBX, ESI and EDI registers to stack. (This step is optional and is done on a need basis. Source and destination index registers are used when string operations are involved or in cases where there is movement of bulk data)
  6. functionB now starts the actual function execution. i.e code written inside function body, will start execution only here.

Steps done before functionB returns back to functionA

  1. functionB saves return value to EAX register (Intel convention)
  2. functionB restores values of EBX, ESI, EDI from stack if they got changed in course of execution of functionB. (This step is optional. Only done on a need basis)
  3. functionB doesn’t require local vars and temporary storage anymore. So it sets back stack pointer back to EBP.
  4. Pops ESP contents (which is actually functionA’s EBP value) to EBP. This operation brings back functionA’s stack frame.
  5. For returning back to functionA, functionB has to perform one last operation. functionB pops out the return address from stack and jumps to that address.

NOTE 1: Above steps 3 and 4 are done using ‘leave’ instruction of Intel x86.

NOTE 2: step 5 is done using ‘ret’ instruction

Steps done by functionA post return from functionB

  1. functionA doesn’t require the parameters pushed onto stack before calling functionB. So here it can get rid of these by setting the ESP accordingly.
  2. functionA saves the return value from EAX register
  3. functionA restores back EAX, EBX and ECX registers if required.

Below is the C code used for demonstrating stack frames:

int functionB(int i)
{
	int j;
	
	// do something - compute j from i
	
	return j;
}

int functionA(int x, int y)
{
	int a;
	int b;
	
	// do some computation
	
	// calls functionB()
	b  = functionB(a);
	
	return b;
}

Below is the stack frame generated for the sequence functionA calls functionB returns to functionA.

The below assembly code generated by GCC shows these operations in detail. I have added comments for properly describing what’s going on at each line. Text following semicolon ‘;’ is a comment.

_functionB:
    pushl    %ebp           ; saves EBP of caller function 
                            ; - i.e. EBP of functionA to stack
    movl    %esp, %ebp      ; set EBP with current ESP. 
                            ; i.e. new stack frame is set for functionB
    subl    $16, %esp       ; allocates 12 bytes temporary storage 
                            ; + 4 bytes for one local variable in stack
    movl    -4(%ebp), %eax  ; access local variable 'j' using ebp - 4 
                            ; and copy it to eax
    leave                   ; reverse the stack frame 
    ret                     ; return back to functionB 
_functionA:
    pushl    %ebp           ; saves EBP of caller function
                            ; - i.e. EBP of main to stack
    movl    %esp, %ebp      ; set EBP with current ESP. 
                            ; i.e. new current stack frame is set for functionA 
    subl    $20, %esp       ; allocates 12 bytes temporary storage 
                            ; + 8 bytes for two local variables in stack 
    movl    -4(%ebp), %eax  ; access local variable 'a' using ebp - 4 
                            ; and copy it to eax
    movl    %eax, (%esp)    ; move eax contents to esp. This is used as 
                            ; the input parameter in functionB
    call    _functionB      ; calls functionB
    movl    %eax, -8(%ebp)  ; post return from functionB, 
                            ; saves functionBs return value from eax to 
                            ; local variable 'b' @ ebp - 8
    movl    -8(%ebp), %eax  ; copies value of 'b' to eax as 
                            ; return value from functionA
    leave                   ; reverse the stack frame 
    ret                     ; return back to main 

Some extra points to notice here:

  1. Function parameters are pushed into the stack by the caller from right to left. You can see that for functionA stack frame, int y is at EBP + 12 (a higher address) and int x is at EBP +8. That means main function had pushed int y (right most parameter) to stack first and then int x.
  2. Temporary storage for both functions are fixed at 12 bytes. These are probably for saving EBX, ESI and EDI registers, each taking 4 bytes making a total of 12 bytes.
  3. As per x86 C function calling conventions, there are certain registers which are to be saved by caller function such as EAX, ECX, EDX, whereas some registers are callee’s responsibility to save. EBX, ESI, EDI are callee saved registers.

Miscellaneous things related to stacks:

Functions returning more than 4 bytes.  

C functions often return 4-byte data. But in certain situations you will have to return data which is more than 4 bytes from a function. A typical example is a function returning a struct data type. This is achievable only through stacks. We all know that return data less than and up to 4 bytes gets stored in EAX register as per Intel convention. But if the return data is more than word length (4 bytes), then EAX or CPU registers cannot be used directly for this.

What will happen under the hood when the below code is compiled:

typedef struct
{
	int i;
	int j;
	int k;
}my_struct;

my_struct func1(void)
{
	my_struct s = {1,2,3};
	return s;
}

If I have to invoke func1 as:

my_struct s0 = func1();

Compiler will automatically change the code (internally) to something as follows:

func1(&s0);

C will not support return value of more than 4 bytes directly. Here, C compiler changes the above function call to func1(&s0); For the programmer, even if func1() is a no argument function, compiler makes func1() to a single argument function. Once this happens, then the return value will get stored in the address given as the first parameter. As you already know parameters are passed in C, via stacks, without stack this trick will not work. Do you remember in C++ all non-static member functions get this parameter, popularly known as ‘this-pointer’ as their first parameter from the compiler automatically? So, this is not anything new. Compilers often change the way you have written the code but will make sure the semantics doesn’t change.

Parameter size promotion to integers:

Another direct effect of stack is on the way C passes function parameters. i.e w.r.t size of the data, all parameters with size less than or up to 4-bytes are promoted to integers i.e. 4-bytes data. Stack operations such as push/pop works only on data of CPU word length size.

Say, if your function is

void my_function(int I, char c, float f);

Then second parameter ‘c’ will be automatically up-sized to 4-bytes. Though it won’t lose any type attributes. It’s just allocated 4 bytes in stack memory. That’s all.

Other place of stack related size promotion is for function’s local variables. For this I will illustrate with an example right below:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
void my_function(void)
{
	int i;
	char c;
	float f;
	
	printf("Address of i : 0x%X\n", &i);
	printf("Address of c : 0x%X\n", &c);
	printf("Address of f : 0x%X\n", &f);
	
	return;
}

Output is as below:

Address of i : 0x461CC3C
Address of c : 0x461CC3B
Address of f : 0x461CC34

The memory layout looks like below:

float f

0x461CC34 – 0x461CC37; 4 bytes

 

char c

0x461CC38 – 0x461CC3B; 4 bytes

But since char occupies only 1 byte,
only 3B is used when you refer to ‘c’

int i

0x461CC3C – 0x461CC3F; 4 bytes

 

 

Ideally char ‘c’ could have been allocated @ 0x461CC38 itself, but since its allocated in function stack, 4-byte address boundary is a must. If it is allocated in 1 byte, then integer ‘i’ should start from 0x461CC39, which will be okay w.r.t memory utilization. But stack operations, as I told works with 4-byte data i.e CPU word length. You might have noticed, that even though the system is Intel x86 little endian, char ‘c’ value is stored at MSB location 0x461CC3B and not LSB location of 0x461CC38. This is required not to lose information. Actually you need to understand that char is not type-casted to int, but is just allocated in 4-bytes location; i.e. without losing the type information. Only if it is converted to an int type, you need to worry about byte ordering or endianness. Three bytes memory from 0x461CC38 to 0x461CC3A are left as gaps. If there were other 3 char variables declared in the function, they could have been allocated in those gaps by the compiler. This is in a way almost similar to the structure padding.

My point here is, in general data stored in stacks, whether it’s function parameters, or function local variables, will be aligned to word boundaries of the processor.

Varargs and variadic functions in C

You all know very well what are varargs or variadic functions in C. The most famous of them is the printf(). Variadic functions unlike other functions take variable number of arguments. For quickly showing you the meaning:

void normal_function(int arg1, int arg2);
void variadic_function(int arg1, …);

So what does the mysterious ellipsis notation‘…’ tells the compiler?

The presence of ellipsis ‘…’ tells the compiler in fact some critical information.

First, it tells that the function is variadic function and not a normal function.

Second, it makes the compiler to enforce putting parameters into stack.

Third is the corollary of second point. It makes sure none of the arguments of variadic functions are put in CPU registers. Many processors these days provides a set of generic CPU registers which can be used efficiently to put arguments. Arguments on stack requires accessing the memory (RAM), whereas argument on registers are part of CPU (registers within CPU) and are much faster. So arguments put in registers in effect makes function calls much faster.

So putting ‘…’ makes compiler NOT TO put any arguments in registers. Or at least, the last named argument before the variable argument and the following variable list of arguments will not be put in registers, and instead will be put on stack.

For example,

void my_variadic_function(int arg1, int arg2, int arg3, …);

Here, ‘arg3’ is the last named argument before the variable argument list starts.

There is a reason for this. I will explain with the help of a code.

Before that, you all are aware of the famous stdargs.h header file that we use to include in our “.c” files if we need to use variadic function conventions. This file defines the following useful macros, for manipulating the variable list arguments.

Refer to: https://en.wikipedia.org/wiki/Stdarg.h

Macro name

Description

va_list                 

is not a macro but a type for iterating arguments

va_start

Start iterating arguments with a va_list

va_arg

Retrieve an argument

va_end

Free a va_list.

va_copy

Copy contents of one va_list to another

 

My advice, there is no need to by-heart these macros and their meaning, if you actually know the following two things:

  1. ‘…’ makes the compiler to put arguments on stack and not registers
  2. If you know how parameters are passed onto functions via stack, then we don’t need to use the stdargs macros.

Implementation of a variadic function without #include <stdargs.h>

Go through the code below:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
#include <stdio.h>

int my_variadic_function(const int arg_count, ...)
{
	const int *const argument_stack_address = &arg_count;
	const int *arg_pointer = argument_stack_address + 1;
	
	for(int i = 0; i < arg_count; ++i)
	{
		printf("arg %d: %d\n", i, arg_pointer[i]);
	}
	
	return 0;
}

int main()
{	
	my_variadic_function(5, 10, 20, 30, 40, 50);
	
	return 0;
}

Result:

arg 0 : 10
arg 1 : 20
arg 2 : 30
arg 3 : 40
arg 4 : 50

So what’s happening here? Remember the stack frame and the way, parameters are pushed into stack from right to left order?

Let us understand what is this right to left order of parameters. Stacks grow downwards. i.e. from Higher address to lower address. Right most argument is pushed first making its place higher in the stack while left most (i.e. the very first) argument is pushed last making its place lower in the stack frame.

If you can get the address of the last named-argument, i.e. here it’s the first argument itself which is arg_count, start incrementing the address to get the other arguments from the stack. The first argument is at the lowest portion of parameter list in the stack and last argument is at the highest address region of stack w.r.t parameter list in stack frame.

So the logic is to get the lowest address and keep on incrementing the address to access the other parameters from the stack. The question now is where to stop navigating the stack frame is not mandated by C language. So put some logic in your code for stopping the stack navigation, otherwise you will end up corrupting the stack and can crash your program.

Here the logic that I have used is to mention the size of the variable argument list in the first named-argument ‘arg_count’. The code gets the address of ‘arg_count’ and from that address gets the addresses of the remaining parameters one by one and access those parameters from the stack. The above step iterates for arg_count times! Voila!!! So now you know a way for implementing your own variadic function without the stdargs.h. Processors such as Intel and ARM follows this convention. Make sure you understand the notion of the processor’s stack frame.

NOTE : There should be at least one named argument in the variadic function or else you cannot get the starting address of the parameter list, unless you write some inline assembly code for accessing the stack pointer. You may also need to understand the entire stack frame of the processor!

Stack Corruption

As C uses stack for almost everything related to its execution model, most of the run-time issues in C is most likely to be related to stack corruption.

Issues with use of heavyweight data in functions.

Heavyweight data means data with huge sizes such as structs, arrays, unions, big strings etc. Programmers often do the following:

  • Passing structs and unions as values instead of references
  • Defining auto variables (within function scope) of structs/unions/big sized arrays

Now you know that auto variables and function parameters are allocated in function’s stack frame and not in data segment. Declaring/instantiating heavyweight variables within function scope can consume lot of stack and is not recommended in general and in particular for embedded applications. Even in bigger platforms, with a reasonable amount of memory, can you imagine the stack abuse such functions do if they are recursive? Secondly, arrays are infamous for not checking memory bounds. Any rogue code can access locations outside memory bounds. If such arrays are allocated in stack, then their affect is much worse. Let’s see a simple code that shows how a rogue code can corrupt a function stack with a simple mistake.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
#include <stdio.h>

void dangerous_arrays(int arg1)
{
	int i = 1;
	int j = 2;
	int k = 3;
	int l = 4;
	
	char array[4] = {0, 0, 0, 0};
	
	printf("before stack corruption i : 0x%X\n", i);
	printf("before stack corruption j : 0x%X\n", j);
	printf("before stack corruption k : 0x%X\n", k);
	printf("before stack corruption l : 0x%X\n", l);
	
	printf("before stack corruption array[0] : 0x%X\n", 0xFF & array[0]);
	printf("before stack corruption array[1] : 0x%X\n", 0xFF & array[1]);
	printf("before stack corruption array[2] : 0x%X\n", 0xFF & array[2]);
	printf("before stack corruption array[3] : 0x%X\n", 0xFF & array[3]);
	
	for(int index = 0; index < 20; ++index)
	{
		array[index] = (char)(index * 10);
	}
	
	puts("---------------------------------------------------");
	
	printf("after stack corruption i : 0x%X\n", i);
	printf("after stack corruption j : 0x%X\n", j);
	printf("after stack corruption k : 0x%X\n", k);
	printf("after stack corruption l : 0x%X\n", l);
	
	puts("");
	
	for(int index = 0; index < 20; ++index)
	{
		printf("after stack corruption array[%d] : 0x%X\n", index, 0xFF & array[index]);
	}
	
	return;
}

int main()
{
	dangerous_arrays(0);
	
	return 0;
}

Result is as follows:

before stack corruption i : 0x1
before stack corruption j : 0x2
before stack corruption k : 0x3
before stack corruption l : 0x4
before stack corruption array[0] : 0x0
before stack corruption array[1] : 0x0
before stack corruption array[2] : 0x0
before stack corruption array[3] : 0x0
---------------------------------------------------
after stack corruption i : 0xBEB4AAA0
after stack corruption j : 0x968C8278
after stack corruption k : 0x6E645A50
after stack corruption l : 0x463C3228
after stack corruption array[0] : 0x0
after stack corruption array[1] : 0xA
after stack corruption array[2] : 0x14
after stack corruption array[3] : 0x1E
after stack corruption array[4] : 0x28
after stack corruption array[5] : 0x32
after stack corruption array[6] : 0x3C
after stack corruption array[7] : 0x46
after stack corruption array[8] : 0x50
after stack corruption array[9] : 0x5A
after stack corruption array[10] : 0x64
after stack corruption array[11] : 0x6E
after stack corruption array[12] : 0x78
after stack corruption array[13] : 0x82
after stack corruption array[14] : 0x8C
after stack corruption array[15] : 0x96
after stack corruption array[16] : 0xA0
after stack corruption array[17] : 0xAA
after stack corruption array[18] : 0xB4
after stack corruption array[19] : 0xBE

If you understand how local variables are stored in C, then you will understand that array[4] is actually the lsb of ‘int l’ and array[7] is the msb of ‘int l’ and so on. In simple terms, rogue array has encroached the memory of the neighboring variables. This is an example of stack corruption. Did you figure out the bug in the code? While setting the values of array, the condition I have given in for loop is

index < 20

It should have been index < 3

When using arrays of known size, here are some recommendations:

  • If within loops make sure you use stop condition based on sizeof(array)/sizeof(data).
  • Or use a MACRO/ constant data for storing the size of array and use that in the code

Safer use of arrays:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
char array[10]; // hardcoded size for array is 10

// safer array use option 1; using sizeof operator
for(int index = 0; index < size(array)/sizeof(char); ++index)
{
	array[index] = index * 10;
}

// safer array use option 2; using a constant size information 
const int size = 4;
char array[size];

for(int index = 0; index < size; ++index)
{
	array[index] = index * 10;
}

// safer array use option 3; using a macro for size information
#define SIZE 4
char array[SIZE];

for(int index = 0; index < SIZE; ++index)
{
	array[index] = index * 10;
}

//If passing arrays into functions, follow the convention I have shown below:

// the parameter declaration 'int array[ ]' explicitly conveys that its as array 
void do_some_operation(int array[], const int size) 
{
	for(int index = 0; index < size; ++index)
	{
		// do something using array here
	}
	
	return;
}

// I don't recommend the following at all.
void do_some_operation(int *array){...}
void do_some_operation(int *array, const int size){...}

The parameter declaration ‘int array[]’ is actually making use of pointers. It’s not pass by value. It’s still pass by reference.

Enjoyed the chapter? Let me know in the comments below. Thanks! 🙂

12 thoughts on “Chapter 4 : The Stacks In C

  1. Thank you for your sharing.
    I think there is a typo in the option 1 of “safer use of arrays”.
    The loop condition of line 4 should be “index < sizeof(arr)/sizeof(char)" instead of " index < size(arr)/sizeof(char)".

  2. the best explanation for the stack i read so far.
    may i suggest, an article on GOT-PLT, dynamic linking and lazy binding in c. it’s a really interesting topic without much info on it.

  3. In chapter-4-the-stacks-in-c : I would like suggest to add references to https://godbolt.org/ based stack examples that would give the viewer a clear understanding on the structure of call stack by seeing the real translation for themselves. It would help the viewer to understand as some of them may be from different architecture background ARM apart from x86.

Leave a Reply