Topics at a glance:
- The wonders of union:
- Vagaries of behavior
- Same memory, different perspectives
- Elegant use of unions for mitigating bit manipulation
- Struct datatype and it’s mysterious bit-field
Struct and Union
After arrays and their inherent address abstraction mechanisms, I’ll now turn my focus to unions and structures. They are also a category of composite data types in C. Unions and structures can hold members belonging to varying data types, whereas for arrays, members should be belonging to the same data type. Let’s start with unions.
Unions are also known as super variables. I guess in ‘C’, union is the only data structure defined by the language which exhibits vagaries of behavior. A union, can at times behave as an integer data type, at a different time, the same union can behave as a float data type, or a char data type or at sometimes even as arrays or even structures. That is why I told unions exhibit behavioral changes, or, we can put it like this :- unions exhibits multiple-personality. I will demonstrate it for you with an example.
In the above example, once you assign 2.567F to member ‘f’, the previously assigned integer value to ‘i’ will get changed. It will be replaced by the IEEE 754 floating point equivalent of 2.567F. i.e. once you assign a valid data to any member of the union, from there on, that union will behave as a variable belonging to that member’s data type, until the next data is assigned. You will run into programming horrors when you try to access any other member of union at this point, such as accessing ‘i’ after assigning a float value to ‘f’, accessing ‘f’ after assigning a char value to ‘c’ etc. The main reason for this issue is actually the most important feature of a union.
A union always occupy the same block of memory regardless of the members declared inside it.
The below code demonstrates this. I have declared a union type as follows:
I have written the following code to manipulate this union:
The above code produced the following result when run:
As integer Byte 0 : 0x5 Byte 1 : 0x0 Byte 2 : 0x0 Byte 3 : 0x0 As float Byte 0 : 0x0 Byte 1 : 0x0 Byte 2 : 0xA4 Byte 3 : 0x40 As char Byte 0 : 0x41 Byte 1 : 0x0 Byte 2 : 0xA4 Byte 3 : 0x40 Where is the union 'u1' stored ? Address of u1.i is 0x461CC44 Address of u1.f is 0x461CC44 Address of u1.c is 0x461CC44
- When used as integer, printing the contents of union shows how 0x5 is stored in 4 bytes of memory in a typical little endian system (i.e. LSB first) (Refer to chapter 1 for more details on how integer is stored)
- When used as float, it shows how 5.125F is stored in IEEE-754 single precision floating format in little endian system. (Refer to chapter 1 for more details on how float (IEEE 754) is stored)
- When used as character, it shows how ‘A’ (ascii value of character ‘A’ is 0x41) is stored in memory
- Results also show that when a union is used as char, accessing union as a 4-byte data causes undesirable effects. When you observe the results for char you can see that bytes 2 and 3 of union is still retaining remnants of older float value. This happens as all the members of union is getting stored in the exact same memory (here, 0x461CC44).
- Unions are not self-managing. Programmer should be very careful on the current state of union and should treat union as a data type appropriate in that state. In the above example, after copying a character value of ‘A’ to union, code should not use the union as a 4-byte data type. If you want to change the type, then re-assign some value to a member of suitable data type.
- The language just gives you a facility to use the same memory for storing different type of data, but of course not at the same time. In course of execution union’s behavior change, i. e. it takes the shape of the last assigned data type.
It is programmer’s responsibility to use the union wisely!
Bit manipulation using union
I will explain a typical use case where unions are most appropriate choice for a programmer. Any guess?
If you are an embedded programmer it would have struck your mind. I am talking about manipulating register values of a typical processor.
Bit-fields in C structs: A use-case
The use of union in conjunction with structure’s bit fields is a very powerful programming idiom in embedded world!
Please see the below declaration of a union.
Note: The bit ordering and alignment is implementation (underlying platform) dependent. Understand the word attributes such as byte alignment/ordering etc of your target platform and compiler support for bit-fields. Here, I have considered target platform as Intel x86 32 bit CPU and is compiled using gcc 7.4.0.
See how the code easily and naturally manipulates register contents w.r.t bytes, bits and nibbles!!!
The result of the above code is as below:
Register value : 0 Register value : 84 Register value low : 4 Register value high : 8
Awesome, isn’t it ?
Let’s understand how this is happening? But before that, let me tell you the significance of this idiom in embedded world.
Usually programmers perform bit set/clear, nibble manipulation etc. using language’s bit manipulation operators such as bit left shit <<, bit right shift >>, Hexadecimal/Binary MASKS such as 0xFF and various other combinations. In embedded programming, updating register contents is just a routine task. Let us see how a C program is written to manipulate bits using typical C bit manipulation convention.
Now using our union approach the same operations can be done in a more natural or simple way.
Let us see the result of the above code using union approach:
b2 set, Register value : 0x4 b2 cleared, Register value : 0x0 b1 set, Register value : 0x2 nibble high is set to 0xE, Register value : 0xE2 Value set to 0xFF, Register value : 0xFF
Now, what you say? Which approach is better? I will definitely select union approach over the direct bit manipulation approach, as union approach is more intuitive.
Just few points on C’s bit-fields:
- Make sure that your C compiler supports bit fields properly. Almost all standard C compilers such as gcc, clang and MSVC supports bit fields. Bit fields are defined in the C standard and is a very handy tool at times.
- Byte ordering (endianness), bit ordering, alignment etc. are platform/implementation dependent, as I’ve mentioned above.3.
- Never try to get address of members declared with bit-field. It results in undesirable behavior. Most of the compilers will not allow you to use ‘&’ i.e address of operation on bit-fields. At least, they’ll warn us what so ever.
Let’s see what is happening under the hood with the above union – register_8_bits? The answer is simple; Unions in ‘C’ guarantees that the compiler will allocate the exact same memory region for all the members declared inside. Here, the 8 bit value, member struct bits and nibbles are all allocated to the same memory region. With C’s struct datatype’s bit field, C compiler guarantees that only the amount of ‘N’ bits specified after ‘:’ will be referred when that specific struct member is used in the code. i.e. bit-fields allow you to refer to a variable’s bit position specifically.
So if you are very prudent about bit positions in your code, there is a direct language support.
By placing the members aptly inside a union along with struct’s bit-fields everything else will fit into the puzzle!
Enjoyed this chapter? Let me know in the comments below. Thanks! 🙂