C++ Core Guidelines: Rules for Unions

Contents[Show]

A union is a special data type where all members start at the same address. A union can hold only one type at a time; therefore, you can save memory. A tagged union is a union which keeps track of its types.

 

Wolpertinger

Here are the four rules for unions.

Let's start with the most obvious rule.

C.180: Use unions to save memory

Because a union can hold only one type at one point at a time, you can save memory. The union will be as big as the biggest type.

 

union Value {
    int i;
    double d;
};

Value v = { 123 };      // now v holds an int
cout << v.i << '\n';    // write 123
v.d = 987.654;          // now v holds a double
cout << v.d << '\n';    // write 987.654

 

Value is a "naked" union. You should not use it according to the next rule.

C.181: Avoid “naked” unions

"Naked" unions are very error-prone because you have to keep track of the underlying type.

// nakedUnion.cpp

#include <iostream>

union Value {
    int i;
    double d;
};

int main(){
  
  std::cout << std::endl;

  Value v;
  v.d = 987.654;  // v holds a double
  std::cout << "v.d: " << v.d << std::endl;     
  std::cout << "v.i: " << v.i << std::endl;      // (1)

  std::cout << std::endl;

  v.i = 123;     // v holds an int
  std::cout << "v.i: " << v.i << std::endl;
  std::cout << "v.d: " << v.d << std::endl;      // (2)
  
  std::cout << std::endl;

}

 

 The union holds int the first iteration a double and in the second iteration an int value. If you read a double as an int (1) or an int as a double (2), you get undefined behaviour.

nakedUnion

 To overcome this source of errors, you should use a tagged union.

C.182: Use anonymous unions to implement tagged unions

Implementing a tagged union is quite sophisticated. In case you are curious have a look at the rule C.182. I will just make it easy and will write about the new C++ standard.

With C++17, we get a tagged union: std::variant. std::variant is a type-safe union. Here is a first impression.

 

// variant.cpp

#include <variant>
#include <string>
 
int main(){

  std::variant<int, float> v, w;       // (1)
  v = 12;                              // v contains int
  int i = std::get<int>(v);            // (2)        
                                       
  w = std::get<int>(v);                // (3)
  w = std::get<0>(v);                  // same effect as the previous line
  w = v;                               // same effect as the previous line

                                       // (4)
  //  std::get<double>(v);             // error: no double in [int, float]
  //  std::get<3>(v);                  // error: valid index values are 0 and 1
 
  try{
    std::get<float>(w);                // w contains int, not float: will throw
  }
  catch (std::bad_variant_access&) {}
 
                                       // (5)
  std::variant<std::string> v("abc");  // converting constructors work when unambiguous
  v = "def";                           // converting assignment also works when unambiguous

}

 

In (2) I define the two variants v and w. Both can have an int and a float value. Their initial value is 0. This is the default value for the first underlying type. v becomes 12. std::get<int>(v) returns the value by using the type. Line (3) and the following two lines show three possibilities to assign the variant v the variant w. But you have to keep a few rules in mind. You can ask for the value of a variant by type or by index. The type must be unique and the index valid (4). If not, you will get a std::bad_variant_access exception. If the constructor call or assignment call is unambiguous, a conversion takes place. This is the reason that it's possible to construct a std::variant<std::string> with a C-string or assign a new C-string to the variant (5).

C.183: Don’t use a union for type punning

At first, what is type punning? Type punning is the possibility of a programming language to intentionally subvert the type system to treat a type as a different type. One typical way to do type punning in C++ is to read the member of a union with a different type from the one with which it was written.

What is wrong with the following function bad?

union Pun {
    int x;
    unsigned char c[sizeof(int)];
};

void bad(Pun& u)
{
    u.x = 'x';
    cout << u.c[0] << '\n';       // undefined behavior (1)
}

void if_you_must_pun(int& x)
{
    auto p = reinterpret_cast<unsigned char*>(&x);   // (2)
    cout << p[0] << '\n';                            // OK; better 
// ...
}

 

Expression (1) has two issues. First and foremost, it's undefined behaviour. Second, the type punning is quite difficult to find. This means if you have to use type punning, do it with an explicit cast such as reinterpret_cast in (2). With reinterpret_cast you have at least the possibility to spot afterwards your type punning.

What's next?

Admittedly, this final post to rules for classes and class hierarchies was a little bit short. With the next post, I will write about the next major section: enumerations.

 

 

 

Thanks a lot to my Patreon Supporters: Eric Pederson, Paul Baxter, and Franco Amato.

 

Get your e-book at leanpub:

The C++ Standard Library

 

Concurrency With Modern C++

 

Get Both as one Bundle

cover   ConcurrencyCoverFrame   bundle
With C++11, C++14, and C++17 we got a lot of new C++ libraries. In addition, the existing ones are greatly improved. The key idea of my book is to give you the necessary information to the current C++ libraries in about 200 pages.  

C++11 is the first C++ standard that deals with concurrency. The story goes on with C++17 and will continue with C++20.

I'll give you a detailed insight in the current and the upcoming concurrency in C++. This insight includes the theory and a lot of practice with more the 100 source files.

 

Get my books "The C++ Standard Library" (including C++17) and "Concurrency with Modern C++" in a bundle.

In sum, you get more than 550 pages full of modern C++ and more than 100 source files presenting concurrency in practice.

 

Comments   

0 #1 Balázs Benics 2017-11-20 20:58
Hi there,

I'm not sure with the last example. I don't think that it contains any UBs.
Let's see why:

At the 'void bad(Pun& u)' function
--------------------------------------
You activated the integer member of the 'u' union by assigning the ascii value of 'x' character.
After that we access the union's value through a different (non-active) union member. which is really UB in a strict fashion, but if we assume the language extension which is widely used and offered by the most of the compilers, than we can access the value through a non-active union member.
(Keep in mind, that ANY type can accessed through [signed/unsigned] char type, so it's also true for this line. If we would access through a non std::byte, non char type, than it would be really an UB)

The only problem that I can see there is that we don't know that the System is Big-endian, or Little-endian so two possible output can be there, but none of them is UB.

At the other function, the standard is clear about that, and It seems to be right as you wrote.
But the outcome is still depends on endianness, I think.

But please check it, and tell me if I'm wrong.

Thank you in advance.

ps: btw nice article
Quote

Add comment


My Newest E-Books

Latest comments

Subscribe to the newsletter (+ pdf bundle)

Blog archive

Source Code

Visitors

Today 873

All 538594

Currently are 173 guests and no members online