templatesTypeTraits

The Type-Traits Library: Optimization

The type-traits library has two main goals: correctness and optimization. Today, I write about optimization.

templatesTypeTraits

This post is the last post in my miniseries about the type-traits library. I have already written the following posts:

Before I write about optimization in C++, I want to tell a short anecdote. I often have the following conversation with my students in my classes:

  • Me: Why do we have the feature ABC in C++?
  • Student: I don’t know.
  • Me: If you don’t have an answer, say performance. This always works in C++.

So, let me write about the type-traits library from the optimization perspective.

Optimization

The idea is quite straightforward and used in the Standard Template Library (STL). If the elements of a range are simple enough, the algorithms of the STL like std::copy, std::fill, or  std::equal are directly applied to memory. Instead of using std::copy to copy each element one by one, all is done in one big step. Internally, C functions like memcmp, memset, memcpy, or memmove are used. The small difference between memcpy and memmove is that memmove can deal with overlapping memory areas.

 

Rainer D 6 P2 500x500Modernes C++ Mentoring

Be part of my mentoring programs:

  • "Fundamentals for C++ Professionals" (open)
  • "Design Patterns and Architectural Patterns with C++" (open)
  • "C++20: Get the Details" (open)
  • "Concurrency with Modern C++" (starts March 2024)
  • Do you want to stay informed: Subscribe.

     

    The implementations of the algorithm std::copy, std::fill, or std::equal use a simple strategy.  std::copy is like a wrapper. This wrapper checks if the elements are simple enough. If so, the wrapper will delegate the work to the optimized copy function. If not, the conservative copy algorithm is used. This conservative one copies each element after the other. The functions of the type-traits library are heavily used to make the right decision.

    The graphic shows the general strategy:

    PerformanceDecision

     

    That was the theory, but here is the practice. Which strategy is used by std::fill?

    std::fill

    std::fill assigns each element in the range a value. The listing shows a GCC-inspired implementation of std::fill.

    // fillGCC.cpp
     
    #include <cstring>
    #include <chrono>
    #include <iostream>
    #include <type_traits>
    
    namespace my{
    
      template <typename I, typename T, bool b>
      void fill_impl(I first, I last, const T& val, const std::integral_constant<bool, b>&){
        while(first != last){
          *first = val;
          ++first;
        }
      }
    
      template <typename T>                                                              // (2)
      void fill_impl(T* first, T* last, const T& val, const std::true_type&){
        std::memset(first, val, last-first);
      }
    
      template <class I, class T>
      inline void fill(I first, I last, const T& val){
        typedef std::integral_constant<bool,std::is_trivially_copy_assignable<T>::value 
    && (sizeof(T) == 1)> boolType; // (1) fill_impl(first, last, val, boolType()); } } const int arraySize = 100'000'000; char charArray1[arraySize]= {0,}; char charArray2[arraySize]= {0,}; int main(){ std::cout << '\n'; auto begin = std::chrono::steady_clock::now(); my::fill(charArray1, charArray1 + arraySize,1); auto last = std::chrono::steady_clock::now() - begin; std::cout << "charArray1: " << std::chrono::duration<double>(last).count() << " seconds\n"; begin = std::chrono::steady_clock::now(); my::fill(charArray2, charArray2 + arraySize, static_cast<char>(1)); last= std::chrono::steady_clock::now() - begin; std::cout << "charArray2: " << std::chrono::duration<double>(last).count() << " seconds\n"; std::cout << '\n'; }

     

    Back to the code example. If the expression boolType() in line (1) is true, the optimized version of my::fill_impl in line 2 is used. This variant fills the memory of 100 million entries with the value 1.  sizeof(char) is 1.

    What about the performance of the program? I compiled the program without optimization to measure the non-optimized performance.

    fillGcc

    The optimized version in line (2) is about ten times faster. Interestingly, when I enable full optimization, both variants are equally fast because the compiler generates the same code for both variants. Also, the generic version (line (3)) uses memset: fillGCC.cpp with maximum optimization on Compiler Explorer.

    I presented an old GCC implementation  std::fill, because the newer ones are not so easy to read. Here are the essential parts of the GCC 6 implementation.

    std::fill

    // fill  
    // Specialization: for char types we can use memset.                   
    template<typename _Tp>
      inline typename
      __gnu_cxx::__enable_if<__is_byte<_Tp>::__value, void>::__type     // (1)
      __fill_a(_Tp* __first, _Tp* __last, const _Tp& __c)
      {
        const _Tp __tmp = __c;
        if (const size_t __len = __last - __first)
        __builtin_memset(__first, static_cast<unsigned char>(__tmp), __len);
      }
    

     

    The GCC 6 implementation uses SFINAE. The full specialization of the function template __fill_a use __builtin_memset. The crucial part in this implementation is line (1): __gnu_cxx::__enable_if<__is_byte<_Tp>::__value, void>::__type. Let me rewrite this expression in a human-readable way and use the official names.

    std::enable_if<std::is_byte<Tp>::value, void>::type
    

     

    The expression checks first if the template parameter TP is a byte:std::is_byte<T>::value. If this expression evaluates to true thanks to std::enable_if from the type-traits library SFINAE kicks in. SFINAE stands for Substitution Failure Is Not An Error and applies during overload resolution of a function template. If substituting the template parameter fails, the specialization is discarded from the overload set, but this failure causes no compiler error. This means in this concrete case: When the condition std::is_byte<T>::value returns false, this full specialization is discarded, and another version of __fill_a is used.

    What’s next?

    First, I take a Christmas break of two weeks. My next post will be published on 10th January 2022. I will write about constexpr functions because they have much in common with templates and become more potent with C++20.

    Second, for a long time, I would like to improve my professional teaching of C++. Therefore, I plan to start a mentoring program for C++. Soon I will publish more details about my idea.

     

     

     
     

     

     

    Thanks a lot to my Patreon Supporters: Matt Braun, Roman Postanciuc, Tobias Zindl, G Prvulovic, Reinhold Dröge, Abernitzke, Frank Grimm, Sakib, Broeserl, António Pina, Sergey Agafyin, Андрей Бурмистров, Jake, GS, Lawton Shoemake, Jozo Leko, John Breland, Venkat Nandam, Jose Francisco, Douglas Tinkham, Kuchlong Kuchlong, Robert Blanch, Truels Wissneth, Kris Kafka, Mario Luoni, Friedrich Huber, lennonli, Pramod Tikare Muralidhara, Peter Ware, Daniel Hufschläger, Alessandro Pezzato, Bob Perry, Satish Vangipuram, Andi Ireland, Richard Ohnemus, Michael Dunsky, Leo Goodstadt, John Wiederhirn, Yacob Cohen-Arazi, Florian Tischler, Robin Furness, Michael Young, Holger Detering, Bernd Mühlhaus, Stephen Kelley, Kyle Dean, Tusar Palauri, Dmitry Farberov, Juan Dent, George Liao, Daniel Ceperley, Jon T Hess, Stephen Totten, Wolfgang Fütterer, Matthias Grün, Phillip Diekmann, Ben Atakora, Ann Shatoff, Rob North, Bhavith C Achar, Marco Parri Empoli, moon, Philipp Lenk, Hobsbawm, and Charles-Jianye Chen.

    Thanks, in particular, to Jon Hess, Lakshman, Christian Wittenhorst, Sherhy Pyton, Dendi Suhubdy, Sudhakar Belagurusamy, Richard Sargeant, Rusty Fleming, John Nebel, Mipko, Alicja Kaminska, Slavko Radman, and David Poole.

    My special thanks to Embarcadero
    My special thanks to PVS-Studio
    My special thanks to Tipi.build 
    My special thanks to Take Up Code
    My special thanks to SHAVEDYAKS

    Seminars

    I’m happy to give online seminars or face-to-face seminars worldwide. Please call me if you have any questions.

    Standard Seminars (English/German)

    Here is a compilation of my standard seminars. These seminars are only meant to give you a first orientation.

    • C++ – The Core Language
    • C++ – The Standard Library
    • C++ – Compact
    • C++11 and C++14
    • Concurrency with Modern C++
    • Design Pattern and Architectural Pattern with C++
    • Embedded Programming with Modern C++
    • Generic Programming (Templates) with C++
    • Clean Code with Modern C++
    • C++20

    Online Seminars (German)

    Contact Me

    Modernes C++ Mentoring,

     

     

    0 replies

    Leave a Reply

    Want to join the discussion?
    Feel free to contribute!

    Leave a Reply

    Your email address will not be published. Required fields are marked *