thread 2995466 1280

C++ Core Guidelines: Rules for Strings

The C++ core guidelines use the term string as a sequence of characters. Consequently, the guidelines are about  C-strings, C++-strings, the C++17 std::string_view‘s, and std::byte‘s. 

 

 thread 2995466 1280

I will in this post only loosely refer to the guidelines and ignore the strings which are part of the guidelines support library, such as gsl::string_span, zstring, and czstring. For short, I call in this post a std::string, a C++-string, and a const char* a C-string.

Let me start with the first rule:

SL.str.1: Use std::string to own character sequences

Maybe, you know another string that owns its character’s sequence: a C-string. Don’t use a C-string! Why? Because you have to take care of the memory management, the string termination character, and the string length.

 

 

Rainer D 6 P2 500x500Modernes C++ Mentoring

Be part of my mentoring programs:

  • "Fundamentals for C++ Professionals" (open)
  • "Design Patterns and Architectural Patterns with C++" (open)
  • "C++20: Get the Details" (open)
  • "Concurrency with Modern C++" (starts March 2024)
  • Do you want to stay informed: Subscribe.

     

    // stringC.c
    
    #include <stdio.h>
    #include <string.h>
     
    int main( void ){
     
      char text[10];
     
      strcpy(text, "The Text is too long for text.");   // (1) text is too big
      printf("strlen(text): %u\n", strlen(text));       // (2) text has no termination character '\0'
      printf("%s\n", text);
     
      text[sizeof(text)-1] = '\0';
      printf("strlen(text): %u\n", strlen(text));
     
      return 0;
    }
    

     

    The simple program stringC.c has inline (1) and line (2) undefined behavior. Compiling it with a rusty GCC 4.8 seems to work fine.

    stringCThe C++ variant does not have the same issues.

    // stringCpp.cpp
    
    #include <iostream>
    #include <string>
    
    int main(){
     
      std::string text{"The Text is not too long."};  
     
      std::cout << "text.size(): " << text.size() << std::endl;
      std::cout << text << std::endl;
     
      text +=" And can still grow!";
     
      std::cout << "text.size(): " << text.size() << std::endl;
      std::cout << text << std::endl;
     
    }
    

     

    The output of the program should not surprise you.

    stringCpp

    In the case of a C++ string, I cannot make an error because the C++ runtime takes care of the memory management and the termination character. Additionally, if you access the elements of the C++ string with the at-operator instead of the index operator, bounds errors are not possible. You can read the details of the at-operator in my previous post: C++ Core Guidelines: Avoid Bounds Errors.

    You know, what was strange in C++, including C++11? There was no way to create a C++ string without a C-string. This is strange because we want to get rid of the C-string. This inconsistency is gone with C++14.

    SL.str.12: Use the s suffix for string literals meant to be standard-library strings 

    With C++14, we got C++-string literals. It’s a C-string literal with the suffix s: “cStringLiteral”s.

    Let me show you an example that makes my point: C-string literals and C++-string literals a different.

     

    // stringLiteral.cpp
    
    #include <iostream>
    #include <string>
    #include <utility>
    
    int main(){
        
        using namespace std::string_literals;                         // (1)
    
        std::string hello = "hello";                                  // (2)
        
        auto firstPair = std::make_pair(hello, 5);
        auto secondPair = std::make_pair("hello", 15);                // (3)
        // auto secondPair = std::make_pair("hello"s, 15);            // (4)
        
        if (firstPair < secondPair) std::cout << "true" << std::endl; // (5)
        
    }
    

     

    It’s a pity; I must include the namespace std::string_literals in line (1) to use the C++-string-literals. Line (2) is the critical line in the example. I use the C-string-literal “hello” to create a C++ string. This is why the type of firstPair is (std::string, int), but the type of the secondPair is (const char*, int). Ultimately, the comparison in line (5) fails because you can not compare different types. Look carefully at the last line of the error message: 

    stringLiteralsError

    When I use the C++-string-literal in line (4 ) instead of the C-string-literal in line (3), the program behaves as expected:

    stringLiterals

    C++-string-literals was a C++14 feature. Let’s jump three years further. With C++17, we got std::string_view and std::byte. I already wrote, in particular, about std::string_view. Therefore, I will only recap the most important facts.

    SL.str.2: Use std::string_view or gsl::string_span to refer to character sequences

    Okay, a std::string view only refers to the character sequence. To say it more explicitly: A std::string_view does not own the character sequence. It represents a view of a sequence of characters. This sequence of characters can be a C++ string or a C-string. A std::string_view only needs two pieces of information: the pointer to the character sequence and their length. It supports the reading part of the interface of the std::string. Additionally to a std::string, std::string_view has two modifying operations: remove_prefix and remove_suffix.

    Maybe you wonder: Why do we need a std::string_view? A std::string_view is relatively cheap to copy and needs no memory. My previous post C++17 – Avoid Copying with std::string_view shows the impressive performance numbers of a std::string_view.

    As I already mentioned it, we got with C++17 also a std::byte.

    SL.str.4: Use char* to refer to a single character and SL.str.5: Use std::byte to refer to byte values that do not necessarily represent characters

    If you don’t follow rule str.4 and use const char* as a C-string, you may end with critical issues.

     

    char arr[] = {'a', 'b', 'c'};
    
    void print(const char* p)
    {
        cout << p << '\n';
    }
    
    void use()
    {
        print(arr);   // run-time error; potentially very bad
    }
    

     

    arr decays to a pointer when used as an argument of the function print. The undefined behavior is that arr is not zero-terminated. You’re mistaken if you now think you can use std::byte as a character.

    std::byte is a distinct type implementing the concept of a byte as specified in the C++ language definition. This means a byte is not an integer or a character and is not open to programmer errors. Its job is to access object storage. Consequently, its interface consists only of methods for bitwise logical operations.

     

    namespace std { 
    
        template <class IntType> 
            constexpr byte operator<<(byte b, IntType shift); 
        template <class IntType> 
            constexpr byte operator>>(byte b, IntType shift); 
        constexpr byte operator|(byte l, byte r); 
        constexpr byte operator&(byte l, byte r); 
        constexpr byte operator~(byte b); 
        constexpr byte operator^(byte l, byte r); 
    
    } 
    

     

    You can use the function std::to_integer(std::byte b) to convert a std::byte to an integer type and the call std::byte{integer} to do it the other way around. integer has to be a non-negative value smaller than std::numeric_limits<unsigned_char>::max().

    What’s next?

    I’m almost done with the rules for the standard library. Only a few rules to iostreams and the C-standard library are left. So you know what I will write about in my next post.

     

     

     

     

    Thanks a lot to my Patreon Supporters: Matt Braun, Roman Postanciuc, Tobias Zindl, G Prvulovic, Reinhold Dröge, Abernitzke, Frank Grimm, Sakib, Broeserl, António Pina, Sergey Agafyin, Андрей Бурмистров, Jake, GS, Lawton Shoemake, Jozo Leko, John Breland, Venkat Nandam, Jose Francisco, Douglas Tinkham, Kuchlong Kuchlong, Robert Blanch, Truels Wissneth, Kris Kafka, Mario Luoni, Friedrich Huber, lennonli, Pramod Tikare Muralidhara, Peter Ware, Daniel Hufschläger, Alessandro Pezzato, Bob Perry, Satish Vangipuram, Andi Ireland, Richard Ohnemus, Michael Dunsky, Leo Goodstadt, John Wiederhirn, Yacob Cohen-Arazi, Florian Tischler, Robin Furness, Michael Young, Holger Detering, Bernd Mühlhaus, Stephen Kelley, Kyle Dean, Tusar Palauri, Dmitry Farberov, Juan Dent, George Liao, Daniel Ceperley, Jon T Hess, Stephen Totten, Wolfgang Fütterer, Matthias Grün, Phillip Diekmann, Ben Atakora, Ann Shatoff, Rob North, Bhavith C Achar, Marco Parri Empoli, moon, Philipp Lenk, Hobsbawm, and Charles-Jianye Chen.

    Thanks, in particular, to Jon Hess, Lakshman, Christian Wittenhorst, Sherhy Pyton, Dendi Suhubdy, Sudhakar Belagurusamy, Richard Sargeant, Rusty Fleming, John Nebel, Mipko, Alicja Kaminska, Slavko Radman, and David Poole.

    My special thanks to Embarcadero
    My special thanks to PVS-Studio
    My special thanks to Tipi.build 
    My special thanks to Take Up Code
    My special thanks to SHAVEDYAKS

    Seminars

    I’m happy to give online seminars or face-to-face seminars worldwide. Please call me if you have any questions.

    Standard Seminars (English/German)

    Here is a compilation of my standard seminars. These seminars are only meant to give you a first orientation.

    • C++ – The Core Language
    • C++ – The Standard Library
    • C++ – Compact
    • C++11 and C++14
    • Concurrency with Modern C++
    • Design Pattern and Architectural Pattern with C++
    • Embedded Programming with Modern C++
    • Generic Programming (Templates) with C++
    • Clean Code with Modern C++
    • C++20

    Online Seminars (German)

    Contact Me

    Modernes C++ Mentoring,

     

     

    0 replies

    Leave a Reply

    Want to join the discussion?
    Feel free to contribute!

    Leave a Reply

    Your email address will not be published. Required fields are marked *