C++ Core Guidelines: Improved Performance with Iostreams

As easy as my title and the rules of the C++ core guidelines sound, getting more performance out of the Iostreams is no no-brainer.

 

sport 659224 1280

Okay, let step back. Although I did a lot of tests,  my numbers in this post are more controversial than I thought. If you have any ideas, improvements, or clarifications, please let me know, and I will add them to this post.

Here are the two performance-related rules from the guidelines to Iostreams.

I assume, you don't know std::ios_base::sync_with_stdio?

SL.io.10: Unless you use printf-family functions call ios_base::sync_with_stdio(false)

Per default, operations on the C++ streams are synchronised with the C streams. This synchronisation happens after each in- or output operation.

This allows it to mix C++ and C in- or output operations because operations on the C++ streams go unbuffered to the C streams. What is also important to note from the concurrency perspective: synchronised C++ streams are thread-safe. All threads can write to the C++ streams without any need to synchronisation. The effect may be an interleaving of characters but not a data-race.

When you set the std::ios_base::sync_with_stdio(false), the synchronisation between C++ streams and C streams will not happen because the C++ stream may put their output into a buffer. Because of the buffering, the in- and output operation may become faster. You have to invoke std::ios_base::sync_with_stdio(false) before any in- or output operation. If not, the behaviour is implementation-defined.

I assume you noticed that I wrote quite often maybe. That is for a reason.

Interleaving of C++ Streams and C Streams

First,  I want to know what would happen when I execute the following program with various compilers.

// syncWithStdio.cpp

#include <iostream>
#include <cstdio>
 
int main(){
    
    std::ios::sync_with_stdio(false);

    std::cout << std::endl;
    
    std::cout << "1";
    std::printf("2");
    std::cout << "3";
    
    std::cout << std::endl;
    
}

 

To get a better picture of my various compiler, I add a few information to them.

GCC 8.2

 gcc

SyncWithStdioLinux

Clang 8.0

clang

SyncWithStdioClang

 

cl.exe 19.20

 clexe

 

 SyncWithStdioWin

It seems that only the output on GCC is not synchronised. This observation does not hold for clang or cl.exe on Windows. A small performance test confirmed my first impression.

Performance with and without Synchronisation

Let me write a small program with and without synchronisation to the console. Doing in without synchronisation should be faster.

  • Synchronised

 

// syncWithStdioPerformanceSync.cpp

#include <chrono>
#include <fstream>
#include <iostream>
#include <random>
#include <sstream>
#include <string>

constexpr int iterations = 10;

std::ifstream openFile(const std::string& myFile){                  

  std::ifstream file(myFile, std::ios::in);
  if ( !file ){
    std::cerr << "Can't open file "+ myFile + "!" << std::endl;
    exit(EXIT_FAILURE);
  }
  return file;
  
}

std::string readFile(std::ifstream file){                        
    
    std::stringstream buffer;
    buffer << file.rdbuf();
    
    return buffer.str();
    
}

auto writeToConsole(const std::string& fileContent){
     
    auto start = std::chrono::steady_clock::now();
    for (auto c: fileContent) std::cout << c;
    std::chrono::duration<double> dur = std::chrono::steady_clock::now() - start;
    return dur;
}  

template <typename Function>
auto measureTime(std::size_t iter, Function&& f){
    std::chrono::duration<double> dur{};
    for (int i = 0; i < iter; ++i){
        dur += f();
    }
    return dur / iter;
}
    
int main(int argc, char* argv[]){
    
    std::cout << std::endl;
  
    // get the filename
    std::string myFile;
    if ( argc == 2 ){
        myFile= argv[1];
    }
    else{
        std::cerr << "Filename missing !" << std::endl;
        exit(EXIT_FAILURE);
    } 
  
    std::ifstream file = openFile(myFile);                                  // (1)
  
    std::string fileContent = readFile(std::move(file));                    // (2)
// (3) auto averageWithSync = measureTime(iterations, [&fileContent]{ return writeToConsole(fileContent); }); std::cout << std::endl; // (4) std::cout << "With Synchronisation: " << averageWithSync.count() << " seconds" << std::endl; std::cout << std::endl; }

 

The program is quite easy to explain. I open a file (line 1), read its entire content (line 2) into a string, and write it iterations-times to the console (line 3). This is done in the function writeToConsole(fileContent).

iterations is in my concrete case 10. On end, I display the average time of the output operations (line 4).

  • Non-Synchronised

The non-synchronised version of the program is quite similar to the synchronised version. Only the main function changed a bit.

// syncWithStdioPerformanceWithoutSync.cpp

... 
 
int main(int argc, char* argv[]){
    
    std::ios::sync_with_stdio(false);    // (1)

    std::cout << std::endl;
  
    // get the filename
    std::string myFile;
    if ( argc == 2 ){
        myFile= argv[1];
    }
    else{
        std::cerr << "Filename missing !" << std::endl;
        exit(EXIT_FAILURE);
    } 
  
    std::ifstream file = openFile(myFile);
  
    std::string fileContent = readFile(std::move(file));
    
    auto averageWithSync = measureTime(iterations, [&fileContent]{ return writeToConsole(fileContent); });
    
    auto averageWithoutSync = measureTime(iterations, [&fileContent]{ return writeToConsole(fileContent); });
    
    std::cout << std::endl;
    
    std::cout << "Without Synchronisation: " << averageWithoutSync.count() << " seconds" << std::endl;  
  
    std::cout << std::endl;
    
}

 

I just added the line (1) to the main program. Now, I hope for performance improvement.

I did my performance test with a small program but also with a bigger text file (600.000 characters). The bigger file gave me no new insight; therefore, I skipped it.

>> syncWithStdioPerformanceSync syncWithStdioPerformanceSync.cpp
>> syncWithStdioPerformanceWithoutSync syncWithStdioPerformanceSync.cpp

GCC

syncWithStdioPerformanceCppGcc

Clang

syncWithStdioPerformanceCppClang

cl.exe

syncWithStdioPerformanceCppWin

 

The results puzzled me because of Windows.

  • With GCC, I had a performance improvement of about 70% in the non-synchronised variant.
  • Neither with Clang nor cl.exe showed any performance improvement. It seems that the non-synchronised in- and output operations are synchronised. My numbers proved my observation from the program syncWithStdio.cpp.
  • Only for the record. Did you notice, how slow the console on windows is?

Of course, I'm guilty. I almost always break the next rule.

SL.io.50: Avoid endl

Why should you avoid std::endl? Or to say it differently: What is the difference between the manipulator std::endl and '\n'.

  • std::endl: writes a newline and flushes the output buffer.
  • '\n': writes a newline.

Flushing the buffer is an expensive operation and should, therefore, be avoided. If necessary, the buffer is automatically flushed. Honestly, I was courious to see the numbers. To make it extremely worse, here is my program, which puts a linebreak (line  3) after each character.

// syncWithStdioPerformanceEndl.cpp

#include <chrono>
#include <fstream>
#include <iostream>
#include <random>
#include <sstream>
#include <string>

constexpr int iterations = 500;                                                    // (1)

std::ifstream openFile(const std::string& myFile){                  

  std::ifstream file(myFile, std::ios::in);
  if ( !file ){
    std::cerr << "Can't open file "+ myFile + "!" << std::endl;
    exit(EXIT_FAILURE);
  }
  return file;
  
}

std::string readFile(std::ifstream file){                        
    
    std::stringstream buffer;
    buffer << file.rdbuf();
    
    return buffer.str();
    
}

template <typename End>
auto writeToConsole(const std::string& fileContent, End end){
     
    auto start = std::chrono::steady_clock::now();
    for (auto c: fileContent) std::cout << c << end;                                 // (3)
    std::chrono::duration<double> dur = std::chrono::steady_clock::now() - start;
    return dur;
}  

template <typename Function>
auto measureTime(std::size_t iter, Function&& f){
    std::chrono::duration<double> dur{};
    for (int i = 0; i < iter; ++i){
        dur += f();
    }
    return dur / iter;
}
    
int main(int argc, char* argv[]){

    std::cout << std::endl;
  
    // get the filename
    std::string myFile;
    if ( argc == 2 ){
        myFile= argv[1];
    }
    else{
        std::cerr << "Filename missing !" << std::endl;
        exit(EXIT_FAILURE);
    } 
  
    std::ifstream file = openFile(myFile);
  
    std::string fileContent = readFile(std::move(file));
    
    auto averageWithFlush = measureTime(iterations, 
                                        [&fileContent]{ return writeToConsole(fileContent, std::endl<char, std::char_traits<char>>); }); // (2)
    auto averageWithoutFlush = measureTime(iterations, [&fileContent]{ return writeToConsole(fileContent, '\n'); });                     // (3)
    
    std::cout << std::endl;
    std::cout << "With flush(std::endl) " << averageWithFlush.count() << " seconds" << std::endl;  
    std::cout << "Without flush(\\n): " << averageWithoutFlush.count() << " seconds" << std::endl;  
    std::cout << "With Flush/Without Flush: " << averageWithFlush/averageWithoutFlush << std::endl;
    
    std::cout << std::endl;
    
}

 

In the first case, I did it with std::endl (line 2), in the second case, I did it with '\n' (line 3). The program is quite similar to the previous one. The big difference is that I made 500 iterations (line 3). Why? I was asthonised about the variations of the numbers. With a few iterations, I could not notice any difference. Sometimes, std::endl was two times faster than '\n'; sometimes, std::endl was four times slower. I got similar behaviour with cl.exe or with GCC.  I also did it with other GCC or cl.exe compiler. Honestly, this was not what I expected. When I did it with 500 iterations, I got the expected winner. '\n' seems to be 10% - 20% faster than std::endl. Once more, only 10% - 20% faster.

GCC

syncWithStdioPerformanceCppLinuxEndl

cl.exe

 syncWithStdioPerformanceCppWinEndl

 

My Small Conclusion

I want to draw a small conclusion out of my performance test.

  • std::ios_base::sync_with_stdio(false) can make a big difference on your platform, but you loose your thread-safety guarantee.
  • std::endl is not as bad as its reputation. I will not change my habit.

What's next?

Only one rule exists to the sections regex, chrono, and the C standard library. You see, I have to improvise in my next post.

 

 

 

Thanks a lot to my Patreon Supporters: Paul Baxter,  Meeting C++, Matt Braun, Avi Lachmish, Roman Postanciuc, Venkata Ramesh Gudpati, Tobias Zindl, Marko, Ramesh Jangama, G Prvulovic, Reiner Eiteljörge, Benjamin Huth, Reinhold Dröge, Timo, Abernitzke, Richard Ohnemus , Frank Grimm, Sakib, and Broeserl.

Thanks in particular to:  TakeUpCode 450 60     crp4

 

Get your e-book at Leanpub:

The C++ Standard Library

 

Concurrency With Modern C++

 

Get Both as one Bundle

cover   ConcurrencyCoverFrame   bundle
With C++11, C++14, and C++17 we got a lot of new C++ libraries. In addition, the existing ones are greatly improved. The key idea of my book is to give you the necessary information to the current C++ libraries in about 200 pages.  

C++11 is the first C++ standard that deals with concurrency. The story goes on with C++17 and will continue with C++20.

I'll give you a detailed insight in the current and the upcoming concurrency in C++. This insight includes the theory and a lot of practice with more the 100 source files.

 

Get my books "The C++ Standard Library" (including C++17) and "Concurrency with Modern C++" in a bundle.

In sum, you get more than 600 pages full of modern C++ and more than 100 source files presenting concurrency in practice.

 

Get your interactive course

 

Modern C++ Concurrency in Practice

C++ Standard Library including C++14 & C++17

educative CLibrary

Based on my book "Concurrency with Modern C++" educative.io created an interactive course.

What's Inside?

  • 140 lessons
  • 110 code playgrounds => Runs in the browser
  • 78 code snippets
  • 55 illustrations

Based on my book "The C++ Standard Library" educative.io created an interactive course.

What's Inside?

  • 149 lessons
  • 111 code playgrounds => Runs in the browser
  • 164 code snippets
  • 25 illustrations

Add comment


Subscribe to the newsletter (+ pdf bundle)

Blog archive

Source Code

Visitors

Today 982

All 2180783

Currently are 108 guests and no members online

Kubik-Rubik Joomla! Extensions

Latest comments