Yesterday I attended C++ meetup regularly organized by Avast folks in their Prague offices. There was a great talk by Jakub Beránek, a talented CS student from Ostrava, about observing performance effects that the internal CPU architecture has on your code.
In this awesomely organized and documented repo, there are C++ benchmark programs that demonstrate thinks like the following:
- Branch misprediction: if the CPU is able to predict the result of an
ifcondition correctly, the code runs much faster. This is also a topic of the most upvoted Stack Overflow question ever.
- False sharing: if two threads write to memory locations that are close enough to be in one cache line, the L1/L2/L3 caches need to be flushed all the time and the program slows down a lot. This is one of reasons why immutable data structures can produce faster programs, even if they do more memory allocations and copying.
- Bandwidth saturation: there is a limit how fast data can be transferred between memory and CPU. Spawning more processes and threads, even if they don’t share memory at all, can eventually saturate the bandwidth and slow your program down.
And there are many more examples like that. Did you know that CPU caches are basically hash tables and that your program can degrade their performance if it manages to generate hash collisions?
The performance nerd in me is so happy.