Hardware Effects

Yesterday I attended C++ meetup regularly organized by Avast folks in their Prague offices. There was a great talk by Jakub Beránek, a talented CS student from Ostrava, about observing performance effects that the internal CPU architecture has on your code.

In this awesomely organized and documented repo, there are C++ benchmark programs that demonstrate thinks like the following:

  • Branch misprediction: if the CPU is able to predict the result of an if condition correctly, the code runs much faster. This is also a topic of the most upvoted Stack Overflow question ever.
  • False sharing: if two threads write to memory locations that are close enough to be in one cache line, the L1/L2/L3 caches need to be flushed all the time and the program slows down a lot. This is one of reasons why immutable data structures can produce faster programs, even if they do more memory allocations and copying.
  • Bandwidth saturation: there is a limit how fast data can be transferred between memory and CPU. Spawning more processes and threads, even if they don’t share memory at all, can eventually saturate the bandwidth and slow your program down.

And there are many more examples like that. Did you know that CPU caches are basically hash tables and that your program can degrade their performance if it manages to generate hash collisions?

The performance nerd in me is so happy.

Optimizing JavaScript code for JITs

Here is an awesome article about optimizing JavaScript code for JIT compilers. Describes techniques on how to extract information on where V8 spends time, and most of them are surprisingly accessible to someone with little optimizing-compiler skills.

Then Shu-yu Guo retweeted it with a link to their paper about Optimization Coaching in devtools. There is an experimental and unfinished feature (hidden under the devtools.performance.ui.show-jit-optimizations pref) in Firefox that implements these ideas.