If you ever skimmed linux kernel code may have noticed that condition expressions are often marked as either 'likely' or 'unlikely':
if (likely (i == 0)) {
...
}
What is it good for?
These directives are translated into the following GCC directives:
#define likely(x) __builtin_expect ((x), 1)
#define unlikely(x) __builtin_expect ((x), 0)
These in turn instruct the compiler which branch of the program should be faster and which may execute slower.
To understand how it works we need to understand few details about modern CPUs. Most importantly, that accessing memory is extremely slow when compared to other CPU operations. An operation requiring memory access can take literally 100 times longer to execute than an operation using CPU registers.
To mitigate the problem, CPU designers have come with the concept of CPU caches. The idea is that part of memory currently in use is copied from the memory into the CPU itself. That way you gain an order of magnitude improvement in performance.
The obvious problem with that is that the size of CPU cache is limited in size. You can't copy entire memory into it. So, the CPU has to guess which memory is going to be used in the near future and load that memory into the CPU cache. The algorithms for doing that can be as simple as 'memory that was used most recently is the most likely to be used in the future' or more sophisticated like 'it the program is using memory at address X now, it is likely to use memory at address X+1 in a short time".
Whatever the case, failure to predict which memory is going to be used results in so called "cache miss" and operation that would otherwise take 1 nanosecond will suddenly take 30 nanoseconds. Consequently, most of the micro-optimisation work done today deals with avoiding cache misses.
Another piece of the puzzle is the fact that the algorithms themselves are stored as data. The machine instructions reside in the memory and have to be accessed in the same way as data in text documents, images, databases etc. What that means is that if program jumps to an address that is not currently in the CPU cache, the execution of program hangs up for say 30 nanoseconds, till next instruction is loaded from the memory.
Now, what if the jump is conditional? Obviously one path will simply continue executing the subsequent instructions which happen to be already loaded in the CPU cache. The other path jumps to another memory location and thus risks a cache miss and the associated performance penalty:
In short, when there's a conditional jump (it-then-else statement in high-level languages) there will necessarily be a "slow" path and a "fast" path. And it's up to you, as a programmer, to decide what code to put on the fast path and which should go to the slow path.
Imagine this piece of code:
if (nuclear_reactor_leaking)
goto emergency;
/* Do routine maintenance here. */
...
emergency:
/* Start emergency procedure here. */
...
When the nuclear reactor is leaking the execution jumps to another memory location and is likely to experience a cache miss. Thus, handling of the emergency condition is postponed by 30 nanoseconds.
On the other hand, if the nuclear reactor is working as expected the routine maintenance is executed extremely fast, with no cache misses.
Obviously, this is not the desired behaviour. We should re-write the above code in the following manner:
if (!nuclear_reactor_leaking)
goto maintenance;
/* Start emergency procedure here. */
...
maintenance:
/* Do routine maintenance here. */
...
However, even in this case the compiler is free to re-order the layout of the code in the memory. Such re-ordering is often done during the optimisation phase of the compilation. Here's where likely() and unlikely() macros come to help. They instruct the compiler about how to layout the branches in the memory. We can re-write the above program this way:
if (likely (nuclear_reactor_leaking)) {
/* Start emergency procedure here. */
...
}
else {
/* Do routine maintenance here. */
...
}
So far so good.
However, have a look at the last code sample once again. Have you noticed anything strange?
Well, nuclear reactor leakage is not likely. Just the opposite. It is extremely rare. And yet we are supposed to mark it as "likely". Yuck!
And this is the point I want to make in this article: likely() and unlikely() macros have stupid names. Actually, the names are seriously misleading.
I've seen people reasoning this way: "This patch is going to be executed in 90% of cases, so let's mark is as likely." While the reasoning may work well in most cases, applying it to our nuclear reactor example means doing the exact opposite of what's needed.
I've even seem people reason like this: "This branch is short and it will execute quickly. Let's mark it as likely." Once again, this may work well in most cases, but won't work for the nuclear reactor example. However short and efficient the maintenance procedure is, it should never be marked as likely. It's always the emergency procedure that should be tagged as likely, irrespective of how long and slow it happens to be.
In short, when you are thinking about whether to mark a branch as likely or unlikely, you should not ask about it existing performance characteristics (How often does it get called? How long does it execute? Etc.) but rather ask yourself about your intended performance characteristics (Do I want this branch to be fast? Can I tolarate a little delay in this branch? Etc.)
The rule of the thumb is: Mark branch that you want to be executed quickly as "likely" and the other branch as "unlikely".
You can even formalise this rule in the following way:
#define accelerate(x) likely(x)
#define decelerate(x) unlikely(x)
And our nuclear reactor example would now look like this:
if (accelerate (nuclear_reactor_leaking)) {
/* Start emergency procedure here. */
...
}
else {
/* Do routine maintenance here. */
...
}
Makes more sense now, doesn't it?
Martin Sústrik, Jun 22th, 2012
…and now every condition is tagged with accelerate() in hopes it makes it execute faster. :)
How about prioritize() ? Sounds appropriate. Not sure what the opposite should be though.
The trouble is the if statement without an else clause. Then the likely/unlikely doesn't even make sense. Maybe a construct that enforces the it-then-else structure would help? fast_if (x) … else_slow … (the point would be to make compilation fail if else_slow clause would not be present)
Your reasoning is wrong.
if( unlikely( … ) ) { … } makes perfect sense.
The if branch can be scheduled at the end of the function. Either it ret, call an unwind thunk, or jump back after the conditional check.
Oops. You are right.
You are missing (or maybe in purposely hiding) a fundamental issue that these macros try to solve: Branch Prediction. Modern CPUs are using pipelining. When CPU predicts the wrong branch it must flush its pipe. That's pretty bad in term of performance. With these macros compiler can add hints for the CPU.
The whole cache thing is mostly hidden by instruction cache prefetching done by the CPU itself. Except if you write very sparse code…
Yes, but does that break the argument above? The goal is not to tag the code as "probable" or "unprobable", rather as "I want this branch to be faster than the alternate branch".
The most probable path may well not be on critical path and thus its performance is irrelevant. The least probable, on the other hand, can be directly on the critical path and should be optimised as much as possible.
Sure I'm not saying otherwise and that is a good argument.
I was just pointing out that the underlying performance issue is more related to branch mispredictions than cache misses.
I'm not so sure.
I think the point using __builtin_expect is branch prediction, as stated above, by jclairem.
And the problem here is not only the cache misses the code can cause, but the wasted cycles in executing the branch marked as likely (or accelerated) when this is not the most probable branch to be executed. Even though you might want to optimize an unlikely branch because its execution time is critical, if you mark it as "likely" the CPU will start processing that branch every time (or most of the time) the function is called. So, the execution will then have to be thrown away and the CPU cycles will be wasted. Then the most probable executed branch (marked as "unlikely", to optimize the other one) will have to be executed again and so, the pipeline slowed down because of the branch misprediction.
Furthermore, the CPU has a branch prediction buffer (or something like that) so, if this function is called often, it will "know" that the most probable branch (marked as "unlikely") is, in fact, the "likely" branch, making the CPU hint totally useless. In case it is not called so often, a branch misprediction will occur for every call.
My apologies for the confusion created by the words [un]likely and most/less probable. It wasn't that easy to put. :P
You are right that CPU cycles will be wasted. However, the point is that if you are not on a critical path you don't care. No time-sensitive computation is going on anyway.
As for the prediction buffer, I guess there's nothing you can do about it, but even in such case the cache miss argument holds.
I think that to avoid cache misses you just can put the code right after the if condition. The compiler will usually keep the code in that order (first the if block, then the else block; it is still free to reorder single instructions) and so, the "likely" hint will have no effect whatsoever.
I'm not an expert, I'm just thinking out loud. :)
My point here is that I am not so sure about using the __builtin_expect hint for this kind of optimization. But, of course, I might be wrong! ;)
In general you wouldn't want to accelerate a reactor leak more than you consider it to be likely…
Just saying :).
To agree with others, the issue here is the branch prediction penalty, especially that on CPUs with speculative execution (Pentium Pro onwards for x86). These can happily do a whole sequence of operations, but not committing anything back -like a write to memory- speculatively -that is, based on the expectation that a specific branch will be taken.
Speculation is a key way to get performance even in code with for loops and branches. Instruction cache (I-Cache) misses are generally irrelevant, at least for short if/then/else clauses, because cache line length means that both paths are likely to be nearby.
CPUs usually default to sequential code -the if() { here } clause- by default, but maintain a Branch Target Buffer, which remembers the last branch outcome, and takes that one again. They hash the address, so there's always a risk of a misprediction when branches are met again, but generally it works well. Worth noting is that in a for() loop, they invariably mispredict the final outcome, when the loop finishes, because every other iteration they took the branch.
The likely() clause is presumably adding some hint to the CPU to say "this is the branch that's taken, if you don't have any memory of the previous outcome of this branch". Where it could be useful is with unlikely() and debug statements:
if (unlikely(level==DEBUG)) logDebug("core temperature: %s, reactor->temp);
when debug is disabled, the CPU will skip building the string, so the default production path doesn't waste cycles, not even on the first pass through the code.
For further reading, I'd recommend Patterson and Hennessy, Computer Architecture, A quantitive approach. I'd recommend reading that for its coverage of CPU concurrency in general, as you'll never look at the volatile keyword in the same light ever again.
Unless of course the debug level can be altered at runtime, in which case marking all the debug statements as unlikely would be a very bad idea indeed.
I think that you missed the points of those builtin instructions. Their names are perfectly valid - all they do is to give a hint to the compiler if it should expect that the branch will be taken or not. It is done solely for branch prediction (as is stated in the docs - https://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html).
The problematic thing is that it's often hard to predict this and with the builtin hints you can actually make things worse if you predict it wrong. But in tight loops where you have some inside knowledge about the branches it can help.