for (int i = 0; i < 1000; i++) { small_numbers[smlen] = numbers[i]; smlen += (nu...

edelind · 2026-06-05T07:59:45 1780646385

Here is another perspective:

- the first one (branchless) use the condition to SAVE the correct value (< 500): it temporarily writes any current value to the same index i, always overwriting the previous value, effectively saving it (by moving forward to i+1) only when the value is right (small number). Downside of this simple function: the last value may be bigger than 500

- the second one use the condition to ADD the value, when it is 100% sure it is a correct small number

bhaak · 2026-06-05T08:20:23 1780647623

They don't. After running, for the values in small_numbers from 0 to smlen-1 they are equivalent.

But if the last value of numbers[] is not smaller than 500, small_numbers[smlen] will contain that value for the first version whereas the second version does not write to small_numbers[smlen].

addaon · 2026-06-05T06:21:29 1780640489

> "these two loops compute the same value"

At what sequence point? The branchless version writes to small_numbers[smlen], for any given value of smlen, potentially more than once; so there are observable points of time during the loop where the behavior is different. But after the loop, both contain the final write to small_numbers[i] for all 0 <= I < smlen; and the transient writes both don't change observed external behavior, and are apparently cheaper than fewer but conditional writes.

kleiba2 · 2026-06-05T11:30:22 1780659022

I think the small_numbers array would differ after the end of the loop if, for instance, numbers contained only numbers >= 500. Am I wrong?

Ukv · 2026-06-05T11:46:58 1780660018

smlen would be 0 for both if there are no small numbers, so end result of both is an empty array.

For the first version small_numbers[0] will contain an arbitrary value at the end, and for the second version it happens to contain the last number read, but that address is outside of the 0-length array being returned.

davrosthedalek · 2026-06-05T11:44:47 1780659887

The point is that you should look only at the first smlen entries, which would be 0 for this case.

peterfirefly · 2026-06-07T13:45:23 1780839923

The latter only adds small numbers to the small_numbers[] array.

The former preliminarily adds all numbers to the array but only keeps the small ones.

As long as you don't look at the small_numbers[smlen] element after the loop, they behave identically.

teo_zero · 2026-06-05T06:22:37 1780640557

Writing to array[n] and not incrementing n means that the value just written is outside the "useful" range (from 0 to n-1) and will not be considered (it will be overwritten the next iteration).

zelphirkalt · 2026-06-05T06:35:03 1780641303

I am rather thinking, if one is so much faster, and they are truly equal, why is the compiler too stupid to convert one into the other?

br121 · 2026-06-05T08:47:09 1780649229

It doesn't convert bogosort into heapsort either, despite the second being much faster than the first. I'm guessing that it's not that easy going from one to the other because the only thing they have in common is the output (and only after you have checked the last value), so if the transformation is not hard-coded into the compiler, the odds of it randomly discovering the optimization is close to zero

zelphirkalt · 2026-06-05T09:00:54 1780650054

Yeah, I would expect such transformations to be implemented as optimizations. Just like maybe (the admitedly simpler):

    (+ ((lambda () 1)) ((lambda () 1))) -> (+ 1 1)

A syntactical transformation, where it is possible as an equivalent transformation.

I may be overlooking special cases, but I thought the compiler is smart enough to infer that the array elements are integers and that `<` will result in a boolean, which is just `0` and `1` and will understand that having only the `if` without `else` branch is equivalent in this case. Guess I was wrong and the compiler is not sophisticated in this specific way.

djray · 2026-06-05T23:38:23 1780702703

They aren't equal, as the faster version does an unconditional memory write instead of only writing to the array if the condition is satisfied. The compiler is strictly forbidden from turning a conditional write into an unconditional one.

peterfirefly · 2026-06-07T13:42:38 1780839758

The Linux kernel had problems years ago when gcc started to do exactly that in certain cases (because it screwed things up with task switches, interrupts, and SMP). It fairly quickly afterwards either stopped doing it entirely or got a switch that would stop it from doing it. Don't remember which.

flohofwoe · 2026-06-05T07:36:26 1780644986

The two code snippets do different things, apples and oranges... e.g. the array modification in the second example needs to move in front of the if for the two snippets to behave identically. I bet then the compiler output is the same with -O1 or higher.

PS: e.g. note how bla() (first code snippet) and blob() (fixed second code snippet) have identical output (both are turned into the same 'branchless' code via a conditional 'setl' instruction), but the blub() function (original second code snippet) differs because that function has different behaviour:

https://www.godbolt.org/z/h9Kfbn5bc

TL;DR: most 'branchless advice' that only tinkers with language features (like "x = a ? b : c" instead of an if) is useless because to the optimizer passes both are the same thing (a condition).

When there's a difference in the generated code then it's usually a bug and the before-after code are not actually equivalent (like in the code examples above).

peterfirefly · 2026-06-07T13:56:50 1780840610

I played a bit with your link. It depends on the compiler. x86_64 clang trunk indeed compiles the original first and your fixed second form to the exact same code. I tried a couple of msvc and gcc versions and they did not but they all made them both branchless.

gblargg · 2026-06-05T05:51:57 1780638717

It only increments if the number was less than 500, effectively just saving the ones less than 500.

jcul · 2026-06-05T06:43:17 1780641797

First version has a side effect of writing to small_numbers[0] always.

The compiler probability can't optimize that in the second version.

If it wrote unconditionally and incremented only in the if then I'd guess they would compile to the same thing.

defrost · 2026-06-05T05:53:34 1780638814

numbers[i] < 500

is a conditional (true or false) that evaluates to 1 or 0 (in C)

Therefore smlen has either a 0 or a 1 added to it's value .. equivilent to only adding 1 if True.