Skip to content

Hackers Delight

Branch-free code

In many computers, branch slow down instruction fetching and inhibit executing instructions in parallel

  • Example
if (data[c] >= 128)
{
    sum += data[c];
}

the above code can be replace with

int t = (data[c] - 128) >> 31;
sum += ~t & data[c];

int t will get the sign of data[c] = 128, since in int, the signed bit is stored at 32 bit

assembly comparison : https://godbolt.org/z/czrE4PhrY

  • Turn off the rightmost bit
x & (x-1)
  • Turn on the rightmost bit
x | (x+1)
  • Turn off the trailing 1’s

can be tested if a unsigned integer is form of 2^n-1, 0 or all 1’s

x & (x+1)
  • Turn on the trailing 0’s in at word
x | (x-1)
  • create a single 1 bit at the position of the rightmost 0-bit in x
~x & (x+1)
  • create a single 0-bit at the position of the rightmost 1-bit in x
~x | (x-1)
  • create a word with 1’s at the position of the trailing 0’s in x
~x & (x-1)
~(x | -x)
(x & -x) -1
  • isolate the rightmost 1-bit
x & (-x)

To determine whether or not a given function can be implemented with a sequence of add ’s, subtract’s, and ’s, or’s, and not’s

THEOREM. A function mapping words to words can be implemented with word-parallel add, subtract, and, or, and not instructions if and only if each bit of the result depends only on bits at and to the right of each input operand .

  • Absolute value
y = x >> 31
x - (2x & y)