K&R C Exercise 2-9: Count Set Bits Using x &= (x-1)

K&R C Exercise 2-9 — Faster bitcount Using x &= (x-1)

Exercise 2-9: In a two’s complement number system, x &= (x-1) deletes the rightmost 1-bit in x. Explain why. Use this observation to write a faster version of bitcount.

Why Does x &= (x-1) Delete the Rightmost 1-Bit?

This is the conceptual heart of the exercise, and it follows directly from how two’s complement subtraction works.

Consider any unsigned integer x with its rightmost 1-bit at some position k:

x     = ...abc 1 0000   (bit k is 1; bits below k are all 0)

When you subtract 1, a borrow propagates through all the trailing zeros up to bit k:

x - 1 = ...abc 0 1111   (bit k becomes 0; bits below k become all 1)

Now AND the two values:

x       = ...abc 1 0000
x - 1   = ...abc 0 1111
x&(x-1) = ...abc 0 0000

Bits above k (the abc part) are the same in both, so they survive. Bit k is 1 in x and 0 in x-1, so AND gives 0 — the rightmost 1-bit is cleared. Bits below k are 0 in x and 1 in x-1, so AND gives 0 — but they were already 0 in x, so nothing changes there. Net effect: exactly the rightmost 1-bit of x is deleted.

Concrete trace with x = 176 = 0b10110000

Iteration 1:
  x         = 1011 0000   (176)
  x - 1     = 1010 1111   (175)
  x &= x-1  = 1010 0000   ← bit 4 cleared  (160)

Iteration 2:
  x         = 1010 0000   (160)
  x - 1     = 1001 1111   (159)
  x &= x-1  = 1000 0000   ← bit 5 cleared  (128)

Iteration 3:
  x         = 1000 0000   (128)
  x - 1     = 0111 1111   (127)
  x &= x-1  = 0000 0000   ← bit 7 cleared  (0)

x == 0, loop ends.  Count = 3.

The loop ran 3 times because 176 has exactly 3 set bits. Compare that to the naive version, which would loop 8 times (once per bit position) on an 8-bit value, or 32 times on a 32-bit system. The faster version scales with the number of set bits, not with the word width.

C Source Code

/* K&R Exercise 2-9 — faster bitcount using x &= (x-1)
 * Compile: gcc -ansi -Wall ex2-9.c -o ex2-9 */
#include <stdio.h>

int bitcount(unsigned x);
int bitcount_naive(unsigned x);  /* original K&R version for comparison */

int main(void)
{
    printf("bitcount(0)   = %d\n", bitcount(0));
    printf("bitcount(1)   = %d\n", bitcount(1));
    printf("bitcount(7)   = %d\n", bitcount(7));
    printf("bitcount(176) = %d\n", bitcount(176));  /* 10110000 = 3 bits */
    printf("bitcount(~0U) = %d\n", bitcount(~0U));
    return 0;
}

/* Fast: loops once per set bit */
int bitcount(unsigned x)
{
    int b;
    for (b = 0; x != 0; ++b)
        x &= x - 1;
    return b;
}

/* Naive K&R original: loops once per bit position */
int bitcount_naive(unsigned x)
{
    int b;
    for (b = 0; x != 0; x >>= 1)
        if (x & 01)
            ++b;
    return b;
}

Compile and Run

gcc -ansi -Wall ex2-9.c -o ex2-9 && ./ex2-9

Sample Output

bitcount(0)   = 0
bitcount(1)   = 1
bitcount(7)   = 3
bitcount(176) = 3
bitcount(~0U) = 32

bitcount(7) returns 3 because 7 = 0b111 has three set bits. bitcount(~0U) returns 32 on a typical 32-bit system because all bits are set. The fast version loops exactly as many times as the popcount; the naive version always loops 32 times regardless of how many bits are set.

What This Exercise Teaches

  • Two’s complement borrow mechanism — subtracting 1 borrows from the rightmost 1-bit. Understanding this arithmetic makes many bit-manipulation tricks click into place.
  • Kernighan’s bit-count trick — attributed to Brian Kernighan (one of the K&R authors), this technique is so well-known it appears as a footnote in many algorithms textbooks. The same trick is used in hardware population-count circuits.
  • O(popcount) vs O(wordsize) — for sparse bit patterns (few 1-bits), the fast version is dramatically faster. For dense patterns (many 1-bits, e.g. ~0U), both versions do comparable work. Knowing when an optimisation applies is as important as knowing the optimisation itself.
  • Loop that modifies its own termination conditionx &= x-1 inside the loop body changes x each iteration. The loop counter is not a separate index but rather the count of times the body executed — a functional loop pattern.
  • Modern relevance — today’s compilers and CPUs provide a built-in popcount (__builtin_popcount in GCC, the POPCNT x86 instruction). The exercise teaches the underlying principle so you understand why the hardware instruction exists and what it computes.

Set Up Your C Environment

You can compile and run this code on any system with GCC. If you haven’t set up C yet for Chapter 2:

Book: The C Programming Language, 2nd Ed — Kernighan & Ritchie


← Exercise 2-8
 | 
Chapter 2 Solutions
 | 
Exercise 2-10 →

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>