K&R C Exercise 1-21: entab — Replace Spaces with Tabs

Exercise 1-21. Write a program entab that replaces strings of blanks by the minimum number of tabs and blanks to achieve the same spacing. Use the same tab stops as for detab. When either a tab or a single blank would suffice to reach a tab stop, which should be given preference?

Approach

entab is the inverse of detab (Exercise 1-20): instead of expanding tabs into spaces, it compresses runs of spaces back into tabs wherever possible. The key insight is that you cannot decide whether a space belongs in the output until you see what comes after it. A space followed immediately by a non-space character must stay as a space; a space that lands exactly on a tab stop can become (the trailing part of) a tab. This means you must buffer the spaces and flush them later.

The algorithm tracks two things: the current column (col) and the count of buffered spaces (spaces). For every space read, increment both. If the updated column is a multiple of TABSTOP, you have just crossed a tab boundary — emit one \t and reset the space counter to zero (discarding all buffered spaces, because a single tab character achieves the same visual position). For any non-space character, flush whatever buffered spaces remain as literal spaces, then emit the character.

The preference question is answered by the code itself: when a single blank would reach a tab stop, the program still emits a tab, not a blank. A tab is one byte; a blank is also one byte — but the exercise says minimum number of tabs and blanks, and a tab communicates the intent more clearly. K&R’s implied answer is: prefer the tab.

Solution

/* entab: replace strings of blanks with tabs and blanks */
/* Compile: gcc -ansi -Wall entab.c -o entab              */
#include <stdio.h>

#define TABSTOP 8

int main(void)
{
    int c, col, spaces;

    col    = 0;
    spaces = 0;

    while ((c = getchar()) != EOF) {
        if (c == ' ') {
            ++spaces;
            ++col;
            /* crossed a tab stop: one tab covers all buffered spaces */
            if (col % TABSTOP == 0) {
                putchar('\t');
                spaces = 0;
            }
        } else {
            /* flush spaces that did not reach a tab stop */
            while (spaces > 0) {
                putchar(' ');
                --spaces;
            }
            if (c == '\n') {
                putchar(c);
                col = 0;          /* newline resets column to zero */
            } else if (c == '\t') {
                putchar(c);
                /* advance col to next tab stop */
                col = col + (TABSTOP - col % TABSTOP);
            } else {
                putchar(c);
                ++col;
            }
        }
    }
    return 0;
}

How It Works — Step by Step

Consider input where TABSTOP is 8 (the Unix default).

Scenario Input (· = space) Output Why
8 leading spaces ········hello \thello col reaches 8 on the 8th space → one tab replaces all eight spaces
3 spaces after “hello” hello···world hello\tworld col is 5 after “hello”; the 3rd space brings col to 8 → tab; spaces reset to 0
4 spaces after “hello” hello····world hello\t·world tab emitted at col=8; the 4th space stays buffered and is flushed when ‘w’ arrives
1 space at col 7 ·······x·y \tx·y 7 spaces → col=7; one more space → col=8 → tab preferred over the single blank

The Preference Question Answered

K&R asks: when one tab or one blank would both reach the next tab stop, which wins? The answer is the tab. Here is why it matters: at column 7 the next tab stop is column 8. One space arrives, col becomes 8, and col % TABSTOP == 0 is true — the code emits \t and discards the buffered space. The result is exactly one character in the output instead of one space character. Tab wins: it is the more compact representation and conveys the intent.

Compile and Run

gcc -ansi -Wall entab.c -o entab

Test with a string of spaces:

printf "        hello   world\n" | ./entab | cat -A

The cat -A flag shows tabs as ^I and end-of-line as $, making the output visible.

Expected output:

^Ihello^Iworld$

Verify the Inverse Relationship with detab

If you have the detab binary from Exercise 1-20, you can verify that entab and detab are true inverses:

printf "        hello   world\n" | ./entab | ./detab

Output:

        hello   world

The original spacing is recovered exactly. This roundtrip property confirms correctness: entab compresses, detab expands, and the visual layout is unchanged.

Sample Output

$ printf "        hello   world\n" | ./entab | cat -A
^Ihello^Iworld$

$ printf "hello    world\n" | ./entab | cat -A
hello^I world$

$ printf "x y z\n" | ./entab | cat -A
x y z$

The third line is unchanged because no run of spaces crosses a tab stop — single isolated blanks are flushed as-is.

What This Exercise Teaches

  • Deferred output (buffering): You cannot emit a character until you know what follows it. Accumulating spaces in a counter and flushing them later is a fundamental streaming pattern in C.
  • Column tracking with modular arithmetic: col % TABSTOP == 0 is the canonical idiom for detecting tab-stop boundaries. It appears in detab, entab, and any tool that must respect fixed-width column grids.
  • Inverse program pairs: entab and detab form a compression/expansion pair. Designing them as inverses is good engineering — you can test one against the other and use them together in pipelines.
  • Handling existing tabs in the input: The solution above also handles tab characters in the input by advancing col to the next tab stop rather than treating them as single-column characters. This keeps column tracking correct when the input is a mix of spaces and tabs.

Set Up Your C Environment

To compile and run this solution, you need GCC installed. If you haven’t set up C on your machine yet:

← Exercise 1-20  | 
Chapter 1 Solutions  | 
Exercise 1-22 →

Book:

The C Programming Language, 2nd Ed — Kernighan & Ritchie

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>