Exercise 1-21. Write a program
entabthat replaces strings of blanks by the minimum number of tabs and blanks to achieve the same spacing. Use the same tab stops as fordetab. When either a tab or a single blank would suffice to reach a tab stop, which should be given preference?
Approach
entab is the inverse of detab (Exercise 1-20): instead of expanding tabs into spaces, it compresses runs of spaces back into tabs wherever possible. The key insight is that you cannot decide whether a space belongs in the output until you see what comes after it. A space followed immediately by a non-space character must stay as a space; a space that lands exactly on a tab stop can become (the trailing part of) a tab. This means you must buffer the spaces and flush them later.
The algorithm tracks two things: the current column (col) and the count of buffered spaces (spaces). For every space read, increment both. If the updated column is a multiple of TABSTOP, you have just crossed a tab boundary — emit one \t and reset the space counter to zero (discarding all buffered spaces, because a single tab character achieves the same visual position). For any non-space character, flush whatever buffered spaces remain as literal spaces, then emit the character.
The preference question is answered by the code itself: when a single blank would reach a tab stop, the program still emits a tab, not a blank. A tab is one byte; a blank is also one byte — but the exercise says minimum number of tabs and blanks, and a tab communicates the intent more clearly. K&R’s implied answer is: prefer the tab.
Solution
/* entab: replace strings of blanks with tabs and blanks */
/* Compile: gcc -ansi -Wall entab.c -o entab */
#include <stdio.h>
#define TABSTOP 8
int main(void)
{
int c, col, spaces;
col = 0;
spaces = 0;
while ((c = getchar()) != EOF) {
if (c == ' ') {
++spaces;
++col;
/* crossed a tab stop: one tab covers all buffered spaces */
if (col % TABSTOP == 0) {
putchar('\t');
spaces = 0;
}
} else {
/* flush spaces that did not reach a tab stop */
while (spaces > 0) {
putchar(' ');
--spaces;
}
if (c == '\n') {
putchar(c);
col = 0; /* newline resets column to zero */
} else if (c == '\t') {
putchar(c);
/* advance col to next tab stop */
col = col + (TABSTOP - col % TABSTOP);
} else {
putchar(c);
++col;
}
}
}
return 0;
}
How It Works — Step by Step
Consider input where TABSTOP is 8 (the Unix default).
| Scenario | Input (· = space) | Output | Why |
|---|---|---|---|
| 8 leading spaces | ········hello |
\thello |
col reaches 8 on the 8th space → one tab replaces all eight spaces |
| 3 spaces after “hello” | hello···world |
hello\tworld |
col is 5 after “hello”; the 3rd space brings col to 8 → tab; spaces reset to 0 |
| 4 spaces after “hello” | hello····world |
hello\t·world |
tab emitted at col=8; the 4th space stays buffered and is flushed when ‘w’ arrives |
| 1 space at col 7 | ·······x·y |
\tx·y |
7 spaces → col=7; one more space → col=8 → tab preferred over the single blank |
The Preference Question Answered
K&R asks: when one tab or one blank would both reach the next tab stop, which wins? The answer is the tab. Here is why it matters: at column 7 the next tab stop is column 8. One space arrives, col becomes 8, and col % TABSTOP == 0 is true — the code emits \t and discards the buffered space. The result is exactly one character in the output instead of one space character. Tab wins: it is the more compact representation and conveys the intent.
Compile and Run
gcc -ansi -Wall entab.c -o entab
Test with a string of spaces:
printf " hello world\n" | ./entab | cat -A
The cat -A flag shows tabs as ^I and end-of-line as $, making the output visible.
Expected output:
^Ihello^Iworld$
Verify the Inverse Relationship with detab
If you have the detab binary from Exercise 1-20, you can verify that entab and detab are true inverses:
printf " hello world\n" | ./entab | ./detab
Output:
hello world
The original spacing is recovered exactly. This roundtrip property confirms correctness: entab compresses, detab expands, and the visual layout is unchanged.
Sample Output
$ printf " hello world\n" | ./entab | cat -A ^Ihello^Iworld$ $ printf "hello world\n" | ./entab | cat -A hello^I world$ $ printf "x y z\n" | ./entab | cat -A x y z$
The third line is unchanged because no run of spaces crosses a tab stop — single isolated blanks are flushed as-is.
What This Exercise Teaches
- Deferred output (buffering): You cannot emit a character until you know what follows it. Accumulating spaces in a counter and flushing them later is a fundamental streaming pattern in C.
- Column tracking with modular arithmetic:
col % TABSTOP == 0is the canonical idiom for detecting tab-stop boundaries. It appears indetab,entab, and any tool that must respect fixed-width column grids. - Inverse program pairs:
entabanddetabform a compression/expansion pair. Designing them as inverses is good engineering — you can test one against the other and use them together in pipelines. - Handling existing tabs in the input: The solution above also handles tab characters in the input by advancing
colto the next tab stop rather than treating them as single-column characters. This keeps column tracking correct when the input is a mix of spaces and tabs.
Set Up Your C Environment
To compile and run this solution, you need GCC installed. If you haven’t set up C on your machine yet:
- Complete C Development Environment Setup
- Install GCC on Windows 11
- Install GCC on macOS
- Install GCC on Ubuntu/Linux
- VS Code for C Programming — recommended editor
← Exercise 1-20 |
Chapter 1 Solutions |
Exercise 1-22 →
Book:
The C Programming Language, 2nd Ed — Kernighan & Ritchie