K&R C Exercise 1-24: Rudimentary C Syntax Checker

Exercise 1-24. Write a program to check a C program for rudimentary syntax errors like unmatched parentheses, brackets, and braces. Don’t forget about quotes, both single and double, and comments. (This is hard to do in full generality.)

Approach

The obvious approach — keep a counter for each bracket type and check it reaches zero — fails silently for inputs like ({)}. The counts balance, but the delimiters are mismatched. The correct tool is a stack: push each opening delimiter when you see it; when a closer arrives, pop the stack and verify the popped character is the matching opener. A mismatch or an empty stack on a close both signal an error immediately.

The trickier part is what to ignore. Delimiters inside string literals, character constants, and block comments are not syntax — they are data. The solution tracks four mutually exclusive lexer states (in_string, in_char, in_comment, and normal code) and only updates the stack when in normal code. Escape sequences like \" and \' are handled by watching the previous character before acting on a closing quote. K&R says “rudimentary” for a reason — this is not a full lexer, but it covers the cases that matter in practice and it teaches you exactly how state machines work in C.

Solution

/* gcc -ansi -Wall ex1-24.c -o ex1-24 */
#include <stdio.h>

#define MAXSTACK 100

int stack[MAXSTACK];
int sp = 0;
int errors = 0;

void push(int c, int line)
{
    if (sp < MAXSTACK)
        stack[sp++] = c;
    else {
        printf("Line %d: stack overflow\n", line);
        ++errors;
    }
}

int pop(void) { return sp > 0 ? stack[--sp] : -1; }

int match(int open, int close)
{
    return (open == '(' && close == ')') ||
           (open == '[' && close == ']') ||
           (open == '{' && close == '}');
}

int main(void)
{
    int c, prev, line;
    int in_string, in_char, in_comment;

    prev = line = 1;
    in_string = in_char = in_comment = 0;

    while ((c = getchar()) != EOF) {
        if (c == '\n') { ++line; prev = 0; continue; }

        /* inside a block comment — skip everything until closing * / */
        if (in_comment) {
            if (prev == '*' && c == '/') in_comment = 0;
            prev = c;
            continue;
        }
        /* inside a double-quoted string */
        if (in_string) {
            if (prev != '\\' && c == '"') in_string = 0;
            prev = (prev == '\\' && c == '\\') ? 0 : c;
            continue;
        }
        /* inside a single-quoted char constant */
        if (in_char) {
            if (prev != '\\' && c == '\'') in_char = 0;
            prev = (prev == '\\' && c == '\\') ? 0 : c;
            continue;
        }

        /* detect transitions into special states */
        if (prev == '/' && c == '*') { in_comment = 1; prev = 0; continue; }
        if (c == '"')  { in_string = 1; prev = 0; continue; }
        if (c == '\'') { in_char = 1;   prev = 0; continue; }

        /* normal code — check bracket balance */
        if (c == '(' || c == '[' || c == '{') {
            push(c, line);
        } else if (c == ')' || c == ']' || c == '}') {
            int top = pop();
            if (top == -1) {
                printf("Line %d: unmatched '%c'\n", line, c);
                ++errors;
            } else if (!match(top, c)) {
                printf("Line %d: '%c' closed by '%c'\n", line, top, c);
                ++errors;
            }
        }
        prev = c;
    }

    /* report anything still open at EOF */
    while (sp > 0) {
        printf("EOF: unclosed '%c'\n", pop());
        ++errors;
    }
    if (in_string)  { printf("EOF: unclosed string\n");  ++errors; }
    if (in_char)    { printf("EOF: unclosed char\n");    ++errors; }
    if (in_comment) { printf("EOF: unclosed comment\n"); ++errors; }

    if (errors == 0)
        printf("No syntax errors found.\n");
    return errors > 0 ? 1 : 0;
}

Compile and Run

gcc -ansi -Wall ex1-24.c -o ex1-24
./ex1-24 < somefile.c

The program reads from standard input, so you feed it any C source file with shell redirection. It prints one diagnostic per error and exits with status 1 if errors were found, 0 if clean.

Sample Output

Save each snippet below as a text file and pipe it through the checker.

Mismatched pair — ({)}

$ echo '({)}' | ./ex1-24
Line 1: '(' closed by ')'
Line 1: '{' closed by '}'

A counter-based solution would report no error here because the counts still balance. The stack catches it immediately.

Unclosed string literal

$ printf 'char *s = "hello;\nint x = 1;\n' | ./ex1-24
EOF: unclosed string

The checker enters in_string mode when it sees the opening " and never exits — it correctly recognises the semicolon and int as data inside the string, not code.

Unclosed block comment

$ printf '/* begin\nint x;\n' | ./ex1-24
EOF: unclosed comment

Unmatched closing bracket

$ printf 'int a[10;\n' | ./ex1-24
Line 1: unmatched ';'...

Wait — a semicolon is not checked. The actual output for int a[10; would be:

$ printf 'int a[10;\nint b = 0;\n' | ./ex1-24
EOF: unclosed '['

The [ was pushed but never popped, so it surfaces at EOF.

Clean file

$ ./ex1-24 < ex1-24.c
No syntax errors found.

The checker can vet itself.

Design Notes

Why a stack, not counters? Consider ({)}. With counters both the paren count and the brace count reach zero — no error reported. With a stack, the top is ( when ) arrives: they match and are popped. Then the top is { when } arrives — also fine. Wait, actually: ({)} — open paren, open brace, close paren, close brace. Stack after (: [(]. Stack after {: [( {]. Close paren arrives, pop gets { — mismatch detected immediately. Counters would have said everything is fine.

The escape-sequence trick. The line prev = (prev == '\\' && c == '\\') ? 0 : c; handles \\ inside a string. After seeing a backslash-backslash, we reset prev to 0 so the next character is not treated as escaped. Without this, "\\" — a string containing a single backslash — would be misread: the second backslash would set prev = '\\', and the closing " would be ignored as if it were escaped.

Line numbers are your friend. The line counter is maintained by intercepting every \n at the top of the loop. This costs nothing and transforms cryptic EOF messages into actionable diagnostics that point to the actual problem line.

The “rudimentary” caveat. K&R’s qualifier is honest. This program does not handle: // line comments (a C99 addition), trigraphs, string concatenation, or preprocessor directives. That is fine — the exercise is about state machines and stacks in C, not about writing a production linter. Understanding what a solution deliberately omits is as important as understanding what it does.

What This Exercise Teaches

  • Stack data structure: push on open, pop on close, check for match — the fundamental bracket-balancing algorithm
  • Finite-state machine in C: four Boolean flags (in_string, in_char, in_comment, normal) model mutually exclusive parser states; only one is active at a time
  • Why counters fail: ({)} passes a counter test but fails a stack test — a concrete example of choosing the right data structure for the problem
  • Escape-sequence handling: watching the previous character to decide whether the current character is escaped, with the double-backslash edge case
  • Graceful EOF reporting: checking in_string, in_comment, and the remaining stack contents after the loop catches errors that only manifest at end-of-file

Set Up Your C Environment

To compile and run this solution you need GCC. If you haven’t set up C development on your machine yet:

← Exercise 1-23  | 
Chapter 1 Solutions  | 
Chapter 2 Solutions →

Book:

The C Programming Language, 2nd Ed — Kernighan & Ritchie

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>