K&R C Programs Exercise 6-4

Exercise 6-4. Write a program that prints the distinct words in its input sorted into decreasing order of frequency of occurrence. Precede each word by its count.

Count words using K&R’s binary tree (Section 6.5), then collect all nodes into an array and sort by count descending using qsort. The key insight: the tree is ideal for counting (O(log n) per insert) but not for sorting by count — converting to an array for qsort is cleaner than rebuilding the tree.

Solution

/* K&R Exercise 6-4 — words sorted by decreasing frequency
 * Compile: gcc -ansi -Wall ex6-4.c -o ex6-4
 * Usage:   ./ex6-4 < input.txt */
#include <stdio.h>
#include <ctype.h>
#include <string.h>
#include <stdlib.h>

#define MAXWORD  100
#define MAXNODES 10000   /* max distinct words */

struct tnode {
    char         *word;
    int           count;
    struct tnode *left, *right;
};

static char *my_strdup(const char *s)
{
    char *p = (char *)malloc(strlen(s) + 1);
    if (p) strcpy(p, s);
    return p;
}

static struct tnode *addtree(struct tnode *p, const char *w)
{
    int cond;
    if (p == NULL) {
        p = (struct tnode *)malloc(sizeof(struct tnode));
        p->word  = my_strdup(w);
        p->count = 1;
        p->left  = p->right = NULL;
    } else if ((cond = strcmp(w, p->word)) == 0) {
        p->count++;
    } else if (cond < 0) {
        p->left  = addtree(p->left,  w);
    } else {
        p->right = addtree(p->right, w);
    }
    return p;
}

/* Collect all tree nodes into a flat array via in-order walk */
static struct tnode *arr[MAXNODES];
static int           narr = 0;

static void collect(struct tnode *p)
{
    if (p == NULL) return;
    collect(p->left);
    if (narr < MAXNODES)
        arr[narr++] = p;
    collect(p->right);
}

/* Comparator: sort by count descending; break ties alphabetically */
static int by_count(const void *a, const void *b)
{
    const struct tnode *ta = *(const struct tnode **)a;
    const struct tnode *tb = *(const struct tnode **)b;
    if (tb->count != ta->count)
        return tb->count - ta->count;   /* higher count first */
    return strcmp(ta->word, tb->word);  /* alphabetical on ties */
}

int main(void)
{
    struct tnode *root = NULL;
    char word[MAXWORD];
    int  c;

    /* Simple word reader: extract runs of alphabetic characters */
    while ((c = getchar()) != EOF) {
        if (isalpha(c)) {
            char *w = word;
            int   lim = MAXWORD - 1;
            *w++ = (char)tolower(c);
            while (lim-- > 0 && isalpha(c = getchar()))
                *w++ = (char)tolower(c);
            *w = '\0';
            root = addtree(root, word);
        }
    }

    collect(root);
    qsort(arr, narr, sizeof(arr[0]), by_count);

    {
        int i;
        for (i = 0; i < narr; i++)
            printf("%6d  %s\n", arr[i]->count, arr[i]->word);
    }
    return 0;
}

Compile and Test

gcc -ansi -Wall ex6-4.c -o ex6-4

printf "to be or not to be that is the question to be\n" | ./ex6-4

Sample Output

     3  be
     3  to
     1  is
     1  not
     1  or
     1  question
     1  that
     1  the

be and to each appear 3 times (tied; sorted alphabetically). All remaining words appear once (also sorted alphabetically within the tie).

Design Notes

  • Why collect into an array? A BST is keyed by word, so its in-order traversal gives alphabetical order — not frequency order. Trying to reorder a BST by a different key is more work than flattening it and sorting the array.
  • Tie-breaking — without the alphabetical tiebreaker, output order within tied groups is undefined (depends on tree shape). Explicit alphabetical tie-breaking makes output deterministic.
  • MAXNODES limit — the flat array is pre-allocated. An alternative is to realloc dynamically; for most inputs 10,000 distinct words is sufficient.

What This Exercise Teaches

  • Two-phase algorithm — count with a BST (fast insertion), then sort the flat array (easy frequency sort); each data structure does what it is best at
  • Array of pointers to structs with qsort — the comparator receives const void * pointers to elements of the array; each element is a struct tnode *, so the comparator casts to const struct tnode ** and dereferences once
  • Compound sort keyqsort comparators can implement multi-level sorting by testing secondary keys when the primary key is equal

Set Up Your C Environment

← Exercise 6-3  | 
Chapter 6 Solutions  | 
Exercise 6-5 →

Book: The C Programming Language, 2nd Ed — Kernighan & Ritchie

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>