Exercise 6-4. Write a program that prints the distinct words in its input sorted into decreasing order of frequency of occurrence. Precede each word by its count.
Count words using K&R’s binary tree (Section 6.5), then collect all nodes into an array and sort by count descending using qsort. The key insight: the tree is ideal for counting (O(log n) per insert) but not for sorting by count — converting to an array for qsort is cleaner than rebuilding the tree.
Solution
/* K&R Exercise 6-4 — words sorted by decreasing frequency
* Compile: gcc -ansi -Wall ex6-4.c -o ex6-4
* Usage: ./ex6-4 < input.txt */
#include <stdio.h>
#include <ctype.h>
#include <string.h>
#include <stdlib.h>
#define MAXWORD 100
#define MAXNODES 10000 /* max distinct words */
struct tnode {
char *word;
int count;
struct tnode *left, *right;
};
static char *my_strdup(const char *s)
{
char *p = (char *)malloc(strlen(s) + 1);
if (p) strcpy(p, s);
return p;
}
static struct tnode *addtree(struct tnode *p, const char *w)
{
int cond;
if (p == NULL) {
p = (struct tnode *)malloc(sizeof(struct tnode));
p->word = my_strdup(w);
p->count = 1;
p->left = p->right = NULL;
} else if ((cond = strcmp(w, p->word)) == 0) {
p->count++;
} else if (cond < 0) {
p->left = addtree(p->left, w);
} else {
p->right = addtree(p->right, w);
}
return p;
}
/* Collect all tree nodes into a flat array via in-order walk */
static struct tnode *arr[MAXNODES];
static int narr = 0;
static void collect(struct tnode *p)
{
if (p == NULL) return;
collect(p->left);
if (narr < MAXNODES)
arr[narr++] = p;
collect(p->right);
}
/* Comparator: sort by count descending; break ties alphabetically */
static int by_count(const void *a, const void *b)
{
const struct tnode *ta = *(const struct tnode **)a;
const struct tnode *tb = *(const struct tnode **)b;
if (tb->count != ta->count)
return tb->count - ta->count; /* higher count first */
return strcmp(ta->word, tb->word); /* alphabetical on ties */
}
int main(void)
{
struct tnode *root = NULL;
char word[MAXWORD];
int c;
/* Simple word reader: extract runs of alphabetic characters */
while ((c = getchar()) != EOF) {
if (isalpha(c)) {
char *w = word;
int lim = MAXWORD - 1;
*w++ = (char)tolower(c);
while (lim-- > 0 && isalpha(c = getchar()))
*w++ = (char)tolower(c);
*w = '\0';
root = addtree(root, word);
}
}
collect(root);
qsort(arr, narr, sizeof(arr[0]), by_count);
{
int i;
for (i = 0; i < narr; i++)
printf("%6d %s\n", arr[i]->count, arr[i]->word);
}
return 0;
}
Compile and Test
gcc -ansi -Wall ex6-4.c -o ex6-4
printf "to be or not to be that is the question to be\n" | ./ex6-4
Sample Output
3 be
3 to
1 is
1 not
1 or
1 question
1 that
1 the
be and to each appear 3 times (tied; sorted alphabetically). All remaining words appear once (also sorted alphabetically within the tie).
Design Notes
- Why collect into an array? A BST is keyed by word, so its in-order traversal gives alphabetical order — not frequency order. Trying to reorder a BST by a different key is more work than flattening it and sorting the array.
- Tie-breaking — without the alphabetical tiebreaker, output order within tied groups is undefined (depends on tree shape). Explicit alphabetical tie-breaking makes output deterministic.
MAXNODESlimit — the flat array is pre-allocated. An alternative is toreallocdynamically; for most inputs 10,000 distinct words is sufficient.
What This Exercise Teaches
- Two-phase algorithm — count with a BST (fast insertion), then sort the flat array (easy frequency sort); each data structure does what it is best at
- Array of pointers to structs with
qsort— the comparator receivesconst void *pointers to elements of the array; each element is astruct tnode *, so the comparator casts toconst struct tnode **and dereferences once - Compound sort key —
qsortcomparators can implement multi-level sorting by testing secondary keys when the primary key is equal
Set Up Your C Environment
← Exercise 6-3 |
Chapter 6 Solutions |
Exercise 6-5 →
Book: The C Programming Language, 2nd Ed — Kernighan & Ritchie