1.1 Putting the software component idea to work

In his address, McIlroy listed a number of software categories that would be suitable for components: mathematical functions, input-output conversion, geometry, text processing and storage management (McIlroy 1969:144). As Persson (2002:31) points out, the C standard library, which McIlroy was instrumental in creating during his time at Bell Laboratories, had routines for all of the original categories but the geometry one.

McIlroy also invented the pipeline mechanism (Ritchie 1980). A pipeline can be seen as a number of software components working in tandem, each serving a very specific purpose, and each unaware of the inner workings of the other components in the pipeline. A user, or a script, strings together such components, enabling complete programs to be built. The pipeline in the Unix operating system was McIlroy’s first application of the concept.

More formally, a pipeline is an ordered collection of software elements that consume data from the element directly preceding them in the pipeline and produce data based on the consumed data (the first element consumes no data). A pipeline can have an arbitrary number of elements, with data flowing from the first element to the last. In Unix, each software element is a stand-alone command-line program.

For example, consider the following program that calculates the number of files and directories in the current directory. It consists of the following input to a Unix shell:1

ls | wc -l

The ls program lists all available files in the current directory. Instead of displaying this list on-screen, the pipe symbol (“|”) causes it to be redirected to the wc program, in effect gluing the two programs together. The wc program counts the number of lines in the input data when given the -l argument. As wc is the last program in the pipeline, its output by default appears on-screen. In effect, a program has been constructed from two components that displays the number of files in a directory with very little effort.

Pipelines can be quite complex; consider the following program that identifies the largest file in the current directory:

ls -s | sort -n | tail -1 | awk '{print $2}'

The familiar ls program is given an -s argument, prompting it to produce a list of all files in the current directory, with file sizes in the first column and file names in the second column. The sort program is given an -n argument, instructing it to sort its input data numerically, resulting in a list with the smallest file at the top and the largest file at the bottom. The tail program extracts the last n lines (where n is one in this case, a number conveyed by the -1 argument). What remains is a string with the size of the largest file in the first column, and its name in the second column. Finally, the awk program2 extracts the contents of the second column and displays it on-screen. In effect, a program has been constructed that identifies the name of the largest file in the current directory, again with the help of reusable components in the guise of standard command-line programs.

The holy grail of component-oriented programming is to enable the same ease-of-use when creating much larger programs consisting of a wide variety of components.

Footnotes

  1. While these pipeline examples are meant to illustrate a concept in use since the 1970s, they have been tested on a system running a modern Linux distribution and may thus not work on a vintage Unix system.
  2. AWK is programming language designed to process strings. Its programs are interpreted by the awk program.