Death to Word Processing

The age of programs is drawing to a close. What's coming is the age of data.

In 1985, I was in the last months of my computer science degree, grumpily taking the required full course in operating systems. It was a tough course, and I frankly didn't care much about the subject matter: techniques for semaphore-passing, avoidance of race conditions, calculations for virtual memory working sets and statistical distributions of queue lengths, yada-yada-yada.

As I've said before in editorials, I'm a bona-fide user, with all the interest in computer plumbing that a dancer has in shoe-making techniques. So when it came time for the Big Paper that would count for a good chunk of our mark, I couldn't think of a topic that I cared enough about to spend a few score hours researching.

I figured out a clever dodge: I wrote a paper on the User Interface Design of Unix. This was years before News and X-windows and there were only a few proprietary graphical interfaces to Unix; my paper was entirely about the command line. I got to research the human factor, not how the computer worked.

I got some good mileage with popular articles that trashed Unix for absurd program names like "cat" and "grep" and it's general incomprehensibility to anyone who doesn't like memorizing man pages of arguments and side-effects.

Then I ran into an article (sorry, I've forgotten the reference and lost the paper; gimme a break, it's been 12 years) that snapped the whole topic into focus for me.

The article pointed out that virtually all previous operating systems had been based around the programs; they were full of tools for creating executable binaries and loading those in to RAM for running. Handling of the data files was an afterthought. When Unix was invented, the vast majority of operating systems did not have hierarchical file systems - all your files ended up in one directory. (Need to organize? Put groups of them on different tapes.)

The message of the article was that Unix, at some sacrifice in the details of user-friendliness (like comprehensible commands that don't have 43 "-" options) had kept a simple, clear focus on one overwhelmingly important design decision: it was centred on the data. True, it was still JAPL -Just Another Program Loader- the first "word" in any command had to be an executable file; but the operating system treated everything as Just Data. Just files. The hierarchical directories were files, files that you could open and mess with if you knew how they were structured; the devices were treated as files, which was a real mind-bender. Wilder yet, every program could be treated as a file - output from other programs directed into their input with the same syntax as directing the output into a file. In Unix, everything was treated the same -as a file, as an item of data.

This distinction, the article contended, totally outweighed all of Unix's admittedly awful sins of bad user interface; it was the first operating system to be structured from the ground up to treat data as the central concern, rather than computer programs. Users (as opposed to programmers) were not much concerned with programs - most only used a few. What they did spend their time with was a lot of data files. Whole trees of directories of files, files to be split up, files to be appended together, files to be linked together and indexed in various ways. It went on to give examples of how Unix made such tasks easier.

The popularity of this concept may be seen in how much it has been imitated: DOS and its various Microsoft descendants, OS/2, the Mac OS -almost every popular operating system copies the data-centric approach if not the trivialities like the exact program names.

In salute to this Great Truth, I say "Death to Word Processing". For several months now, I have successfully (and without sacrifice) avoided all use of word processors that trap my words into proprietary formats. I do all writing at work and home with HTML editors. I carefully avoid any "features" that some HTML programs offer that create "HTML" files with proprietary extensions.

As Tim Berners-Lee has said, "Why put 'best viewed with Netscape 3' on your Web page? Who wants to go back to the days of proprietary formats? Just put 'this page takes advantage of HTML 3.2' and leave it at that".

The latest additions to HTML have answered all but the most exacting needs for page description; with style sheets and dynamic HTML, you can create just about any graphic/text combination most people could want. Indeed, with Java and Javascript, the distinction between "document" (data) and a computer program has blurred in a way no word processor ever did. With the advent of XML, vast horizons are opening up for "documents" to be complex data structures, with database functionality and further intermingling of instructions and information. So I may soon hope to add "Death to Spreadsheets", and "Death to Database Programs" to my war cry. (Come to think of it, I can almost say "Death to Presentation Graphics Programs" now!)

The people who write computer programs for a living have been trying to keep us in a program-centred world for the whole quarter-century that Unix and its children have been pushing the data-centred view. Even today, the "default directory" that they open the file browser to when you want to save your work, is usually the program's own directory, though nobody but an idiot keeps their data files there.

Microsoft makes unabashed efforts to change file formats with every version and provides poor or no tools for saving into previous formats (Word '97, every version of Access) so that you are pulled into upgrading the whole office once somebody upgrades. They want to control your access to your own data by providing the only program that will read it perfectly.

Well, they can run, but they can't hide. The Age of Data is almost upon us. The IBM world of proprietary hardware-and-software bundles gave way to the Microsoft world where hardware is a low-profit commodity that all has to run the same software base. That world in turn is about to lose out to the Web world where all software programs have to be able to work on standard, non-proprietary data structures that are the new centre of the customer's interest.

Speed the day.

Roy Brander

Index to this Issue
Index to the CUUGer Newsletter
CUUG Home Page