Hierarchy of Information
All information can be divided, from an individual's perspective, into four levels (in order of decreasing access and increasing size): cache, bookcase, archive, and universe. Every sort of information – names and addresses, books, documents (physical and electronic), web bookmarks, etc. – fits this scheme with minimal shoehorning. Unfortunately, software often neglects support for one of these categories, making it difficult for us to find or use information in the ways we are used to. Let's examine each of these levels in turn.
The cache is a collection of our most frequently accessed items of information, usually between five and a dozen items. These are the things we're using at any given time, and include such examples as the reports on our desk, the documents in our Windows toolbar, the programs on our desktop (or our quick launch shortcuts), the phone numbers in our speed dial, the books on our bedside table. We do not usually select these items consciously, but guide their accumulation in the course of our activity. Their usefulness is dimished when they must be chosen deliberately, thus preventing them from changing often enough to be relevant, or when they are selected automatically, and therefore only a listing of the last things we have touched, and not the ones we still need. The Windows 95 version of the start menu suffered from the first flaw, the my recent documents collection from the second. Whereas the surface of our desk accumulates the papers we're working with, but without retaining the newspaper we have read in the morning or bill we've paid and disposed of. Still, increasing amounts of software are designed with support for this level of information, and we seem to have acknowledged its importance.
The bookshelf holds things you might not need today, but want to be able to see at a glance and as a whole. Though organized, it is not hierarchally arranged and doesn't require explicit categorization. This is the level of information that computers support least. If something's not in your cache, you're forced to hunt through nested folders to find, a long and frustrating process. One place this level does appear is in the inboxes of many people's email programs. Every email appears in a single list, already organized by date and sender, without any effort by the user. Another example is the desktop of many computer users, littered with the programs and documents they use often or have just downloaded. Rather than viewing this as an abuse of the desktop, we need to recognize the need for this type of information view, and support it. Why, for example, such every Microsoft Word document bear the same icon, regardless of its size, content, or use? How can we add more visual coherence and information to the desktop? A third example of a bookshelf, better designed than the desktop, is Google news, with all the day's top stories arranged on one page, arranged in categories but viewable as a whole.
The archive is any collection of information controlled by a single entity. At this level of complexity, the information requires explicit organization and devoted activity. Formal search tools become useful. Examples include all the files (or just the documents) on your computer, the Library of Congress, or the entire website of the New York Times. Archives are the traditional computer model of information and the best supported. Computers don't, however, aid in the transition from a bookshelf to an archive, and anyone who's attempted the task knows that it involves hours of renaming files, creating folders, and entering metadata. If the computer understood bookshelves, it could formalize the implicit categorizations that underlie them and automatically generate an archive. This need not happen all at once. Items on a bookshelf that were often used together could be grouped, and the user could then name the group or add other items to it. Until the information became an archive, however, the groups would be displayed in a single page and not require the same hierachal navigation as an archive or a file system.
The universe is all information in an area, controlled by a variety of authorities, and accessed in diverse ways. This is what Google is so good at helping you search. But there's still a lot of information that goes unindexed. Search for the deep web or see this Salon article for more information. Also, note that today's software does little to integrate the information universe.
That was much more than I had intended to write. This post started as a single thought that required a few hours to record. Does it make sense? Are there any good examples that I missed? Any counterexamples? Anyone who's done real research or writing on this topic?