Archives for May 2004

Tiger Map Server: dynamically generate road maps of the U.S. (and a Slashdot thread on related software)

dodgeball.com :: location-based social software for mobile devices

My Favorite Dialog Box Message

29 May 2004

From Visual Studio 2003:

Some time ago, you retrieved data from the database with the query or view whose name is table name. The returned rows appear in the Results pane and the database server continues to retain the result set in its local memory, which consumes valuable server resources. Because you have not recently used the Results pane, it will be automatically cleared in one minute. This will empty the results pane, discard unsaved changes, and free the resources on the database server. Then, if you want to reestablish the result set, you can rerun the query or view.

Do you want to prolong your work with the Results pane and continue to consume resources on the database server?

And you only get a minute to read it.

Determining your location from cell-phone camera photos.

Wired Hackers & Painters: can a programming language let us sketch a rough draft of code?

Generate RSS feeds from a Google news search.

Feedster: a search engine for RSS feeds.

XHTML Friends Network: a standard for encoding personal relationships in hyperlinks. (cf. FOAF)

Hierarchy of Information

15 May 2004

All information can be divided, from an individual's perspective, into four levels (in order of decreasing access and increasing size): cache, bookcase, archive, and universe. Every sort of information – names and addresses, books, documents (physical and electronic), web bookmarks, etc. – fits this scheme with minimal shoehorning. Unfortunately, software often neglects support for one of these categories, making it difficult for us to find or use information in the ways we are used to. Let's examine each of these levels in turn.

The cache is a collection of our most frequently accessed items of information, usually between five and a dozen items. These are the things we're using at any given time, and include such examples as the reports on our desk, the documents in our Windows toolbar, the programs on our desktop (or our quick launch shortcuts), the phone numbers in our speed dial, the books on our bedside table. We do not usually select these items consciously, but guide their accumulation in the course of our activity. Their usefulness is dimished when they must be chosen deliberately, thus preventing them from changing often enough to be relevant, or when they are selected automatically, and therefore only a listing of the last things we have touched, and not the ones we still need. The Windows 95 version of the start menu suffered from the first flaw, the my recent documents collection from the second. Whereas the surface of our desk accumulates the papers we're working with, but without retaining the newspaper we have read in the morning or bill we've paid and disposed of. Still, increasing amounts of software are designed with support for this level of information, and we seem to have acknowledged its importance.

The bookshelf holds things you might not need today, but want to be able to see at a glance and as a whole. Though organized, it is not hierarchally arranged and doesn't require explicit categorization. This is the level of information that computers support least. If something's not in your cache, you're forced to hunt through nested folders to find, a long and frustrating process. One place this level does appear is in the inboxes of many people's email programs. Every email appears in a single list, already organized by date and sender, without any effort by the user. Another example is the desktop of many computer users, littered with the programs and documents they use often or have just downloaded. Rather than viewing this as an abuse of the desktop, we need to recognize the need for this type of information view, and support it. Why, for example, such every Microsoft Word document bear the same icon, regardless of its size, content, or use? How can we add more visual coherence and information to the desktop? A third example of a bookshelf, better designed than the desktop, is Google news, with all the day's top stories arranged on one page, arranged in categories but viewable as a whole.

The archive is any collection of information controlled by a single entity. At this level of complexity, the information requires explicit organization and devoted activity. Formal search tools become useful. Examples include all the files (or just the documents) on your computer, the Library of Congress, or the entire website of the New York Times. Archives are the traditional computer model of information and the best supported. Computers don't, however, aid in the transition from a bookshelf to an archive, and anyone who's attempted the task knows that it involves hours of renaming files, creating folders, and entering metadata. If the computer understood bookshelves, it could formalize the implicit categorizations that underlie them and automatically generate an archive. This need not happen all at once. Items on a bookshelf that were often used together could be grouped, and the user could then name the group or add other items to it. Until the information became an archive, however, the groups would be displayed in a single page and not require the same hierachal navigation as an archive or a file system.

The universe is all information in an area, controlled by a variety of authorities, and accessed in diverse ways. This is what Google is so good at helping you search. But there's still a lot of information that goes unindexed. Search for the deep web or see this Salon article for more information. Also, note that today's software does little to integrate the information universe.

That was much more than I had intended to write. This post started as a single thought that required a few hours to record. Does it make sense? Are there any good examples that I missed? Any counterexamples? Anyone who's done real research or writing on this topic?

Thoughts on C#

01 May 2004

I've been programming in C# for the past few weeks at work, and I wanted to record some of my thoughts. It's a well designed language overall, and the .NET Framework has been a useful library. If only the IDE wasn't a constant source of frustration and anger.

Here, then, are some of the main differences between C# and C++ (the language I know best).

  • C# is a higher level, interpreted language. In C++, you play tricks with your data structures by reading and writing their memory directly, using pointers. In C#, these are replaced by reflection, a more powerful, but slower, method of instantiating runtime-typed objects or retrieving a listing of an object's methods or fields. My C# code loads a lot of objects from strings in configuration files, making it easier to change my program's behavior without recompiling. It can also help separate infrastructure code from application logic.
  • C# has garbage collection but makes it difficult to deterministically despose of objects or resources. This means I don't have to worry (much) about memory leaks, but I could easily avoid them in C++ by using a smart pointer like boost's shared_ptr. The lack of C++ style destructors is a pain, though there's some good reasoning behind it. (Remember, there's nothing in C++ to stop you from returning a pointer to a local variable.) I still haven't found a use for the finalize method, since you can't seem to assume anything about the program's state when it's called, but the Dispose method and using directive are a useful substitute (and an interesting mix of language keywords and specific classes; more on this below).
  • C# sometimes conflates the underlying language with your code. Certain keywords act on particular interfaces, as in the example above and foreach with the IEnumerable interface. Also, the compiler automatically generates certain methods, like Begin- and EndInvoke on delegates. This is very convenient but feels improper. C++ does a bit of this with operator overloading, but C# goes far beyond thet.

There are more differences, but I'd rather talk about the problems with C#. Here's a partial list.

  • The documentation sucks. It's impossible to find useful information among all the crap about DirectX, COM, and the N other technologies you don't use. When you do find the class or function you care about, it doesn't tell you what you need to know, or else it's wrong. For example, TcpClient.Close() doesn't disconnect your socket, contrary to its description (you have to use TcpClient.GetStream().Close() instead). And Thread.Abort() doesn't kill the thread it's called on (it throws an exception in the thread that called it). And many classes are described only as “internal to the .NET Framework”, including useful things like the IXmlSerializable interface that lets you customize the XML serialization of a class. And you have to follow extra links to find out basic information, like the namespace of a class or the parameters of a function. And the Longhorn documentation crashes Internet Explorer (but then, so does Microsoft Project). And so on. It's inexcusable.
  • The lack of flexibility of certain classes. The System.Configuration includes some convenient classes for reading from a configuration file, as long as you don't need to specify the file. In fact, .NET's configuration works badly in general, and doesn't even do the one thing it's supposed to: allow your program to always find its settings. Another restrictive class is RemotingConfiguration. Channels cannot be unloaded, and Configure can only be called once. Apparently, the recommended way to stop listening on a port (with remoting) is to restart your process.
  • The string.Format function. It may be less kludgy than C++'s strean operators, but it's also a return to the days of C's unchecked function parameters. In this case, the runtime stops you from mangling you memory, but will happily throw exceptions at very inconvient times. C++'s output routines can generally be verified by the compiler, an important safeguard.

I have a few other minor quibbles with C#, but its flaws pale in comparison to those of Visual Studio. I don't understand how the same company can create such a well-done language and such a poor tool for using it. It's an insult to the designers of C# and those of us trying to use it. But don't worry, as we say at work with each newly-discovered or re-encoutered flaw, it'll all be better in Longhorn.