May 15, 2005

Laws and Questions about Online Variations

by Nick Montfort · , 8:02 pm

I heard about GTxA commenter Raph Koster’s “The Laws of Online World Design” recently on ifMUD. It’s a provocative and thoughtful list of principles, some of which were evident back in the days of Habitat. While the page itself is not new – an Internet Archive search shows the page has been around at that location since 2000 – and there are no arguments offered for why these laws obtain, the page is still well worth reading, and has several thoughts that apply to one-player games as well.

Having used the Internet Archive to check the date this page was first posted, I also fetched the May 11, 2000 version of the page (the earliest one) and then ran diff on this old page and the current HTML. Which leads me to wonder…

Although there’s nothing to indicate it in the current HTML of the page, you can see from doing this that Dundee’s Law, Ananda Dawnsinger’s Law, Rickey’s Law, Darklock’s First Law, Corollary to Darklock’s First Law, and Darklock’s Second Law have been added, as you would note if you ran diff on plain text renderings of the HTML. There are also other changes visible when diffing the HTML, though. Line breaks that have been eliminated here and there; a jpeg bullet image is linked instead of a gif image.

Well, these particular differences aren’t startling or exciting, but they led me to wonder about how Web pages (and sites overall) are altered over time. What changes about them, and why? There are plenty of cases where where the White House has erased online government records to make the current administration look better (e.g., the list of coalition members, as reported later by Reuters), but what are the less sinister and more subtle changes that are made? Has any humanistic work been done on discerning and interpreting these changes from outside the site, using the Internet Archive or one’s one saved versions of pages? Such changes are beginning to be understood in the framework of machine learning and computer science – Google certainly takes changes into account, determining how often to refetch a page based on how frequent these changes are, for instance. But it will also be interesting to learn how such alterations reveal editorial activity and human intentions and actions. There seems to be room for large-scale work as well as well as close studies of specific documents.

So far, I’ve found one recent article that bears on this question, in the journal Information Research: “A longitudinal study of Web pages continued: a consideration of document persistence” by Wallace Koehler. This article compares samples of documents from between 1996 and 2003 to quantify Web page changes over time, looking at file size and links rather than all of the HTML. The conclusions are interesting:

…the half-lives of Web resources in different disciplines, domains, and fields differ. … not only are legal, scholarly, and educational electronic citations reported to have limited lifecycles not dissimilar to Web resources in general, but there is also variability among the disciplines.

The focus of this article is on “linkrot” and preservation issues, though – I’m also interested in finding out what the implications of such longitudinal changes are for textual studies and editorial practice, not just for library science and descriptive bibliography. I know Matt Kirschenbaum’s excellent “Editing the Interface: Textual Studies and First Generation Electronic Objects.” (TEXT: An Interdisciplinary Annual of Textual Studies 14 (2002): 15-51) deals with this issue, but unfortunately Matt’s article suffers from the ultimate linkrot – it was never placed online. I’ll have to fish it out of my files and re-read it at some point…