Grand Text Auto » EP 6.4: Statistical AI

February 29, 2008

EP 6.4: Statistical AI

by Noah Wardrip-Fruin · , 6:10 am

We can see, in Minstrel, symptoms of a much larger problem. One which Turner, alone, could have done little to address. By the late 1980s it was clear that AI systems in general were not living up to the expectations that had been created over the three previous decades. Many successful systems had been built — by both “neats” and “scruffies” — but all of these worked on very small sets of data. Based on these successes, significant funding had been dedicated to attempting to scale up to larger, more real-world amounts of data. But these attempts failed, perhaps most spectacularly in the once high-flying area of “expert systems.” The methods of AI had produced, rather than operational simulations of intelligence, a panoply of idiosyncratic encodings of researchers’ beliefs about parts of human intelligence — without any means of compensating for the non-simulation of the rest of human intelligence. Guy Steele and Richard Gabriel, in their history of the Lisp programming language (1993, 30), note that by 1988 the term “AI winter” had been introduced to describe the growing backlash and resulting loss of funding for many AI projects. In this vacuum, assisted by steadily increasing processing power, a new form of AI began to rise in prominence.

As some readers of this book no doubt remember, in the early 1990s the World Wide Web was largely human-organized. The “What’s New” page at NCSA provided regular links to new web resources (much like one of today’s blogs) and Yahoo — still on servers at Stanford University — maintained a growing hierarchical directory, to which pages were added by hand. Many individuals maintained hotlists, or pages of links, that were the definitive sources for finding the web’s best information on certain topics. Much as pre-Dewey librarians had done before them, and AI researchers more recently, these editors of web resources organized knowledge according to their own ideas (often no more than intuitions) of what belonged with what. Web page “meta tags” helped authors express what they felt were the relevant keywords for their contributions, and page titles were a very important factor in determining which results came back from a search. Generally, finding the resources with “cool shades” next to them in the Yahoo directory was a better route to information than using the clumsy search engines.

But this began to change, and quickly, as search engines improved. This improvement came not by engines coming to better “understand” the pages they indexed, not by better interpreting document contents as traditional AI researchers had attempted, but by counting. For example, the “PageRank” algorithm that rocketed Google to success operates by counting the links that interconnect web pages. As the Google website explains:

PageRank relies on the uniquely democratic nature of the web by using its vast link structure as an indicator of an individual page’s value. In essence, Google interprets a link from page A to page B as a vote, by page A, for page B. But, Google looks at more than the sheer volume of votes, or links a page receives; it also analyzes the page that casts the vote. Votes cast by pages that are themselves “important” weigh more heavily and help to make other pages “important.” (Google)

Of course, describing this process as “democratic” is a rhetorical choice on the part of Google’s marketing department. When the web’s most “important” pages are almost all owned by media and technology conglomerates, the power to bless viewpoints on widely-discussed topics (e.g., national politics) with attention via PageRank lies in much the same hands as the power to grant attention does in non-web media. If democracy was the goal, one can imagining weighting Google’s statistical model somewhat differently. Nevertheless, even if it is possible to debate whether PageRank is democratic, there is no debating that it is statistical. It is based on counting information about pages (links in and out) and interpreting the resulting numbers. We now take for granted the ability of a computer program to find useful documents on a subject that interests us — though it would have seemed an amazing feat of AI not that many years ago.

The success of another high-profile web business is also partially tied to a statistical AI technique. In this case, Amazon’s recommendation system. Amazon doesn’t decide which books and movies to recommend to its customers by deeply analyzing the content of each item it sells and “noticing” similarities between them. Nor does it simply follow rules that dictate recommending, for example, other books by the same author. Instead, it looks at the numbers: the numbers of people who bought an item and also bought another item, or who looked at a product page and also looked at another product page (Linden, Smith, and York, 2003). It is these types of statistical correlations that let Amazon know to (as of this writing) recommend the first season of the television show Veronica Mars to those who visit the page for the show Wonderfalls — even though there is no overlap in the shows’ creators or stars, one is set in upstate New York and the other in Southern California, and the main character of one is a college-educated woman who works in a gift shop while the other is a mystery-solving high school student. Companies like Amazon build such statistical models not only in an attempt to improve sales of these items, but also because this type of information about products and their purchasers is a saleable commodity in itself.

Looking beyond applications such as Amazon’s, statistical AI techniques can also engage with human expressions, not just recognize what correlates with them. Statistical techniques have improved computer speech recognition to the point where automatic dictation (and automated phone systems, with their frustrations) are a practical reality. And statistical techniques don’t just help computers recognize what words people speak, they can also analyze patterns of words. The most successful email spam filters, for example, are built using statistical AI techniques.

The rise of statistical AI presents a significant challenge to those who believe that AI systems should be structured based on theories of human intelligence. The successes of statistical techniques grow each day — and yet no one believes that I choose which media to recommend to friends using the types of counting involved in Amazon’s recommendation system. At the same time, statistical techniques have severe limitations, which are not always acknowledged.

Limits of statistical AI

In the U.S., our most substantial public conversation about the limits of statistical artificial intelligence arose in response to the Total Information Awareness (TIA) initiative. Though often discussed as a response to the terrorist attacks of September 11th, 2001, the TIA effort was first proposed by the Defense Advanced Research Projects Agency two years before — and it was the continuation of a vision with a significantly longer history. But it was not until after the 2001 attacks that TIA gained serious momentum of its own, after acquiring major funding and a high-profile director: John Poindexter, best known for less-than-truthful testimony to Congress about the Iran-Contra scandal that plagued president Reagan’s administration.

The TIA vision is an example of dystopian magical thinking. It is dystopian in that, rather than identifying potential criminals and placing them under surveillance, it imagines putting every citizen under surveillance. It is magical thinking in that it imagines statistical AI techniques are capable of sifting through the vast flood of information this would produce in order to identify good leads for anti-terrorism investigators. This demonstrates a fundamental misunderstanding of how these techniques operate. One of the world’s most respected computing organizations — the Association for Computing Machinery — put it this way in a letter to the Senate Armed Services Committee:

Any type of statistical analysis inevitably results in some number of false positives — in this case incorrectly labeling someone as a potential terrorist. As the entire population would be subjected to TIA surveillance, even a small percentage of false positives would result in a large number of law-abiding Americans being mistakenly labeled.

For example, suppose the system has an 99.9% accuracy rate. We believe that having only 0.1% of records being misclassified as belonging to potential terrorists would be an unachievable goal in practice. However, if records for everyone in the U.S. were processed monthly, even this unlikely low rate of false positives could result in as many as 3 million citizens being wrongly identified each year. More realistic assumptions about the percentage of false positives would drive the number even higher. (Simons and Spafford, 2003)

False positives are fine in Google link lists and Amazon purchase recommendations, but in other areas their cost is much too high. In the area of surveillance, not only would they improperly cast suspicion on many people — the number of false leads would also be so high as to overwhelm any group tasked with investigating them. Further, the problem of false positives is only one of the serious limitations of statistical techniques. Another is the need for large amounts of well-defined data in order to develop the system’s model. The amount of information available on terrorist activity is very small by comparison with that used to build effective statistical systems (e.g., Google’s collection of web pages containing links) — and the patterns sought (e.g., determining whether a conversation is somehow “suspicious”) are extremely ill-defined compared with statistical AI’s success stories (e.g., determining where a web page links). Security expert Bruce Schneier’s estimate is that an “unrealistically accurate” TIA system would generate 1 billion false alarms for every real plot identified (2006). Further, there is the problem of false negatives. Even with all these billions of alarms, the system would fail to go off at times it should.

In 2003, after public outcry, Poindexter resigned, and it seemed that TIA had been dealt a death blow when its funding was removed. Then, in 2005, The New York Times reported that the Bush administration had authorized a far-reaching NSA surveillance program that included warrantless electronic eavesdropping on telephone calls and e-mails of U.S. citizens, and a series of further revelations followed (Williams, 2006). If not TIA in name, this was clearly the same dystopian vision at work. Further, other manifestations of magical thinking continue to generate proposals for utterly inappropriate attempts to employ statistical techniques. In the case of surveillance, even if ineffective for their stated purpose, these efforts still produce databases and monitoring systems, each vulnerable to compromise and improper use.

More generally, to understand what statistical techniques can accomplish, we need to move beyond general principles. Though understanding the basic fact of false positives provides some basis for evaluating proposals for statistical techniques (the cost of false positives must be low) it is also important to foster an understanding of how statistical models are developed and what they represent. As I argued in the Introduction, I believe presenting legible examples, especially from the area of media authoring, can help in this effort. And, of course, such examples can also help one consider when statistical techniques might be useful to media authors.

N-grams

A statistical model used in a system for understanding or generating human language is often called a “language model.” As with most statistical models, the primary use of a language model is to predict the future (linguistic) behavior of a system based on a sample of its past (linguistic) behavior. Constructing such a model requires two efforts. The first is deciding what to pay attention to: what “features” of the past behavior will be treated statistically. The second is deciding how to make those features part of an operating statistical model.

One of the most popular approaches for building a language model is the concept of “n-grams” (also known as “Markov chains”). An n-gram language model is built by looking at a body of linguistic data and seeing what elements follow other elements. If looking at the letters in a body of English, q would quite often be followed by u. Looking at the word level, you would more often be followed by are than by banana. Looking at sets of two elements is examining “bigrams” and sets of three are “trigrams.”

As this probably makes clear, in an n-gram model, the features are often the word or two that appear before the current one. A simple model employing these features would be: the different following words are assigned likelihoods based on how often they appear after the preceding ones within the sample data. In actuality, the models tend to be somewhat more complicated. This is in part because, even after training on millions of words, new pairs and triplets of words will appear regularly even in data from the same source (say, the same newspaper from which the model was developed). This requires that the model be “smoothed” so that newly-appearing digrams and trigrams won’t be assigned zero probabilities.

Such models have many potential uses. For example, dictation software may find certain words in a sentence ambiguous. But an n-gram model can use the words that precede and follow ambiguous words to improve the chances of correct recognition. Certainly “ate the banana” is a much more likely trigram than “ate the bandana.”

On the other hand, an n-gram collection doesn’t “understand” anything about the language it models. It would work just the same way on a collection of musical notes, or any other serially-ordered data. And while, as a model, it does well at capturing coherence from word-to-word, it does poorly at capturing any higher-level structure. This becomes quite apparent in the various attempts to use n-grams to generate language, rather than simply understand it. The foundational example, here, is Claude Shannon’s n-gram generated sentence from his field-defining essay, “A Mathematical Theory of Communication” (1948):

THE HEAD AND IN FRONTAL ATTACK ON AN ENGLISH WRITER THAT THE CHARACTER OF THIS POINT IS THEREFORE ANOTHER METHOD FOR THE LETTERS THAT THE TIME OF WHO EVER TOLD THE PROBLEM FOR AN UNEXPECTED.

This passage is more coherent than a random series of words, because each word follows quite naturally from the one preceding. This is no surprise, given that it was produced by flipping through a book, writing down a word, flipping through until that word was found again, writing down the word immediately following it, then flipping through until the just-written word was found, and so on — a word-level bigram. Longer chain lengths, of course, create longer stretches of local coherence. But the structure of sentences, paragraphs, and ideas is completely absent from the n-gram model.

In other words, a statistical model is nothing magical. Like Google’s capturing of links from web pages, statistical models such as n-grams must be focused on well defined features in consistent pools of data. Then, even once millions of patterns are captured, in any activity with significant variety (like human language) the model must be smoothed to account for the fact that previously-unseen patterns will continually appear. Even once this is accomplished, the model will only be appropriate for limited tasks — it will not be a robust artificial intelligence able to predict, identify, generate, and otherwise stand in for human knowledge of what it models.

The Restaurant Game

After moving to MIT, Jeff Orkin, who led the AI development for F.E.A.R., launched a project with an interesting goal — pursuing an idea from traditional AI using the tools of statistical AI. Specifically, his The Restaurant Game seeks to develop what he characterizes as a modern version of the scruffy idea of a “restaurant script” (Orkin, 2007, 22). His method is to statistically analyze the behavior of thousands of players in a freeform restaurant environment and, thereby, learn a much more flexible and detailed script of restaurant behavior than it would be possible to author by hand.

Orkin calls his effort a “game,” a term that resonates in two senses. First, he plans to use the results of his work to develop a game — and all those who interact with the freeform environment will be credited as designers of the eventual game. Second, by presenting the experience as a game, he is placing The Restaurant Game into the same category as a number of successful efforts to use simple game structures to encourage large numbers of people online to collaboratively develop data that is intractable for machines but too laborious for individual humans.

The best known of these data-collection games is The ESP Game, developed by Luis von Ahn and collaborators at Carnegie Mellon University (von Ahn et al, 2005). This game shows the same image, simultaneously, to two audience members who don’t know the other’s identity and cannot directly communicate. These two people each type words, as quickly as they can, that might describe the contents of the image, avoiding any forbidden words listed. As soon as one player types a word that the other player has already typed — creating a match in their descriptions — they score points and the next image appears, until the game time expires.

People find The ESP Game surprisingly fun, with some experiencing an exciting feeling of connection at the moments when they choose the same word as a partner. This has provided the motivation for people to generate quite a bit of data through such games. The data may be used to label online images (to increase accessibility for the visually impaired) or toward more computationally challenging ends (e.g., as well-formed training data for statistical approaches to computer vision). Quite a few other games along these lines exist, and they differ in their particulars, but all of them use simple scenarios to generate very specific data points.

By comparison, The Restaurant Game is immensely free-form and ambitious. Each person who signs in plays a male customer or female waitress. Players can chat freely with each other — not trying to guess the same word, or taking part in any other game-like activity, but simply playing their roles. The environment has many objects with which the players can interact, including food and drink that can be consumed, a register that can be used to pay bills (but can also be picked up and perhaps stolen), menus, furniture, flowers on the tables, a variety of kitchen implements, and so on. There are also two non-player characters: a bartender who provides drinks and a cook who provides food.

Compared against the real world, The Restaurant Game is certainly a microworld — but it is also a vastly richer environment than that provided in previous data collection games. Because it is a microworld, fewer actions are possible and those actions are unambiguous. The system knows exactly when one player passes another a menu, can tell exactly what words one player says to another, and can always tell if all the food was gone before the waitress picked up the plate. On the other hand, the diversity of possible actions and sequences of actions makes the data collected by The Restaurant Game a challenge to interpret statistically.

Orkin’s approach is to focus on the physical actions that take place, looking at the orders and relationships of actions. Actions are clustered based on their supports, an approach that discovers certain similarities (all foods are eaten the same way) as well as preconditions built into the microworld (eating requires the presence of food) and also social expectations (almost all players eat while seated). Orkin’s analysis prunes away the most infrequent actions and sequences, building up what he calls a “plan network” — a structure representing the normal pathways of actions in The Restaurant Game’s microworld. The language from player interactions is arranged, in the model, in terms of the physical actions between which it took place.

The resulting plan networks have the strengths and failures one would expect. Some of these are very specific to the microworld. Orkin demonstrates this with a visualization of the waitress plan network, which captures the set of interface steps required to use the cash register to pay a bill — and also captures the path produced by a common player confusion about the register’s operations.

Other successes and failures are those common to statistical models. These were revealed in Orkin’s evaluation of the system, which used n-gram modeling as “a means of estimating the likelihood of new gameplay sessions, and quantitatively evaluating how well the learned model of restaurant behavior and language correlates with humans’ ” (69). Here we actually see a microworld version of the ideas behind Total Information Awareness — examining data from thousands of games, Orkin worked to build a statistical profile of the behavior, and then attempted to build a system that would automatically detect unusual behavior.

This basic approach builds on some techniques that have proven very powerful in statistical AI, and Orkin reports some successes with it. For example, Orkin reports a play session in which a very typical set of actions took place. However, the system, correctly, rated the session as highly unlikely based on the language — the customer was trying to leave without paying and convince the waitress to come home with him. On the other hand, the system also responded in the same way to any conversation containing unusual vocabulary, even in games that humans rated as typical. As discussed earlier, this is the classic problem of false positives. Further, Orkin provides examples of sequences of physical actions — such as ordering beer and pie until they fill the restaurant, or sitting down, playfully refusing to order, and leaving — that the current version of the plan network is unable to detect as unusual, but are immediately strange to any human evaluator. This is the classic problem of false negatives.

Orkin had originally announced that a game created using the results from The Restaurant Game would be entered into the 2008 Independent Games Festival. But he had to let the deadline pass. In his thesis he writes that this provided a lesson in the “danger of promising the public a deliverable based on novel research by a specific date” (42). More generally, it provides a lesson about statistical AI. As an authoring tool, it requires large amounts of data that may not be available. If the significant effort is expended to collect this data, as with The Restaurant Game, for behavior with wide variation the results will be unpredictable and incomplete. Nevertheless, perhaps Orkin’s next step will find a way to make interacting with this model a fun, playful experience. In the meantime, the most significant contribution of The Restaurant Game is how it demonstrates that even in a microworld, stripped of most of the complexity of our own, statistical models produce many more false positives and false negatives than a TIA-style system could tolerate.

23 Responses to “EP 6.4: Statistical AI”

Chris Lewis Says:
March 2nd, 2008 at 12:45 pm
Might be worth mentioning that The ESP Game gave rise to Google’s Image Labeller, which gives the necessary weight to how important The ESP Game is.

I don’t know if they are folding those results back into the search engine yet; but they’re certainly collecting a heck of a lot of useful data.
Chris Lewis Says:
March 2nd, 2008 at 12:49 pm
This is probably a problem with the way I’m reading EP (just jumping in and out) more than you, Noah, but I didn’t get what “traditional AI” meant.

For example, Bayesian Statistics has been used in AI for decades, so is that traditional or statistical?

Like I said, you’ve probably already defined this earlier, and I’ve just missed it!
Chris Lewis Says:
March 2nd, 2008 at 12:50 pm
This is a fantastic chapter. Really enjoyed it; relevant, up-to-date and understands the issues well. Excellent stuff!
Jeff Orkin Says:
March 2nd, 2008 at 2:23 pm
I agree with Chris — Google’s acquisition of the ESP game illustrates the game’s success at its mission. I thought that Google really was using this data in image searchs now, but I might be wrong. von Ahn’s web page states that the ESP game has collected over 10 million labels. The figure is probably higher now.
Jeff Orkin Says:
March 2nd, 2008 at 2:52 pm
In our JOGD article, we delved a little deeper into what it is about this game that encourages people to play in the intended way (despite the open-ended nature), and in turn generate useful data. We described a new genre called a Minimal Investment Multiplayer Online (MIMO) role playing game, that is distinct from today’s commercial MMOs in its encouragement of anonymous dramatic role play.

You might be interested in reading pp.8-11:
http://www.media.mit.edu/cogmac/publications/Orkin_JoGD07_inpress.pdf

You note that the player can pay bills with the register or pick it up and steal it. I also wanted to point out that the player has the same interaction options for every object in the restaurant — they can pickup, sit on, bite, and “use” anything. While this opens the door to ridiculous behavior (e.g. biting the waitress, or eating the trash while sitting on the microwave), it also allows players to engage in dramatic improv in ways that we never anticipated. For example, we’ve seen players carry the decorative fruit bowl to the bar, and try to use the blender in response to a customer ordering a margarita.
Jeff Orkin Says:
March 2nd, 2008 at 2:53 pm
The data collection effort is still running, and the game can be downloaded at http://theRestaurantGame.net , for those interested.
Jeff Orkin Says:
March 2nd, 2008 at 3:06 pm
It may be worth noting that the microworld of The Restaurant Game is also a richer environment than most commercial games, with respect to the range of possible interactions and the variety of interactive objects. Most of today’s games restrict interaction with the world to shooting things, opening doors, and picking up weapons. Physics simulations have added the ability to knock some things over, or destroy them. But in general, interactivity is restricted, in part because developers simply don’t have the bandwidth to script appropriate responses to every interaction. So, that is one of the goals of statistical data collection — learn what interactions players tend to have with different types of objects, and what are common responses when these interactions occur.
Jeff Orkin Says:
March 2nd, 2008 at 3:29 pm
“perhaps Orkin’s next step will find a way to make interacting with this model a fun, playful experience”

…Or, perhaps it is possible to combine statistical and human-authored approaches, to get the best of both worlds. I agree with everything you’ve written, based on what we have published so far, but our long term goal is a semi-automated system where the statistical model forms the scaffolding, facilitating human authoring of the less typical behavior. The Restaurant Game is an ongoing project that will continue to keep me busy for the foreseeable future.
Richard Evans Says:
March 4th, 2008 at 1:35 pm
I find The Restaurant Game fascinating because it is at a different level of ambition from other work in game AI.

In terms of modeling the background social practices which make everyday stories intelligible (e.g. Schank’s restaurant), there are various levels of simulation: (1) hand-code a few social practices; the system cheats and tells the agents what state the practice is in. (Consequently there is no room for divergence between the different agents’ conception of the state of the practice). (2) data-drive a number of social practices and let the agents themselves work out what state the social practices are in (This is what Rob Zubek does in his dissertation, using markov models to make a best guess about the current state of the practice). (3) Let the system auto-generate practices from statistical analysis of lots of player data. The Restaurant Game is the only project I am aware of which allows for the possibilities of new practices to emerge, which were not defined by the author at design-time.

Most games ignore social practices altogether. A few games hard-code a couple of social practices.

One person (Rob Zubek) has built a system which
* ignoring social practices entirely (most games)
* hard-coding a couple of social practices
* hand-coding a large number of social practices
Richard Evans Says:
March 4th, 2008 at 1:39 pm
Sorry, I had some copy/paste issues in my last post :)

I meant to type this:

I find The Restaurant Game fascinating because it is at a different level of ambition from other work in game AI.

In terms of modeling the background social practices which make everyday stories intelligible (e.g. Schank’s restaurant), there are various levels of simulation: (1) hand-code a few social practices; the system cheats and tells the agents what state the practice is in. (Consequently there is no room for divergence between the different agents’ conception of the state of the practice). (2) data-drive a number of social practices and let the agents themselves work out what state the social practices are in (This is what Rob Zubek does in his dissertation, using markov models to make a best guess about the current state of the practice). (3) Let the system auto-generate practices from statistical analysis of lots of player data. The Restaurant Game is the only project I am aware of which allows for the possibilities of new practices to emerge, which were not defined by the author at design-time.
Mark Nelson Says:
March 4th, 2008 at 1:58 pm
I wouldn’t say I disagree with this critique entirely, but one strength of statistical AI is precisely that it *is* fairly robust compared to other types of artificial intelligence. This isn’t to say it’s entirely robust or able to stand in for humans in various situations, but then neither is classical symbolic AI.

What people claim for statistical AI varies by person, but the most common view I’ve encountered is actually fairly limited and frank about its limitations—statistical AI says very precise things about very narrow, well-defined problems, so is sort of a bottom-up way of rigorously dealing with problems, starting with the simplest ones. The empirical claim is that it works better than anything else has so far, at least on a variety of common problems, and the theoretical claim is that we have a better understanding of how/why it works or doesn’t work, due to the mathematical underpinnings.
noah Says:
March 10th, 2008 at 8:42 am
Chris, that puts a smile on my face!

This chapter is a prime example of what I hope the book can do, which is also what has made writing it a challenge.

Basically, I’m pushing what I can manage — as a thinker and writer — by threading together the past and present, the theoretical and practical, the big issues and the system specifics. Knowing there are readers who like the results is a good feeling.
noah Says:
March 10th, 2008 at 8:47 am
Mark, it was a pleasure to finally meet you the other day.

I think you and I are in complete agreement about statistical AI. The people we (or at least I) disagree with are the “Total Information Awareness” sort, who are trying to sell politicians and the public on a fantasy of what statistical techniques can manage.

So maybe I should re-cast this paragraph. It’s not actually meant as any sort of attack on statistical AI (which I’m happy to have as part of my daily life). Can you think of a way to make my target clearer?
noah Says:
March 10th, 2008 at 8:48 am
Right — thanks for pointing out that ambiguity. I should probably use a term like “symbolic” rather than (or in addition to) “traditional.”
noah Says:
March 10th, 2008 at 8:50 am
Good idea. I’ll definitely mention that — probably both with something in the main text and with a note giving more detail. It also ties nicely to the discussion of Google above.
Chris Lewis Says:
March 10th, 2008 at 4:19 pm
I’m not Mark (obviously) but I think all this paragraph needs is just to make a greater emphasis on Mark’s point that statistical AI works if:
a) The problem is well-defined
b) The problem is narrow
c) You can get the data!

I think it’s just the final line that *just about* casts what you said in the wrong light.

Google is a good example. Google’s problem is well-defined: find keywords in pages, work out how important they are, match it to user keywords (this is a horrific simplification, but it’s brief ;) ) but if Google wants to be more like a librarian (“find what I mean, not what I say”) it won’t work, because it relies too heavily on keywords.

For Google’s current focus, statistical AI is great, and we see it working very well. For Google’s overall *mission*, statistical AI won’t stretch that far. Continuing your use of Google to show the limitations of statistical AI, but illustrate what it can achieve, would probably get the passage there.

But a lot of people here know *a lot* more than me, so caveat emptor!
Mark Says:
March 13th, 2008 at 5:02 pm
I’m not sure this is a complete answer to your question, but I think it might be worth separating some inherent issues with statistical AI from misuse of its conclusions.

As I read it, this section focuses more on some of the inherent limitations; for example, binning together things that aren’t necessarily similar and smoothing out inconsistencies in order to make complex data fit relatively simple statistical models, like modeling sentences as bags of words or n-grams. This is obviously going to cause problems at some point, and might be an argument in favor of more complex models, which is mostly what I read (possibly, misread) this discussion as.

IMO, though, the fundamental issue with what the TIA folks are doing is somewhat different. Although this sort of simplification/smoothing exists, statistical techniques can also give you error estimates: e.g. you can figure out that your naive spam filter, despite not being able to really parse English, nonetheless flags spam with only 0.3% false positives and 3% false negatives, and then decide whether this is good enough for you or not. The main problem with TIA seems to be that they either ignore these error estimates, or decide something is good enough when you or I might disagree that this level of error is really reasonable for a government to use in flagging terrorists.

Incidentally, that sort of critique could probably be made of use of statistics more generally, not only by the government and not only within statistical AI. For example, 95% confidence intervals are taken as “standard” in demonstrating significant effects when doing medical research—despite the fact that this means 1 out of 20 medical studies is likely to contain intervals that don’t correspond to reality, a pretty shockingly high level when dealing with life/death matters.
Nicholas Novitski Says:
March 17th, 2008 at 11:38 am
Is this paragraph necessary? It arguably continues/completes the story of an infamous statistical approach, but it seems to belong in the opening chapter of a book on the flaws of mass surveillance. The previous and following paragraphs are sufficient to communicate the general problem of statistical AI: the cost of false positives must be low.
Chris Lewis Says:
March 17th, 2008 at 1:11 pm
I can see Nicholas’ point. I didn’t notice it on my first read because of my own political leanings (I’m more biased that I thought!) but I think it is probably too charged without adding to the discussion of statistical AI.
noah Says:
March 17th, 2008 at 1:16 pm
Well, my argument in the overall book is that we should pay attention to the processes of digital media for more than one reason. The most obvious reason is that we’ll understand digital media better. A less obvious reason is that we can, hopefully, generalize from some of this understanding to big issues for our society. So this paragraph is one of my attempts to gesture in that direction.

The question is, given it’s not working for you two, why is it not working? One possibility is the one that Ian raises: the blog post format makes the long threads of the book’s argument suffer, even if it’s a good format for looking at the local, specific arguments. Another possibility is that it simply doesn’t belong here (or should be demoted to a note). Hmmm…
Chris Lewis Says:
March 17th, 2008 at 1:23 pm
Ahhhh. Well, probably Ian’s point then. If it contributes to an overall theme through the book, then it would make sense. I haven’t hidden that I’m jumping in and out of the book due to the blog-post format.

In isolation, it seems that it’s a bit op-ed, with some strong phrases like “utterly inappropriate”, that’s placed between two paragraphs backed up with quotes and calculated reasoning.

In the blog format, it sticks out. In the context of the entire book, it might well be in the right place!
Grand Text Auto » Link Madness, Part 1: the Hyperbolic Says:
March 23rd, 2008 at 3:40 pm
[…] the idea to tap into MMOs as a resource for training AGIs. Seems like a good idea — akin to Jeff Orkin’s research perhaps. However he starts going a bit off the rails with statements like this: It seems possible […]
Noah Wardrip-Fruin Says:
August 15th, 2008 at 5:24 pm
I ended up revising this part of the manuscript in a way that I hope clarifies the point I’m trying to make. If it still doesn’t work I’ll have a chance to try again when I have the copyedits back:

===

In other words, despite the impressive results from many statistical AI systems, a statistical model is nothing magical. Like Google’s capturing of links from web pages, statistical models such as n-grams must be focused on well defined features in consistent pools of data. Then, even once millions of patterns are captured, in any activity with significant variety (like human language) the model must be smoothed to account for the fact that previously-unseen patterns will continually appear. Even once this is accomplished, the model will only be appropriate for limited tasks. Given this, we must bear both the strengths and limitations of statistical approaches in mind. Whether we are listening to intelligence agency officials or singularity-seeking science fiction writers, we should not be fooled by impressive results in certain areas into thinking anyone has developed (or knows how to develop) robust artificial intelligence able to predict, identify, generate, and otherwise stand in for human knowledge of a complex domain.

===

Chris Lewis Says:
March 2nd, 2008 at 12:45 pm

Might be worth mentioning that The ESP Game gave rise to Google’s Image Labeller, which gives the necessary weight to how important The ESP Game is.

I don’t know if they are folding those results back into the search engine yet; but they’re certainly collecting a heck of a lot of useful data.

Chris Lewis Says:
March 2nd, 2008 at 12:49 pm

This is probably a problem with the way I’m reading EP (just jumping in and out) more than you, Noah, but I didn’t get what “traditional AI” meant.

For example, Bayesian Statistics has been used in AI for decades, so is that traditional or statistical?

Like I said, you’ve probably already defined this earlier, and I’ve just missed it!

Chris Lewis Says:
March 2nd, 2008 at 12:50 pm

This is a fantastic chapter. Really enjoyed it; relevant, up-to-date and understands the issues well. Excellent stuff!

Jeff Orkin Says:
March 2nd, 2008 at 2:23 pm

I agree with Chris — Google’s acquisition of the ESP game illustrates the game’s success at its mission. I thought that Google really was using this data in image searchs now, but I might be wrong. von Ahn’s web page states that the ESP game has collected over 10 million labels. The figure is probably higher now.

Jeff Orkin Says:
March 2nd, 2008 at 2:52 pm

In our JOGD article, we delved a little deeper into what it is about this game that encourages people to play in the intended way (despite the open-ended nature), and in turn generate useful data. We described a new genre called a Minimal Investment Multiplayer Online (MIMO) role playing game, that is distinct from today’s commercial MMOs in its encouragement of anonymous dramatic role play.

You might be interested in reading pp.8-11:
http://www.media.mit.edu/cogmac/publications/Orkin_JoGD07_inpress.pdf

You note that the player can pay bills with the register or pick it up and steal it. I also wanted to point out that the player has the same interaction options for every object in the restaurant — they can pickup, sit on, bite, and “use” anything. While this opens the door to ridiculous behavior (e.g. biting the waitress, or eating the trash while sitting on the microwave), it also allows players to engage in dramatic improv in ways that we never anticipated. For example, we’ve seen players carry the decorative fruit bowl to the bar, and try to use the blender in response to a customer ordering a margarita.

Jeff Orkin Says:
March 2nd, 2008 at 2:53 pm

The data collection effort is still running, and the game can be downloaded at http://theRestaurantGame.net , for those interested.

Jeff Orkin Says:
March 2nd, 2008 at 3:06 pm

It may be worth noting that the microworld of The Restaurant Game is also a richer environment than most commercial games, with respect to the range of possible interactions and the variety of interactive objects. Most of today’s games restrict interaction with the world to shooting things, opening doors, and picking up weapons. Physics simulations have added the ability to knock some things over, or destroy them. But in general, interactivity is restricted, in part because developers simply don’t have the bandwidth to script appropriate responses to every interaction. So, that is one of the goals of statistical data collection — learn what interactions players tend to have with different types of objects, and what are common responses when these interactions occur.

Jeff Orkin Says:
March 2nd, 2008 at 3:29 pm

“perhaps Orkin’s next step will find a way to make interacting with this model a fun, playful experience”

…Or, perhaps it is possible to combine statistical and human-authored approaches, to get the best of both worlds. I agree with everything you’ve written, based on what we have published so far, but our long term goal is a semi-automated system where the statistical model forms the scaffolding, facilitating human authoring of the less typical behavior. The Restaurant Game is an ongoing project that will continue to keep me busy for the foreseeable future.

Richard Evans Says:
March 4th, 2008 at 1:35 pm

I find The Restaurant Game fascinating because it is at a different level of ambition from other work in game AI.

In terms of modeling the background social practices which make everyday stories intelligible (e.g. Schank’s restaurant), there are various levels of simulation: (1) hand-code a few social practices; the system cheats and tells the agents what state the practice is in. (Consequently there is no room for divergence between the different agents’ conception of the state of the practice). (2) data-drive a number of social practices and let the agents themselves work out what state the social practices are in (This is what Rob Zubek does in his dissertation, using markov models to make a best guess about the current state of the practice). (3) Let the system auto-generate practices from statistical analysis of lots of player data. The Restaurant Game is the only project I am aware of which allows for the possibilities of new practices to emerge, which were not defined by the author at design-time.

Most games ignore social practices altogether. A few games hard-code a couple of social practices.

One person (Rob Zubek) has built a system which
* ignoring social practices entirely (most games)
* hard-coding a couple of social practices
* hand-coding a large number of social practices

Richard Evans Says:
March 4th, 2008 at 1:39 pm

Sorry, I had some copy/paste issues in my last post :)

I meant to type this:

Mark Nelson Says:
March 4th, 2008 at 1:58 pm

I wouldn’t say I disagree with this critique entirely, but one strength of statistical AI is precisely that it *is* fairly robust compared to other types of artificial intelligence. This isn’t to say it’s entirely robust or able to stand in for humans in various situations, but then neither is classical symbolic AI.

What people claim for statistical AI varies by person, but the most common view I’ve encountered is actually fairly limited and frank about its limitations—statistical AI says very precise things about very narrow, well-defined problems, so is sort of a bottom-up way of rigorously dealing with problems, starting with the simplest ones. The empirical claim is that it works better than anything else has so far, at least on a variety of common problems, and the theoretical claim is that we have a better understanding of how/why it works or doesn’t work, due to the mathematical underpinnings.

noah Says:
March 10th, 2008 at 8:42 am

Chris, that puts a smile on my face!

This chapter is a prime example of what I hope the book can do, which is also what has made writing it a challenge.

Basically, I’m pushing what I can manage — as a thinker and writer — by threading together the past and present, the theoretical and practical, the big issues and the system specifics. Knowing there are readers who like the results is a good feeling.

noah Says:
March 10th, 2008 at 8:47 am

Mark, it was a pleasure to finally meet you the other day.

I think you and I are in complete agreement about statistical AI. The people we (or at least I) disagree with are the “Total Information Awareness” sort, who are trying to sell politicians and the public on a fantasy of what statistical techniques can manage.

So maybe I should re-cast this paragraph. It’s not actually meant as any sort of attack on statistical AI (which I’m happy to have as part of my daily life). Can you think of a way to make my target clearer?

noah Says:
March 10th, 2008 at 8:48 am

Right — thanks for pointing out that ambiguity. I should probably use a term like “symbolic” rather than (or in addition to) “traditional.”

noah Says:
March 10th, 2008 at 8:50 am

Good idea. I’ll definitely mention that — probably both with something in the main text and with a note giving more detail. It also ties nicely to the discussion of Google above.

Chris Lewis Says:
March 10th, 2008 at 4:19 pm

I’m not Mark (obviously) but I think all this paragraph needs is just to make a greater emphasis on Mark’s point that statistical AI works if:
a) The problem is well-defined
b) The problem is narrow
c) You can get the data!

I think it’s just the final line that *just about* casts what you said in the wrong light.

Google is a good example. Google’s problem is well-defined: find keywords in pages, work out how important they are, match it to user keywords (this is a horrific simplification, but it’s brief ;) ) but if Google wants to be more like a librarian (“find what I mean, not what I say”) it won’t work, because it relies too heavily on keywords.

For Google’s current focus, statistical AI is great, and we see it working very well. For Google’s overall *mission*, statistical AI won’t stretch that far. Continuing your use of Google to show the limitations of statistical AI, but illustrate what it can achieve, would probably get the passage there.

But a lot of people here know *a lot* more than me, so caveat emptor!

Mark Says:
March 13th, 2008 at 5:02 pm

I’m not sure this is a complete answer to your question, but I think it might be worth separating some inherent issues with statistical AI from misuse of its conclusions.

As I read it, this section focuses more on some of the inherent limitations; for example, binning together things that aren’t necessarily similar and smoothing out inconsistencies in order to make complex data fit relatively simple statistical models, like modeling sentences as bags of words or n-grams. This is obviously going to cause problems at some point, and might be an argument in favor of more complex models, which is mostly what I read (possibly, misread) this discussion as.

IMO, though, the fundamental issue with what the TIA folks are doing is somewhat different. Although this sort of simplification/smoothing exists, statistical techniques can also give you error estimates: e.g. you can figure out that your naive spam filter, despite not being able to really parse English, nonetheless flags spam with only 0.3% false positives and 3% false negatives, and then decide whether this is good enough for you or not. The main problem with TIA seems to be that they either ignore these error estimates, or decide something is good enough when you or I might disagree that this level of error is really reasonable for a government to use in flagging terrorists.

Incidentally, that sort of critique could probably be made of use of statistics more generally, not only by the government and not only within statistical AI. For example, 95% confidence intervals are taken as “standard” in demonstrating significant effects when doing medical research—despite the fact that this means 1 out of 20 medical studies is likely to contain intervals that don’t correspond to reality, a pretty shockingly high level when dealing with life/death matters.

Nicholas Novitski Says:
March 17th, 2008 at 11:38 am

Is this paragraph necessary? It arguably continues/completes the story of an infamous statistical approach, but it seems to belong in the opening chapter of a book on the flaws of mass surveillance. The previous and following paragraphs are sufficient to communicate the general problem of statistical AI: the cost of false positives must be low.

Chris Lewis Says:
March 17th, 2008 at 1:11 pm

I can see Nicholas’ point. I didn’t notice it on my first read because of my own political leanings (I’m more biased that I thought!) but I think it is probably too charged without adding to the discussion of statistical AI.

noah Says:
March 17th, 2008 at 1:16 pm

Well, my argument in the overall book is that we should pay attention to the processes of digital media for more than one reason. The most obvious reason is that we’ll understand digital media better. A less obvious reason is that we can, hopefully, generalize from some of this understanding to big issues for our society. So this paragraph is one of my attempts to gesture in that direction.

The question is, given it’s not working for you two, why is it not working? One possibility is the one that Ian raises: the blog post format makes the long threads of the book’s argument suffer, even if it’s a good format for looking at the local, specific arguments. Another possibility is that it simply doesn’t belong here (or should be demoted to a note). Hmmm…

Chris Lewis Says:
March 17th, 2008 at 1:23 pm

Ahhhh. Well, probably Ian’s point then. If it contributes to an overall theme through the book, then it would make sense. I haven’t hidden that I’m jumping in and out of the book due to the blog-post format.

In isolation, it seems that it’s a bit op-ed, with some strong phrases like “utterly inappropriate”, that’s placed between two paragraphs backed up with quotes and calculated reasoning.

In the blog format, it sticks out. In the context of the entire book, it might well be in the right place!

Grand Text Auto » Link Madness, Part 1: the Hyperbolic Says:
March 23rd, 2008 at 3:40 pm

[…] the idea to tap into MMOs as a resource for training AGIs. Seems like a good idea — akin to Jeff Orkin’s research perhaps. However he starts going a bit off the rails with statements like this: It seems possible […]

Noah Wardrip-Fruin Says:
August 15th, 2008 at 5:24 pm

I ended up revising this part of the manuscript in a way that I hope clarifies the point I’m trying to make. If it still doesn’t work I’ll have a chance to try again when I have the copyedits back:

===

In other words, despite the impressive results from many statistical AI systems, a statistical model is nothing magical. Like Google’s capturing of links from web pages, statistical models such as n-grams must be focused on well defined features in consistent pools of data. Then, even once millions of patterns are captured, in any activity with significant variety (like human language) the model must be smoothed to account for the fact that previously-unseen patterns will continually appear. Even once this is accomplished, the model will only be appropriate for limited tasks. Given this, we must bear both the strengths and limitations of statistical approaches in mind. Whether we are listening to intelligence agency officials or singularity-seeking science fiction writers, we should not be fooled by impressive results in certain areas into thinking anyone has developed (or knows how to develop) robust artificial intelligence able to predict, identify, generate, and otherwise stand in for human knowledge of a complex domain.