September 17, 2004

Games and Natural Language Understanding

by Michael Mateas · , 1:40 pm

In contemporary commercial game design, natural language interaction is avoided like the plague. If the player needs to “talk” to characters in the world, designers typically employ menus (either dialog trees containing explicit dialog, or flat dialog action menus containing actions such as flirt, insult, etc.) or simply can the entire conversation by providing a talk command. Barring occasional experiments with limited speech recognition (e.g. Lifeline, Seaman, Babyz), developers are skeptical of natural language understanding (NLU), remembering the frustrations of the well-known parser failures of text-based interactive fiction, and noting that NLU requires human-level AI to solve in the general case.

Ultimately, however, in order to create adult experiences containing rich characters addressing complex themes, games will have to use language, and thus will have to tackle NLU. Players will want and need to communicate a large set of possible meanings to the characters (and of course the characters, as well as the large scale structure of the game, should be responsive to those meanings). Any explicit choice approach to conveying this large range of meanings (e.g. dialog menus, discourse act menus, constructive interfaces that let you put together sentences out of parts) introduces a number of problems, including foregrounding the boundaries of the experience (the player immediately sees the full range of possibilities), making all choices appear equally salient, and making action selection unwieldy (and potentially unmanageable).

NLU interfaces avoid foregrounding the game boundaries because, rather than displaying an explicit list of all the possible meanings (game verbs) the player can perform, the player instead expresses what they want, in their own words, “selecting” an utterance out of the pool of all possible natural language utterances. Ideally, through gameplay, the player discovers the action potentials of the game; the space of possibility feels wide-open, and, in the best case, the illusion is created of a space of possibility much larger than is actually supported by the game. Additionally, when characters respond to the specific language the player writes, as opposed to responding to a menu choice, it makes the characters seem more alive, as if they really understand you. Of course the strategy of creating these illusions is risky. When the NLU fails to recognize an input, or the design fails to plausibly support an action recognized by the NLU, the illusion comes crashing down; rather than feeling wide-open, the possibility space instead feels finite but unknown (frustrating) and the characters may seem even more mechanical than if NLU wasn’t used. Thus, using NLU in games involves not only clever technology, but also clever design that can mask NLU failures, e.g. strategies such as:

And you never want to say “I don’t understand”.

Fable, Peter Molyneux’s new game, illustrates the issues of equal saliency and large hierarchical menus. (Note that these comments are based on watching Peter demo Fable at GDC; I’d love to hear comments about the action interface from those who’ve actually played Fable. ) Fable makes available to the player a large number of possible actions, including discourse acts. The player navigates hierarchical menus in order to select from this large set (the player can bind controller buttons to frequently used choices). In a laudable effort to increase player agency, and in the design style of open-world games, the large number of choices is available at all times. So, for example, the player can choose to fart at all possible moments during the game (at the two different presentations of Fable I saw, Peter took great pleasure in demonstrating farting, commenting on the British propensity towards fart humor; I have repeated the “fart demo” story a number of times, including here, revealing my own unnatural interest in farting). Now, though it is certainly possible to fart at all possible moments, it is often not the most salient or important player action in many gameplay situations. But, in an explicit choice scheme, the fart action always looks as important, or salient an option as, say, giving another character an order, handing a character an object, and so forth. In a sense, the interface is constantly reminding you “you could fart now, you could fart now, you could fart now”, and of course reminding you of many other actions as well, most of which aren’t particularly important in any given gameplay situation. An NLU interface supports a large potential action space, allowing all actions at all times, without falsely presenting the actions as all being equally important at all times. Where an explicit choice scheme will tend to encourage a “poking” gameplay style, where the player pokes at the world by picking actions off the menu (e.g. “what happens if I fart here … how about here … how about here …”), an NLU interface encourages the player to consider “What do I want to do next?” (and if the player wants to poke, they can still poke).

Unfortunately, from an input perspective, while large hierarchical menus are ungainly to use, there is currently no good interface for inputting natural language either. Within real-time games, there are difficulties integrating the pace of typing with the pace of real-time action. In Façade, our strategy is to use “pseudo-real-time” for conversation; the characters talk at a normal speed, but when the player types, the characters act as if the player is talking at conversational speed, pausing and looking at the player attentively, even though the length of time the player may be typing is much longer than typical conversational time. And real-time issues aside, a keyboard seems inappropriate as an input device, (i.e. typos, many players can’t type quickly, trying to pay attention to typing and what is going on in the game world is difficult), but speech recognition is nowhere near good enough yet.

However, despite the technological, game design and input device difficulties of using natural language, NLU will be an important approach in the future of gaming, and in fact will be necessary to create deeply interactive experiences that, in a sophisticated way, address the human condition. Making NLU work in games is an example of non-incremental game research; if we don’t start building experimental games with NLU now, we’ll never have NLU for commercial games.

13 Responses to “Games and Natural Language Understanding”


  1. B. Rickman Says:

    I am skeptical (as always) that NLU will be able to avoid the foregrounding of boundaries to any great degree — one does, after all, spend a certain amount of time “foregrounding the boundaries” whenever one is in an unusual or unexpected situation. If I go to a Korean restaurant, my interactions with the waiter are limited by the things I think they will understand, in the extremes either speaking normal English or pointing at the menu and making hand gestures. Of course this isn’t a common, everyday situation, but neither is a user’s interaction with a new application.

  2. HeatherL Says:

    Yes. Absolutely. Fantastic.

    I am dying to get my hands on Fable, but I promised myself I would finish Beyond Good and Evil first. :)

    One of my favorite games ever, Quest For Glory 1: So You Want to Be a Hero used a simple NLU approach. It was a LucasArts/Sierra style graphical adventure game but used a text field at the bottom of the screen where players could type actions in. The game was later remade with the Sierra more-or-less standard point and click only interface (you clicked a hand icon, for example, then clicked over something on the screen and it would pick it up, touch it, etc.)

    Even though in the original version of the game I would occasionally get a little flustered at not asking a question the right way and getting a standard “huh?” response from the NPC, it felt much superior to the point and click version in terms of immersion. They did a pretty good job with it, and I didn’t wind up with a “huh?” response that often. And yes, it did feel much more like I was exploring and uncovering a world than with a point and click menu interface for conversations.

    I think this is a fantastic area of research, but I see a commercial adoption of it very very far in the future. I really think that what will have to happen for a commercial game to adopt NLU is for there to be a readily available, proven, third-party package that the developers can drop into their project and tweak infinitely with small amounts of effort on their parts. At the moment, it is intimidating because it is very high-risk: Alot of time and money (man-hours) for something that may, if done well, make their game more appealing to the market or, if not done well, be a complete disaster.

  3. Nicolas Szilas Says:

    I would like to argue in favor of menu-based interfaces…

    I understand the position that menu-based interfaces “falsely [present] the actions as all being equally important at all times” and that “a NLU interface encourages the player to consider “What do I want to do next?””. However, I believe that a menu-based interface could also promote such behavior. If you have a real story, if available actions are “narrative” (“farting” might not be the most narrative action…), then the player will anticipate what the interface provides to him, and FIRST choose what to do, THEN find it on the interface.

    Within the IDtension project, I am building a narrative engine able to carry out complex language actions (like dissuading a character from performing a given task). For testing purpose, I have chosen the most trivial menu-based interface: User get all possible actions available in a list. I noticed, on myself and others, that at he beginning, the user was reading all the choices, and later he was thinking of an action and then he was finding it on the interface.

    It might be the case, paradoxically, that this good behavior was induced by the poor quality of this specific menu-based interface! But whatever menu-based interface, the proposed actions will never be more salient that the actions “afforded” by the narrative itself and the previous interactions. So the challenge of menu-based interfaces also lies in the way the story and the engine behind make it possible for the user to anticipate possible actions.

    I have develop this idea of anticipation in a recent paper, presented at TIDSE’04 (I will make this paper soon available on my website). This topic deserves its own thread!!

    Another reason why menu-based interfaces should not be discarded too fast is that an “explicit choice approach” does not necessary “[make] all choices appear equally salient”, at least in principle. I can imagine for example a two level interface, where most salient actions would be proposed in the foreground, and all the rest would be left at the background. I am not promoting this approach (it raises other issues, see the paper mentioned above), but this examples shows that the problem could also be avoided through an innovative design of the interface.

    To sum up, thank you Michael for raising the issue of “equal saliency”… but this issue might find solutions in the framework of “explicit choice interfaces”. So, both NLU interfaces and explicit interfaces should be investigated, for language based games.

  4. andrew Says:

    I’m joining this conversation a bit late, work has been extremely busy for me the past two weeks and I’m just now coming up for air.

    First I’d like to point to a similar plea to integrate natural language into games, that we made in the introduction of our 2003 GDC paper (pdf). Also, in a post just before this one we were discussing similar issues in the context of text-based IF, relevant to this thread.

    Another way of describing a preference for open-ended text input versus menu interfaces is that the former is more natural. For interactive characters/drama/fiction, I’m all for natural, intuitive interfaces. I find them more immersive and conducive to suspension of disbelief, as well as more elegant, minimal and efficient. Interface is so important to the overall quality of an interactive experience.

    Michael describes how NLU failures are a risk with open-ended text input. Another risk is the potential of an overly-extreme mapping of language input into too few player moves. If the millions of possible natural language inputs all just get funneled into a very small number of player discourse acts, then the interface may feel muddy and unclear. That is, if too many significantly different things I type end up causing the same response from the system, because there are too few responses authored, it becomes hard to for players to tell what kind of effect they’re having. The player observes that they caused something to happen, but they don’t know why; consequently their sense of agency is diluted. Whereas if they were explicitly presented with only a few discourse acts in a short menu list, this diluted sense of agency wouldn’t occur.

    I think it’s going too far to say “there is currently no good interface for inputting natural language”. IMO the interface we developed for Façade actual works pretty well in this regard. It’s close to real-time, and similar to live text chatting, which is pretty natural as computer interfaces go. I think the keyboard is actually a reasonable input device for language; if you’re a careful typist and decent speller, typos aren’t much of problem. (A system could easily do real-time spell checking too, although we didn’t have time to implement this in Façade.) But, I’m biased; we’ll have to wait for public opinion on this to know if this goodness claim is accurate.

    I agree with Heather, that for developers to begin adding natural language to their games, they will need a robust conversational NLU middleware solution that is easy to plug into their existing game engines, that comes with a really good, broad and robust default library of concepts it can understand, that is easy to extend and customize to a specific domain. I think this is do-able.

    Nicolas, it’s true that a menu of choices could be pruned or sorted to only offer or highlight choices that have narrative relevance. Another interactive story practitioner, Chris Crawford, strongly advocates such an approach. There are two main disadvantages I can think of with that, that perhaps you mention in your paper (I haven’t read it yet, although I have the TIDSE04 proceedings, but my copy isn’t on hand as I write this):

    One, even with a long menu list of relevant things to say, there’s probably no way it will have exactly what I want to say on it. And the longer the list is, to increase the chances that it will contain what I want to say, the more cumbersome it will be to use. Also, as a player, I won’t always want to say relevant things; I’ll sometimes want to act rebellious or crazy, to push past the boundaries, to try to break it. Take that freedom away from me, and I’ll tend to feel straitjacketed. (Anyone read Notes from Underground?)

    Two, an interface that keeps changing on me is probably going to be confusing. If the list of things to say is always in flux, it’s a lot of work for the player to keep up with it. Players are usually happier when the interface is consistent and predictable.

    In an attempt to think outside the box for a moment, here’s a couple of alternative ideas to consider.

    – Is language an unavoidable requirement for interactive experiences to more deeply address the human condition? For example, could there be a purely physical gesture-based interface, as part of an interactive Buster Keaton-style “silent” drama, or perhaps a wordless interactive comic, that still offers players deeper, meaningful expression?

    – Could one design an abstract language interface, for example, allowing players to express themselves in non-literal but still meaningful verbal sounds and utterances, a la the Sims-like speak, also reminiscent of the “wahh wahh wahh” of the parents and teachers characters in the Charlie Brown TV cartoons? A sort of “gestural speaking”? (This is different from Crawford’s notion of creole-like “dramatic sublanguages” for interactive story.)

    —-

    A few possibly useful additional links:

    Just this week, ALICE won this year’s Loebner Prize competition.

    Mark Marino is continuing the discussion on framing chatterbots in new ways, at the post Unconscious Thinking.

    A version of Rob Zubek’s The Breakup Conversation is now available for download. I haven’t had a chance to play it yet, but hope to soon.

    Our recent Façade NLU paper is now online (which by the way won Best Paper at TIDSE04. We also just put online an expanded, book chapter version of an earlier ABL paper (pdf).)

  5. Rob Says:

    I’m joining the thread a bit late as well, but as you know, it’s a topic near and dear to my heart. :)

    I agree that NL interactions are much more natural than menu interfaces – it’s been my experience as well. And I second the criticisms raised so far, especially about the asymmetry of what the player can want to talk about, and what the system actually understands.

    To these concerns, I’d also like to add one more, from the design side. I’ve been becoming increasingly concerned with the problems caused by the layer of abstraction between the surface text and the agent AI – such as a layer of speech acts or other intermediate representations. Not that I can think of a better solution: authoring anything complex without such an abstraction would be completely unfeasible.

    But intermediation reduces transparency. Now errors can occur at either stage of intermediation: both translating surface text into communicative acts, and then using those acts to ‘understand’ the given situation. And when things go wrong, the players gets very little information about what happened.

    Menu interfaces ‘fix’ this and other problems, by removing the first layer – the player operates directly on the level of communicative acts, filtered a priori based on relevance to the situation.

    Still, removal of a naturalistic interface is a drastic step to take just to increase transparency. Especially since humans clearly make the same kinds of errors, and yet we manage to communicate just fine without resorting to pop-up menus. :)

    My bet is on error-recovery: that we have really good recovery and feedback mechanisms. Even if the information is missing or incorrect, we can figure out what it should be based on the situation, and proceed from there giving the other a lot of feedback along the way. Can we build artificial agents that have enough knowledge to recover in human-like ways?

    I know, it’s playing perilously close to the common sense tarpit. :) But my intuition is that, for limited domains such as games, the answer will turn out to be affirmative.

  6. Nicolas Szilas Says:

    About the “menu of choice” solution: I would like to precise that this is not a solution I especially recommand… (a “menu of choice” is only one possibility among all menu-based interfaces). The disadvantages you mention, Andrew, are true, and I would like to comment them:
    – “the longer the list is, to increase the chances that it will contain what I want to say, the more cumbersome it will be to use”: this is a main challenge of these interfaces, make it practical to choose among dozens of possible actions… in the mentioned paper, I call this the “choice problem”. Btw, I made the paper available at:
    http://www.idtension.com (click on “Publications”). This paper tries to discuss these issues with a certain formal approach.
    – “an interface that keeps changing on me is probably going to be confusing”. Absolutely. Being able to anticipate the proposed choices is mandatory otherwise the player cannot properly construct his/her own “possible worlds”, as s/he does for any narrative.

    Rob explains that the “menu-based interfaces” suppresses the surface-text layer, which makes them less natural. I agree. These interfaces are also more ludic (playful), in the sense that the player is playing with explicit possibilities rather than living/exploring/improvising new situations (or, in Roger Caillois’s terms, it is more “ludus” than “Paidia”). Maybe it is not that bad, because the player properly manages the set of possibilities, and play with it. So, I wonder if it would be desirable and possible to combine the two kinds of interfaces: a NLU interface, which would make the player aware of the speech act s/he is allowed to produce… Maybe be by showing the speech acts… I know this idea could sound strange, because it makes the NLU interface less natural, but it might solve problems of classical NLU interfaces.

    To answer Rob’s question: “Can we build artificial agents that have enough knowledge to recover in human-like ways?”: I believe that this has been studied in research on Human-Computer Dialog. I suspect that the practical results are limited, given the fact that games are not such a limited domain, compared to the very limited domain in which these dialog systems are used (tourist information, train ticket reservation, etc.). I guess this is another examaple of “narrow and deep” research which remains not easily applicable to games…

  7. andrew Says:

    A follow-up to my musing about a physical gesture interface — a blurb on Gamasutra mentions an upcoming CNN article about the future of games, in which Will Wright “argues that micro-gestures of hands and fingers, much as used in the current mouse, make for much more interesting game control, and suggests: ‘I think this new spatial/gesture language represents the most probable future of input devices.'”

  8. Ian Bogost Says:

    I’m going to play devil’s advocate a bit here.

    The positions you (Michael and Andrew) put forward here — while relevant and worthwhile — are clearly utterances from deep within your specific design paradigm, namely that of the particular kind of interactive drama to which you have devoted significant time and effort (and to much productive gain, I might add).

    Let me draw attention to two examples that illustrate how your arguments are grounded at least as much in design philosophy as in technology research:

    Michael:
    [NLU] will be necessary to create deeply interactive experiences that, in a sophisticated way, address the human condition.

    Andrew:
    Another way of describing a preference for open-ended text input versus menu interfaces is that the former is more natural.

    The assumption at work here — namely that natural language interfaces are somehow closer to human understanding than other kinds of interfaces, is by no means a foregone conclusion.

    I could start spilling obtuse and decidedly non-“natural” sounding Derrida all over this argument (what about poetic language?), but instead I’ll “gesture” toward my own design philosophy which, while not opposed to NLU, is certainly more focused on representative expression over semantic expression. I have less interest in creating mimetic representations of human interactions than symbolic ones. In such cases, the hard work of human understanding gets left to the human, not by virtue of some missing cog, but by design. Are the (apparent) failings of Fable to be blamed on an ill-conceived input system, or an ill-conceived design methodology (or something else)? I haven’t even seen the game yet so I won’t venture a guess.

    Andrew:
    Could one design an abstract language interface, for example, allowing players to express themselves in non-literal but still meaningful verbal sounds and utterances, a la the Sims… ?

    The objection I’d raise to this interesting challenge surrounds its insistence on verbal sounds and utterances, instead of representational ones, encapsualted ones. Perhaps its my lyric poetry background talking, but I don’t think that we will necessarily find rich computational expression at the corner of Morpheme St. and Phoneme Rd.

    I applaud the excellent use of NLU in Facade. I see many viable uses for it. I find myself personally interested in using it, especially now that Michael and I can collaborate so readily. I understand and appreciate Michael’s position on non-incremental games research. But, sorry guys, I take issue with the thesis that all paths from here to sophisticated representations of human expression pass through a language parser.

  9. andrew Says:

    Ian writes, The assumption at work here – namely that natural language interfaces are somehow closer to human understanding than other kinds of interfaces, is by no means a foregone conclusion.

    Well, I wouldn’t want to make a hard and fast claim that natural language is the most effective interface for interactive experiences about “the human condition” (a term I think we all wish could be improved upon) — but, NL seems a primary one to work towards, from a designer’s point of view, ignoring for a moment the technical hurdles. That is, if you’re going to create an interactive experience about people as individuals, it makes sense to create an interface to allow players to interact as individuals normally do, with language, face to face.

    However, if you’re going to create an interactive experience about people as groups or societies, such as politically-oriented games, or games where you play the role of a god overseeing a community (e.g., the Sims, Black and White), then a higher-level interface is more appropriate. Political decision makers and gods make high level decisions, they don’t primarily have conversations with individuals. (Perhaps we wish our politicians and gods had more conversations with us individuals, but they don’t.)

    It’s not specific enough to say “create adult experiences containing rich characters addressing complex themes” — we need to further say how “zoomed in” on the people are these interactive experiences meant to be. So, Ian, I think you’re right, there are paths to achieve the broad concept of “sophisticated representations of human expression” without NL.

    On the idea of “gestural speaking” — it’s intended to be a solution for individual-to-individual communication, easier to implement than full-on natural language, yet still meaningful and effective.

    —-

    Rob wrote, But intermediation reduces transparency. Now errors can occur at either stage of intermediation: both translating surface text into communicative acts, and then using those acts to ‘understand’ the given situation. And when things go wrong, the players gets very little information about what happened.

    Well, it seems to me that overly-simplistic- or mis-interpretations of surface level communication happens normally in people all the time. I don’t have a problem with AI’s building intermediate, simplified / abstracted interpretations of the player’s utterances, with errors in those abstractions. But you’re right that the system could give much more feedback about exactly what its interpretation is. In an attempt to de-muddify one of the messier parts of our drama, before reacting, we have our characters first say a line back to the player stating how they interpreted what the player just said. E.g.,

    PLAYER
    You’re so pessimistic!

    GRACE
    Rob, you — you think I’m depressed? Well, yes, considering all I’ve been through in this marriage, it’s no wonder I can’t get out of bed.

    At this point in the drama, from her pool of available behaviors, Grace has one designed to respond to the discourse act “player thinks grace is depressed”. But we don’t have a response specifically about being pessimistic. We’ve necessarily mapped pessimistic (and several other related, plausible concepts, such as “stressed”) into the concept of depressed. This is imperfect. But Grace’s initial line at least tells the player how she has abstracted/interpreted what they said. And in fact, this act of abstraction/interpretation could be seen as interesting content, because after all, it’s behavior — it says something about Grace that she considers pessimism a depressing idea. One could imagine doing a different mapping for a different character.

    Your suggestion of more error-recovery is also very important, of course.

    Nicolas, your thought of always showing the system’s underlying interpretations, as a way to clarify what’s going on, would be kind of interesting. I could imagine an interactive drama where you have an ability to mind-read the characters’ thoughts, to see fully see how they’re interpreting and reasoning over what you’re saying to them. Could be fun in the same way the a-life game Creatures allowed players to see inside the metabolism and (simple) neural-net of fantasy virtual pets.

    —-

    On the topic of NL, I’ll throw in a link to what looks like a book worth checking out from MIT Press: “In Ontological Semantics, Sergei Nirenburg and Victor Raskin introduce a comprehensive approach to the treatment of text meaning by computer. Arguing that being able to use meaning is crucial to the success of natural language processing (NLP) applications, they depart from the ad hoc approach to meaning taken by much of the NLP community and propose theory-based semantic methods. …”

    Also, this new book looks good (slightly off topic, but worth a mention): The Turing Test: Verbal Behavior as the Hallmark of Intelligence, edited by Stuart Shieber.

  10. michael Says:

    Andrew, I like the distinction of a “zoomed-in” experience to distinguish when NLU may be particularly appropriate! Yes, for god and sandbox games, which can address the human condition, natural language is not necessarily the way to go. But I was so strong in my original post because currently, in the game industry, there’s a belief that natural language is generally inappropriate for any kind of game, a belief that will hold back the development of interactive drama (which is a “zoomed-in” experience) as a genre.

    I bought The Turing Test by Stuart Shieber a while ago. It’s a nice essay compilation of philosophical responses to the adequacy of the Turing Test as a measure of (really a definition of) intelligence.

    I haven’t read Ontological Semantics, but the reference to “theory-based semantic methods” immediately makes me nervous about the applicability of the methods to broad, complex domains.
    Based on my knowledge of other formal approaces to NLP, “theory-based” generally means either that you can prove nice theorems about the formalism, or that the formalism is strongly based on some cognitive theory of language processing, or that the theory, at a high-level, claims to capture the totality, the essence of language. But such formalisms, when implemented as computer programs, tend to only work in small, clean micro-domains (e.g. blocks world, or the world of making airline reservations), not in broad, scruffy, complex domains (e.g. the world of office politics, the world of a marriage falling apart).

  11. Ian Bogost Says:

    Michael and Andrew — Don’t get me wrong. I don’t even really disagree with you. The idea that natural language is inappropriate for any kind of game is a silly sentiment, clearly motivated by more-of-the-same market goals more than innovation. And I was serious about my personal interest and belief in the future of NLU. Just a couple weeks ago, Michael and I were looking at Animal Crossing on my home GC and noting that one of the design flaws of that game has to do with its failure to implement any sort of NLU for the written missives that are in some ways central to that game experience.

    What I’d push back on is the idea that NLU is the necessary apotheosis of an innovation strategy for such experiences. Certainly it is one extremely viable strategy, and an especially viable one (perhaps indeed a requisite one) for the kind of interactive drama you are interested in.

    However, I do not think that game experiences that address the human condition without NLU are necessarily limited to god and sandbox games. I think we may be hitting a genre wall here; to my ears, your claims sound a bit like a filmmaker telling a poet that his form is incapable of creating credible representations of complex human experience. I know this isn’t what you intend, but it seemed a worthwhile point to bring up in this casual forum. Perhaps it’s just a question of implementation strategy, and in order to have those problems I agree that we need to commit to the technology challenge of NLU as a long-term problem.

  12. Grand Text Auto » Pictures from the Phront Says:

    […] ons (which I may blog about in more detail at a later time), included the pros and cons of natural language vs. artificial languages, discrete space (s […]

  13. Grand Text Auto » New Interactive Drama in the Works (Part 3): NLU Interfaces Says:

    […] d or ignored by the game. But there are many drawbacks with context-dependent menus, that we’ve blogged about before. A major one is that the p […]

Powered by WordPress