September 17, 2004
In contemporary commercial game design, natural language interaction is avoided like the plague. If the player needs to “talk” to characters in the world, designers typically employ menus (either dialog trees containing explicit dialog, or flat dialog action menus containing actions such as flirt, insult, etc.) or simply can the entire conversation by providing a talk command. Barring occasional experiments with limited speech recognition (e.g. Lifeline, Seaman, Babyz), developers are skeptical of natural language understanding (NLU), remembering the frustrations of the well-known parser failures of text-based interactive fiction, and noting that NLU requires human-level AI to solve in the general case.
Ultimately, however, in order to create adult experiences containing rich characters addressing complex themes, games will have to use language, and thus will have to tackle NLU. Players will want and need to communicate a large set of possible meanings to the characters (and of course the characters, as well as the large scale structure of the game, should be responsive to those meanings). Any explicit choice approach to conveying this large range of meanings (e.g. dialog menus, discourse act menus, constructive interfaces that let you put together sentences out of parts) introduces a number of problems, including foregrounding the boundaries of the experience (the player immediately sees the full range of possibilities), making all choices appear equally salient, and making action selection unwieldy (and potentially unmanageable).
NLU interfaces avoid foregrounding the game boundaries because, rather than displaying an explicit list of all the possible meanings (game verbs) the player can perform, the player instead expresses what they want, in their own words, “selecting” an utterance out of the pool of all possible natural language utterances. Ideally, through gameplay, the player discovers the action potentials of the game; the space of possibility feels wide-open, and, in the best case, the illusion is created of a space of possibility much larger than is actually supported by the game. Additionally, when characters respond to the specific language the player writes, as opposed to responding to a menu choice, it makes the characters seem more alive, as if they really understand you. Of course the strategy of creating these illusions is risky. When the NLU fails to recognize an input, or the design fails to plausibly support an action recognized by the NLU, the illusion comes crashing down; rather than feeling wide-open, the possibility space instead feels finite but unknown (frustrating) and the characters may seem even more mechanical than if NLU wasn’t used. Thus, using NLU in games involves not only clever technology, but also clever design that can mask NLU failures, e.g. strategies such as:
- situations in which characters may plausibly ignore some player utterances
- action trajectories and through-lines in which action can plausibly continue when an utterance is incompletely recognized
- hierarchical game verbs such that, when the NLU fails to recognize an utterance as a specific game verb, there are more general or generic verbs it may succeed in recognizing, where the character and game world responses to these more generic verbs still maintain plausibility and interest
And you never want to say “I don’t understand”.
Fable, Peter Molyneux’s new game, illustrates the issues of equal saliency and large hierarchical menus. (Note that these comments are based on watching Peter demo Fable at GDC; I’d love to hear comments about the action interface from those who’ve actually played Fable. ) Fable makes available to the player a large number of possible actions, including discourse acts. The player navigates hierarchical menus in order to select from this large set (the player can bind controller buttons to frequently used choices). In a laudable effort to increase player agency, and in the design style of open-world games, the large number of choices is available at all times. So, for example, the player can choose to fart at all possible moments during the game (at the two different presentations of Fable I saw, Peter took great pleasure in demonstrating farting, commenting on the British propensity towards fart humor; I have repeated the “fart demo” story a number of times, including here, revealing my own unnatural interest in farting). Now, though it is certainly possible to fart at all possible moments, it is often not the most salient or important player action in many gameplay situations. But, in an explicit choice scheme, the fart action always looks as important, or salient an option as, say, giving another character an order, handing a character an object, and so forth. In a sense, the interface is constantly reminding you “you could fart now, you could fart now, you could fart now”, and of course reminding you of many other actions as well, most of which aren’t particularly important in any given gameplay situation. An NLU interface supports a large potential action space, allowing all actions at all times, without falsely presenting the actions as all being equally important at all times. Where an explicit choice scheme will tend to encourage a “poking” gameplay style, where the player pokes at the world by picking actions off the menu (e.g. “what happens if I fart here … how about here … how about here …”), an NLU interface encourages the player to consider “What do I want to do next?” (and if the player wants to poke, they can still poke).
Unfortunately, from an input perspective, while large hierarchical menus are ungainly to use, there is currently no good interface for inputting natural language either. Within real-time games, there are difficulties integrating the pace of typing with the pace of real-time action. In Façade, our strategy is to use “pseudo-real-time” for conversation; the characters talk at a normal speed, but when the player types, the characters act as if the player is talking at conversational speed, pausing and looking at the player attentively, even though the length of time the player may be typing is much longer than typical conversational time. And real-time issues aside, a keyboard seems inappropriate as an input device, (i.e. typos, many players can’t type quickly, trying to pay attention to typing and what is going on in the game world is difficult), but speech recognition is nowhere near good enough yet.
However, despite the technological, game design and input device difficulties of using natural language, NLU will be an important approach in the future of gaming, and in fact will be necessary to create deeply interactive experiences that, in a sophisticated way, address the human condition. Making NLU work in games is an example of non-incremental game research; if we don’t start building experimental games with NLU now, we’ll never have NLU for commercial games.