February 27, 2008
The finite-state machine is much like the quest flag or the dialogue tree. Each is conceptually simple, easy to implement, places low demand on system resources, and — over a certain level of complexity — becomes difficult to author and prone to breakdown. A quick look at the structure of FSMs shows the reasons for this.1
An FSM is composed of states and rules for transitioning between states. For example, an FSM could describe how to handle a telephone. In the initial state, the phone is sitting on the table. When the phone rings, the FSM rules dictate a transition to picking the phone up and saying “Hello.” If the caller asks for the character who answered, the rules could say to transition to a conversation state. If the caller asks for the character’s sister, the transition could be to calling the sister’s name aloud. When the conversation is over (if the call is for the character who answered the phone) or when the sister says “I’m coming” (if the call is for her) the phone goes back on the table.
As promised, all of this is quite simple. But what happens when the situation becomes more complex? We might add caller ID, after which the character needs to check who the call is from before answering (and sometimes not answer). We might add call waiting, transitioning to the “Hello” state from the conversation state (rather than from the phone on the table state) and then running a second conversation that, when it ends, returns to the previous conversation state (rather than putting the phone down on the table). We might add a check to see if the building is on fire, in which case the character asks for help from anyone who calls (or perhaps only from local callers). We might add a check to see if the character is deep in conversation with another physically-present character, in which case the ringing phone leads them to say, “Let’em leave a message” (rather than transitioning to the “Hello” state or the caller ID check). Now imagine all of the character’s behavior organized into a giant, interconnected mass — covering phone conversations, in-person conversations, eating and drinking, sleeping, working, playing games, reading, visiting museums, getting into traffic accidents, all the possible transitions between them, and all the possible combinations (playing a game while drinking, getting into an accident while talking). While authoring such an FSM would be possible, in practice the complexity would be a huge challenge to manage, and any change would threaten to produce unexpected results.
For this reason, game behavior authors have sought to manage FSM complexity in different ways. Jeff Orkin, in his work on the game No One Lives Forever 2 (Hubbard et al, 2002), gave characters sets of goals that compete for activation (Orkin, 2006). Each goal has its own FSM, of a much more manageable size than an FSM for controlling the character’s entire behavior, and the rules for transitioning between these goal-specific FSMs are decoupled from the FSMs themselves (instead handled at the level that determines the current highest-priority goal). Orkin describes the results for non-player characters (referring to them, in game industry parlance, as “A.I.”).
[E]ach goal contained an embedded FSM. There was no way to separate the goal from the plan used to satisfy that goal. . . .
Characters in NOLF2 were surrounded by objects in the environment that they could interact with. For example someone could sit down at a desk and do some work. The problem was that only the Work goal knew that the A.I. was in a sitting posture, interacting with the desk. When we shot the A.I., we wanted him to slump naturally over the desk. Instead, he would finish his work, stand up, push in his chair, and then fall to the floor. This was because there was no information sharing between goals, so each goal had to exit cleanly, and get the A.I. back into some default state where he could cleanly enter the next goal.
Of course, adding more states and transitions to the Work FSM could have addressed this. But pursuing such a path in general would have raised the same complexity issues that motivated the decision to compartmentalize the NOLF2 FSM for each goal. The problem lies in the compartmentalization strategy itself as a method for addressing the complexity of character behavior.
Ken Perlin, who pioneered computer graphics based on procedural textures and animations, describes the consequences of a similar problem in a rather different sort of game — The Sims (Wright et al, 2000):
Playing The Sims is lots of fun, but one thing conspicuously lacking from the experience is any compelling feeling that the characters are real. Much of this lack comes from The Sims’s reliance on sequences of linear animation to convey the behavior of its characters. For example, if the player indicates to a Sims character that the character should feed her baby, then the character will run a canned animation to walk over to the baby’s bassinet, pick up the baby, and make feeding movements. If the player then tells her to play with the baby, she will put the baby down, return to a previous position, then begin the animation to approach the bassinet, again pick up the baby, and start to play. One result of this mechanical behavior is that there is no real possibility of willing suspension of disbelief on the part of the player as to the reality of the character. (2004)
In The Sims the problem comes from something other than the logic of FSMs. Rather than animations driven by FSMs that are segregated according to character goals, in The Sims actions and their animations are compartmentalized via smart objects (e.g., a “shower object” contains information about its impact on the Sim and how it is used, including a pointer to the animation involved). This is another approach to managing complexity using a simple structure with limited points of interconnection — one which I will discuss further in a later chapter. This form of compartmentalization made it possible to manage the complexity created by the massive number of expansion packs for The Sims (each of which introduced many new objects) but also, as Perlin observed, resulted in breakdowns similar to those seen in NOLF2.
Avoiding these compartmentalization-driven breakdowns requires an approach to managing complexity that doesn’t isolate each action from the rest of the world. As it happens, I have already outlined such an approach: Tale-Spin’s. Each planbox in Tale-Spin is able to access a shared memory space — and a character only attempts to alter the world to suit planbox preconditions if those conditions aren’t already met. So, for example, a Tale-Spin planbox for playing with a baby might have a precondition of holding the baby. If the Tale-Spin character is already feeding the baby, rather than returning to a neutral state (which involves putting down the very same baby), the planbox would check memory, see the precondition was already met, and move on to the next step.
Tale-Spin was far from alone in employing planning based on actions, with preconditions checked against a central memory, intended to move the world from the current state to a goal state. This was already a widely practiced AI technique in the 1970s, one that “scruffy” AI took in a particular direction, rather than an approach initiated by Schank and Abelson. An earlier form, developed at the Stanford Research Institute, formed the basis for Orkin’s response to the character behavior and authoring problems of NOLF2.
Strips and F.E.A.R.
In 1968, Douglas Engelbart’s Augmentation Research Center — at the Stanford Research Institute (SRI) — stunned the computing world with a demonstration that combined the first public showings of the mouse, hypermedia, and teleconferencing (Engelbart and English, 1968). The research that followed this “mother of all demos” laid the foundations for the modern computing environment. Meanwhile, Engelbart’s colleagues in SRI’s Artificial Intelligence Center were pursuing research that, while it shaped the agenda for decades of work in AI, isn’t nearly as well known or recognizable in our daily lives. This project involved an aptly-named robot called “Shakey” — a small wheeled cart, carrying a boxy set of control systems, topped by a tower holding range finders, a television camera, and a radio link antenna. An SRI promotional film from that time shows Shakey moving around a brightly-lit space of right-angled walls populated by two shapes of blocks, its tower bouncing from the sudden stops at the end of each action, while Dave Brubeck’s “Take Five” repeats in the background (Hart, Nilsson, and Wilber, 1972).
Shakey’s antenna allowed it to connect with what, for the time, was a large computer. This computer controlled Shakey using a program called Planex. This program, in turn, guided Shakey through the execution of plans developed by another program called Strips (for Stanford Research Institute Problem Solver). Strips carried out laborious planning processes developed using stored knowledge about Shakey’s physical environment. Planex used a basic representation of the plan and the reasoning behind it to attempt to guide Shakey to the goal state, addressing problems along the way. These problems included finding doors blocked, blocks moved, and registration mismatches between the actual world and the world model in memory (the possible range of which increased with every movement).2
Rather than something like Schank’s conceptual dependency expressions, the Strips system stored the state of the world using first-order predicate calculus. This is a type of formal logical representation that, among other things, allowed Strips to use theorem proving in its planning operations — for example, to see if an action’s preconditions had already been met. Rather than searching exhaustively for a set of actions that could move the world from the current state to the goal state, Strips instead — following a “means-end analysis” strategy widely used since the seminal General Problem Solver (1957) — worked backwards from a goal state. This backwards planning movement depended on a library of actions, each classified according to its preconditions and effects. Strips identified actions with effects that could move the world to the goal state, the preconditions of those actions were identified as subgoals, other actions were found to achieve the subgoals via their effects, the preconditions for those actions were identified, and so on. As one of the system’s architects, Richard Fikes, puts it:
When STRIPS finds an acceptable plan, it has computed the world model anticipated before and after each action in the plan, has proven that the preconditions of each action in the plan are true in the model anticipated at the time of the action’s execution, and has proven that the task statement is satisfied in the model anticipated after completion of the plan’s execution. (1971)
The basics of this proof, especially the preconditions and effects of each action, were the materials used by Planex to guide the movements of Shakey. But the Strips model of planning has also been adopted by very different systems, in which the roles of Planex and Shakey disappear or are transformed significantly. An example is Orkin’s system for character behavior that took the next step after NOLF2: the AI system for F.E.A.R.
F.E.A.R. (Hubbard et al, 2005) is a first-person shooter in which the player is a new member of an elite unit — “First Encounter Assault Recon” — assigned to deal with unusual threats. Like non-player characters in NOLF2, the AI enemies in F.E.A.R. have goals that compete for activation, such as KillEnemy, Dodge, and Goto. But rather than each goal having an embedded FSM, each goal can be reached by sequencing actions. As in Strips, these actions have preconditions and effects, and the effects satisfy goals (e.g., AttackFromCover and AttackMelee both satisfy the KillEnemy goal).
This may seem like a simple change, but it has a powerful impact. Because different actions can be chosen to satisfy goals — and the state of the world (through preconditions) can influence the choice of action — a character at a desk could choose an action to satisfy a Death goal that is appropriate to someone sitting. Similarly, the fact that actions are not compartmentalized makes it easy to author new logical conditions on actions. Orkin describes an example of this:
Late in development of NOLF2, we added the requirement that A.I. would turn on lights whenever entering a dark room. In our old system, this required us to revisit the state machine inside every goal and figure out how to insert this behavior. This was both a headache, and a risky thing to do so late in development. With the F.E.A.R. planning system, adding this behavior would have been much easier, as we could have just added a TurnOnLights action with a LightsOn effect, and added a LightsOn precondition to the Goto action. This would affect every goal that was satisfied by using the Goto action.
Orkin also made certain changes to the Strips model. One involves costs for actions. Rather than simply choosing the shortest chain of actions toward a goal, F.E.A.R. chooses the lowest cost. Making the generic Attack action more costly than the two actions GotoNode and AttackFromCover causes AI enemies that have available cover prefer the safer option (Orkin, 2005).
F.E.A.R. also differs from Strips in the presence of squad behaviors. These look for AI characters to fill slots in coordinated plans. If they preconditions are met, the AI characters are given goals that cause them to fulfill their roles — for example, one giving support fire while the others advance to cover that is closer to the player. In addition, and this is where F.E.A.R. shines as an authoring effort, squad dialogue behaviors are used to address the Tale-Spin effect.
Rather than a character behavior system that only communicates to the player through animation (leaving open the possibility that the player will assume the system is as simple as that employed in other games) F.E.A.R.’s enemy squads are in verbal communication with each other in a manner that exposes some of the internal state and workings of the system. When I played F.E.A.R.,3 I knew the AI characters had lost track of my position when one asked, “Do you see him?” and a another replied, “Quiet down!” When I’d hidden particularly well I could hear them ordered, first, to search the area — and, eventually, to return to their posts. Dialogue also reveals planning-developed decisions about heading for cover, advancing, and other activities. It even explains non-activity, as Orkin describes:
If an A.I. taking fire fails to reposition, he appears less intelligent. We can use dialogue to explain that he knows he needs to reposition, but is unaware of a better tactical position. The A.I. says “I’ve got nowhere to go!” (2006)
In my experience, Orkin’s adaptation of 35 year old AI techniques was a great success. By comparison, the AI enemies in even the best games released around the same period (such as Half-Life 2, 2004) feel like part of a shooting gallery — more moving parts of the landscape than characters. That this was accomplished with limited development resources demonstrates the power of even quite dated AI approaches for modern digital media authoring.
But the experience appears to have left Orkin looking for something more. After F.E.A.R. he left industry for graduate study at MIT’s Media Lab. In his 2006 paper on F.E.A.R. he describes the goal of his MIT research group as creating “robots and characters that can use language to communicate the way people do.” And it’s very unlikely that people operate in a manner at all like Strips or F.E.A.R.
Critiques of Strips and F.E.A.R.
In his dissertation’s discussion of Tale-Spin, James Meehan addresses Strips as an important piece of prior work in AI. But he also critiques it severely, arguing that it is a very unlikely model for human cognition. For example, Meehan points to the way that Strips stores information about the world using first-order predicate calculus. This represents all facts about the world as formal logical expressions, with “reasoning” about those facts performed using theorem proving techniques. Meehan and other scruffy researchers argued strongly — with some experimental data for support — that humans don’t remember all facts about the world in the same way or reason about them all using the same approach. Instead, scruffy researchers argued that human intelligence operates in different ways within different problem domains.
Even if some might disagree with Meehan’s critique of Strips, few would argue that F.E.A.R.’s planning uses a model that matches human memory. In F.E.A.R., rather than first-order predicate calculus, plans consult a model of the world composed of a fixed-size array (essentially, a data box divided into a pre-determined number of smaller compartments, all accessible at any time). This makes checking preconditions much faster, but has no psychological credibility.
Well after Meehan’s work, the 1980s and 90s brought fundamental critiques of the Strips approach from new directions. Phil Agre, in Computation and Human Experience (1997), provides a careful overview of the functions of (and relationship between) Strips and Planex before launching into a discussion of the recurring patterns of technical difficulty that will plague systems of this sort. The problems arise from the fundamental distinction between making plans and following plans — a recapitulation of the mind/body split of Descartes, and another kind of action compartmentalization — which is reified in the distinction between Strips and Planex but in the background of almost all work on planning.
Agre points to 1980s work of David Chapman, demonstrating that “plan construction is technically tractable in simple, deterministic worlds, but any non-trivial form of complexity or uncertainty in the world will require an impractical search” (156). In other words, traditional planning only works in microworlds, such as the Aesop’s fables world of Tale-Spin or the simple geometric spaces of Shakey. Agre proposes that, rather than being based on plans, “activity in worlds of realistic complexity is inherently a matter of improvisation” (156).
How might a Strips-like system be more improvisational? It seems clear that Agre would remove the boundary between Strips and Planex. He also quotes Strips authors Fikes, Hart, and Nilsson as follows:
Many of these problems of plan execution would disappear if our system generated a whole new plan after each execution step. Obviously, such a strategy would be too costly. (Agre, 1997, 151)
But it might not be too costly with modern computational power and a simplified world representation. In fact, improvisation of this sort is close to how planning operates in F.E.A.R. Plans are very short term — with each “goal” essentially being the next compound action (e.g., getting to cover, using the lowest-cost set of one or more actions to do so). If the plan is shown to be flawed in the course of execution (e.g., by the player invalidating the cover) F.E.A.R. doesn’t attempt to recover the plan via Planex-style compensations. Instead, a new plan is generated.4 F.E.A.R. can do this — and do it in real time, even with most of the available computational power claimed by game graphics — precisely because it is a microworld. The approach wouldn’t work in our everyday world of complexity and contingency. But, of course, all games are authored microworlds.
Given this discussion, we might still critique F.E.A.R. For example, its ambitions — improving the combat-centric behavior found in first-person shooters — are rather limited, in the broad spectrum of digital media. For these sorts of characters to play fictional roles beyond combat would require much more complexity. Specifically, these characters would need to be able to engage in multiple behaviors simultaneously and work toward both long-term and short-term goals. I will discuss this further in a later chapter, in connection with the characters of the Oz Project.
Yet the fact remains that F.E.A.R. demonstrates the power of traditional AI approaches as authoring tools. It also shows that many of the most damning critiques of these traditional approaches fall away when they are employed as authoring tools for games on modern computing hardware, rather than as research into systems for intelligent action in the everyday world. The only critique that remains — which, in fact, gains even more traction — is that these approaches lack validity as models of human cognition. As discussed in an earlier chapter, I believe this matters only to the extent that insights into human intelligence are what we hope to gain from our work with computational models. And this was precisely the goal of a major story system that followed Tale-Spin.
1The basic FSM structure can be implemented in a number of ways, some more efficient than others, in a manner than embodies the same operational logic.
2Something like the Strips/Planex distinction between plan formulation and execution was also found in Tale-Spin. For example, MTrans and PTrans had “action module” versions (Do-MTrans and Do-PTrans) that did additional work not contained in the versions used for planning (Meehan, 1976, 43). As Meehan puts it, “You don’t worry about runtime preconditions until you’re executing the plan. If I’m planning to get a Coke out of the machine upstairs, I worry about having enough money, but I don’t worry about walking up the stairs until I’m at the stairs” (41).
3I played the Xbox 360 version (Hubbard et al, 2006).
4In fact, Orkin writes of F.E.A.R.’s approach, “the biggest benefit comes from the ability to re-plan.”