March 14, 2008
The Oz Project at Carnegie Mellon University — led by Joe Bates from its inception in the late 1980s — has an unusual distinction. While Tale-Spin and Universe could be considered outliers among software systems for the fact that both are widely credited with outputs they did not produce, the Oz Project may be the only computer science research project most famous for an experiment that did not require computers. This was an experiment in interactive drama, carried out with a human director and actors, but aimed at understanding the requirements for software-driven systems (Kelso, Weyhrauch, and Bates, 1993).
The dream of interactive drama is often traced back to Brenda Laurel’s 1986 dissertation — which in turn draws on a series of memos and articles she wrote, starting in 1981, while working at Atari (first as a software marketing manager, then as a research scientist).6 After the dramatic rise of Atari’s research lab, only to be disbanded after the 1980s crash of the video game market, Laurel’s dissertation was positioned as a feasibility study for a particular version of one of the “grand ideas” that motivated the lab’s work under the leadership of computer pioneer Alan Kay. Writ large, the idea is of a fantasy amplifier. Laurel’s more focused version was cast as a system for first-person interaction (as a “user-character”) with system characters and a world model, both under the control of a “playwright” expert system.
Though the screenwriters of Star Trek: The Next Generation would later provide vivid dramatizations of interaction with such a system — in the form of the “holodeck” — the Oz Project experiment is famous because it offers some potential insight into what it might be like to experience unscripted interaction with such a system. The experiment’s setting was far from the lush renderings of the holodeck, taking place in a studio theater, with boxes, chairs, and a table as the only set. These were meant to suggest a bus station, which was populated by three characters, each played by trained actors wearing wireless headphones: an unhelpful clerk, a nearly-blind passenger, and a larcenous young man (described as a “punk”). The headphones allowed an off-stage director, assisted by a “plot graph” of possible events, to give directions to the actors in order to maintain the overall flow of the drama.
In each run of the experiment, an interactor was given the instruction to buy a bus ticket to attend a relative’s funeral. But this was a classic misdirection. The ticket-buying was delayed by the clerk, and then the interactor was distracted by a request for help from the other passenger, and during this time the young man’s requests for money escalated from panhandling to knifepoint robbery. As things came to a climax, the young man cut the clerk’s phone line and the clerk offered the interactor a gun. In the most famous run-through, the interactor fired the gun in the air, from behind the young man, at which point the actor portraying that character — not waiting for a cue from the director — dramatically fell to the ground.
Interactors felt that the experience was incredibly engaging, with one reporting that it “escalate[d] from normal, to annoying, to out of control in a matter of seconds.” Those observing the experiment from the outside, on the other hand, felt that the action lagged — and even reported boredom to the point of losing track of events. From this Bates and his colleagues concluded two things. First, most obviously, that engaging pacing for an interactor, immersed in an experience, is quite different from what it would be in a traditional media experience. Second, more theoretically, they took the success at engaging their interactors as a confirmation of their basic design philosophy for interactive drama: placing the audience member as an interacting character within the drama (an interactor), creating expressive and relatively independent non-player characters within the same environment, and guiding the higher-level actions of the non-player characters through the interventions of a drama manager tasked with adjusting pacing for the interactor and guiding the story to a successful conclusion.
However, in practice, one element of this design philosophy received the lion’s share of the attention in the Oz Project’s system-building activities: the creation of interactive characters. This was the focus of their early-1990s graphical system (Edge of Intention, featuring animated “Woggles” reminiscent of Dr. Seuss, figure 1.5) and their earlier textual system (Lyotard, building a relationship with a standoffish simulated housecat). It was also the focus of the web-based system with which their work re-emerged in the year 2000 (OttoAndIris.com, playing games with anthropomorphized letters) after most project members left CMU in the later 1990s to found the company Zoesis. Unlike the short-lived, combat-focused characters of F.E.A.R. — or the micro-managed, attention-deficit characters of The Sims — these characters needed to have both long-term and short-term behaviors, with some behaviors supporting others, multiple behaviors able to take place simultaneously, and some reasonable method of choosing what to do at any given time. The Oz Project’s system for authoring such character behavior was called Hap.
Figure 1.5: The Woggles in The Edge of Intention.
The Oz work in this area was developed in the context of ongoing computer science work in intelligent agents, a branch of AI encompassing a wide variety of software exhibiting behavior that could be seen as that of a semi-independent entity. More specifically, during the years of the Oz project a number of research labs were working on “believable” agents — created with the idea of media in mind. Some groups focused more on the internal structure of behaviors and activity (working outward toward animation) while others focused first on new animation techniques that would enable more expressive characters (working inward toward higher-level behavior). At the time, I was working at New York University, in the lab where Ken Perlin (in collaboration with Athomas Goldberg and others) led an animation-originated interactive character project called Improv (Perlin, 1995; Perlin and Goldberg, 1996).
Despite their differing directions of movement, the Improv system shared with Hap a focus on careful hand authoring of each character. Other systems were more invested in various sorts of correctness and generality.7 Improv’s characters felt remarkably alive, compared with other real-time rendered characters of the period. This impression was created by tuning the probabilities and rates of animations, layering animations, and smoothly transitioning between animations for actions such as eye blinking, looking around the room, gesturing, approaching or avoiding moving characters and objects (perhaps controlled by the audience), and speaking (with each other or the audience). The results were engaging animated experiences, when combined with something to move the scenario forward — ranging from a human choreographer directing a graphical dancer’s movements in response to improvised music (in the 1994 Danse Interactif, figure 1.6) to a set of probabilities tuned to always produce a slightly different version of a linear set of events (in 1998’s browser-based Sid and the Penguins). The technology was spun off into a company in 1999, headed by Goldberg, aimed at taking the expressive animation techniques to a wider audience. The project as a whole created convincing demonstrations of engaging characters performing flexible behavior, but never the more autonomous actors imagined in the dream of interactive drama.
Figure 1.6: The live choreography interface for Danse Interactif.
Hap, on the other hand, was designed to let another system handle animation — or have no animation at all, as in the Lyotard project. Instead of visual appearance, it was focused on characters that could act appropriately and autonomously in a fictional world, and do this based on goals and behaviors crafted by authors for that character. These goals and behaviors may sound similar to the building blocks of Strips plans, but the lead designer of Hap, Bryan Loyall, draws a sharp distinction:
Unlike some notions of goals in artificial intelligence, Hap goals have no meaning in and of themselves. They are not planned for, and there is no grammar or logic in which the system can reason about them. Instead they are given meaning through the agent builder’s vision of what they mean, and the behaviors written to express that vision. (1997, 36)
Instead of Strips-style reasoning, in a manner similar to the planboxes of Tale-Spin, each goal has an author-defined set of possible behaviors that may satisfy it. These behaviors may be made up of sequential steps, or steps that can be pursued in any order (including in parallel). For example, the steps in a “wink” behavior are sequential — an exaggerated close of the eye, and a pause, followed by opening the eye. But the steps in a “gather what I need before leaving the house” behavior can be ordered depending on circumstances — the car keys can be picked up before or after the jacket.
As suggested by the differing levels of specificity between winking and item-gathering, Hap behaviors can launch sub-goals and sub-behaviors, finally grounded when they produce a set of primitive mental actions and physical actions (the physical actions making the connections to the surface presentation of the Hap character). All of this is organized in a current behavior tree, which at each step either expands something (a behavior or sub-goal) or executes something (a mental or physical action).
Goals, actions, and behaviors can succeed, fail, or abort — based on what happens in the world. This propagates up and down the tree, so that a decision to abort going to work would remove sub-behaviors such as gathering objects for the trip. Similarly, failing to find the keys might fail the higher goal of going to work, or at least of doing so via the character’s car, depending on how the behavior is constructed. Loyall also presents further authorial tools in the form of preconditions on situations in which behaviors can be chosen, more-specific behaviors for particular circumstances, and “demons” that wait for something in the state of the world to be true before executing behaviors. Finally, an algorithm is used to select which part of the tree to act on for each step, giving precedence to higher-priority actions and otherwise preferring to continue expanding and executing the same part of the tree.
Hap is a powerful tool for defining character behavior. Oz characters such as the Woggles were able to move around, engage in solo behaviors, interact with each other, and interact with the audience. In some ways they were quite successful, widely exhibited and discussed — even if Bates, remarkably candidly, also reports that one of their most memorable behaviors occurred as an outgrowth of system errors.8
However, in this first attempt to use Hap to create a graphical experience, a larger problem also presented itself. Phoebe Sengers, an Oz Project PhD student, reports this in her dissertation’s description of building the Woggles:
Following the Hap design strategy, we first built a set of high-level behaviors such as sleeping, dancing, playing follow-the-leader, moping, and fighting. Each of these behaviors was reasonably straightforward to implement, including substantial variation, emotional expression, and social interaction. Watching the agents engage in any of these behaviors was a pleasure. Then came the fatal moment when the behaviors were to be combined into the complete agent. This was a nightmare. Just combining the behaviors in the straightforward way led to all kinds of problems . . . (1998, 38)
Some of the problems might have arisen from a bug in the way conflicting goals were handled. For example, the Woggles would try to engage in two incompatible behaviors simultaneously, such as “fight” and “sleep.” But there is another set of problems that Sengers diagnosed as closer to the core of the Oz approach to characters — a problem that also plagued the agent-based projects of many contemporary groups, such as the simulated dog (Silas) and computer game-playing penguin (Pengi) built by groups at MIT.
The Woggles would jump from behavior to behavior, never settling on one long enough to be comprehended by the audience. They would get stuck in loops, switching back and forth between a pair of behaviors. In this way, the problem of compartmentalized actions — discussed earlier in relationship to FSMs — reared its head in the context of advanced behavior systems. Under deadline pressure, the Oz group sacrificed the key feature of having Woggles that could interleave behaviors from multiple high-level goals. Sengers describes the field-wide problem this way:
[A]lternative AI agents generally have a set of black-boxed behaviors. Following the action-selection paradigm, agents continuously redecide which behavior is most appropriate. As a consequence, they tend to jump around from behavior to behavior according to which one is currently the best. What this means is that the overall character of behavior of the agent ends up being somewhat deficient; generally speaking, its behavior consists of short dalliances in individual, shallow high-level behaviors with abrupt changes between behaviors. It is this overall defective nature of agent behavior, caused by under-integration of behavioral units, that I term schizophrenia. (40)
Improv animations were able to avoid this problem for a number of reasons. First, visual presentation of behavior transition and combination was a major area of system research and demonstration, so abrupt transitions were rarely in evidence. Even closer to the heart of the matter, Improv projects were, generally, either small-scale behavior demonstrations or built with a narrative through-line embedded in their scripts. More autonomous agents required a different solution. Sengers pursued this through a change of perspective.
For Sengers, the problem with Hap characters grows out of their continuities with traditional AI. The issue is not that a behavior-based architecture is inappropriate. Rather, it is that behaviors tend to be authored from the character’s point of view (what should the character be able to do) and selected from the character’s point of view (what is the right thing to do in this situation). This may sound like a computer model for what the live human actors provided in the bus stop experiment. But the work of an actor is to take actions that will communicate, not those that are correct. The difference turns out to be crucial.
Responding to her dissatisfaction with standard Hap characters, Sengers built a set of extensions called The Expressivator. An inspiration for the work was the field of narrative psychology, which, rather than seeing patients as sets of disconnected symptoms, attempts a reintegration through narrative. Similarly, The Expressivator attempts “symptom control” for schizophrenic agents by building a system focused on narratively-understandable behavior.
This is necessary, in part, because of the differences between characters like Woggles and characters like Sims. A player’s Sims are narratively understandable because of elements like the “Needs” display. Watching a Sim’s meter for their Bladder move too low, the player thinks, “She’ll need to use the restroom before long.” When the Sim abruptly stops an earlier behavior and heads to the restroom, this is understood as a fulfillment of something that has been building over time, rather than a mysterious switch. Further, behaviors in The Sims are relatively time-consuming, and each is carefully animated as a performance. These lengthy behaviors make it impossible for Sims to appear to “dither” like looping Woggles. Lengthy behaviors also necessitate a visualized queue of current and upcoming behaviors which can be easily canceled, even mid-execution, by the player — so that Sims can react quickly to events such as ringing phones or on-fire kitchens.
Sengers sought a way to bring some of the narrative understandability of characters from traditional media — who don’t use meters to display the changing state of their needs — to behavior-based digital characters. To accomplish this, The Expressivator organizes character action around authorial intentions for how it will be interpreted. Rather than goals, behaviors, and actions, the system is composed of signifiers and signs. Since character actions take time for the audience to interpret, signifiers and signs are “posted” when the audience is likely to have seen them. Then these are part of the world history, available for use in deciding on the next signifiers and signs. When moving from one behavior to another, each character attempts to communicate why the change is being made, using a set of specifically-designed transition behaviors. These, in turn, are a special case of the more general category of meta-level controls, designed to allow authors to express relationships between formerly black-boxed character actions.
In building The Expressivator (and her sample media work, the Industrial Graveyard) Sengers demonstrates an interesting middle path. Her focus on audience interpretation in narrative terms, especially in designing transitions aimed at communicating the reasons for behavior changes, begins to create a system with some of the strengths that made Improv’s more scripted animations engaging over time. At the same time, the characters have the flexibility and autonomy of behavior-focused designs. What is missing, however, is the reason that the Oz Project needed this flexibility in the first place — for characters to play a role while also following the directions of an off-stage drama manager. The first full-scale attempt at this was initiated several years later, by a game industry veteran and the Oz Project’s last graduate student.
6This places her work at Atari and on her dissertation in roughly the same time period as Universe and the first phase of Minstrel. Some of Laurel’s dissertation was later adapted for her book Computers as Theater (1991).
7From the animation direction, work led by researchers such as Norm Badler and Jessica Hodgins focused on biomedically correct animation, rather than expressive animation (Badler, Phillips, and Webber, 1993; Hodgins, Wooten, Brogan, and O’Brien, 1995). From the behavior side, researchers such as Barbara Hayes-Roth sought to find the right general-purpose “social-psychological model” for interactive characters while Bruce Blumberg was among those building non-human characters based on ethological models (e.g., dog behavior) (Rousseau and Hayes-Roth, 1998; Blumberg, 1997).
8Bates reports of a particular Woggle:
“Due to a programming error, Shrimp occasionally peppers his ongoing behavior with what seems to be a nervous tick causing him repeatedly to hit his head against the ground. This attracts people’s attention immediately, but to our surprise they build psychological theories, always incorrect, about Shrimp’s mental state and seem to find him much more interesting and alive for having this behavior.” (1994, 124)
Years later, the Zoesis company created a new implementation of the agent architecture. Testing OttoAndIris.com they found something similar. As Loyall reports:
“In an early version of the system, kids testing it drew pictures afterwards of Otto as a ‘crybaby,’ and kept talking about the time he refused to sing. The refusal was a bug that caused part of Otto’s mind to completely freeze. We thought the bug had ruined the test, but to the kids it showed Otto’s strong will and made him seem more alive.” (2004, 7)
These behaviors succeed, of course, because they are great opportunities for the Eliza effect. But an entire system composed along their lines would have been a dismal failure. They succeeded by being distinctive within a larger context of behavior that seemed ordinary and appropriate.