June 28, 2004
I recently returned from TIDSE 2004 (Technologies for Interactive Digital Storytelling and Entertainment). Here’s the first part of my trip report from this conference.
Norman Badler (keynote)
Norman Badler opened the conference with a keynote titled Embodied Agents and Meaningful Motion. He began by describing the gulf between the world of computer graphics and the world of artificial intelligence, noting that virtual humans must integrate techniques from both fields in order to support compelling interactions between real and virtual people. He described his own work on the EMOTE motion quality model, based on Laban motion analysis, that provides a parameterized model that procedurally modifies the affective quality of humanoid motion, given 8 high-level parameters. The best part of the talk was his list of the myths that many virtual humans researchers are guilty of believing (he noted that he didn’t mean to single out any one researcher, and that he himself has been guilty of believing many of these).
Here are a few of my favorites:
- Movement is just physics or math. Expressive movement requires more than physically modeling a body.
- A moving agent is an animated agent. Just being able to move an agent around doesn’t mean that you’ve actually created an animated, or life-like, agent.
- Arms are for gesturing, feet are for walking, heads are for taking, and a rigid torso connects them all. Affect is conveyed continuously by the entire body position, not just by the face or by discrete gestures.
- Actors move the same way regular people do. Actors perform exaggerated and/or stylized actions for expressive effect.
- FACS is an animation system (FACS is a psychological facial expression notation system). Work based on FACS tends to switch facial expressions, such as smiles, on and off mechanically. Irfan Essa (now at Tech) did some work at MIT showing that muscle excitation curves associated with facial expressions are much more complex than simple smooth on/off curves.
- Eyes should stare straight at targets. Creates a cold, mechanical stare.
- Heads move on neck stalks.
- Facial expressions end at the lower jaw. Head and face movements actually extend into the upper chest (related to the compartmentalized view of human movement mentioned above).
- Cognition is someone else’s problem.
He mentioned the work of colleague Barry Silverman at Penn, who’s built a dynamical system modeling the physiological state of agents based on a number of interconnected physiological reservoirs (such as injury, hunger, etc.). The model determines the actions of agents modeled in Unreal Tournament.
More virtual human work needs to happen that integrates many different pieces, instead of focusing on individual subproblems. But it’s hard to get the individual pieces right and hard to figure out how to integrate them.
The audience discussion after the talk tended to revolve around realism, whether realism is the same thing as expressiveness, and whether there are different kinds of realism (e.g. emotional realism vs. physical realism). Regarding cartoon characters (non-people, e.g. Shrek) vs. people, Norm argues that virtual people are intrinsically harder to get right, because of the set of expectations we have for people, though the kind of artistic abstraction used by cartoon characters may be applicable to virtual humans. Norm himself very much comes out of the world of observations of naturalistic (non-performative, non-abstracted) human motion, though as he pointed out himself in his list of myths, actors move differently than people do in everyday life.
Of course his myths made me feel good about our work on Façade, as we’ve tried to build a complete system integrating high-level decision-making, plot, and detailed bodily movement, all with expression as the foremost goal.
Michael Mateas & Andrew Stern
I presented my and Andrew’s paper describing our natural language understanding architecture for Façade, focusing on our approach for mapping surface text to discourse acts rather than our discourse management architecture. Our paper won the best paper award at the conference. A very nice surprise! To motivate our architecture, I described why you’d even want to try creating a natural language interface for interactive drama, given that it’s an AI-complete problem dooming you to some (or quite a few) understanding failures. This discussion turned out to be relevant for the next speaker, who did a theoretical analysis of the design tradeoffs in interactive drama interfaces. I’ll post more about why you’d even want to build an natural language interface later.
Nicholas described a theoretical analysis of interface issues in interactive drama by exploring the functional relationships between two sets, the set of physically possible actions afforded by the interface (e.g. clicking menu items, typing text) and the set of meanings relevant to the drama. Dialog menus introduce a total function between the two sets, in which there is a one-to-one relationship between physically possible interface actions and meanings. Natural language understanding introduces a partial function in which some physical actions (typed text) have no resulting meaning. Filtering interfaces change the set of physical actions (e.g. contextual menus) as a function of the current context of the drama. He described the design tradeoffs in using the various types of mappings.
Drawing on semiotic descriptions of narrative interpretation, Nicholas described the importance of players being able to anticipate future action as part of the process of interpreting the drama. Thus, he concluded that filtering interfaces, that dynamically change the set of player actions available as a function of context, are inappropriate for interactive drama.
He concluded by describing a very interesting design problem in real-time interactive drama: how do you deal with the extended duration of player dramatic actions. It takes time for the player to do things in the world (e.g. respond to another character, either by inputting natural language or selecting a choice off of a large hierarchical menu), yet the drama is moving forward in real-time. What do you do while the player takes action? One possibility is to freeze time, but Nicholas argued that this breaks immersion in the story. Another possibility is semi-autonomous avatars, which continue to do some actions in the world while the player is constructing their next action. But semi-autonomous avatars potentially interfere with autonomy. He proposed a third possibility: using temporal ellipses. In this approach, after the player makes an action, the time jumps ahead an amount consistent with the amount of time taken by the player to select the action. In Façade we use a pseudo-real-time approach, in which the characters perform in real time, but, while the player is typing, pause to listen to the player, nodding and so forth, as if the player were speaking in real-time, even though the amount of time it takes the player to type a response is much longer than the amount of time it would take the player to speak the same line. In our play tests we’ve never had someone complain that the drama seems too slow, or that the characters listening for an extra-long time while the player types breaks immersion. On the contrary, most first-time players complain that the action takes place much to fast, that they feel overwhelmed by the drama and unable to jump in as much as they’d like.
It was nice to see Nicholas again. I met him at the 1999 Narrative Intelligence Symposium, and most recently saw him at Cosign last September.
Ido described the Art-e-fact system, which uses several virtual characters to present artwork in museums, where the different virtual characters correspond to different points of view on the art. A big aim of their architecture is to support authoring for non-computer scientists, allowing museum specialists to author virtual characters for changing museum displays. They have developed an authoring interface that supports authoring characters using explicit transition networks, maintaining a bit of state to allow conditions on transitions. They are using AIML to do surface text processing, using AIML both to recognize discourse acts, and using it in its more standard form to map directly from player utterance to response for the purposes of supporting chat. They are currently working on a rule-based story engine as a successor to the transition network approach, though they are concerned that the rule-based approach won’t support end-user authoring. In the discussion period I asked them about how much they used AIML’s recursive features, which are not as expressive as the support for recursion in Façade’s NLU rule language (Façade’s NLU rule language supports both shallow and deep rules living side-by-side, so that authors can create a mixture of template-based rules and deeper, robust noise-tolerant parsing rules). They are currently barely making use of AIML’s recursive features – they found the features to be difficult to use and to yield poor results.
Isabel Machado, Paul Brna, Ana Paiva
Isabel described director agent that helps add story structure to a multi-user children’s play world. The goal is to support children in constructing emergent fairytale narratives (narrative play). The project uses a variable story schema based on Propp, where Propp’s story functions are taken as plot points. During the questions there was some discussion as to what it means to “use Propp” given that Propp’s morphology is grossly underspecified. She, and a couple of other presenters who also make use of Propp, tend to use his story functions as a high-level guide or inspiration for writing the concrete plot points for their interactive stories.
Daniel Roberts, Mark Wright
Daniel’s work was motivated by the goal of reusing animation content from a children’s cartoon series while avoiding the hard problems of narrative modeling. They take an object oriented approach to prompted play, where narratives are associated with props. For example, if the character Bing is placed in the environment with a fruit tree and a trampoline, a story associated with the objects might be: Bing is hungry but is too short to reach the fruit, Bing uses the trampoline to jump up to the fruit, Bing eats the fruit and is satisfied. The system allows the children to play freely with the objects, but occasionally prompts them (gives the children hints) to move the story forward. During the questions there was some discussion about whether this kind of experience counts as a narrative, or is rather puzzle-based problem solving. As many GTxA readers know, the question “but is this a narrative” often arises at interactive narrative conferences, as the word “narrative” covers a wide range of structures, from rather constrained, tightly defined structures such as “a dramatic story structured in an Aristotelian arc, with a clear rising tension, crisis, climax and resolution” (e.g. Façade), to more open-ended structures such as “a causal sequence of events” or “a master narrative”.
Federico Peinado, Pablo Gervás
Federico turns to the rules of thumb employed by game masters of role-playing games (both tabletop and live action) for principles for constructing interactive narrative. From such “how to” guides he has identified a number of player types (e.g. butt-kickers, specialists, storytellers, etc.) as well as the following high-level improvisation algorithm for drama managers (game masters):
- Create possibilities for the current story situation.
- Imagine the most obvious result
- Imagine the most challenging result
- Imagine the most surprising result
- Imagine the most pleasing result
- Pick one at random.
- Imagine consequences; if they conflict with plot, pick another.
He’s using this algorithm to structure the design of a case-based approach to drama management. Story cases are annotated by features such as surprisingness, challengingness and so forth to support step 1 of the algorithm. Additional author-specified global constraints (e.g. the first part of the story must have low tension, the second part high tension) are used to force case reselection if the selected case violates the constraints (step 3 of the algorithm). The real power of case-based reasoning lies in the adaptation of cases to novel situations. Currently, in his approach, adaptation is only used to bind unbound terms in the case to entities in the current situation (e.g. binding roles in the selected case to the present players). But he’s interested in exploring more radical adaptations in the future. I think case-based approaches to drama management are a very fruitful direction to explore.
Shachindra performed a literature survey to determine whether the term “presence”, as the term is currently understood in VR, is an adequate concept for capturing the narrativity of an environment. He described a number of definitions of presence appearing in the presence literature, with the richest and most interesting one by the ecological view of presence. In the ecological view, presence arises from four factors:
- Action support: environmental affordances
Borrowing from the psychologist Gerrig’s view of transportation as a journey,
Shachindra proposes extending the ecological approach to presence to include the notion of virtual identity. For Gerrig, transportation is the phenomenon of a traveler being transported to a fictional world, in which they have a role in the world, by some identity-shaping mediation (means of transportation), and occurring as a result of performing certain actions.
Gabriela Tully and Susan Turner
Susan described an experiment designed to explore issues of interactivity and immersion in a branching node-based interactive movie. The movie, Emobra, is a story of a couple’s relationship. The conventional approach to interactive movies is to use explicit menus at branch points. Such explicit menus, however, break immersion. So, in Emobra, they tried a couple of alternative cinematic indications of interaction points, including a zooming effect and a black-and-white effect. Their general question was whether viewers, told only that the experience was interactive in some way, would discover the interaction points and how they could interact. Their experiment confirmed that viewers were able to discover the interaction points. It would be interesting to compare an explicit menu-based version of Emobra to the implicit interaction point version to see if viewers also find the cinematic effect approach to indicating interaction points more immersive than explicit menus.
Stephane Sanchez, Oliver Balet, Yves Duthen, Herve Luga
They described a virtual character architecture (a reactive planning architecture, like the ABL language we use in Façade) in which they are integrating machine learning in the form of genetic classifier systems. The genetic classifiers learn to sequence behaviors given the current sensed state of the world. In their example they showed a Sims-like world in which a character learned a collection of genetic classifiers that allowed it to cook food. Interesting to me as it’s related to the work I’m doing integrating reinforcement learning into ABL.
Stefan Göbel, Oliver Schneider, Ido Iurgel, Christian Knöpfle, Alexander Rettig
Oliver presented the virtual human platform, an architecture used for both the Art-e-fact project (above) and the augmented reality project Geist. In Geist, a user is able to wander about the old castle in Heidelberg and see, superimposed on the landscape, various characters (ghosts) talk about the 30 years war. One of their goals was to allow the user to visit the various locations in any order, but have the stories told by the ghosts still form a coherent narrative. Reminds me of a project one of my students did in my interactive narrative class last semester, storyboarding a narrative structure for an augmented reality tour of the Oakland cemetery in Atlanta with the goal of creating a coherent narrative regardless of the order in which sites are visited by users.
Marië Theune, Sander Rensen, Rieks op den Akker, Dirk Heylen, Aanton Nijholt
Marië presented a story generation system in which a plot is specified as a set of episodes, where each episode specifies a setting, a collection of characters, and a set of goals that the characters must achieve for that episode. In a Tale-spin-like manner, stories are generated as the characters select actions to achieve goals. Their model goes beyond Tale-spin in that the characters use the OCC cognitive appraisal model of emotion (the same model of emotion used in the Oz Project), with the characters achieving emotion-expression goals in addition to story goals. Also, the characters must ask the plot manager permission to carry out actions selected in the pursuit of goals; the plot manager thus has the opportunity to constrain actions so as to achieve greater plot coherence than can be achieved by the Tale-spin model (one of the big lessons of Tale-spin was that a collection of characters autonomously pursuing goals doesn’t necessarily, and in fact, often doesn’t, create a story). Currently the stories produced are Tale-spin-like, but the architecture has a lot of room to grow. One of their primary interests is natural language generation with pragmatic variation (generating language with a specific desired tone or style). The story generator, complete with character emotion model, is there to give them interesting text to generate. NLG with pragmatic variation is important for the future of generative story systems; I was glad to see someone working in this area at the conference.