June 20, 2007
Generating Narrative Variation in Interactive Fiction
University of Pennsylvania, Computer & Information Science
Mitchell P. Marcus & Gerald Prince, Advisors
Committee: Aravind Joshi, Mark Liberman, Fernando Pereira, Marie-Laure Ryan
(This is a distilled version of my dissertation defense from this morning, for Grand Text Auto.)
The main question I’m considering today (after working on it for a few years) is:
How to create a text-generating automatic narrator to tell about the same events in different ways?
The context for this question is interactive fiction (IF). There are two parts to the answer:
- Develop a formal theory of narrative variation for IF
- Implement it in an IF and text-generation architecture
If it were possible to do everything perfectly the first time, it would just be a matter of developing the correct theory and then implement it. As it happens, in the actual project I’ve undertaken, the first attempt at a theory and implementation informs some changes and another attempt, and so on…
Today I’ll consider the theory and the implementation of a new interactive fiction system, nn. The name “nn” is meant to suggest the fundamental distinction between the narrated (events and existents; content) and the narrating (the telling of these).
Types of narrative variation
The distinction between the content of a story (what events happened in it; what existents are part of the story world) and the particular way it is expressed or told is an essential one, one of the foundations of the discipline called narratology that has existed for about 40 years.
Matt Madden made use of this distinction when he began his book 99 Ways to Tell a Story with this comic narrative, one in which the events are rather uninteresting … (These images are all taken from Madden’s site for the book; higher quality versions of all of these are available there.)
He continued the book with 98 variations on this comic, providing different tellings of the same underlying content. Here he has told the same events in one panel, on the left, and in 30 panels, on the right.
While creating a lot of variations is fun, it’s helpful to be systematic when trying to determine the essential types of variation. Probably the first really thorough consideration of how narrative can vary was Gérard Genette’s Narrative Discourse: An Essay in Method (1980; originally in French, 1972). By analogy to grammar, Genette described how narrative can have tense (pertaining to the order of events, the speed with which they are told, and the frequency with which one or more are told), mood (distance vs. immediacy, focalization or perspective), and voice (pertaining to the time of narrating and other qualities of the narrator and narratee, the one to whom the story is addressed).
In the second telling of this story here, represented at the bottom, order, speed, and frequency have all been changed. The last two events are told out of chronological order: “The jester laughed, after the clown usurped the throne.” The second event is passed over entirely – narrated at infinite speed. Events can also be narrated more slowly, so that there is twice as much text representing them, for instance. Finally, the first and third event are told at once – “the king and queen died” – with a single narration, in a type of frequency that is called iterative.
In discussing focalization and distance, it’s helpful to return to Matt Madden’s comics. In the one on the left, the perspective has changed so that we’re looking out as if through the main character’s eyes: a sort of “first-person” view. In a textual narrative, a character who focalizes regulates the narrative information. We know only what that character knows and perceives. Note that such a character doesn’t have to appear in the text as “I”; a purely third-person narrative can still have a focalizer. On the right, a few things have changed – an extra narrative level where a character in the story himself tells a story, a shift to past tense – but what I want to point out is that there is no visual representation of events anymore; they are now told in this characters “speech,” textually, making them less immediate and more distant.
Finally, there’s time of narrating and the narrator and narratee. The narrator can be placed, temporally, subsequent to events (so that narration is in the past tense, which is typical), simultaneous with events (as with a cell phone user saying exactly what he or she is doing while walking down the street), or even previous to events, as in this unusual and somewhat prophetic example. Furthermore, the narrator (who tells the story) can be signaled more or less prominently, and can be made into one of the characters in the narrative. The same goes for the narratee. In the bottom text, the queen is the narratee and the jester the narrator.
nn can produce narrative variation in all of these categories, but I’ll discuss just two things today: Order (which I’ll consider jointly with time of narrating) and focalization.
Interactive fiction’s promise
Before going on to discuss these and the nn architecture in detail, a few words about why interactive fiction is so interesting and is worth research effort. The idea of this project is to combine the underlying simulated world of IF, exemplified here by Adventure, with variable narration, allowing different expression. Raymond Queneau’s Exercises in Style, the book that was the basis for Matt Madden’s comics, is the exemplary work in this case. Brining these together … should lead to a real benefit for the literary and gaming arts, and maybe offer some benefits beyond that.
IF has some virtues that are particularly nice for computational linguistics research. It is:
- A limited domain or simulated “microworld”
- A dialog system: Natural language out & in
- A computer game, providing fun entertainment
- A form of aesthetic expression, literary art
Over 30 years, interactive fiction has moved through three productive eras, shown here.
Independent games haven’t been slackers compared to their commercial cousins, as even the briefest look at a few examples can attest.
My vision is for a fourth era of IF, one in which interactive narrating joins interactive fiction. It wouldn’t preclude other independent IF production, but it would bring IF more fully into the literary life of our world and use computation in new ways to do some of the important work of literature and art. If IF does become a more prominent part of our cultural life, we could expect to see landmarks like these.
The architecture of nn
The current IF system has two well-developed modules: The parser, to handle input, and the world model, to simulate the world. The basic “game” that an IF author writes is an instantiation of classes provided by the world model. From my standpoint, and with my interest in being able to vary the narrative, this is highly simplified, but what this architecture does in providing a simulated world and a form of natural-language understanding is very useful, and it’s the basis for my work.
I have proposed and implemented a more elaborate architecture, but one in which each module corresponds fairly clearly to a function that the IF system needs to carry out. Some modules aren’t involved in processing commands – the Clarifier handles unrecognized inputs; the Joker deals with directives at the game level, such as “quit,” “undo,” and “restore.” The modules that are part of command processing are (1) the Preparer, a simple tokenizer, (2) the Recognizer, basically a parser that is based on a semantic grammar and uses a discourse model, (3) the Simulator, a language-independent subsystem for determining what events happen and what the next state of the world will be, (4) the Narrator, which uses the world models, plan for narrating, and discourse model to produce the appropriate telling of events and description of existents, and (5) the Presenter, which formats the output string for the output device, say a terminal window or Web page. The whole system is needed for interactive operation, but the critical components from a research standpoint are the Narrator module, the plan for narrating, and the world models.
The narrator itself has an internal architecture: A three-stage pipeline. The first stage is the Reply Planner, where events, existents, and the plan for narrating are converted into a reply structure, an ordered tree of proposed expressions representing what is to be expressed and in what order. This is the content selection and structuring stage. In the next stage, the Microplanner, that tree is converted into a list of paragraph proposals, each of which has a list of sentence proposals. A formalism called “string-with-slots” is used which is reasonably easy to author and, while it is not highly abstract, still allows variation in tense, aspect, and number. Aggregation occurs here, and all grammatical specifics are decided upon and connected to the strings-with-slots. Finally, the realizer converts the abstract paragraph proposals into the final output strings.
The Narrator has to rearrange events when they are to be told in a non-chronological order. Here’s a simple story with six events in it. The events can be told out of order: 341256 instead of 123456. But a simple sequence doesn’t tell us much about the order. Genette identified many categories of order, and this sequence could fall into several of them. If it came about because we decided to narrate based on the location of events (kitchen, garden, dining room), it would be an example of narration by category, or syllepsis. If we had simply mixed the events up at random and 341256 was the result, it would be a jumbled narration, what Genette calls achrony. Finally, if we flashed back from the main sequence (3456) to tell about what had happened the previous day (12), this would be an instance of analepsis. These are three different types of order, but all can produce the sequence 341256. Clearly that sequence by itself will not be enough as input to a text generator, which has to know more about how to narrate.
Instead of using a simple sequence, these ordered trees – reply structures – represent the order of events. For syllepsis, the first level of the tree has the three categories, and beneath each one are the corresponding events, in the order in which they are to be narrated. Achrony is a very flat structure with one jumbled level. And analepsis has the main sequence with the earlier sequence embedded in it.
Intuitively, this representation is more meaningful, but what exactly is stored in the internal nodes to make generation of narrative possible? The information we need is something that, in combination with the time at which the events happened, can let us decide the tense that should be used to narrate. Hans Reichenbach’s theory of tense explains that what we need is reference time and speech time; from the time of reference, speech, and event we can determine the appropriate tense. So for Analepsis, we can set reference time to follow events (E=R), set speech time to max (E=R<S), and we have the simple past for all events on the first level. But then when we get to the internal node that is the parent of the analepsis sequence, here we let the values of R and S stay as they are at that point: R=4, S=max. Then, when narrating the analepsis, E<R<S, and the anterior past (past perfect) is generated. We can also add some cue words (“Yesterday,” “Anyway…”) at the beginning and end of this sequence easily, since the structure shows us the right place for them. Achrony can be generated as well if we want to “privilege confusion.” Just set R to follow and S to follow, so that the present tense is generated (unhelpfully) in all cases. This is the representation nn uses and the way reordering is done.
The world models also have an internal architecture: they consist of an IF Actual World and a Focalizer World for each actor, following the ideas in Marie-Laure Ryan (1991). The IF Actual World is complete, correct, current, and used for simulation. Each Focalizer World is (usually) incomplete, possibly wrong, has history, and is used for narrating. History is necessary because when retelling some events that happened 100 turns ago, it’s necessary to remember what the world was like at that point: who else was in the room, for instance?
Here’s an example of a focalizer’s incomplete knowledge. On the left, the player character stays in the middle of the plaza and can only see that the trash collector picked up “something” – the event can be seen, but not what is being picked up. If the PC had walked over to where the trash collector is, it’s clear that this is “a candy wrapper.” A model of visibility, prominence, and vantage is used to determine what can be seen. If something can’t be seen by a focalizer, a blank existent is included in its place in that Focalized World.
Here, knowledge of the world has been initialized differently. Walking out of the building, the standard adventurer knows only where she just came from – that the building’s interior is to the east. The “know-it-all” adventurer, on the other hand, has a Focalizer World with a full copy of all locations. As a result, all four exits are known and are indicated. So, characters can not only perceive things differently; they can also know more or less about them to begin with.
The same ten turns underlie both of these narratives, these recountings. On the left, the focalizer is the adventurer, who is doing lots of exciting things. On the right, the focalizer is the pirate (also addressed with “you”) who just stands at the end of the road and waves, having a much less interesting perspective on things. These different perspectives show the fundamental use of the Focalizer Worlds.
Creative work, evaluation
Two short creative pieces – more stand-alone demos that real games – were created to see how higher-level narrative effects can be composed out of simpler ones. Additionally, a pilot evaluation was undertaken in which annotators rated 14 IF output texts on naturalness, identified events in them, and identified the chronology of events. There were some good comments from annotators that will inform future improvements, and the evaluation offered several ideas about how to conduct a full-scale evaluation.
To name just a handful of the many potential future projects:
- A public release of nn for IF authors & researchers
- A non-IF system to teach narratology
- Adding drama management
- Adding awareness of the narrating
- Use with story generation
- Multi-lingual input & output
- Developing subjectivity (a grandiose goal!)
- Use in other applications
Summary of advances
In a nutshell, the project has managed, so far:
- A narratological theory of IF
- An IF architecture abstracting expression from content
- Narrative variation formalized for computational use
- An automated narrator
- A full working IF system, nn
- The implementation of standard IF (Adventure, Cloak)
- New creative demos
- A pilot evaluation