March 10, 2008

EP 7.5: Expressive Language Generation

by Noah Wardrip-Fruin · , 6:21 am

From one perspective, the challenge faced by Terminal Time is the primary focus of the entire computer science research area of “natural language generation” (NLG). This work focuses on how to take a set of material (such as a story structure, a weather summary, or current traffic information) and communicate it to an audience in a human language such as English. On the other hand, very little NLG research has taken on the specific version of this challenge relevant for Terminal Time (and digital media more generally): shaping this communication so that the specific language chosen has the appropriate tone and nuance, in addition to communicating the correct information. Given this, digital media (such as games) have generally chosen very different approaches from NLG researchers for the central task of getting linguistic competence into software systems.

The approach of most games, as I discussed earlier in the context of dialogue trees, is to simply have a human author write large chunks of text — these chunks then embody the author’s linguistic competence. At this extreme, the computer need not know anything about the content of what the author has written, because the computer will never need to do anything but output one of these chunks at the appropriate moment. In games the human-written chunks of text are often read aloud by voice actors and the resulting sound files triggered by the system when the appropriate moment is determined by non-linguistic processes within the game.

The opposite extreme would be to attempt to put all the linguistic competence into the software. Using artificial intelligence techniques, the software would determine (at the level of meaning) what messages need to be conveyed to the audience. Then, using general knowledge about human language, a body of knowledge about the specific language in which the messages are to be conveyed, topic-specific knowledge of how ideas in this domain are typically expressed, and some sort of mechanism for defining and choosing between the different options deducible from this knowledge, the software would produce a chunk of text fully customized for that situation. No trace of a message, except as abstract knowledge, would exist before it was assembled. No operational system such as this exists, because many non-trivial research questions would need to be answered before one could be constructed. But this kind of ambition is part of what motivated, for example, the work on translating CD expressions to natural-language sentences in Roger Schank’s lab, which formed the basis for Meehan’s Mumble — a more ambitious version of which was Neil Goldman’s earlier Babel (Goldman, 1975).

NLG templates

Structure-oriented NLG systems fall between the extremes outlined above. The simplest systems, perhaps too simple to be considered true NLG, are template-driven systems. These systems have a structure, a template, in which certain pieces of content are left to be determined (they are “open slots” in the template). Also, aspects of the template may vary in simple ways.

The best-known template systems, in everyday life, are letter-generating systems. These are used for everything from broad-based political fundraising to very specifically individual (and yet consistently structured) professional communications of doctors and lawyers. These systems may simply fill in the name of the recipient, and send otherwise-identical letters to every address receiving a certain type of letter, or they may insert or omit a wide variety of paragraphs, sentences, and even words to match the data the system knows about a recipient.

As with the “chunks of text” approach, most of the linguistic structure in a template system comes from human authoring that is expressed as completed text, rather than as structures expressed in computable form. This makes template-driven systems easier to construct than more complicated NLG systems, but it also provides less flexibility. Changing a text’s tense, for example, would probably be accomplished through providing an alternate version of the template. Many NLG systems of the more complex varieties, on the other hand, would have little trouble generating past or present tense messages based on encoded knowledge of how tense functions in the language in question.

Departing from writing

Moving away from template-driven approaches, into the area of “true” structure-oriented NLG, one is also moving further from writing. This is true in two senses. The information one supplies to the system is further from writing. Also, given the difference in the information one supplies, it becomes harder to employ the techniques traditionally used by authors to shape the surface output of the system — revision becomes something quite different from traditional textual editing. These facts are likely part of the reason that NLG techniques more complicated than templates have rarely been used by writers.

In their article “Building Applied Natural Language Generation Systems” (1997) Ehud Reiter and Robert Dale outline six basic kinds of activity for NLG systems, using the example of a system that answers questions about rail travel. Content determination is the process of getting the semantic input that the NLG system will turn into text, and creating a set of messages that will be used by the further steps (in the example: the next train on the route, when it leaves, and how many trains a day travel that route). Discourse planning structures the messages, usually into one of computer science’s ubiquitous trees (in the example, the identity of the next train and its time of departure become the leaves of a “next train information” node — with an “elaboration” relation between them — which is linked to the “number of trains” information by a “sequence” relationship expressed at the root of the tree). Sentence aggregation, in turn, is the process of deciding which messages should be grouped into sentences, often leaning on the tree data (in the example, it is pointed out that the two leaves might be combined so that the next train’s name and departure time would be in one sentence). Lexicalization is the activity that determines the words and phrases that will be used to express particular concepts and relations. Lexicalization is particularly important for systems that output in multiple languages, but can also be a good place to explicitly provide variation (to prevent monotony in output) or make choices about word usage (in the example, it is suggested that depart is perhaps more formal than leave). Referring expression generation is in some ways closely related to lexicalization, in that it is the selection of words or phrases to refer to entities in the messages. However, it is more focused on context, particularly the context of text generated thus far (in the example, this is the question of when expressions like “it,” “this train,” and “the Glasgow train” are appropriate for referring to a previously-mentioned train). Linguistic realization is the application of rules of grammar to form the output from the previous processes into text that is correct syntactically, morphologically, and orthographically. (In the example, the system produces the sentence “There are 20 trains each day from Aberdeen to Glasgow.” The syntactic component of the realizer added “from” and “to” in order to mark the train’s source and destination, the morphological component produced the plural “trains” from the root “train,” and the orthographic component capitalized the initial word and added a period at the end.)

As a practical matter, NLG systems do not generally have six different components for these six different activities. Reiter and Dale suggest that the most common architecture actually consists of a three stage pipeline: text planning (content determination and discourse planning), sentence planning (sentence aggregation, lexicalization, and referring expression generation), and linguistic realization (syntactic, morphological, and orthographic processing). Reiter and Dale’s account here is in agreement with that of other NLG authorities, such as Eduard Hovy, whose summary in the The MIT Encyclopedia of the Cognitive Sciences names these stages using the almost-identical terms “text planning,” “sentence planning,” and “sentence realization” (Wilson and Keil, 1999).

This overview underscores another likely reason that few writers — and, in fact, few digital media practitioners from any background — have made use of traditional NLG techniques. As one sees in the design of systems from Tale-Spin/Mumble to the present, these systems are generally constructed with the assumption that messages precede language. Their architectures are developed so that well-defined messages can be “realized” as texts. And yet creative writing, for many writers, is the practice of what could not be said any other way. Its messages do not precede its text, but emerge through the specifics of its language.

This authoring dilemma is approached in different manners by Terminal Time, Brutus, and systems that engage more directly with the NLG tradition.

Terminal Time templates

Looking at the NLG approach of Terminal Time, one can see that the system succeeds at producing satisfying media experiences, in part, because it is a flexible container for text that can be crafted with traditional writing skills. Its approach is essentially that of template-oriented NLG, though the design of the underlying system is a more general and powerful one, which could be further exploited. As it is, however, the Terminal Time authors generally craft the narration for historical events at the sentence level.

In any particular Terminal Time performance, the current shape of the audience-directed ideology model will determine which events will be portrayed and how they will be spun. The text for each historical event must describe the event appropriately, both giving its outlines and shifting the description according to current rhetorical goals. Further, because Terminal Time is designed to be shown to the same audience twice, in quick succession, it is useful if the same event (motivated by the same rhetorical goals) can be narrated in non-repetitive ways. For example, here is an NLG rule for an introductory Terminal Time narration focused on issues of class:

(def-nlgrule :name :generate
:context :feudal-agrarian-first-period-intro
:body (:seq

(:terminal

(:string “1000 years ago, Europe was emerging from the Dark Ages.”)
(:keywords (daily-life europe european peasant)))
(:terminal

(:string “As the new millennium dawned, the seeds had been sewn for a just and humane social order.”)
(:keywords (farming daily-life europe)))
(:rule :detail)
(:terminal

(:string “The new millennium’s fragile seeds of economic freedom were cultivated in the feudal agrarian economies.”)
(:keywords (daily-life farming europe)))
(:rule :conclusion)))

(Mateas 2007)

The three sentences preceded by “:string” provide the spine of the narration. They are spoken, in order, by the synthesized voice of the narrator — and accompanied by video clips tagged with the keywords specified on the next line. Variation is provided by the two lines that begin “:rule” — one for additional detail, the other for the conclusion. The possible detail sentences that Terminal Time can drop in range from “Farmers and small tradesmen worked together in congenial local barter economies” to “Wealth was only mildly concentrated in these farming communities when one compares them to the capitalist gluttony of 20th century America” (with their accompanying video keywords). One of the possible conclusions reads: “The worker owed his labor to his land’s owner, but could not be ejected from his home.” A different ideology model, on the other hand, would result in the selection of a somewhat different description of these circumstances, such as the one that can conclude: “The worker owed his labor to his land’s owner, and in return earned the right to his home.”

In narrating the agrarian past of Europe, Terminal Time essentially has parallel structures for the narration, depending on the current shape of its ideological model. However, Terminal Time’s authors chose to intertwine the differing narrations of other events. For example, this is the NLG rule for introductory narration about the space age:

(def-nlgrule
:name :generate-rocket-science-intro
:context %RocketScience
:body (:seq

(:terminal

(:string “Beginning in (%date 1957) with the launch of the Russian satellite Sputnik atop an Intercontinental Ballistic Missile, human beings left the planet earth and began exploring space.”)
(:keywords (satellites rockets communism outer-space)))
(:terminal

(:string (“This”

((%rhet-goal :show-the-hollowness-of-science)
“mechanistic”)
“hymn to science reached its crescendo with the moon landing in (%date 1969).”))
(:keywords (moon rockets america outer-space technological-instruments astronomy)))))

________________

In this rule, the word “mechanistic” is dropped into the second sentence if this event is being spun to support the rhetorical goal “show-the-hollowness-of-science.” If it is, a further rule will produce a following sentence that reads: “Yet this demonstration of the power of mortal man to overcome nature left the world simply more empty and threatened.” If, on the other hand, the active rhetorical goal is “show-success-of-science,” then the next sentence is “Besides being a demonstration of the power of man’s mind to overcome nature, space exploration unambiguously proved that above our heads there is no heaven, only the vastness of empty space.” It is techniques such as these, breaking the narration down to the sentence and word level, that allow Terminal Time to tell its events from an appropriate ideological perspective, keep the data authoring task manageable, and — crucially — interleave the narration of different events in a way that exposes the shape of its current ideological model.

This is effective, but could have been taken farther. Each of the rhetorical goal checks performed by Terminal Time is actually a full inferential query. The power in this approach would have also allowed narration to vary based on a wide variety of other parameters. And the variation, of course, could have been at a much more fine-grained level. But this is not a way in which writers are accustomed to working, and it is not immediately obvious how one would structure such an approach to authoring. A technique from Brutus illustrates one possibility for such an approach.

Literary augmented grammars

Just as Brutus uses story grammars, it also uses paragraph and sentence grammars. These are relatively standard linguistic tools, but in Brutus they are augmented with hand-crafted literary knowledge and named “literary augmented grammars” (LAGs). One particular LAG is the “independent parallel setting description” (INDPSD), which created this familiar sentence: “He loves the studious youth, ivy-covered clocktowers and its sturdy brick.” These are an example of using hand-authored data to generate sentences with literary nuance — making for better text as well as for more ability to reflect the underlying system in surface language. A closer look at the INDPSD reveals the strategies employed.

Bringsjord and Ferrucci present the INDPSD grammar three times in the course of explaining the version used in the “setting_description” from the story of Dave Striver. The first presentation is of a relatively standard, if somewhat specialized, grammar:

• INDPSD → SETTING verb FP
• FP → ‘its’ FEATURE FP | ‘its’ FEATURE
• SETTING → noun_phrase
• FEATURE → noun_phrase
(Bringsjord and Ferrucci, 2000, 181)

In this grammar uppercase words are non-terminals, words in single quotes are “literals” (used in the sentence exactly as they appear), and lowercase words are terminals (to be selected/constructed from the lexicon). The “|” symbol, which did not appear in the story grammar, can be read as “or.” In this case, it indicates that an INDPSD can contain one or more FPs — because an FP can be rewritten either as “ ‘its’ FEATURE” or as “ ‘its’ FEATURE” followed by another FP. This creates the parallelism that gives the INDPSD its name.

However, this grammar clearly is not enough to create a sentence such as the example above. Where is Brutus to find the remaining information? While, as I mentioned earlier, there are large bodies of “common sense” knowledge that have been encoded into structures such as the in-process Cyc Ontology, common sense knowledge is not the same as literary knowledge.

In order to address this, Brutus includes hand-encoded information about the “iconic features” of different objects, from a literary point of view. These iconic features are both positive and negative, to assist in portraying the object from differing points of view. Here, for example, is the listing of iconic features for universities:

frame university is a object
default positive_iconic_features is

{clocktowers, brick, ivy, youth, architecture, books, knowledge, scholar, sports} and
default negative_iconic_features is

{tests, competition, ‘intellectual snobbery’}.

(184)

Obviously, this sort of literary knowledge is such that it would differ from story to story. In fact, it might differ from character to character within the same story. For example, a character who is an athlete and another who is her professor might disagree strongly about whether “sports” are a positive or negative iconic feature of the university. Nevertheless, Bringsjord and Ferrucci’s approach is an intriguing one.

Of course, this is still not enough information to produce a sentence such as the output seen from INDPSDs. Another level of knowledge is required, which Bringsjord and Ferrucci call “literary modifiers.” Here are the iconic features and literary modifiers for ivy:

frame ivy is a object
default positive_iconic_features is {leaves, vines} and
default negative_iconic_features is {poison} and
default negative_literary_modifiers is {poisonness, tangled} and
default positive_literary_modifiers is {spreading, green, lush}.

(184)

However, this still is not enough to ensure the creation of sentences such as those seen from INDPSDs. The grammar needs to be further augmented, with more consistency of structure and more direction about how to fill out aspects of the grammar. For this purpose Brutus contains what its authors call “literary constraints.” Here is the second version of the INDPSD, incorporating a set of such constraints:

• INDPSD → SETTING verb (isa possessive_verb) FP(n=3)
• FP → ‘its’ FEATURE FP | ‘its’ FEATURE
• SETTING → noun_phrase (has_role setting)
• FEATURE → noun_phrase (isa iconic_feature_of SETTING)
(185)

In the grammar above, the literary constraints are the elements that appear in parenthesis. For example, the “(n=3)” in the first line enforces the occurrence of three FPs in the sentence — creating the three-part parallelism seen in examples of the INDPSD. This is simply a structural rule. But elements such as the last line’s “(isa iconic_feature_of SETTING)” create the connections between the grammar and literary knowledge.

This, however, still doesn’t suffice to create sentences such as those generated by INDPSDs. Specifically, the examples above begin with “He” — identifying Dave Striver. Throughout Artificial Intelligence and Literary Creativity Bringsjord and Ferrucci emphasize the importance of simulating consciousness at the level of language, of describing things from the point of view of characters. This brings us to the final version of this LAG:

• POV → Agent (is a person) Verb (is a PC Verb) FirstFP
• FirstFP → Setting FEATURE FP
• FP → its FEATURE FP | ‘.’
• FEATURE → noun_phrase (is a feature of SETTING)
• SETTING → noun_phrase (is a setting)
(188)

Some of the previously introduced constraints seem to have been removed in order to make it easier to see what is new in this version. In particular, Bringsjord and Ferrucci draw attention to the “(is a PC Verb)” constraint. They have hand-encoded a special set of verbs as “pc_verbs” which “include verbs for feeling, thinking, understanding, wanting, etc.” (187). Presumably there is also a way in which the LAG restricts this so that we get a pc_verb that reflects the point of view of the Agent, but this is not specified.

With these techniques, it is clear how LAGs enabled Bringsjord and Ferrucci to author data for Brutus1 that would result in more successful literary output than found in previous story generation systems such as Tale-Spin/Mumble and Minstrel. LAGs offer an interesting tradeoff in terms of expressing the underlying model and human authorability. Other techniques, from areas nearby the story generation field, offer further interesting approaches.

Two other approaches

Eduard Hovy’s system Pauline is undoubtedly a high-water mark for expressing an underlying model through variations in surface text. This project emerges clearly from the “scruffy” tradition — developed as Hovy’s dissertation work at Yale (defended in 1987) with Schank as his advisor and Abelson as a committee member. Given this context, it should be no surprise that Pauline uses something similar to Schank’s “conceptual dependency” expressions as its underlying representation structure.

Pauline doesn’t generate stories, but instead can tell many variations of the same story, based on a model of the situation’s “pragmatic constraints.” While the system was employed to generate texts recounting a hypothetical primary election between Carter and Kennedy, as well as differing versions of the output of a program modeling judicial sentencing, it is best known for its variations on a story about a moment in the student/administration conflict over Yale’s investments in apartheid-era South Africa. Authoring the data for this story required representing “about 75 elements denoting the events, actors, locations and props, and . . . about 50 elements denoting the relationships (temporal, intergoal, causal, etc.) that hold among them” (Hovy, 1987, 696). In addition, the system required information about how to translate this logical information into text that would have the appropriate nuance (e.g., “MTRANS indexes to a great many verbs and phrases, among which are ‘give permission’, ‘allow’, ‘announce’, and ‘say’ ” (703)). This was a non-trivial authoring effort, in a form quite removed from traditional writing, for generating variations on a story only one paragraph in length.

But the variations are quite impressive. More than 100 are possible, each seeking to express a different model of the pragmatic situation — such as whether the speaker agrees or disagrees with the audience, is trying to slant the account in favor of one group or another, is communicating in haste or deliberately, and so on. These can influence what topics are included in the story, how the discussion of the topics is organized, and what phrases and words are used. For example, if the situation is one in which the speaker wishes to decrease interpersonal distance with their audience, choosing an appropriate level of informality is important. Hovy provides this example of using clause position, verb formality, ellipsis, adjective inclusion, and conjunction to communicate the same portions of the story in a “highfalutin” and “colloquial” manner:

In early April a shantytown — named Winnie Mandela City — was erected by several students on Beinecke Plaza, so that Yale University would divest from companies doing business in South Africa.

Students put a shantytown, Winnie Mandela City, up on Beinecke Plaza in early April. The students wanted Yale University to pull their money out of companies doing business in South Africa. (704)

Nick Montfort’s system nn, on the other hand, is explicitly designed as an authoring tool for interactive fictions in the Infocom tradition. As such, authorability must be one of its highest-priority goals. It seeks to create a system in which authors can create extended interactive fiction experiences, defining both the story and language involved, and yet able to take advantage of elements of the system that allow for automatic variation in the narration.

Though not working in the scruffy tradition,8 Montfort does employ the basic acts from conceptual dependency theory in the simulation engine for nn. However, these are not positioned as the primary site for nn authoring (many rooms, things, and actors will function largely on appropriate defaults). Instead, nn invites authors to focus on crafting texts, in a manner as close as possible to traditional writing while still providing appropriate hooks for nn’s narrative transformation processes. Montfort provides an example that re-engineers the opening text from one of his own interactive fictions, Winchester’s Nightmare (1999):

Sarah Winchester has forgotten being awake. It is night, or predawn morning, and moonless. She is on a sandy strand extending north and south from here. The sea is before her to the east.

Using nn, this can be transformed into the standard second-person address of much interactive fiction:

You have forgotten being awake. It is night, or predawn morning, and moonless. You are on a sandy strand extending north and south from here. The sea is before you to the east. (2007, 119)

Or nn can make other shifts, such as creating first-person narration that takes place at a time previous to the events:

I will have forgotten being awake. It will be night, or predawn morning, and moonless. I will be on a sandy strand extending north and south from there. The sea will be before me to the east. (120)

The system is also capable of much higher-level transformations of narrative, from event ordering to the speed of narration, building on the structures of classic narratology. But probably the biggest challenge it presents to authors is to write a significant body of text in the “string-with-slots” representation it uses for building sentences. Here is the one for the sample text:

S_FC V_forget_PERF being awake, it V_be_S night, or predawn morning, and moonless, S_FC_PN V_be on a sandy strand extending north and south from D_HERE, the sea V_be_S before O_FC_PN to the east (119)

Each of the capitalized elements is one that may be manipulated by nn, subjects beginning “S_”, verbs with “V_”, and objects with “O_”. The first of these, “S_FC”, refers to the subject who is the focalizing character. While this is certainly not a style of writing that seems likely to come naturally to many writers, it would certainly be easier to reverse-engineer traditional writing into this form (as Montfort does in this example) than into that employed in Pauline or Brutus, and the results are significantly more flexible (though along a different dimension) than those in most of Terminal Time’s text. When the tradeoff is worthwhile is open to question.

And the ground may shift. As of this writing, nn is still in active development and has not yet had a public release. Montfort writes of a possible tool to “semi-automatically create these representations from ordinary texts under user supervision” (149–150). Such work may open a new chapter in the history of interactive fiction — in which narrative variation plays as important a role as it does in print fiction. It may also provide a productive example for those developing systems for the surface output of other sorts of flexible fictions, so that authors may specify means of expression for underlying models of aspects other than narration. This book, on the other hand, now moves toward systems that use modalities other than textual narration in their expression.

Notes

8The first version of nn was developed in Montfort’s dissertation work at the University of Pennsylvania, under Mitchell P. Marcus, who trained at MIT.

4 Responses to “EP 7.5: Expressive Language Generation”


  1. nick Says:

    Noah, thanks for discussing nn here – you have done it well and I think have positioned nn among other expressive text generators. My goal was to develop a formalism that would maximize the flexibility of text output while also actually being usable by authors. This meant that a lot of lovely and more or less pure models of thought had to be passed over in exchange for junk like “S_FC V_forget_PERF being awake.” Still, I’m hopeful that the result will be usable by IF authors and by others interested in computer-generated narrative.

  2. noah Says:

    I’m glad you think the discussion of nn does what it should. I actually think your “string-with-slots” representation is an exciting starting place. It’s already usable, and it’s also a place from which we can imagine things like authoring tools to make such constructions easier, similar formalisms for varying different aspects of the text, and so on.

  3. Richard Evans Says:

    Minor correction: I think you mean “Whether”, not “When”,in the final sentence.

  4. noah Says:

    Actually, the meaning I’m after is probably “In which cases” — sometimes the tradeoff makes sense, but in other cases it doesn’t. I should probably rephrase this (or, perhaps, expand on the point a little).

Powered by WordPress