Model of Dialog
-- Prediction of the Next Topic Based on Utterance Motivation --

Riichiro MIZOGUCHI and Yoichi YAMASHITA

The Institute of Scientific and Industrial Research, Osaka University
8-1, Mihogaoka, Ibaraki, Osaka, 567 JAPAN
e-mail: miz@ei.sanken.osaka-u.ac.jp

Prediction of user's next utterances is an important technique to understand spoken dialog with the middle or large size vocabulary. Various kinds of dialog knowledge, such as utterance pairs, topics, and so on, are indispensable for prediction of the utterances. We have proposed a basic mechanism for utterance prediction last year. It was based on a topic transition model, named TPN (topic packet network). The topic information was very useful to predict the utterances. TPN is a static model of topic transitions represented in the form of a sort of network. In a TPN, topic packets (TP) bundle some topics and they are a priori linked each other. However, the topic of the utterance dynamically changes in a dialog as the dialog goes on. It is not easy to describe all the possible topic transitions in a static network especially for dialog tasks with the large vocabulary. A dynamic mechanism is appropriate for modeling topic transitions. In this year, we have discussed a new mechanism for enumerating possible topics of the next utterance by introducing a model of utterance motivation.
In a goal-oriented dialog, we carry out a dialog in order to exchange focussed information. Goals are achieved step by step through repeated interactions. We usually have a definite motivation of making an utterance in such a dialog. This motivation of the utterance has a close connection with the meaning of the utterance. A model of the utterance motivation is expected to flexibly predict topics in a dialog.
We analyzed several dialogs concerning a task of route direction and trip reservation in order to investigate motivations of the utterance. Since it was assumed that a stimulus and a response in an utterance pair have the same topic, only the stimuli were analyzed. For example, 'to make a comparison because an attribute has multiple values' is a motivation. Motivations could be divided into two different levels of motivation: communication and problem solving. The motivation of communication is triggered by the state of information exchange. A state that 'an attribute has multiple values' is categorized into one of the motivations of this level. The motivation of problem solving is related to how to use the derived information in the process of problem solving. 'To make a comparison' is an example of the motivations of problem solving.
The motivations of communication are classified into 8 categories as follows.

\=a. A value is unknown
\>b. A value is ambiguous
\>c. An attribute has multiple values
\>d. An attribute or an action is unknown
\>e. Attributes or actions are exhausted
\>f. An object or an action sequence is ambiguous
\>g. To confirm something uncertain
\>h. An object or an action sequence is unknown


Knowledge about the dialog domain is necessary for the dialog manager to know transferred information in a dialog. The dialog manager predicts the motivations of communication based on the state of transmission for each piece of information. We introduced two kinds of information packets to organize the domain knowledge. The 'action sequence' represents a sequence of actions which are executed in order to achieve a goal. The 'object' describes an assembly of information except for action sequences.
The motivations of problem solving are classified into 10 categories as follows.

\=A. To compare
\>B. To select
\>C. To sort
\>D. To know/inform the reason
\>E. To know/inform the condition
\>F. To know/inform the related information
\>G. To correct
\>H. To satisfy the constraints
\>I. To know/inform the goal
\>J. To know/inform the completion


The purpose in the problem solving process by the user invokes the motivation of this level. A user modeling is a very important technique to predict the motivation of problem solving. This is future work.
The motivation of an utterance is modeled by combination of two levels of the motivation mentioned above. We investigated the frequency of combined motivations using 6 simulated dialogs. The tasks of two sets of dialog are route direction and trip reservation, respectively. All combinations of motivation did not occur and there was a small difference of distribution of the frequency between the two tasks.
A mechanism for predicting topics in the next utterance is described according to each utterance motivation. Assume that a slot of an information packet contains multiple values and 'to compare' is predicted as the motivation of problem solving. The motivation 'c' is predicted based on the state of communication. Thus, the combination 'c' and 'A' is identified as the motivation of the next utterance. The topic prediction pattern for the motivation 'cA' is to enumerate topics in the information packet subordinated by the slot which contains multiple values.
The former mechanism based on the TPN model had poor flexibility to dynamic change of dialogs. A model of the utterance motivation enables utterance prediction adaptive to situations in a dialog. Evaluation of the proposed mechanism remains an issue to be discussed.

Keywords: utterance prediction, utterance motivation, topic transition model, spoken dialog recognition