Language Analysis for Dialogue Understanding
Yuji MATSUMOTO and Osamu IMAICHI
Graduate School of Information Science
Nara Institute of Science and Technology
8916-5 Takayama, Ikoma, Nara 630-01, Japan
e-mail: matsu@is.aist-nara.ac.jp
Dialogue may include a number of grammatically incorrect but
semantically understandable sentences, leading dialogue systems
to handle grammatically ill-formed inputs. Speakers often omit
words, change order of phrases, or make some careless errors such
as agreement inconsistency, misspellings or extra words.
Dialogue systems have to handle not only grammatically
well-formed inputs but also grammatically ill-formed inputs.
That is, robust parsing systems should not just reject the
grammatically ill-formed inputs, but process them anyway.
Many works use context-free grammars (CFGs) for describing
well-formedness of input sentences because of their simplicity
and tractability. Another reason to use CFGs is that there are
efficient parsing algorithms. However, there are linguistic
phenomena which are hard for CFGs to define every detail of
phenomena, such as word order variation in a free word order
language. To cope with the problem, in recent grammar
formalisms, grammars with various kind of constraints instead of
one single kind of rules as in a CFG are prevailing. Especially,
in unification-based grammar formalisms, grammar
rules are described using the notion of unification.
Moreover, since utterances usually contain sufficient information
for understanding the speakers intention, they have to be
interpreted within contexts.
We developed a new method to deal with grammatically ill-formed
inputs. Considering grammatically ill-formed inputs as violation
of constraints on grammar rules, these are regarded as being
caused by some failure of unification operation. In the ordinary
definition of unification, the result of an unification operation
is a success or a failure. That is, if feature structures to be
unified include inconsistent information, the unification operation
simply fails.
Only when a normal parsing process fails to find a complete
parse, the recovery process is invoked.
In order to perform a recovery process efficiently, it is necessary
(1) not to search the result of the failure repeatedly,
(2) to know what is to be obtainable if the failure is recovered, and
(3) to know the degree of the failure.
Since the result of ordinary unification only ends in failure,
(1) to (3) above are not attainable in ordinary unification
operation. It is necessary to keep results of failure. In order
to attain the conditions (1) through (3), we extend the
unification operation for handling feature structures that include
inconsistency. We call this extended unification
cost-based unification, which always
succeeds even if two feature structures include inconsistent
information. When inconsistency is detected between
feature structures, a cost is assigned to the resultant
feature structure according to the degree of inconsistency. We
also introduced the notion of reward, which reflects the
goodness of the result.
Our approach consists of three methods for parsing grammatically
ill-formed inputs, based on syntactic, semantic and contextual
information. The first method handles a local ill-formedness
such as constraint violation. The second and third methods
handle a non-local ill-formedness such as word order violation,
incomplete sentential fragment and ellipses.
The first two methods try to discover a phrase which covers the
whole input. After they are performed, the third method receives
the result, and then finds the appropriate interpretation of the
input using contextual information. Since these methods are
performed on the basis of cost and reward trade-off, they can be
integrated into a uniform framework.
The contextual processing performed by the third method is based
on Relevance theory. The most relevant interpretation is
obtained by looking for contextual information of low
accessibility that produced the maximal information. We have
implemented a prototype parser for grammatically ill-formed
inputs using an HPSG-style grammar formalism of Japanese.
Keywords: robust parsing, cost-based unification, relevance theory