Thesaurus Construction and Analysis Method for Dialogue Understanding

-- Robust Natural Language Processing --

Yuji Matsumoto

Graduate School of Information Science, Nara Institute of Science and Technology

8916-5 Takayama, Ikoma, Nara 630-01, Japan

e-mail: matsu@is.aist-nara.ac.jp

A robust processing technique is indispensable for the analysis of spontaneous speeches, which should be far more flexible than those based on strict grammars and dictionaries. For flexible natural parsing, we augmented our logic based concurrent parsing system SAX, in which each grammatical category is implemented as an autonomous process to communicate with other processes. The augmentation is done by defining "meta" processes that monitors the partial parsing results passed between grammatical processes. This augmentation makes it possible to clearly separate the grammar part and other auxiliary parts such as ill-formedness processing. We then surveyed robust natural processing techniques for coping with ill-formed inputs and studies pros and cons of some representative techniques. Most of the techniques can be regarded more or less as a relaxation method, in that some conditions described in the grammar and the dictionary are relaxed in front of an analysis failure. We presented an improved technique that claims a small number of principles for relaxing grammatical conditions, not focusing on a predefined set of failure situations. We also presented an implementation method of the idea in the concurrent parsing system we have developed for general purpose natural language processing. In our implementation we could solve the drawbacks found in the relaxation techniques. The advantages of our method are as follows: 1) Concurrent processing model makes it possible to compare multiple candidates of parsing failure. 2) Reuse of the partial analysis results is possible since no backtracking process exists. 3) Principle based meta rules for ill-formedness dispenses with a large set of meta rules for describing each possible cause of parse failure.

Keywords: robust natural language processing, ungrammaticality, ill-formedness, concurrent parsing