Towards Spontaneous Dialogue Speech Understanding

Hiroaki SAITO

Department of Mathematics, Keio University

3-14-1 Hiyoshi Kouhoku-ku Yokohama 223 Japan

e-mail: hxs@nak.math.keio.ac.jp

An error-recoverable LR parser has been developed which can handle various errors and noise phenomena in spontaneous speech. The context-free grammar for the parser is phoneme-based to deal with phonemic and word information uniformly. The parser corrects such minor errors as phoneme insertion, deletion and substitution in the noisy phoneme sequence of an utterance by looking ahead the action part of the generalized LR table. The parser puts a dummy nonterminal symbol against a lengthy noisy portion by looking ahead the goto table. Namely, at state i which is expecting shift action(s), the parser also consults the goto table. If an entry m exists along the row of state i under the column labeled with nonterminal D, the parser shifts D onto the stack and goes to state m. Nonterminal D in this case is dummy. This technique named "gap-filling" enhances the robustness of the parser in handling noisiness in the following points.

(1) A dummy nonterminal fills big missing constituents of the input which would yield no hypotheses without the gap-filling function.

(2) The gap-filling function enables an LR parser to perform reduce actions only when the action creates a definite high-score nonterminal. The dummy nonterminal is likely to be either an insertion or an unknown word. Semantic analysis is also performed along with the syntactic parsing. In making use of a two-way speech system, the parser interacts with the speaker against unidentified portions instead of conjecturing them. The reutterance of the unidentified portion is parsed efficiently by using the parse record of the first utterance. When the unidentified portion is found to be an unknown word after interaction, the word is incorporated into the system incrementally. The parser has been tested using the grammar which contains a few hundred words. Two other related researches have been done this year. One is to start implementing a speech recognition module for spontaneous speech using a Hidden Markov Model toolkit called 'HTK'. Currently a phoneme recognition model is being built using Tohoku Univ-Matsushita's spoken word database and JIPDEC's spoken sentence database. The other is a parser generator 'NLyacc' we have developed for natural language processing. NLyacc, unlike Yacc, can handle arbitrary context-free grammars using the generalized LR parsing algorithm. Although NLyacc is not currently equipped with the error-correction mechanism mentioned above, we have begun to distribute NLyacc as a free software. (contact nlyacc@nak.math.keio.ac.jp for distribution)

Keywords: speech understanding, spontaneous speech, parsing, interactive system