Common Sense Knowledge Base and Outer World Interface
-- Application to Speech Understanding --

Masao YOKOTA

Department of Communication and Computer Engineering,
Fukuoka Institute of Technology
3-30-1 Wajiro-higashi, Higashi-ku, Fukuoka 811-02, JAPAN
e-mail: yokota@cs.fit.ac.jp

The ideal goal of machine speech recognition is to reconstruct literally exact utterances. Actually, however, such a thing is quite difficult or almost impossible because human speech recognition is an active process that uses many different sources of knowledge, some of which are deeply embedded in the linguistic competence of the speaker and the listener. For example, we can presumably discern between "play-4" and "pray for" from the speech signal simply using acoustic-phonetic knowledge. However, in order to filter out improbable word sequences, we must use such higher-level knowledge sources as syntax, semantics, discourse contexts and worlds. We intend to construct a speech understanding system IMAGES-S that can infer the conceptual information which the speaker would transmit. The processing for this purpose belongs no longer to wave signal processing but to natural language understanding, especially to conceptual processing with background knowledge such as commonsense, world-specific knowledge, etc. And moreover, understanding incompletely percepted speech is nearly equal to estimating the concepts of the words omitted in texts. The system IMAGES-S consists of three modules 1) Speech recognition (SRM), 2) Language understanding (LUM), and 3) Task realization (TRM). SRM transforms acoustic signal waves into word-lattices. LUM analyzes these syntactically and semantically and generates meaning representations, employing background knowledge. Finally, TRM realizes the task required by the speaker. The modules LUM and TRM are almost equal to IMAGES-II, that is, almost completed. The prototype of SRM will be selected among the ready-made programs employing HMMs.