Common Sense Knowledge Base and Outer World Interface
-- Application to Speech Understanding --
Masao YOKOTA
Department of Communication and Computer Engineering,
Fukuoka Institute of Technology
3-30-1 Wajiro-higashi, Higashi-ku, Fukuoka 811-02, JAPAN
e-mail: yokota@cs.fit.ac.jp
The ideal goal of machine speech recognition is to reconstruct
literally exact utterances. Actually, however, such a thing is quite
difficult or almost impossible because human speech recognition is an
active process that uses many different sources of knowledge, some of
which are deeply embedded in the linguistic competence of the speaker
and the listener. For example, we can presumably discern between
"play-4" and "pray for" from the speech signal simply using
acoustic-phonetic knowledge. However, in order to filter out
improbable word sequences, we must use such higher-level knowledge
sources as syntax, semantics, discourse contexts and worlds. We
intend to construct a speech understanding system IMAGES-S that can
infer the conceptual information which the speaker would transmit. The
processing for this purpose belongs no longer to wave signal
processing but to natural language understanding, especially to
conceptual processing with background knowledge such as commonsense,
world-specific knowledge, etc. And moreover, understanding
incompletely percepted speech is nearly equal to estimating the
concepts of the words omitted in texts. The system IMAGES-S consists
of three modules 1) Speech recognition (SRM), 2) Language
understanding (LUM), and 3) Task realization (TRM). SRM transforms
acoustic signal waves into word-lattices. LUM analyzes these
syntactically and semantically and generates meaning
representations, employing background knowledge. Finally, TRM realizes
the task required by the speaker. The modules LUM and TRM are almost
equal to IMAGES-II, that is, almost completed. The prototype of
SRM will be selected among the ready-made programs employing HMMs.