Thesaurus Construction and Analysis Method for Dialogue Understanding
--Thesaurus Construction for Dialogue Understanding--
Hiroaki TSURUMARU, Hideyuki MAEDA, and AKihiro KAWASHIMA
Department of Electrical Engineering and Computer Science,
1-14 Bunkyou-machi, Nagasaki 852, JAPAN
Many ellipses and demonstrative pronouns occur in dialogue.
Generally speaking, the omitted words (or phrases) and the pronominal
references are complemented by the use of common sense and discourse
information. Here it becomes a serious problem for the dialogue
understanding that the definition of common sense is not clear.
The thesaurus consisting of the semantic (hierarchical) relations such
as upper/lower relation or part/whole relation between
words is regarded as an approximate model of the common sense.
There are some thesauri such as ``Bunrui-Goi-Hyo (Word List by Semantic
Principles)'' and ``Roget's Thesaurus''. However, they are not
always sufficient for natural language processing, because they
are mainly for the use of human beings.
This study aims to clarify the method for constructing a thesaurus based on hierarchical relations such as upper/lower relation and part/whole relation between the concepts of words, and to approach to the problems of the application of the thesaurus to dialogue understanding. Here we regard one of the senses of a word as one concept. Now, how and from what to obtain these hierarchical relations is one of the most important problems for constructing the thesaurus.
We have been studying how to acquire these hierarchical relations from the definition sentences in the on-line Japanese dictionary, and developing a programming system for computer-aided thesaurus construction. The contents of the current year's studies are mainly as follows: First, we review the algorithm for extracting the hierarchical relations. Second, we discuss the evaluation of the trial thesaurus which has been made on an experimental basis through the results of these works. And third, we also discuss the application of the thesaurus to presumption of the elliptical words in dialogue. Now we describe those three topics more specifically.
(1) Concerning the extracting algorithm, we have reviewed it from a theoretical viewpoint. The basic idea of the extraction of the hierarchical relations is as follows; generally the definition sentence contains the core word(s) expressing the central meaning of the word sense, which we call the definition word(s). Then we extract the definition word(s) and the relational information, and decide the semantical relation between the entry word and the definition word. Here the semantical relations include, as well as upper/lower relation and part/whole relation, synonymous relation and element/set relation. We also regard the latter two relations as hierarchical relation in a wide sense.
(2) Concerning the evaluation of the trial thesaurus, we have researched on the followings;
(3) Concerning the application of the thesaurus, we have studied algorithm for the inference of the omitted words or phrases in dialogue using the thesaurus and IPAL basic verbs dictionary in order to verify the validity of the trial thesaurus. The outline of the algorithm is as follows;
This algorithm can be used to handle pronominal and anaphoric reference. We have collected the dialogue data for experiments from the texts of NHK Sequel Basic English(1991).
Keywords : thesaurus, semantic dictionary, word knowledge, conceptual hierarchy