Speech Synthesis Method for Dialogue and Psychological Assessment of Synthetic Speech

Hideki Kasuya, Wen Ding, Jin-lin Lu, Naoki Koshikawa and Yoshihisa Watanabe

Department of Electrical and Electronic Engineering, Utsunomiya University

2753 Ishii-machi, Utsunomiya 321, Japan

e-mail: kasuya@utsunomiya-u.ac.jp

In order to develop a speech synthesis method for a dialogue system and to derive appropriate methods to make assessment of synthetic speech in the dialogue environment, three major efforts have been made this year: Exploring a novel method to extract the parameters of a voice source and ARMA model, (2)developing a speaking rate conversion system, and (3)building up an integrated speech analysis software system. A simulated annealing method was successfully used to estimate the voice source parameters of a Rosenberg-Klatt model, as it was applied together with a nonstationary Kalman filter model to estimate ARMA parameters. This method was applied to the analysis of some of the VCV(vowel-consonant-vowel) sound segments uttered in a sentence context of [sorewa VCV desuka?] (Is it VCV ?). The ARMA parameters together with the voice source parameters analyzed from each of the VCV segments were concatenated to produce a meaningful sentence, /imagamigoronoume/ (Plum flowers are at their best.). The pitch contour was created from the same sentence uttered separately. The synthetic sound proved to be natural sounding and quite intelligible. Our new method seems promising in developing a speech synthesizer for dialogue system. Rhythm and tempo of speech seem to play an important role in the inter-speaker communication. To investigate psychological significance of the rhythm and tempo in the dialogue, we have to have an experimental apparatus to control in various ways rhythm and tempo of a dialogue speech. We have developed a semi-automatic speaking rate conversion system based on a repetition and elimination of a single period waveform. An integrated speech analysis software has been made to perform efficiently the measurement and statistical analysis of prosodic features in dialogue speech. This system is to be used next year to gain acoustic phonetic knowledge of the dialogue speech.