Making Computers Really Talk like People
Alan W Black (Language Technologies Institute, Carnegie Mellon University)


We now have speech synthesizers that sound (almost) as good as prerecorded human speech. The goal of speech synthesis has always been to generate natural sounding understandable speech, and we have come a long way from robotic sounding voice synthesis. This talk will give a little history of techniques used for synthesis and discuss the current commercial and research techniques highlighting the strengths and weaknesses in providing computer applications with appropriate speech synthesis. But this talk will also go beyond current work and discuss the new challenges in speech synthesis. How can we make speech synthesis model conversational speech, modify style and accent, and ultimately generate non-lexical aspects of speech including back channelling and laughter.


Alan W Black is an Associate Professor in the Language Technologies Institute at Carnegie Mellon University. He previously worked in the Centre for Speech Technology Research at the University of Edinburgh, and before that at ATR in Japan. He is one of the principal authors of the free software Festival Speech Synthesis System, the FestVox voice building tools and CMU Flite, a small footprint speech synthesis engine. He received his PhD in Computational Linguistics from Edinburgh University in 1993, his MSc in Knowledge Based Systems also from Edinburgh in 1986, and a BSc (Hons) in Computer Science from Coventry University in 1984.

Although much of his core research focuses on speech synthesis, he also works in real-time hands-free speech-to-speech translation systems (Croatian, Arabic and Thai), spoken dialog systems, and rapid language adaptation for support of new languages. Alan W Black was an elected member of the IEEE Speech Technical Committee (2003-2007). He is currently on the board of ISCA, He was program chair of the ISCA Speech Synthesis Workshop 2004, and was general co-chair of Interspeech 2006 -- ICSLP. In 2004, with Prof Keiichi Tokuda, he initiated the now annual Blizzard Challenge, the largest multi-site evaluation of corpus-based speech synthesis techniques.