Speech communication plays a key role in human intelligence. We are studying intelligent processing of speech, audio and music exchanged by human beings for automatic recognition, understanding and interaction systems, specifically (1) automatic speech transcription of meetings and lectures, (2) analysis of audio scenes and music signals composed of multiple sound sources, and (3) humanoid robots that make natural interaction by combining non-verbal information.


Automatic Speech Recognition and Rich Transcription

Automatic speech recognition (ASR) of lectures and meetings, and also natural language processing (NLP) for segmenting and extracting information structures, in order to realize intelligent transcription and captioning systems.
Large Vocabulary Continuous Speech Recognition Platform ...Julius site
Speech Recognition of Lectures and Meetings ...Diet project
Natural Language Processing for Rich Transcription

Spoken Dialogue Systems for Human-Robot Interaction

Spoken dialogue model and systems integrating verbal and non-verbal information for humanoid robots (android) that behaves like and naturally interacts with human beings.
Speech Understanding
Interaction Analysis and Model
Dialogue Systems

Music Information Processing

Sound source separation and automatic transcription of music audio signals, applied to an intelligent sound editor that separates singing voice and accompaniment sound.

Acoustic Signal Processing for Audio Scene Analysis

Analysis of the real world where multiple persons and a variety of sound sources exist, based on multi-modal sensing and statistical acoustic signal processing.
Source Separation and Speech Enhancement
Robust Speech Recognition
Multi-modal Conversation Analysis

CALL (Computer Assisted Language Learning)

Next-generation CALL system that can automatically check pronunciation of foreign language learners and serve as a virtual language teacher for simulated conversation practice.
CALL (Computer Assisted Language Learning)