Publications

Speech Recognition of Lectures and Meetings

  • T.Kawahara.
    Automatic meeting transcription system for the Japanese Parliament (Diet).
    In Proc. APSIPA ASC, (overview talk), 2017. (PDF file)
  • S.Ueno, H.Inaguma, M.Mimura, and T.Kawahara.
    Acoustic-to-word attention-based model complemented with character-level CTC-based model.
    In Proc. IEEE-ICASSP, pp.5804--5808, 2018. (PDF file)
  • S.Li, Y.Akita, and T.Kawahara.
    Semi-supervised acoustic model training by discriminative data selection from multiple ASR systems' hypotheses.
    IEEE/ACM Trans. Audio, Speech \& Language Process., Vol.24, No.9, pp.1524--1534, 2016. (text) (PDF file) (KURENAI)

Robust Speech Recognition

  • K.Shimada, Y.Bando, M.Mimura, K.Itoyama, K.Yoshii, and T.Kawahara.
    Unsupervised beamforming based on multichannel nonnegative matrix factorization for noisy speech recognition.
    In Proc. IEEE-ICASSP, pp.5734--5738, 2018. (PDF file)
  • M.Mimura, S.Sakai, and T.Kawahara.
    Reverberant speech recognition combining deep neural networks and deep autoencoders augumented with phone-class feature.
    EURASIP J. Advances in Signal Processing, Vol.2015, No.62, pp.1--13, 2015. (text) (PDF file) (KURENAI)

Source Separation and Speech Enhancement

  • Y.Bando, M.Mimura, K.Itoyama, K.Yoshii, and T.Kawahara.
    Statistical speech enhancement based on probabilistic integration of variational autoencoder and non-negative matrix factorization.
    In Proc. IEEE-ICASSP, pp.716--720, 2018. (PDF file)
  • K.Itakura, Y.Bando, E.Nakamura, K.Itoyama, K.Yoshii, and T.Kawahara.
    Bayesian multichannel audio source separation based on integrated source and spatial models.
    IEEE/ACM Trans. Audio, Speech \& Language Process., Vol.26, No.4, pp.831--846, 2018. (text) (PDF file)

Dialogue Systems

  • T.Kawahara.
    Spoken dialogue system for a human-like conversational robot ERICA.
    In Proc. Int'l Workshop Spoken Dialogue Systems (IWSDS), (keynote speech), 2018. (PDF file)
  • K.Yoshino and T.Kawahara.
    Conversational system for information navigation based on POMDP with user focus tracking.
    Computer Speech and Language, Vol.34, No.1, pp.275--291, 2015. (text) (PDF file)

Interaction Analysis and Model

  • T.Kawahara, T.Yamaguchi, K.Inoue, K.Takanashi, and N.Ward.
    Prediction and generation of backchannel form for attentive listening systems.
    In Proc. INTERSPEECH, pp.2890--2894, 2016. (PDF file)

Multi-modal Conversation Analysis

  • T.Kawahara, T.Iwatate, K.Inoue, S.Hayashi, H.Yoshimoto, and K.Takanashi.
    Multi-modal sensing and analysis of poster conversations with smart posterboard.
    APSIPA Trans. Signal \& Information Process., Vol.5, No.e2, pp.1--12, 2016. (text)

CALL (Computer Assisted Language Learning)

  • R.Duan, T.Kawahara, M.Dantsuji, and J.Zhang.
    Effective articulatory modeling for pronunciation error detection of L2 learner without non-native training data.
    In Proc. IEEE-ICASSP, pp.5815--5819, 2017. (PDF file)
  • M.Mirzaei, K.Meshgi, and T.Kawahara.
    Exploiting automatic speech recognition errors to enhance partial and synchronized caption for facilitating second language listening.
    Computer Speech and Language, Vol.49, pp.17--36, 2018. (text)
  • H.Wang, C.J.Waple, and T.Kawahara.
    Computer assisted language learning system based on dynamic question generation and error prediction for automatic speech recognition.
    Speech Communication, Vol.51, No.10, pp.995--1005, 2009. (text) (PDF file)
  • Y.Tsubota, T.Kawahara, and M.Dantsuji.
    An English pronunciation learning system for Japanese students based on diagnosis of critical pronunciation errors.
    ReCALL Journal, Vol.16, No.1, pp.173--188, 2004. (text) (PDF file)

Speech Understanding

  • T.Zhao and T.Kawahara.
    Joint learning of dialog act segmentation and recognition in spoken dialog using neural networks.
    In Proc. IJCNLP, pp.704--712, 2017. (PDF file)
  • T.Kawahara, C.-H.Lee, and B.-H.Juang.
    Flexible speech understanding based on combined key-phrase detection and verification.
    IEEE Trans. Speech \& Audio Process., Vol.6, No.6, pp.558--568, 1998. (text) (PDF file)

Natural Language Processing for Rich Transcription

  • H.Inaguma, M.Mimura, K.Inoue, K.Yoshii, and T.Kawahara.
    An end-to-end approach to joint social signal detection and automatic speech recognition.
    In Proc. IEEE-ICASSP, pp.6214--6218, 2018. (PDF file)
  • G.Neubig, Y.Akita, S.Mori, and T.Kawahara.
    A monotonic statistical machine translation approach to speaking style transformation.
    Computer Speech and Language, Vol.26, No.5, pp.349--370, 2012. (text) (PDF file)
  • T.Kawahara, M.Hasegawa, K.Shitaoka, T.Kitade, and H.Nanjo.
    Automatic indexing of lecture presentations using unsupervised learning of presumed discourse markers.
    IEEE Trans. Speech \& Audio Process., Vol.12, No.4, pp. 409--419, 2004. (text) (PDF file) (KURENAI)

Large Vocabulary Continuous Speech Recognition Platform

  • A.Lee and T.Kawahara.
    Recent development of open-source speech recognition engine Julius.
    In Proc. APSIPA ASC, pp.131--137, 2009. (PDF file)
  • T.Kawahara, A.Lee, K.Takeda, K.Itou, and K.Shikano.
    Recent progress of open-source LVCSR engine Julius and Japanese model repository.
    In Proc. ICSLP, pp.3069--3072, 2004. (PDF file)
  • A.Lee, T.Kawahara, and K.Shikano.
    Julius -- an open source real-time large vocabulary recognition engine.
    In Proc. EUROSPEECH, pp.1691--1694, 2001. (PDF file)