Publications

T.Kawahara.
Automatic meeting transcription system for the Japanese Parliament (Diet).
In Proc. APSIPA ASC, (overview talk), 2017. (PDF file)
K.Matsuura, S.Ueno, M.Mimura, S.Sakai, and T.Kawahara.
Speech corpus of Ainu folklore and end-to-end speech recognition for Ainu language.
In Proc. Int'l Conf. Language Resources \& Evaluation (LREC), pp.2622--2628, 2020. (PDF file)
S.Ueno, A.Lee, and T.Kawahara.
Refining synthesized speech using speaker information and phone masking for data augmentation of speech recognition.
IEEE/ACM Trans. Audio, Speech \& Language Process., Vol.32, pp.3924--3933, 2024. (text) (KURENAI)
H.Inaguma and T.Kawahara.
Alignment knowledge distillation for online streaming attention-based speech recognition.
IEEE/ACM Trans. Audio, Speech \& Language Process., Vol.31, pp.1371--1385, 2021. (text)

Y.Gao, H.Shi, C.Chu, and T.Kawahara.
Speech emotion recognition with multi-level acoustic and semantic information extraction and interaction.
In Proc. INTERSPEECH, pp.1060--1064, 2024. (PDF file)
H.Feng, S.Ueno, and T.Kawahara.
End-to-end speech emotion recognition combined with acoustic-to-word ASR model.
In Proc. INTERSPEECH, pp.501--505, 2020. (PDF file)

H.Shi, M.Mimura, and T.Kawahara.
Waveform-domain speech enhancement using spectrogram encoding for robust speech recognition.
IEEE/ACM Trans. Audio, Speech \& Language Process., Vol.32, pp.3049--3060, 2024. (text) (KURENAI)
K.Shimada, Y.Bando, M.Mimura, K.Itoyama, K.Yoshii, and T.Kawahara.
Unsupervised speech enhancement based on multichannel NMF-informed beamforming for noise-robust automatic speech recognition.
IEEE/ACM Trans. Audio, Speech \& Language Process., Vol.27, No.5, pp.960--971, 2019. (text) (KURENAI)

K.Sekiguchi, Y.Bando, A.A.Nugraha, K.Yoshii, and T.Kawahara.
Fast multichannel nonnegative matrix factorization with directivity-aware jointly-diagonalizable spatial covariance matrices for blind source separation.
IEEE/ACM Trans. Audio, Speech \& Language Process., Vol.28, pp.2610--2625, 2020. (text)
Y.Bando, M.Mimura, K.Itoyama, K.Yoshii, and T.Kawahara.
Statistical speech enhancement based on probabilistic integration of variational autoencoder and non-negative matrix factorization.
In Proc. IEEE-ICASSP, pp.716--720, 2018. (PDF file)

T.Zhao and T.Kawahara.
Joint dialog act segmentation and recognition in human conversations using attention to dialog context.
Computer Speech and Language, Vol.50, pp.108--127, 2019. (text)
T.V.Dang, T.Zhao, S.Ueno, H.Inaguma, and T.Kawahara.
End-to-end speech-to-dialog-act recognition.
In Proc. INTERSPEECH, pp.3910--3914, 2020. (PDF file)

T.Kawahara.
Spoken dialogue system for a human-like conversational robot ERICA.
In Proc. Int'l Workshop Spoken Dialogue Systems (IWSDS), (keynote speech), 2018. (PDF file)
K.Inoue, K.Hara, D.Lala, K.Yamamoto, S.Nakamura, K.Takanashi, and T.Kawahara.
Job interviewer android with elaborate follow-up question generation.
In Proc. ICMI, pp.324--332, 2020. (PDF file)
K.Inoue, D.Lala, K.Yamamoto, S.Nakamura, K.Takanashi, and T.Kawahara.
An attentive listening system with android ERICA: Comparison of autonomous and WOZ interactions.
In Proc. SIGdial Meeting Discourse \& Dialogue, pp.118--127, 2020. (PDF file)
Tatsuya Kawahara, Hiroshi Saruwatari, Ryuichiro Higashinaka, Kazunori Komatani, and Akinobu Lee.
Spoken Dialogue Technology for Semi-Autonomous Cybernetic Avatars.
In Hiroshi Ishiguro, Fuki Ueno, and Eiki Tachibana, editors, (text)

K.Yamamoto, K.Inoue, and T.Kawahara.
Character expression for spoken dialogue systems with semi-supervised learning using variational auto-encoder.
Computer Speech and Language, Vol.79, No. 101469, 2023. (text)
K.Inoue, D.Lala, and T.Kawahara.
Can a robot laugh with you?: Shared laughter generation for empathetic spoken dialogue.
Frontiers in Robotics and AI, Vol.Computational Intelligence in Robotics, pp.1--11, 9:933261, 2022. (text) (KURENAI)
K.Inoue, B.Jiang, E.Ekstedt, T.Kawahara, and G.Skantze.
Multilingual turn-taking prediction using voice activity projection.
In Proc. COLING, pp.11873--11883, 2024. (PDF file)

K.Inoue, D.Lala, K.Takanashi, and T.Kawahara.
Engagement recognition by a latent character model based on multimodal listener behaviors in spoken dialogue.
APSIPA Trans. Signal \& Information Process., Vol.7, No.e9, pp.1--16, 2018. (text)
T.Kawahara, T.Iwatate, K.Inoue, S.Hayashi, H.Yoshimoto, and K.Takanashi.
Multi-modal sensing and analysis of poster conversations with smart posterboard.
APSIPA Trans. Signal \& Information Process., Vol.5, No.e2, pp.1--12, 2016. (text)

J.Nozaki, T.Kawahara, K.Ishizuka, and T.Hashimoto.
End-to-end speech-to-punctuated-text recognition.
In Proc. INTERSPEECH, pp.1811--1815, 2022. (PDF file)
M.Mimura, S.Sakai, and T.Kawahara.
An end-to-end model from speech to clean transcript for parliamentary meetings.
In Proc. APSIPA ASC, pp.465--470, 2021. (PDF file)

R.Duan, T.Kawahara, M.Dantsuji, and H.Nanjo.
Cross-lingual transfer learning of non-native acoustic modeling for pronunciation error detection and diagnosis.
IEEE/ACM Trans. Audio, Speech \& Language Process., Vol.28, No.1, pp.391--401, 2020. (text) (KURENAI)
M.Mirzaei, K.Meshgi, and T.Kawahara.
Exploiting automatic speech recognition errors to enhance partial and synchronized caption for facilitating second language listening.
Computer Speech and Language, Vol.49, pp.17--36, 2018. (text)

A.Lee and T.Kawahara.
Recent development of open-source speech recognition engine Julius.
In Proc. APSIPA ASC, pp.131--137, 2009. (PDF file)
T.Kawahara, A.Lee, K.Takeda, K.Itou, and K.Shikano.
Recent progress of open-source LVCSR engine Julius and Japanese model repository.
In Proc. ICSLP, pp.3069--3072, 2004. (PDF file)

Speech and Audio Processing Laboratory