FY 2025 | 2024 | 2023 | 2022 | 2021 | 2020 |
FY 2019 | 2018 | 2017 | 2016 | 2015 | 2014 | 2013 | 2012 | 2011 | 2010 |
FY 2009 | 2008 | 2007 | 2006 | 2005 | 2004 | 2003 | 2002 | 2001 | 2000 |

FY 2025

H.Shi, X.Lu, K.Shimada, and T.Kawahara.
Combining deterministic enhanced conditions with dual-streaming encoding for diffusion-based speech enhancement.
IEEE/ACM Trans. Audio, Speech \& Language Process., Vol.33, pp.4253--4266, 2025. (text)
Y.Gao, H.Shi, C.Chu, and T.Kawahara.
Multi-attribute learning for multi-level emotion recognition from speech.
APSIPA Trans. Signal \& Information Process., Vol.14, No. e20, pp.1--29, 2025. (text)
K.Shimada, K.Uchida, Y.Koyama, T.Shibuya, S.Takahashi, Y.Mitsufuji, and T.Kawahara.
Open-vocabulary sound event localization and detection with joint learning of CLAP embedding and activity-coupled cartesian DOA vector.
IEEE/ACM Trans. Audio, Speech \& Language Process., Vol.33, pp.2946--2960, 2025. (text)
K.Ochi, D.Lala, K.Inoue, T.Kawahara, and H.Kumazaki.
Robot-mediated multi-party conversation aimed at affect improvement for psychiatric patients.
IEEE Transactions on Affective Computing, p. (accepted for publication), 2025. (text)
Tatsuya Kawahara, Yuya Akita, and Mikitaka Masuyama.
Captioning parliamentary meeting videos using official meeting transcripts.
The Journal of Professional Reporting and Transcription (Tiro), No.1, 2025. (text)

FY 2024

S.Ueno, A.Lee, and T.Kawahara.
Refining synthesized speech using speaker information and phone masking for data augmentation of speech recognition.
IEEE/ACM Trans. Audio, Speech \& Language Process., Vol.32, pp.3924--3933, 2024. (text) (KURENAI)
H.Shi, M.Mimura, and T.Kawahara.
Waveform-domain speech enhancement using spectrogram encoding for robust speech recognition.
IEEE/ACM Trans. Audio, Speech \& Language Process., Vol.32, pp.3049--3060, 2024. (text) (KURENAI)
K.Soky, S.Li, C.Chu, and T.Kawahara.
Finetuning pretrained model with embedding of domain and language information for ASR of very low-resource settings.
International Journal of Asian Language Processing, Vol.33, No.4, pp.2350024:1--16, 2024. (text) (KURENAI)

FY 2023

K.Yamamoto, K.Inoue, and T.Kawahara.
Character expression of a conversational robot for adapting to user personality.
Advanced Robotics, Vol.38, No.4, pp.256--266, 2024. (text)
Y.Fu, K.Inoue, D.Lala, K.Yamamoto, C.Chu, and T.Kawahara.
Dual variational generative model and auxiliary retrieval for empathetic response generation by conversational robot.
Advanced Robotics, Vol.37, No.21, pp.1406--1418, 2023. (text) (KURENAI preprint)
K.Ochi, K.Inoue, D.Lala, T.Kawahara, and H.Kumazaki.
Effect of attentive listening robot on pleasure and arousal change in psychiatric daycare.
Advanced Robotics, Vol.37, No.21, pp.1382--1391, 2023. (text) (KURENAI) (KURENAI preprint)
K.Yamamoto, K.Inoue, and T.Kawahara.
Character expression for spoken dialogue systems with semi-supervised learning using variational auto-encoder.
Computer Speech and Language, Vol.79, No. 101469, pp.1--14, 2023. (text)

FY 2022

K.Inoue, D.Lala, and T.Kawahara.
Can a robot laugh with you?: Shared laughter generation for empathetic spoken dialogue.
Frontiers in Robotics and AI, Vol.Computational Intelligence in Robotics, pp.1--11, 9:933261, 2022. (text) (KURENAI)
K.Soky, M.Mimura, T.Kawahara, C.Chu, S.Li, C.Ding, and S.Sam.
TriECCC: Trilingual corpus of the Extraordinary Chambers in the Courts of Cambodia for speech recognition and translation studies.
International Journal of Asian Language Processing, Vol.31, No. 3\&4, pp.225007:1--21, 2022. (text) (KURENAI)
K.Sekiguchi, Y.Bando, A.A.Nugraha, M.Fontaine, K.Yoshii, and T.Kawahara.
Autoregressive moving average jointly-diagonalizable spatial covariance analysis for joint source separation and dereverberation.
IEEE/ACM Trans. Audio, Speech \& Language Process., Vol.30, pp.2368--2382, 2022. (text)

FY 2021

Y.Du, R.Scheibler, M.Togami, K.Yoshii, and T.Kawahara.
Computationally-efficient overdetermined blind source separation based on iterative source steering.
IEEE Signal Processing Letters, Vol.29, pp.927--931, 2021. (text)
H.Inaguma and T.Kawahara.
Alignment knowledge distillation for online streaming attention-based speech recognition.
IEEE/ACM Trans. Audio, Speech \& Language Process., Vol.31, pp.1371--1385, 2021. (text)
S.Ueno, M.Mimura, S.Sakai, and T.Kawahara.
Synthesizing waveform sequence-to-sequence to augment training data for sequence-to-sequence speech recognition.
Acoustical Science \& Technology, Vol.42, No.6, pp.333--343, 2021. (text) (PDF file)
E.Nakamura and K.Yoshii.
Musical Rhythm Transcription Based on Bayesian Piece-Specific Score Models Capturing Repetitions.
Information Sciences, Vol. 572, pp. 482-500, 2021. (text)
K.Shibata, E.Nakamura, and K.Yoshii.
Non-Local Musical Statistics as Guides for Audio-to-Score Piano Transcription.
Information Sciences, Vol. 566, pp. 262–280, 2021. (text)
T.Kawahara, N.Muramatsu, K.Yamamoto, D.Lala, and K.Inoue.
Semi-autonomous avatar enabling unconstrained parallel conversations --seamless hybrid of WOZ and autonomous dialogue systems--.
Advanced Robotics, Vol.35, No.11, pp.657--663, 2021. (text)
R.Nishikimi, E.Nakamura, M.Goto, and K.Yoshii.
Audio-to-Score Singing Transcription Based on a CRNN-HSMM Hybrid Model.
APSIPA Trans. Signal \& Information Process., Vol.10, No.e7, pp.1–13, 2021. (text)

FY 2020

A.A.Nugraha, K.Sekiguchi, M.Fontaine, Y.Bando, and K.Yoshii.
Flow-Based Independent Vector Analysis for Blind Source Separation.
IEEE Signal Processing Letters, Vol. 28, pp. 2173–2177, 2020. (text)
Y.Wu, T.Carsault, E.Nakamura, and K.Yoshii.
Semi-Supervised Neural Chord Estimation Based on a Variational Autoencoder With Latent Chord Labels and Features.
IEEE/ACM Trans. Audio, Speech \& Language Process., Vol.28, pp.2956-2966, 2020. (text)
K.Sekiguchi, Y.Bando, A.A.Nugraha, K.Yoshii, and T.Kawahara.
Fast multichannel nonnegative matrix factorization with directivity-aware jointly-diagonalizable spatial covariance matrices for blind source separation.
IEEE/ACM Trans. Audio, Speech \& Language Process., Vol.28, pp.2610--2625, 2020. (text)
R.Nishikimi, E.Nakamura, M.Goto, K.Itoyama, and K.Yoshii.
Bayesian Singing Transcription Based on a Hierarchical Generative Model of Keys, Musical Notes, and F0 Trajectories.
IEEE/ACM Trans. Audio, Speech \& Language Process., Vol.28, pp.1678--1691, 2020. (text)
H.Tsushima, E.Nakamura, K.Itoyama, and K.Yoshii.
Bayesian Melody Harmonization Based on a Tree-Structured Generative Model of Chord Sequences and Melodies.
IEEE/ACM Trans. Audio, Speech \& Language Process., Vol.28, pp.1644--1655, 2020. (text)
A.A.Nugraha, K.Sekiguchi, and K.Yoshii.
A Flow-Based Deep Latent Variable Model for Speech Spectrogram Modeling and Enhancement.
IEEE/ACM Trans. Audio, Speech \& Language Process., Vol.28, pp.1104--1117, 2020. (text)
Tatsuya Kawahara, Shoko Ueno, and Masaya Morikawa.
Transcription system using automatic speech recognition in the Japanese Parliament.
The Journal of Professional Reporting and Transcription (Tiro), No.1, 2020. (text)
A.A.Nugraha, K.Sekiguchi, and K.Yoshii.
A Flow-Based Deep Latent Variable Model for Speech Spectrogram Modeling and Enhancement.
IEEE/ACM Trans. Audio, Speech \& Language Process., Vol.28, pp.1104--1117, 2020. (text)
R.Duan, T.Kawahara, M.Dantsuji, and H.Nanjo.
Cross-lingual transfer learning of non-native acoustic modeling for pronunciation error detection and diagnosis.
IEEE/ACM Trans. Audio, Speech \& Language Process., Vol.28, pp.391--401, 2020. (text) (KURENAI)
K.Sekiguchi, Y.Bando, A.A.Nugraha, K.Yoshii, and T.Kawahara.
Semi-supervised multichannel speech enhancement with a deep speech prior.
IEEE/ACM Trans. Audio, Speech \& Language Process., Vol.27, No.12, pp.2197--2212, 2019. (text)
Y.Li, C.T.Ishi, K.Inoue, S.Nakamura, K.Takanashi, and T.Kawahara.
Expressing reactive emotion based on multimodal emotion recognition for natural conversation in human-robot interaction.
Advanced Robotics, Vol.33, No.20, pp.1030--1041, 2019. (text)
T.Zhao and T.Kawahara.
Joint dialog act segmentation and recognition in human conversations using attention to dialog context.
Computer Speech and Language, Vol.57, pp.108--127, 2019. (text) (KURENAI)
K.Shimada, Y.Bando, M.Mimura, K.Itoyama, K.Yoshii, and T.Kawahara.
Unsupervised speech enhancement based on multichannel NMF-informed beamforming for noise-robust automatic speech recognition.
IEEE/ACM Trans. Audio, Speech \& Language Process., Vol.27, No.5, pp.960--971, 2019. (text) (KURENAI)

FY 2018

Y.Ojima, E.Nakamura, K.Itoyama and K.Yoshii.
Chord-aware automatic music transcription based on hierarchical Bayesian integration of acoustic and language models.
APSIPA Trans. Signal \& Information Process., Vol.7, No.e14, pp.1--14, 2018. (text)
E.Nakamura and K.Yoshii.
Statistical piano reduction controlling performance difficulty.
APSIPA Trans. Signal \& Information Process., Vol.7, No.e13, pp.1--12, 2018. (text)
K.Inoue, D.Lala, K.Takanashi, and T.Kawahara.
Engagement recognition by a latent character model based on multimodal listener behaviors in spoken dialogue.
APSIPA Trans. Signal \& Information Process., Vol.7, No.e9, pp.1--16, 2018. (text)
H.Tsushima, E.Nakamura, K.Itoyama, and K.Yoshii.
Generative Statistical Models with Self-Emergent Grammar of Chord Sequences.
Journal of New Music Research, 2018. (text)
M.Mirzaei, K.Meshgi, and T.Kawahara.
Exploiting automatic speech recognition errors to enhance partial and synchronized caption for facilitating second language listening.
Computer Speech and Language, Vol.49, pp.17--36, 2018. (text) (KURENAI)
T.Hagiya, T.Horiuchi, T.Yazaki, and T.Kawahara.
Typing Tutor: Individualized tutoring in text entry for older adults based on statistical input stumble detection.
J. Information Processing, Vol.26, No.4, 2018. (text)
K.Itakura, Y.Bando, E.Nakamura, K.Itoyama, K.Yoshii, and T.Kawahara.
Bayesian multichannel audio source separation based on integrated source and spatial models.
IEEE/ACM Trans. Audio, Speech \& Language Process., Vol.26, No.4, pp.831--846, 2018. (text) (PDF file)

FY 2017

Y.Bando, K.Itoyama, M.Konyo, S.Tadokoro, K.Nakadai, K.Yoshii, T.Kawahara, and H.G.Okuno.
Speech enhancement based on Bayesian low-rank and sparse decomposition of multichannel magnitude spectrograms.
IEEE/ACM Trans. Audio, Speech \& Language Process., Vol.26, No.2, pp.215--230, 2018. (text) (PDF file) (Errata)
R.Duan, T.Kawahara, M.Dantsuji, and J.Zhang.
Articulatory modeling for pronunciation error detection without non-native training data based on DNN transfer learning.
IEICE Trans., Vol.E100-D, No.9, pp.2174--2182, 2017. (text)
T.Hagiya, T.Horiuchi, T.Yazaki, T.Kato, and T.Kawahara.
Assistive typing application for older adults based on input stumble detection.
J. Information Processing, Vol.25, No.6, 2017. (text)

FY 2016

M.Mirzaei, K.Meshgi, Y.Akita, and T.Kawahara.
Partial and synchronized captioning: A new tool to assist learners in developing second language listening skill.
ReCALL Journal, Vol.29, No.2, pp.178--199, 2017. (text) (PDF file)
M.Ohkita, Y.Bando, Y.Ikemiya, E.Nakamura, K.Itoyama, and K.Yoshii.
Audio-visual beat tracking based on a state-space model for a robot dancer performing with a human dancer
Journal Robotics \& Mechatronics, Vol.29, No.1, pp.125-136, 2017. (text)
K.Sekiguchi, Y.Bando, K.Itoyama, and K.Yoshii.
Layout optimization of cooperative distributed microphone arrays based on estimation of source separation performance.
Journal Robotics \& Mechatronics, Vol.29, No.1, pp.83-93, 2017. (text)
K.Youssef, K.Itoyama, and K.Yoshii.
Simultaneous identification and localization of still and mobile speakers based on binaural robot audition.
Journal Robotics \& Mechatronics, Vol.29, No.1, pp.59-71, 2017. (text)
Y.Ikemiya, K.Itoyama, and K.Yoshii.
Singing Voice Separation and Vocal F0 Estimation Based on Mutual Combination of Robust Principal Component Analysis and Subharmonic Summation.
IEEE/ACM Trans. Audio, Speech \& Language Process., Vol.24, No.11, pp.2084--2095, 2016. (text) (PDF file)
S.Li, Y.Akita, and T.Kawahara.
Semi-supervised acoustic model training by discriminative data selection from multiple ASR systems' hypotheses.
IEEE/ACM Trans. Audio, Speech \& Language Process., Vol.24, No.9, pp.1524--1534, 2016. (text) (PDF file) (KURENAI)
T.Kawahara, T.Iwatate, K.Inoue, S.Hayashi, H.Yoshimoto, and K.Takanashi.
Multi-modal sensing and analysis of poster conversations with smart posterboard.
APSIPA Trans. Signal \& Information Process., Vol.5, No.e2, pp.1--12, 2016. (text)
K.Yoshino and T.Kawahara.
Conversational system for information navigation based on POMDP with user focus tracking.
Computer Speech and Language, Vol.34, No.1, pp.275--291, 2015. (text)
I.Nishimuta, K.Yoshii, K.Itoyama, and H.G.Okuno.
Toward a Quizmaster Robot for Speech-based Multiparty Interaction.
Advanced Robotics., Vol.29, No.18, pp.1205--1219, 2015. (text)
S.Li, Y.Akita, and T.Kawahara.
Automatic lecture transcription based on discriminative data selection for lightly supervised acoustic model training.
IEICE Trans., Vol.E98-D, No.8, pp.1545--1552, 2015. (text)
R.Gomez, T.Kawahara, and K.Nakadai.
Optimized wavelet-domain filtering under noisy and reverberant conditions.
APSIPA Trans. Signal \& Information Process., Vol.4, No.e3, pp.1--12, 2015. (text)
M.Mimura, S.Sakai, and T.Kawahara.
Reverberant speech recognition combining deep neural networks and deep autoencoders augmented with phone-class feature.
EURASIP J. Advances in Signal Processing, Vol.2015, No.62, pp.1--13, 2015. (text) (PDF file) (KURENAI)
T.Tung, R.Gomez, T.Kawahara, and T.Matsuyama.
Multi-party interaction understanding using smart multimodal digital signage.
IEEE Trans. Human-Machine Systems, Vol.44, No.5, pp. 625--637, 2014. (text) (PDF file)
M.Ablimit, T.Kawahara, and A.Hamdulla.
Lexicon optimization based on discriminative learning for automatic speech recognition of agglutinative language.
Speech Communication, Vol.60, pp.78--87, 2014. (text) (PDF file)

FY 2013

S.Sakai and T.Kawahara.
Admissible stopping in Viterbi beam search for unit selection speech synthesis.
IEICE Trans., Vol.E96-D, No.6, pp.1359--1367, 2013. (text)
G.Neubig, T.Watanabe, S.Mori, and T.Kawahara.
Substring-based machine translation.
Machine Translation, Vol.27, No.2, pp.139--166, 2013. (text) (PDF file)

FY 2012

H.Nishizaki, T.Akiba, K.Aikawa, T.Kawahara, and T.Matsui.
Evaluation framework design of spoken term detection study at the NTCIR-9 IR for spoken documents task.
自然言語処理, Vol.19, No.4, pp.329--350, 2012. (text) (PDF file)
G.Neubig, Y.Akita, S.Mori, and T.Kawahara.
A monotonic statistical machine translation approach to speaking style transformation.
Computer Speech and Language, Vol.26, No.5, pp.349--370, 2012. (text) (PDF file)
G.Neubig, T.Watanabe, E.Sumita, S.Mori, and T.Kawahara.
Joint phrase alignment and extraction for statistical machine translation.
J. Information Processing, Vol.20, No.2, pp.512--523, 2012. (text)

FY 2011

G.Neubig, M.Mimura, S.Mori, and T.Kawahara.
Bayesian learning of a language model from continuous speech.
IEICE Trans., Vol.E95-D, No.2, pp.614--625, 2012. (text)
S.Sakai, T.Kawahara, and H.Kawai.
Probabilistic concatenation modeling for corpus-based speech synthesis.
IEICE Trans., Vol.E94-D, No.10, pp.2006--2014, 2011. (text)
D.Cournapeau, S.Watanabe, A.Nakamura, and T.Kawahara.
Online unsupervised classification with model comparison in the Variational Bayes framework for voice activity detection.
IEEE J. Selected Topics in Signal Processing, Vol.4, No.6, pp.1071--1083, 2010. (text) (PDF file) (KURENAI)
R.Gomez and T.Kawahara.
Robust speech recognition based on dereverberation parameter optimization using acoustic model likelihood.
IEEE Trans. Audio, Speech \& Language Process., Vol.18, No.7, pp.1708--1716, 2010. (text) (PDF file) (KURENAI)
Y.Akita and T.Kawahara.
Statistical transformation of language and pronunciation models for spontaneous speech recognition.
IEEE Trans. Audio, Speech \& Language Process., Vol.18, No.6, pp.1539--1549, 2010. (text) (PDF file) (KURENAI)
K.Ishizuka, S.Araki, and T.Kawahara.
Speech activity detection for multi-party conversation analyses based on likelihood ratio test on spatial magnitude.
IEEE Trans. Audio, Speech \& Language Process., Vol.18, No.6, pp.1354--1365, 2010. (text) (PDF file)
T.Shinozaki, S.Furui, and T.Kawahara.
Gaussian mixture optimization based on efficient cross-validation.
IEEE J. Selected Topics in Signal Processing, Vol.4, No.3, pp.540--547, 2010. (text) (PDF file)

FY 2009

T.Misu and T.Kawahara.
Bayes risk-based dialogue management for document retrieval system with speech interface.
Speech Communication, Vol.52, No.1, pp.61--71, 2010. (text) (PDF file)
H.Wang and T.Kawahara.
Effective prediction of errors by non-native speakers using decision tree for speech recognition-based CALL system.
IEICE Trans., Vol.E92-D, No.12, pp.2462--2468, 2009. (text)
H.Wang, C.J.Waple, and T.Kawahara.
Computer assisted language learning system based on dynamic question generation and error prediction for automatic speech recognition.
Speech Communication, Vol.51, No.10, pp.995--1005, 2009. (text) (PDF file)

FY 2008

D.Cournapeau and T.Kawahara.
Voice activity detection based on high order statistics and online EM algorithm.
IEICE Trans., Vol.E91-D, No.12, pp.2854--2861, 2008. (text)
I.R.Lane, T.Kawahara, T.Matsui, and S.Nakamura.
Out-of-domain utterance detection using classification confidences of multiple topics.
IEEE Trans. Audio, Speech \& Language Process., Vol.15, No.1, pp.150--161, 2007. (text) (PDF file) (KURENAI)
T.Misu and T.Kawahara.
Dialogue strategy to clarify user's queries for document retrieval system with speech interface.
Speech Communication, Vol.48, No.9, pp.1137--1150, 2006. (text) (PDF file)
C.Troncoso and T.Kawahara.
Trigger-based language model adaptation for automatic transcription of panel discussions.
IEICE Trans., Vol.E89-D, No.3, pp.1024--1031, 2006. (text)
I.R.Lane and T.Kawahara.
Verification of speech recognition results incorporating in-domain confidence and discourse coherence measures.
IEICE Trans., Vol.E89-D, No.3, pp.931--938, 2006. (text)
M.Nishida and T.Kawahara.
Speaker model selection based on the Bayesian information criterion applied to unsupervised speaker indexing.
IEEE Trans. Speech \& Audio Process., Vol.13, No.4, pp. 583--592, 2005. (text) (PDF file) (KURENAI)

FY 2004

K.Komatani, S.Ueno, T.Kawahara, and H.G.Okuno.
User modeling in spoken dialogue systems to generate flexible guidance.
User Modeling and User-Adapted Interaction, Vol.15, No.1, pp. 169--183, 2005. (text) (PDF file)
I.R.Lane, T.Kawahara, T.Matsui, and S.Nakamura.
Dialogue speech recognition by combining hierarchical topic classification and language model switching.
IEICE Trans., Vol.E88-D, No.3, pp.446--454, 2005. (text)
Y.Akita and T.Kawahara.
Language model adaptation based on PLSA of topics and speakers for automatic transcription of panel discussions.
IEICE Trans., Vol.E88-D, No.3, pp.439--445, 2005. (text)
T.Kawahara, M.Hasegawa, K.Shitaoka, T.Kitade, and H.Nanjo.
Automatic indexing of lecture presentations using unsupervised learning of presumed discourse markers.
IEEE Trans. Speech \& Audio Process., Vol.12, No.4, pp. 409--419, 2004. (text) (PDF file) (KURENAI)
H.Nanjo and T.Kawahara.
Language model and speaking rate adaptation for spontaneous presentation speech recognition.
IEEE Trans. Speech \& Audio Process., Vol.12, No.4, pp. 391--400, 2004. (text) (PDF file) (KURENAI)
Y.Tsubota, T.Kawahara, and M.Dantsuji.
An English pronunciation learning system for Japanese students based on diagnosis of critical pronunciation errors.
ReCALL Journal, Vol.16, No.1, pp.173--188, 2004. (text) (PDF file)
Y.Tsubota, T.Kawahara, and M.Dantsuji.
Formant structure estimation using vocal tract length normalization for CALL system.
Acoustical Science \& Technology, Vol.24, No.2, pp.93--96, 2003. (text) (PDF file)
M.Mimura and T.Kawahara.
Difference of acoustic modeling for read speech and dialogue speech.
Acoustical Science \& Technology, Vol.22, No.5, pp.373--374, 2001. (text) (PDF file)

FY 2000

C.-H.Jo, T.Kawahara, S.Doshita, and M.Dantsuji.
Japanese pronunciation instruction system using speech recognition methods.
IEICE Trans., Vol.E83-D, No.11, pp.1960--1968, 2000. (text)
T.Kawahara, C.-H.Lee, and B.-H.Juang.
Flexible speech understanding based on combined key-phrase detection and verification.
IEEE Trans. Speech \& Audio Process., Vol.6, No.6, pp. 558--568, 1998. (text) (PDF file)
T.Kawahara and S.Doshita.
Comparison of discrete and continuous classifier-based HMM.
J. Acoust. Soc. Japan (E), Vol.13, No.6, pp.361--367, 1992. (text) (PDF file)