Audio for Kinect: pushing it to the limit

Audio for Kinect: pushing it to the limit
Ivan Tascev (Microsoft)

PRESENTATION SLIDES:

ABSTRACT:

This talk will discuss aspects of the acoustical design and audio processing pipeline of Kinect, the most selling electronic device in the human history as recorded in the Guinness Book of Records. The device is the first industrial product with surround sound echo cancellation, one of the first to offer hands free speech recognition from distance up to four meters, and is the first open microphone speech recognition device. The presenter, Dr. Ivan Tashev from Microsoft Research, is one of the architects behind Kinect and created most of the algorithms in the audio pipeline.

SPEAKER BIO:

Ivan Tashev toke his Diploma Engineer in Electronics and PhD in Computer Science from the Technical University of Sofia, Bulgaria, in 1984 and 1990 respectively. He was Assistant Professor in the same university when joined Microsoft in 1998. Currently he is a Principal Architect in Speech Technology Group in Microsoft Research. Dr. Tashev contributed with algorithms and designs to microphone array support in Windows, RoundTable device, the audio pipeline in MicrosoftAuto platform, and the audio pipeline in Kinect. He is inventor or co-inventor of 40 US patent submissions, from which 18 are granted. Ivan Tashev is a senior member of IEEE and member of its Audio and Acoustic Signal Processing Technical Committee. He is also member of the Audio Engineering Society and its Pacific Northwest Committee, and the Acoustical Society of America. Dr. Tashev is reviewer for most of the scientific journals in his research area, member of the organizing or technical committees of ICASSP, IWAENC, WASPAA, HSCMA and other scientific conferences in his area. He authored or coauthored four books and more than 70 scientific papers. His latest book =93Sound Capture and Processing=94 was published in 2009 by John Wiley & Sons Ltd.

REFERENCES:

[1] Ivan Tashev, Sound Capture and Processing: Practical Approaches, pp. 388, Wiley, July 2009
[2] Ivan Tashev, Recent Advances in Human-Machine Interfaces for Gaming and Entertainment, in International Journal on Information Technology and Security, vol. III, no. 3, pp. 69-76, Union of Scientists in Bulgaria, September 2011
[3] Michael L. Seltzer, Yun-Cheng Ju, Ivan Tashev, Ye-Yi Wang, and Dong Yu, In Car Media Search, in IEEE Signal Processing Magazine, IEEE SPS, June 2011
[4] Ivan Tashev, Coherence Based Double Talk Detector with Adaptive Threshold, in XX Scientific Conference ELECTRONICS ET2011, Technical University of Sofia Publishing House, 15 September 2011
[5] Hoang Do, Ivan Tashev, and Alex Acero, A New Speaker Identification Algorithm for Gaming Scenarios, in ICASSP, IEEE, May 2011
[6] Ivan Tashev and Alex Acero, Statistical Modeling of the Speech Signal, in International Workshop on Acoustic, Echo, and Noise Control (IWAENC), Tel Aviv, Israel, 1 September 2010
[7] Ivan Tashev, Andrew Lovitt, and Alex Acero, Dual stage probabilistic voice activity detector, in NOISE-CON 2010 and 159th Meeting of the Acoustical Society of America, Acoustical Society of America, 20 April 2010
[8] Lae-Hoon Kim, Ivan Tashev, and Alex Acero, Reverberated Speech Signal Separation Based on Regularized Subband Feedforward ICA and Instantaneous Direction of Arrival, in International Conference on Acoustics, Speech and Signal Processing, IEEE, 16 March 2010
[9] Ivan Tashev, Michael L. Seltzer, and Yun-Cheng Ju, Speech and sound for in-car infotainment systems, in Automotive User Interfaces and Interactive Vehicular Applications (AutomotiveUI 2009), Association for Computing Machinery, Inc., Essen, Germany, 22 September 2009
[10] Ivan Tashev, Michael Seltzer, Yun-Cheng Ju, Ye-Yi Wang, and Alex Acero, Commute UX: Voice Enabled In-car Infotainment System, in Mobile HCI '09: Workshop on Speech in Mobile and Pervasive Environments (SiMPE), Association for Computing Machinery, Inc., Bonn, Germany, 15 September 2009
[11] Yun-Cheng Ju, Michael Seltzer, and Ivan Tashev, Improving Perceived Accuracy for In-Car Media Search, International Speech Communication Association, September 2009
[12] Ivan Tashev, Andrew Lovitt, and Alex Acero, Unified Framework for Single Channel Speech Enhancement, in 2009 IEEE Pacific Rim Conference on Communications, Computers and Signal Processing, IEEE, Victoria B.C., Canada, 24 August 2009
[13] Young-In Song, Ye-Yi Wang, Yun-Cheng Ju, Mike Seltzer, Ivan Tashev, and Alex Acero, Voice Search of Structured Media Data, in International Conference on Acoustics, Speech and Signal Processing, Institute of Electrical and Electornic Engineers, Inc., Taipei, Taiwan, April 2009
[14] Ivan Tashev, Slavy Mihov, Tyler Gleghorn, and Alex Acero, Sound Capture System and Spatial Filter for Small Devices, in Proceedings of Interspeech 2008, International Speech Communication Association, Brisbane, Australia, September 2008
[15] Nilesh Madhu, Ivan Tashev, and Alex Acero, An EM-based Probabilistic Approach for Acoustic Echo Suppression, in Proceedings of International Conference on Audio, Speech and Signal Processing ICASSP 2008, Institute of Electrical and Electronics Engineers, Inc., Institute of Electrical and Electronics Engineers, Inc., Las Vegas, USA, April 2008