research | home-page

Convolutive Bottleneck Network with Dropout for Dysarthric Speech Recognition

T. Nakashika, T. Yoshioka, T. Takiguchi, Y. Ariki, S. Duffner, C. Garcia, University of Kobe & LIRIS, 2014

We investigate the recognition of speech produced by a person with an articulation disorder resulting from athetoid cerebral palsy. The articulation of the first spoken words tends to become unstable due to strain on speech muscles, and that causes degradation of speech recognition. Therefore, we propose a robust feature extraction method using a convolutive bottleneck network (CBN) instead of the well-known MFCC. The CBN stacks multiple various types of layers, such as a convolution layer, a subsampling layer, and a bottleneck layer, forming a deep network. Applying the CBN to feature extraction for dysarthric speech, we expect that the CBN will reduce the influence of the unstable speaking style caused by the athetoid symptoms. Furthermore, we also adopt dropout in the output layer since automatically assigned labels to the dysarthric speech are usually unreliable due to ambiguous phonemes uttered by the person with speech disorders. We confirmed its effectiveness through word recognition experiments, where the CNN based feature extraction method outperformed the conventional feature extraction method.

Convolutive Bottleneck Network with Dropout for Dysarthric Speech Recognition. T. Nakashika, T. Yoshioka, T. Takiguchi, Y. Ariki, S. Duffner, C. Garcia. Transactions on Machine Learning and Artificial Intelligence ():1-15, ISSN 2054-7390, 2014.

Please reload

Christophe Garcia

Full Professor at Insa Lyon & LIRIS Laboratory (UMR 5205 CNRS)