top of page

Convolutive Bottleneck Network with Dropout for Dysarthric Speech Recognition

T. Nakashika, T. Yoshioka, T. Takiguchi, Y. Ariki, S. Duffner, C. Garcia, University of Kobe & LIRIS, 2014

We investigate the recognition of speech produced by a person with an articulation disorder resulting from athetoid cerebral palsy. The articulation of the first spoken words tends to become unstable due to strain on speech muscles, and that causes degradation of speech recognition. Therefore, we propose a robust feature extraction method using a convolutive bottleneck network (CBN) instead of the well-known MFCC. The CBN stacks multiple various types of layers, such as a convolution layer, a subsampling layer, and a bottleneck layer, forming a deep network. Applying the CBN to feature extraction for dysarthric speech, we expect that the CBN will reduce the influence of the unstable speaking style caused by the athetoid symptoms. Furthermore, we also adopt dropout in the output layer since automatically assigned labels to the dysarthric speech are usually unreliable due to ambiguous phonemes uttered by the person with speech disorders. We confirmed its effectiveness through word recognition experiments, where the CNN based feature extraction method outperformed the conventional feature extraction method.

Please reload

bottom of page