| Deep learning for minimum mean-square error approaches to speech enhancement |
10 |
| Automatic segmentation of speech articulators from real-time midsagittal MRI based on supervised learning |
9 |
| Improving multilingual speech emotion recognition by combining acoustic features in a three-layer model |
9 |
| Dysarthric speech classification from coded telephone speech using glottal features |
8 |
| End-to-end acoustic modeling using convolutional neural networks for HMM-based automatic speech recognition |
8 |
| Discrimination of L2 Greek vowel contrasts: Evidence from learners with Arabic L1 background |
7 |
| Bone-conducted speech enhancement using deep denoising autoencoder |
6 |
| Updating the Silent Speech Challenge benchmark with deep learning |
6 |
| Measuring communication difficulty through effortful speech production during conversation |
6 |
| Automatic speech emotion recognition using an optimal combination of features based on EMD-TKEO |
6 |
| The sound of im/politeness |
6 |
| ProDis: A dialectometric tool for acoustic prosodic data |
6 |
| Multi-domain adversarial training of neural network acoustic models for distant speech recognition |
5 |
| Deep neural network based i-vector mapping for speaker verification using short utterances |
5 |
| Data augmentation using generative adversarial networks for robust speech recognition |
5 |
| Speaker models for monitoring Parkinson's disease progression considering different communication channels and acoustic conditions |
5 |
| Single-channel multi-talker speech recognition with permutation invariant training |
5 |
| Automatic lexical stress and pitch accent detection for L2 English speech using multi-distribution deep neural networks |
5 |
| Automatic word count estimation from daylong child-centered recordings in various language environments using language-independent syllabification of speech |
5 |
| Speaker recognition from whispered speech: A tutorial survey and an application of time-varying linear prediction |
5 |
| Front-end speech enhancement for commercial speaker verification systems |
4 |
| Investigating different representations for modeling and controlling multiple emotions in DNN-based speech synthesis |
4 |
| Speaker recognition using PCA-based feature transformation |
4 |
| Influence of visual cues on head and eye movements during listening tasks in multi-talker audiovisual environments with animated characters |
4 |
| New insights on the optimality of parameterized Wiener filters for speech enhancement applications |
4 |
| Hierarchical sparse coding framework for speech emotion recognition |
4 |
| Estimation of the glottal source from coded telephone speech using deep neural networks |
4 |
| Automatic quantitative analysis of spontaneous aphasic speech |
4 |
| Time-domain speech enhancement using generative adversarial networks |
4 |
| Semi-parametric joint detection and estimation for speech enhancement based on minimum mean square error |
4 |
| Speech emotion recognition based on DNN-decision tree SVM model |
4 |
| OPENGLOT - An open environment for the evaluation of glottal inverse filtering |
4 |
| The relative contribution of computer assisted prosody training vs. instructor based prosody teaching in developing speaking skills by interpreter trainees: An experimental study |
4 |
| Joint dictionary learning using a new optimization method for single-channel blind source separation |
3 |
| Robust binaural speech separation in adverse conditions based on deep neural network with modified spatial features and training target |
3 |
| Acoustic classification of Russian plain and palatalized sibilant fricatives: Spectral vs. cepstral measures |
3 |
| The impact of the Lombard effect on audio and visual speech recognition systems |
3 |
| Entrainment profiles: Comparison by gender, role, and feature set |
3 |
| Deep-learning-based audio-visual speech enhancement in presence of Lombard effect |
3 |
| Golden speaker builder - An interactive tool for pronunciation training |
3 |
| Supervised single-channel speech dereverberation and denoising using a two-stage model based sparse representation |
3 |
| Significance of phase in single frequency filtering outputs of speech signals |
3 |
| Improved embedded pre-whitening subspace approach for enhancing speech contaminated by colored noise |
3 |
| Unsupervised single channel speech separation based on optimized subspace separation |
3 |
| Speech enhancement using MMSE estimation of amplitude and complex speech spectral coefficients under phase-uncertainty |
3 |
| Computer-vision analysis reveals facial movements made during Mandarin tone production align with pitch trajectories |
3 |
| Distributed-microphones based in-vehicle speech enhancement via sparse and low-rank spectrogram decomposition |
3 |
| Towards automatic assessment of spontaneous spoken English |
3 |
| On the issues of intra-speaker variability and realism in speech, speaker, and language recognition tasks |
3 |
| Speech-driven animation with meaningful behaviors |
3 |