| Decision tree SVM model with Fisher feature selection for speech emotion recognition |
11 |
| Advanced recurrent network-based hybrid acoustic models for low resource speech recognition |
6 |
| Automatic segmentation of infant cry signals using hidden Markov models |
5 |
| Towards end-to-end speech recognition with transfer learning |
5 |
| Exploring convolutional, recurrent, and hybrid deep neural networks for speech and music detection in a large audio dataset |
4 |
| Automatic bird species recognition based on birds vocalization |
3 |
| Dual supervised learning for non-native speech recognition |
3 |
| Music detection from broadcast contents using convolutional neural networks with a Mel-scale kernel |
3 |
| Online/offline score informed music signal decomposition: application to minus one |
2 |
| Introducing phonetic information to speaker embedding for speaker verification |
2 |
| Loudness stability of binaural sound with spherical harmonic representation of sparse head-related transfer functions |
2 |
| Robust emotional speech recognition based on binaural model and emotional auditory mask in noisy environments |
2 |
| Speech intelligibility improvement in noisy reverberant environments based on speech enhancement and inverse filtering |
2 |
| The use of long-term features for GMM- and i-vector-based speaker diarization systems |
2 |
| Relevance-based quantization of scattering features for unsupervised mining of environmental audio |
1 |
| An adaptive a priori SNR estimator for perceptual speech enhancement |
1 |
| Wind noise reduction for a closely spaced microphone array in a car environment |
1 |
| An artificial patient for pure-tone audiometry |
1 |
| Discriminative frequency filter banks learning with neural networks |
1 |
| Search on speech from spoken queries: the Multi-domain International ALBAYZIN 2018 Query-by-Example Spoken Term Detection Evaluation |
1 |
| Piano multipitch estimation using sparse coding embedded deep learning |
1 |
| Enhancement of speech dynamics for voice activity detection using DNN |
1 |
| Replay attack detection with auditory filter-based relative phase features |
1 |
| Robust image-in-audio watermarking technique based on DCT-SVD transform |
1 |
| Learning long-term filter banks for audio source separation and audio scene classification |
1 |
| A new joint CTC-attention-based speech recognition model with multi-level multi-head attention |
1 |
| Web-based environment for user generation of spoken dialog for virtual assistants |
1 |
| Articulation constrained learning with application to speech emotion recognition |
1 |
| ALBAYZIN Query-by-example Spoken Term Detection 2016 evaluation |
0 |
| AudioPairBank: towards a large-scale tag-pair-based audio content analysis |
0 |
| Feature trajectory dynamic time warping for clustering of speech segments |
0 |
| Room-localized speech activity detection in multi-microphone smart homes |
0 |
| Signal enhancement for communication systems used by fire fighters |
0 |
| Punctuation-generation-inspired linguistic features for Mandarin prosody generation |
0 |
| From raw audio to a seamless mix: creating an automated DJ system for Drum and Bass |
0 |
| Speech enhancement methods based on binaural cue coding |
0 |
| Latent class model with application to speaker diarization |
0 |
| A unit selection text-to-speech-and-singing synthesis framework from neutral speech: proof of concept |
0 |
| Non-parallel dictionary learning for voice conversion using non-negative Tucker decomposition |
0 |
| Robust singer identification of Indian playback singers |
0 |
| ALBAYZIN 2018 spoken term detection evaluation: a multi-domain international evaluation in Spanish |
0 |
| A parametric prosody coding approach for Mandarin speech using a hierarchical prosodic model |
0 |
| Unsupervised adaptation of PLDA models for broadcast diarization |
0 |