TY  - GEN
AB  - We explored voice conversion systems to improve speech intelligibility of 1) dysarthric speech and 2) laryngectomees.
In the first case, we explore the potential of conditional generative adversarial networks (cGANs) to learn the mapping from habitual speech to clear speech. We evaluated the performance of cGANs in three tasks: 1) speaker-dependent one-to-one mappings, 2) speaker-independent many-to-one mappings, and 3) speaker-independent many-to-many mappings. In the first task, cGANs outperformed a traditional deep learning (DNN) mapping in term of average keyword recall accuracy and the number of speakers with improved intelligibility. In the second task, we showed that without clear speech, we can significantly improve intelligibility of the habitual speech of one of three speakers. In the third task which is the most challenging one, we improved the keyword recall accuracy for two of three speakers.
In the second case, we aim to improve speech of laryngectomees in term of intelligibility and naturalness. We predict the voicing and voicing degree for laryngectomees from speech spectra using a deep neural network. We use a logarithmically falling synthetic F0 for statement phrases. Spectra are converted to synthetic target spectra using a cGAN.
AD  - Oregon Health and Science University
AD  - Oregon Health and Science University
AU  - Dinh, Tuan
AU  - Kain, Alexander
DA  - 2020
DO  - 10.6083/4b29b666b
DO  - DOI
ID  - 8295
KW  - Speech Intelligibility
KW  - Speech Acoustics
KW  - Deep Learning
KW  - Laryngectomy
KW  - laryngectomees
KW  - conditional adversarial nets
KW  - dysarthric speech
KW  - voice conversion
L1  - https://digitalcollections.ohsu.edu/record/8295/files/Tuan-Dinh.pdf
L1  - https://digitalcollections.ohsu.edu/record/8295/files/Dinh_Presentation.pdf
L2  - https://digitalcollections.ohsu.edu/record/8295/files/Tuan-Dinh.pdf
L2  - https://digitalcollections.ohsu.edu/record/8295/files/Dinh_Presentation.pdf
L4  - https://digitalcollections.ohsu.edu/record/8295/files/Tuan-Dinh.pdf
L4  - https://digitalcollections.ohsu.edu/record/8295/files/Dinh_Presentation.pdf
LA  - eng
LK  - https://digitalcollections.ohsu.edu/record/8295/files/Tuan-Dinh.pdf
LK  - https://digitalcollections.ohsu.edu/record/8295/files/Dinh_Presentation.pdf
N2  - We explored voice conversion systems to improve speech intelligibility of 1) dysarthric speech and 2) laryngectomees.
In the first case, we explore the potential of conditional generative adversarial networks (cGANs) to learn the mapping from habitual speech to clear speech. We evaluated the performance of cGANs in three tasks: 1) speaker-dependent one-to-one mappings, 2) speaker-independent many-to-one mappings, and 3) speaker-independent many-to-many mappings. In the first task, cGANs outperformed a traditional deep learning (DNN) mapping in term of average keyword recall accuracy and the number of speakers with improved intelligibility. In the second task, we showed that without clear speech, we can significantly improve intelligibility of the habitual speech of one of three speakers. In the third task which is the most challenging one, we improved the keyword recall accuracy for two of three speakers.
In the second case, we aim to improve speech of laryngectomees in term of intelligibility and naturalness. We predict the voicing and voicing degree for laryngectomees from speech spectra using a deep neural network. We use a logarithmically falling synthetic F0 for statement phrases. Spectra are converted to synthetic target spectra using a cGAN.
PB  - Oregon Health and Science University
PY  - 2020
T1  - Using conditional adversarial networks for intelligibility improvement for dysarthric speech and laryngectomees
TI  - Using conditional adversarial networks for intelligibility improvement for dysarthric speech and laryngectomees
UR  - https://digitalcollections.ohsu.edu/record/8295/files/Tuan-Dinh.pdf
UR  - https://digitalcollections.ohsu.edu/record/8295/files/Dinh_Presentation.pdf
Y1  - 2020
ER  -