TY  - THES
AB  - Concatenative synthesis, the dominant Text-to-Speech (TTS) method, often produces audible discontinuities due to mismatched phonemic and prosodic contexts. Previous linear cross-fading approaches improved smoothness but generated unnatural formant trajectories. This thesis proposes a unit-dependent, parameterized cross-fading algorithm guided by a perceptual cost function predicting speech quality from acoustic distance measures. Using a custom corpus and perceptual experiments, we show that output quality depends on formant trajectory shape across the vowel and correlates with both absolute distance and its derivative. Results demonstrate feasibility of perceptual cost-based optimization for natural-sounding TTS, advancing speech synthesis beyond traditional concatenation techniques.
AD  - Oregon Health and Science University
AU  - Miao, Qi
DA  - 2012
DO  - 10.6083/M4ZG6Q7F
DO  - DOI
ED  - van Santen, Jan
ED  - Advisor
ID  - 717
KW  - Communication Aids for Disabled
KW  - Auditory Perception
KW  - speech synthesis
L1  - https://digitalcollections.ohsu.edu/record/717/files/720_etd.pdf
L2  - https://digitalcollections.ohsu.edu/record/717/files/720_etd.pdf
L4  - https://digitalcollections.ohsu.edu/record/717/files/720_etd.pdf
LK  - https://digitalcollections.ohsu.edu/record/717/files/720_etd.pdf
N2  - Concatenative synthesis, the dominant Text-to-Speech (TTS) method, often produces audible discontinuities due to mismatched phonemic and prosodic contexts. Previous linear cross-fading approaches improved smoothness but generated unnatural formant trajectories. This thesis proposes a unit-dependent, parameterized cross-fading algorithm guided by a perceptual cost function predicting speech quality from acoustic distance measures. Using a custom corpus and perceptual experiments, we show that output quality depends on formant trajectory shape across the vowel and correlates with both absolute distance and its derivative. Results demonstrate feasibility of perceptual cost-based optimization for natural-sounding TTS, advancing speech synthesis beyond traditional concatenation techniques.
PB  - Oregon Health and Science University
PY  - 2012
T1  - Perceptual cost function for cross-fading based concatenation
TI  - Perceptual cost function for cross-fading based concatenation
UR  - https://digitalcollections.ohsu.edu/record/717/files/720_etd.pdf
Y1  - 2012
ER  -