TY - THES AB - Concatenative synthesis, the dominant Text-to-Speech (TTS) method, often produces audible discontinuities due to mismatched phonemic and prosodic contexts. Previous linear cross-fading approaches improved smoothness but generated unnatural formant trajectories. This thesis proposes a unit-dependent, parameterized cross-fading algorithm guided by a perceptual cost function predicting speech quality from acoustic distance measures. Using a custom corpus and perceptual experiments, we show that output quality depends on formant trajectory shape across the vowel and correlates with both absolute distance and its derivative. Results demonstrate feasibility of perceptual cost-based optimization for natural-sounding TTS, advancing speech synthesis beyond traditional concatenation techniques. AD - Oregon Health and Science University AU - Miao, Qi DA - 2012 DO - 10.6083/M4ZG6Q7F DO - DOI ED - van Santen, Jan ED - Advisor ID - 717 KW - Communication Aids for Disabled KW - Auditory Perception KW - speech synthesis L1 - https://digitalcollections.ohsu.edu/record/717/files/720_etd.pdf L2 - https://digitalcollections.ohsu.edu/record/717/files/720_etd.pdf L4 - https://digitalcollections.ohsu.edu/record/717/files/720_etd.pdf LK - https://digitalcollections.ohsu.edu/record/717/files/720_etd.pdf N2 - Concatenative synthesis, the dominant Text-to-Speech (TTS) method, often produces audible discontinuities due to mismatched phonemic and prosodic contexts. Previous linear cross-fading approaches improved smoothness but generated unnatural formant trajectories. This thesis proposes a unit-dependent, parameterized cross-fading algorithm guided by a perceptual cost function predicting speech quality from acoustic distance measures. Using a custom corpus and perceptual experiments, we show that output quality depends on formant trajectory shape across the vowel and correlates with both absolute distance and its derivative. Results demonstrate feasibility of perceptual cost-based optimization for natural-sounding TTS, advancing speech synthesis beyond traditional concatenation techniques. PB - Oregon Health and Science University PY - 2012 T1 - Perceptual cost function for cross-fading based concatenation TI - Perceptual cost function for cross-fading based concatenation UR - https://digitalcollections.ohsu.edu/record/717/files/720_etd.pdf Y1 - 2012 ER -