Go to main content
Formats
Format
BibTeX
MARCXML
TextMARC
MARC
DublinCore
EndNote
NLM
RefWorks
RIS
Cite
Citation

Files

Abstract

Concatenative synthesis, the dominant Text-to-Speech (TTS) method, often produces audible discontinuities due to mismatched phonemic and prosodic contexts. Previous linear cross-fading approaches improved smoothness but generated unnatural formant trajectories. This thesis proposes a unit-dependent, parameterized cross-fading algorithm guided by a perceptual cost function predicting speech quality from acoustic distance measures. Using a custom corpus and perceptual experiments, we show that output quality depends on formant trajectory shape across the vowel and correlates with both absolute distance and its derivative. Results demonstrate feasibility of perceptual cost-based optimization for natural-sounding TTS, advancing speech synthesis beyond traditional concatenation techniques.

Details

PDF

Statistics

from
to
Export
Download Full History