Perceptual cost function for cross-fading based concatenation

Miao, Qi

doi:10.6083/M4ZG6Q7F

Perceptual cost function for cross-fading based concatenation

Miao, Qi

2012

Get Accessible Copy

Download

Formats

Format
BibTeX
MARCXML
TextMARC
MARC
DublinCore
EndNote
NLM
RefWorks
RIS

Add to Basket

Cite

Files

Abstract

Concatenative synthesis, the dominant Text-to-Speech (TTS) method, often produces audible discontinuities due to mismatched phonemic and prosodic contexts. Previous linear cross-fading approaches improved smoothness but generated unnatural formant trajectories. This thesis proposes a unit-dependent, parameterized cross-fading algorithm guided by a perceptual cost function predicting speech quality from acoustic distance measures. Using a custom corpus and perceptual experiments, we show that output quality depends on formant trajectory shape across the vowel and correlates with both absolute distance and its derivative. Results demonstrate feasibility of perceptual cost-based optimization for natural-sounding TTS, advancing speech synthesis beyond traditional concatenation techniques.

Details

Title

Perceptual cost function for cross-fading based concatenation

Creator

Miao, Qi : Oregon Health and Science University

Contributor

van Santen, Jan Advisor (Oregon Health and Science University)

Publisher

Oregon Health and Science University

Date

2012

Subjects

Communication Aids for Disabled
Auditory Perception

Keywords

speech synthesis

DOI

https://doi.org/10.6083/M4ZG6Q7F

Content Type

Thesis

Degree Type

M.S.

Academic Program

Computer Science & Electrical Engineering (sunsetting)

Department

Department of Computer Science and Electrical Engineering

School

School of Medicine

Copyright Status

In copyright - single owner

Usage Statement

CC BY

Record ID

717

Record Created

2023-06-29

PDF

Statistics

Download Full History