Improving the goodness of pronunciation score by using deep neural networks: single-input classification and sequence-to-sequence classification

Veleta, Moises

doi:10.6083/vbvakq

Improving the goodness of pronunciation score by using deep neural networks: single-input classification and sequence-to-sequence classification

Veleta, Moises

2018

Download

Formats

Format
BibTeX
MARCXML
TextMARC
MARC
DublinCore
EndNote
NLM
RefWorks
RIS

Add to Basket

Cite

Files

Abstract

This is a comparative study of the Goodness of Pronunciation (GOP) score, a phone pronunciation quality metric, that explores its formulation and evolution. The effectiveness of the GOP score lies predominantly in its two main components: its forced aligner, which produces the expected phone segments, and its phone loop, which produces the observed phone segments. As with the derivatives created since the inception of the GOP score, this thesis explores alternatives to the traditional forced aligner and phone loop by using several Deep Neural Network (DNN) architectures. The two general classes of architectures used, from Deep Learning, are the single-input classifier and the sequence-to-sequence classifier. Along with these architectures, proposed approaches are also presented on how to utilize DNNs within a GOP score. Lastly, a new generalized GOP score, the GOP-ensemble, is proposed to enable users to combine various established GOP scores to create a new, modular pronunciation score.

Details

Title

Improving the goodness of pronunciation score by using deep neural networks: single-input classification and sequence-to-sequence classification

Creator

Veleta, Moises : Oregon Health and Science University : (0000-0001-6423-0858)

Contributor

Asagri, Meysam Advisor (Oregon Health and Science University)

Publisher

Oregon Health and Science University

Date

2018

Subjects

Machine Learning
Language Disorders
Voice Quality
Telephone
Neural Networks, Computer
Deep Learning

DOI

https://doi.org/10.6083/vbvakq

Content Type

Thesis

Degree Type

M.S.

Academic Program

Computer Science & Electrical Engineering (sunsetting)

Department

Center for Spoken Language Understanding

School

School of Medicine

Copyright Status

In copyright - single owner

Record ID

7857

Record Created

2023-06-29

Record Appears in

OHSU Works > Student Work > Academic Programs > School of Medicine Programs > Computer Science & Electrical Engineering (sunsetted)
OHSU Works > Student Work > Degrees > Master of Science (M.S.)
Organizations > Centers and Institutes > Center for Spoken Language Understanding
Organizations > Schools > School of Medicine
Theses and Dissertations
Featured Collections
All Records
Theses

PDF

Statistics

Download Full History