Semi-supervised learning with generative models for pathological speech classification

Trinh, Nam ORCID: 0000-0003-0307-3793 (2021) Semi-supervised learning with generative models for pathological speech classification. Master of Science thesis, Dublin City University.

Abstract
Metadata
Downloads
Documents

[+][-]

Abstract

Recent work in pathological speech classification has employed supervised learning algorithms such as neural networks and support vector machines to classify speech as healthy or pathological. A challenge in applying such machine learning techniques to pathological speech classification is the labelled data shortage problem. While labelled data are expensive and scarce, unlabelled data are inexpensive and plentiful. Labelled data acquisition often entails significant human effort and time-consuming experimental design. Further, for medical applications, privacy and ethical issues must be addressed where patient data is collected. In this thesis, we investigate a semi-supervised learning (SSL) approach that employs a generative model to incorporate both labelled and unlabelled data into the training process. Generative models explored include both a generative adversarial network (GAN) and a variational autoencoder (VAE). To employ a GAN, we modify its traditional discriminator to not only differentiate between real and fake speech samples but to also classify the given sample as healthy or pathological. To employ a VAE, we first pre-train the VAE with unlabelled data and subsequently, incorporate the pre-trained encoder into a classifier to be trained on labelled data. We test our approach using three commonly used pathological speech datasets: the Spanish Parkinson’s Diseases Dataset (SPDD), the Saarbrucken Voice Database (SVD) and the Arabic Voice Pathology Database (AVPD). We compare the performance of the GAN and VAE-based approaches trained on both labelled and unlabelled data with a traditional supervised approach based on a convolutional neural network (CNN) trained only on labelled data. We observe that our SSL-based approach leads to an accuracy gain compared to a baseline CNN trained only on labelled pathological speech data. This promising result shows that our approach has the potential to alleviate the labelled data shortage problem in pathological speech classification and other medical applications where labelled data acquisition is challenging.

Metadata

Item Type:	Thesis (Master of Science)
Date of Award:	November 2021
Refereed:	No
Supervisor(s):	O'Brien, Darragh
Subjects:	Computer Science > Artificial intelligence Computer Science > Machine learning Engineering > Signal processing
DCU Faculties and Centres:	DCU Faculties and Schools > Faculty of Engineering and Computing > School of Computing
Use License:	This item is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 3.0 License. View License
Funders:	Science Foundation Ireland under grant No. 17/RC/PHD3488., ADAPT Centre for Digital Content Technology is funded under the SFI Research Centres Programme (Grant 13/RC/2106) and is co-funded under the European Regional Development Fund.
ID Code:	25955
Deposited On:	27 Oct 2021 16:17 by Darragh O'brien . Last Modified 13 Jan 2022 10:36

Documents

Full text available as:

[thumbnail of Nam_Trinh_MSc_thesis_FINAL_signed.pdf]

Preview

PDF - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
3MB

Downloads

Downloads per month over past year

Archive Staff Only: edit this record

Altmetric

DORAS | DCU Research Repository

Semi-supervised learning with generative models for pathological speech classification

Downloads