Trinh, Nam ORCID: 0000-0003-0307-3793 (2021) Semi-supervised learning with generative models for pathological speech classification. Master of Science thesis, Dublin City University.
Abstract
Recent work in pathological speech classification has employed supervised learning algorithms such as neural networks and support vector machines to classify speech as healthy or pathological. A challenge in applying such machine learning techniques to pathological speech classification is the labelled data shortage problem. While labelled data are expensive and scarce, unlabelled data are inexpensive and plentiful. Labelled data acquisition often entails significant human effort and time-consuming experimental design. Further, for medical applications, privacy and ethical issues must be addressed where patient data is collected.
In this thesis, we investigate a semi-supervised learning (SSL) approach that employs a generative model to incorporate both labelled and unlabelled data into the training process. Generative models explored include both a generative adversarial network (GAN) and a variational autoencoder (VAE). To employ a GAN, we modify its traditional discriminator to not only differentiate between real and fake speech samples but to also classify the given sample as healthy or pathological. To employ a VAE, we first pre-train the VAE with unlabelled data and subsequently, incorporate the pre-trained encoder into a classifier to be trained on labelled data.
We test our approach using three commonly used pathological speech datasets: the Spanish Parkinson’s Diseases Dataset (SPDD), the Saarbrucken Voice Database (SVD) and the Arabic Voice Pathology Database (AVPD). We compare the performance of the GAN and VAE-based approaches trained on both labelled and unlabelled data with a traditional supervised approach based on a convolutional neural network (CNN) trained only on labelled data.
We observe that our SSL-based approach leads to an accuracy gain compared to a baseline CNN trained only on labelled pathological speech data. This promising result shows that our approach has the potential to alleviate the labelled data shortage problem in pathological speech classification and other medical applications where labelled data acquisition is challenging.
Metadata
Item Type: | Thesis (Master of Science) |
---|---|
Date of Award: | November 2021 |
Refereed: | No |
Supervisor(s): | O'Brien, Darragh |
Subjects: | Computer Science > Artificial intelligence Computer Science > Machine learning Engineering > Signal processing |
DCU Faculties and Centres: | DCU Faculties and Schools > Faculty of Engineering and Computing > School of Computing |
Use License: | This item is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 3.0 License. View License |
Funders: | Science Foundation Ireland under grant No. 17/RC/PHD3488., ADAPT Centre for Digital Content Technology is funded under the SFI Research Centres Programme (Grant 13/RC/2106) and is co-funded under the European Regional Development Fund. |
ID Code: | 25955 |
Deposited On: | 27 Oct 2021 16:17 by Darragh O'brien . Last Modified 13 Jan 2022 10:36 |
Documents
Full text available as:
Preview |
PDF
- Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
3MB |
Downloads
Downloads
Downloads per month over past year
Archive Staff Only: edit this record