Searching for just a few words should be enough to get started. If you need to make more complex queries, use the tips below to guide you.
Article type: Research Article
Authors: Worasawate, Denchaia | Asawaponwiput, Warisaraa | Yoshimura, Natsueb | Intarapanich, Apichartc | Surangsrirat, Dechod; *
Affiliations: [a] Department of Electrical Engineering, Faculty of Engineering, Kasetsart University, Bangkok, Thailand | [b] Institute of Innovative Research, Tokyo Institute of Technology, Yokohama, Japan | [c] Educational Technology Team, National Electronics and Computer Technology Center, Pathum Thani, Thailand | [d] Assistive Technology and Medical Devices Research Center, National Science and Technology Development Agency, Pathum Thani, Thailand
Correspondence: [*] Corresponding author: Decho Surangsrirat, Assistive Technology and Medical Devices Research Center, National Science and Technology Development Agency, Pathum Thani, Thailand. E-mail: [email protected].
Abstract: BACKGROUND: Parkinson’s disease (PD) is a long-term neurodegenerative disease of the central nervous system. The current diagnosis is dependent on clinical observation and the abilities and experience of a trained specialist. One of the symptoms that affects most patients is voice impairment. OBJECTIVE: Voice samples are non-invasive data that can be collected remotely for diagnosis and disease progression monitoring. In this study, we analyzed voice recording data from a smartphone as a possible medical self-diagnosis tool by using only one-second voice recording. The data from one of the largest mobile PD studies, the mPower study, was used. METHODS: A total of 29,798 ten-second voice recordings on smartphone from 4,051 participants were used for the analysis. The voice recordings were from sustained phonation by participants saying /aa/ for ten seconds into an iPhone microphone. A dataset comprising 385,143 short one-second audio samples was generated from the original ten-second voice recordings. The samples were converted to a spectrogram using a short-time Fourier transform. CNN models were then applied to classify the samples. RESULTS: Classification accuracies of the proposed method with LeNet-5, ResNet-50, and VGGNet-16 are 97.7 ± 0.1%, 98.6 ± 0.2%, and 99.3 ± 0.1%, respectively. CONCLUSIONS: We achieve a respectable classification performance using a generalized approach on a dataset with a large number of samples. The result emphasizes that an analysis based on one-second clip recorded on a smartphone could be a promising non-invasive and remotely available PD biomarker.
Keywords: PD voice, audio classification, convolutional neural network, mPower study
DOI: 10.3233/THC-220386
Journal: Technology and Health Care, vol. 31, no. 2, pp. 705-718, 2023
IOS Press, Inc.
6751 Tepper Drive
Clifton, VA 20124
USA
Tel: +1 703 830 6300
Fax: +1 703 830 2300
[email protected]
For editorial issues, like the status of your submitted paper or proposals, write to [email protected]
IOS Press
Nieuwe Hemweg 6B
1013 BG Amsterdam
The Netherlands
Tel: +31 20 688 3355
Fax: +31 20 687 0091
[email protected]
For editorial issues, permissions, book requests, submissions and proceedings, contact the Amsterdam office [email protected]
Inspirees International (China Office)
Ciyunsi Beili 207(CapitaLand), Bld 1, 7-901
100025, Beijing
China
Free service line: 400 661 8717
Fax: +86 10 8446 7947
[email protected]
For editorial issues, like the status of your submitted paper or proposals, write to [email protected]
如果您在出版方面需要帮助或有任何建, 件至: [email protected]