Abstract
In this work, we propose a modular multi-modal architecture to automatically detect Alzheimer's disease using the dataset provided in the ADReSSo challenge. Both acoustic and text-based features are used in this architecture. Since the dataset provides only audio samples of controls and patients, we use Google cloud-based speech-to-text API to automatically transcribe the audio files to extract text-based features. Several kinds of audio features are extracted using standard packages. The proposed approach consists of 4 networks: C-attention-acoustic network (for acoustic features only), C-Attention-FT network (for linguistic features only), C-Attention-Embedding network (for language embeddings and acoustic embeddings), and a unified network (uses all of those features). The architecture combines attention networks and a convolutional neural network (CAttention network) in order to process these features. Experimental results show that the C-Attention-Unified network with Linguistic features and X-Vector embeddings achieves the best accuracy of 80.28% and F1 score of 0.825 on the test dataset.
| Original language | English |
|---|---|
| Title of host publication | 22nd Annual Conference of the International Speech Communication Association, INTERSPEECH 2021 |
| Pages | 4196-4200 |
| Number of pages | 5 |
| ISBN (Electronic) | 9781713836902 |
| DOIs | |
| State | Published - 2021 |
| Event | 22nd Annual Conference of the International Speech Communication Association, INTERSPEECH 2021 - Brno, Czech Republic Duration: 30 Aug 2021 → 3 Sep 2021 |
Publication series
| Name | Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH |
|---|---|
| Volume | 6 |
| ISSN (Print) | 2308-457X |
| ISSN (Electronic) | 2958-1796 |
Conference
| Conference | 22nd Annual Conference of the International Speech Communication Association, INTERSPEECH 2021 |
|---|---|
| Country/Territory | Czech Republic |
| City | Brno |
| Period | 30/08/21 → 3/09/21 |
UN SDGs
This output contributes to the following UN Sustainable Development Goals (SDGs)
-
SDG 3 Good Health and Well-being
Keywords
- Acoustic feature
- Alzheimer's disease
- CNN-attention network
- Linguistic feature
- Multi-modal approach
Fingerprint
Dive into the research topics of 'Modular multi-modal attention network for Alzheimer's disease detection using patient audio and language data'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver