Accessible human-error interactions in AI applications for the blind

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

1 Scopus citations

Abstract

People who are blind experience challenges when performing everyday tasks that are heavily dependent on vision such as identifying objects, clothing, and packages of food as well as entering text on a smartphone using touchscreen keyboards. To overcome these challenges, they would typically use other sensory channels such as touch, taste, smell, and hearing. For example, they use a screen reading application such as VoiceOver to identify keys on a touchscreen keyboard or Braille to recognize everyday objects (e.g., by attaching adhesive Braille labels to them). Machine learning (ML) applications such as automatic speech recognition (ASR) and computer vision can make it easier for this population to carry out such tasks by allowing access to the visual world and interactions through preferred modalities. Prior work [2, 34] has shown that indeed speech input is the preferred method for text input on a mobile device for blind people. Also, many computer vision application have been proposed to enable these users navigate in unfamiliar indoor environments [3, 30], access printed text [5, 25], and identify objects of interest [1, 20, 27] using the built-in cameras on their mobile devices. Although advances in ML promise improved accuracies, ML applications are inherently error prone. The accuracy of ASR systems is reaching 5.1% word error rate for English [33]. In computer vision, the top-5 performance of image classifiers has only 3.7% error rate [26]. Given that these numbers are averages reported on benchmarking datasets, users may face additional challenges when these models are deployed in real-world application especially in conditions deviating from those that the models were trained on. For example, ASR errors can occur due to speaker variation, disfluency, background noise, ambiguity of words, and mistakes from users [14, 18]. Also, object recognition errors, even though models are typically trained on limited numbers of objects, could occur due to lack of discriminative characteristics, lighting conditions, and reflective surfaces, but more so for blind users due to their challenges in photo taking. These could result in different background clutter, scale, viewpoints, occlusion, and image quality than in photos taken by sighted users used in training [20, 35].

Original languageEnglish
Title of host publicationUbiComp/ISWC 2018 - Adjunct Proceedings of the 2018 ACM International Joint Conference on Pervasive and Ubiquitous Computing and Proceedings of the 2018 ACM International Symposium on Wearable Computers
Pages522-528
Number of pages7
ISBN (Electronic)9781450359665
DOIs
StatePublished - 8 Oct 2018
Event2018 Joint ACM International Conference on Pervasive and Ubiquitous Computing, UbiComp 2018 and 2018 ACM International Symposium on Wearable Computers, ISWC 2018 - Singapore, Singapore
Duration: 8 Oct 201812 Oct 2018

Publication series

NameUbiComp/ISWC 2018 - Adjunct Proceedings of the 2018 ACM International Joint Conference on Pervasive and Ubiquitous Computing and Proceedings of the 2018 ACM International Symposium on Wearable Computers

Conference

Conference2018 Joint ACM International Conference on Pervasive and Ubiquitous Computing, UbiComp 2018 and 2018 ACM International Symposium on Wearable Computers, ISWC 2018
Country/TerritorySingapore
CitySingapore
Period8/10/1812/10/18

Keywords

  • Accessibility
  • Automatic speech recognizer
  • Machine learning
  • Personalized object recognizer
  • Speech input

Fingerprint

Dive into the research topics of 'Accessible human-error interactions in AI applications for the blind'. Together they form a unique fingerprint.

Cite this