Identifying speech input errors through audio-only interaction

Jonggi Hong, Leah Findlater

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

12 Scopus citations

Abstract

Speech has become an increasingly common means of text input, from smartphones and smartwatches to voice-based intelligent personal assistants. However, reviewing the recognized text to identify and correct errors is a challenge when no visual feedback is available. In this paper, we first quantify and describe the speech recognition errors that users are prone to miss, and investigate how to better support this error identification task by manipulating pauses between words, speech rate, and speech repetition. To achieve these goals, we conducted a series of four studies. Study 1, an in-lab study, showed that participants missed identifying over 50% of speech recognition errors when listening to audio output of the recognized text. Building on this result, Studies 2 to 4 were conducted using an online crowdsourcing platform and showed that adding a pause between words improves error identification compared to no pause, the ability to identify errors degrades with higher speech rates (300 WPM), and repeating the speech output does not improve error identification. We derive implications for the design of audio-only speech dictation.

Original languageEnglish
Title of host publicationCHI 2018 - Extended Abstracts of the 2018 CHI Conference on Human Factors in Computing Systems
Subtitle of host publicationEngage with CHI
ISBN (Electronic)9781450356206, 9781450356213
DOIs
StatePublished - 20 Apr 2018
Event2018 CHI Conference on Human Factors in Computing Systems, CHI 2018 - Montreal, Canada
Duration: 21 Apr 201826 Apr 2018

Publication series

NameConference on Human Factors in Computing Systems - Proceedings
Volume2018-April

Conference

Conference2018 CHI Conference on Human Factors in Computing Systems, CHI 2018
Country/TerritoryCanada
CityMontreal
Period21/04/1826/04/18

Keywords

  • Audio-only interaction
  • Error correction
  • Eyes-free use
  • Speech dictation
  • Synthesized speech
  • Text entry

Fingerprint

Dive into the research topics of 'Identifying speech input errors through audio-only interaction'. Together they form a unique fingerprint.

Cite this