TY - GEN
T1 - Identifying speech input errors through audio-only interaction
AU - Hong, Jonggi
AU - Findlater, Leah
N1 - Publisher Copyright:
© 2018 Association for Computing Machinery.
PY - 2018/4/20
Y1 - 2018/4/20
N2 - Speech has become an increasingly common means of text input, from smartphones and smartwatches to voice-based intelligent personal assistants. However, reviewing the recognized text to identify and correct errors is a challenge when no visual feedback is available. In this paper, we first quantify and describe the speech recognition errors that users are prone to miss, and investigate how to better support this error identification task by manipulating pauses between words, speech rate, and speech repetition. To achieve these goals, we conducted a series of four studies. Study 1, an in-lab study, showed that participants missed identifying over 50% of speech recognition errors when listening to audio output of the recognized text. Building on this result, Studies 2 to 4 were conducted using an online crowdsourcing platform and showed that adding a pause between words improves error identification compared to no pause, the ability to identify errors degrades with higher speech rates (300 WPM), and repeating the speech output does not improve error identification. We derive implications for the design of audio-only speech dictation.
AB - Speech has become an increasingly common means of text input, from smartphones and smartwatches to voice-based intelligent personal assistants. However, reviewing the recognized text to identify and correct errors is a challenge when no visual feedback is available. In this paper, we first quantify and describe the speech recognition errors that users are prone to miss, and investigate how to better support this error identification task by manipulating pauses between words, speech rate, and speech repetition. To achieve these goals, we conducted a series of four studies. Study 1, an in-lab study, showed that participants missed identifying over 50% of speech recognition errors when listening to audio output of the recognized text. Building on this result, Studies 2 to 4 were conducted using an online crowdsourcing platform and showed that adding a pause between words improves error identification compared to no pause, the ability to identify errors degrades with higher speech rates (300 WPM), and repeating the speech output does not improve error identification. We derive implications for the design of audio-only speech dictation.
KW - Audio-only interaction
KW - Error correction
KW - Eyes-free use
KW - Speech dictation
KW - Synthesized speech
KW - Text entry
UR - http://www.scopus.com/inward/record.url?scp=85046956521&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85046956521&partnerID=8YFLogxK
U2 - 10.1145/3173574.3174141
DO - 10.1145/3173574.3174141
M3 - Conference contribution
AN - SCOPUS:85046956521
T3 - Conference on Human Factors in Computing Systems - Proceedings
BT - CHI 2018 - Extended Abstracts of the 2018 CHI Conference on Human Factors in Computing Systems
T2 - 2018 CHI Conference on Human Factors in Computing Systems, CHI 2018
Y2 - 21 April 2018 through 26 April 2018
ER -