TY - CHAP
T1 - Exploring User Behavior and Validation Proficiency in Assessing Responses From a Conversational Agent
AU - Huang, Jiayin
AU - Hong, Jonggi
N1 - Publisher Copyright:
© 2025. Published by AHFE.
PY - 2025
Y1 - 2025
N2 - With the rapid development of large language models (LLMs) like ChatGPT, conversational agents are becoming popular alternatives to traditional search engines. However, the ability to distinguish between replies generated by conversational agents and accurate information, along with user behavior in validating these replies, remains unclear. This study examines users’ behavior and their ability to detect incorrect responses from ChatGPT, both with and without Google search results for validation, through a user study with 15 participants. Participants assessed ChatGPT’s answers to questions about Alzheimer’s Disease, which had an accuracy rate of 93.33% (28/30) and an error rate of 6.67% (2/30). Interestingly, when Google search results were available, participants tended to view both correct and incorrect responses favorably. These findings provide insights into the strategies users employ to validate conversational agents’ responses, highlighting differences in behavior with and without the assistance of search engines.
AB - With the rapid development of large language models (LLMs) like ChatGPT, conversational agents are becoming popular alternatives to traditional search engines. However, the ability to distinguish between replies generated by conversational agents and accurate information, along with user behavior in validating these replies, remains unclear. This study examines users’ behavior and their ability to detect incorrect responses from ChatGPT, both with and without Google search results for validation, through a user study with 15 participants. Participants assessed ChatGPT’s answers to questions about Alzheimer’s Disease, which had an accuracy rate of 93.33% (28/30) and an error rate of 6.67% (2/30). Interestingly, when Google search results were available, participants tended to view both correct and incorrect responses favorably. These findings provide insights into the strategies users employ to validate conversational agents’ responses, highlighting differences in behavior with and without the assistance of search engines.
KW - Artificial intelligence
KW - Computing methodologies
KW - Empirical studies in HCI
KW - Human-centered computing
KW - Human-computer interaction (HCI)
UR - https://www.scopus.com/pages/publications/105031127606
UR - https://www.scopus.com/pages/publications/105031127606#tab=citedBy
U2 - 10.54941/ahfe1006707
DO - 10.54941/ahfe1006707
M3 - Chapter
AN - SCOPUS:105031127606
T3 - Applied Human Factors and Ergonomics International
SP - 139
EP - 148
BT - Applied Human Factors and Ergonomics International
PB - AHFE International
ER -