TY - GEN
T1 - Information retrieval evaluation as search simulation
T2 - 7th ACM SIGIR International Conference on the Theory of Information Retrieval, ICTIR 2017
AU - Zhang, Yinan
AU - Liu, Xueqing
AU - Zhai, Chengxiang
N1 - Publisher Copyright:
© 2017 Copyright held by the owner/author(s).
PY - 2017/10/1
Y1 - 2017/10/1
N2 - While the Cranfield evaluation methodology based on test collections has been very useful for evaluating simple IR systems that return a ranked list of documents, it has significant limitations when applied to search systems with interface features going beyond a ranked list, and sophisticated interactive IR systems in general. In this paper, we propose a general formal framework for evaluating IR systems based on search session simulation that can be used to perform reproducible experiments for evaluating any IR system, including interactive systems and systems with sophisticated interfaces. We show that the traditional Cranfield evaluation method can be regarded as a special instantiation of the proposed framework where the simulated search session is a user sequentially browsing the presented search results. By examining a number of existing evaluation metrics in the proposed framework, we reveal the exact assumptions they have made implicitly about the simulated users and discuss possible ways to improve these metrics. We further show that the proposed framework enables us to evaluate a set of tag-based search interfaces, a generalization of faceted browsing interfaces, producing results consistent with real user experiments and revealing interesting findings about effectiveness of the interfaces for different types of users.
AB - While the Cranfield evaluation methodology based on test collections has been very useful for evaluating simple IR systems that return a ranked list of documents, it has significant limitations when applied to search systems with interface features going beyond a ranked list, and sophisticated interactive IR systems in general. In this paper, we propose a general formal framework for evaluating IR systems based on search session simulation that can be used to perform reproducible experiments for evaluating any IR system, including interactive systems and systems with sophisticated interfaces. We show that the traditional Cranfield evaluation method can be regarded as a special instantiation of the proposed framework where the simulated search session is a user sequentially browsing the presented search results. By examining a number of existing evaluation metrics in the proposed framework, we reveal the exact assumptions they have made implicitly about the simulated users and discuss possible ways to improve these metrics. We further show that the proposed framework enables us to evaluate a set of tag-based search interfaces, a generalization of faceted browsing interfaces, producing results consistent with real user experiments and revealing interesting findings about effectiveness of the interfaces for different types of users.
KW - IR evaluation
KW - Interface card
KW - User simulation
UR - http://www.scopus.com/inward/record.url?scp=85033237683&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85033237683&partnerID=8YFLogxK
U2 - 10.1145/3121050.3121070
DO - 10.1145/3121050.3121070
M3 - Conference contribution
AN - SCOPUS:85033237683
T3 - ICTIR 2017 - Proceedings of the 2017 ACM SIGIR International Conference on the Theory of Information Retrieval
SP - 193
EP - 200
BT - ICTIR 2017 - Proceedings of the 2017 ACM SIGIR International Conference on the Theory of Information Retrieval
Y2 - 1 October 2017 through 4 October 2017
ER -