TY - GEN
T1 - Linkso
T2 - 4th ACM SIGSOFT International Workshop on NLP for Software Engineering, NL4SE 2018, co-located with FSE 2018
AU - Liu, Xueqing
AU - Wang, Chi
AU - Leng, Yue
AU - Zhai, Cheng Xiang
N1 - Publisher Copyright:
© 2018 Association for Computing Machinery.
PY - 2018/11/4
Y1 - 2018/11/4
N2 - We present LinkSO, a dataset for learning to rank similar questions on Stack Overflow. Stack Overflow contains a massive amount of crowd-sourced question links of high quality, which provides a great opportunity for evaluating retrieval algorithms for community-based question answer (cQA) archives and for learning to retrieve similar questions. However, due to the existence of missing links, one question is whether question links can be readily used as the relevance judgment for evaluation. We study this question by measuring the closeness between question links and the relevance judgment, and we find their agreement rates range from 80% to 88%. We conduct an empirical study that evaluates existing retrieval models’ performance on LinkSO. While existing work focuses on non-learning approaches, our preliminary exploration that assembles simple learning models shows great potential for further improving the retrieval performance with machine learning.
AB - We present LinkSO, a dataset for learning to rank similar questions on Stack Overflow. Stack Overflow contains a massive amount of crowd-sourced question links of high quality, which provides a great opportunity for evaluating retrieval algorithms for community-based question answer (cQA) archives and for learning to retrieve similar questions. However, due to the existence of missing links, one question is whether question links can be readily used as the relevance judgment for evaluation. We study this question by measuring the closeness between question links and the relevance judgment, and we find their agreement rates range from 80% to 88%. We conduct an empirical study that evaluates existing retrieval models’ performance on LinkSO. While existing work focuses on non-learning approaches, our preliminary exploration that assembles simple learning models shows great potential for further improving the retrieval performance with machine learning.
KW - Community-based question answering
KW - Information retrieval
UR - http://www.scopus.com/inward/record.url?scp=85061829762&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85061829762&partnerID=8YFLogxK
U2 - 10.1145/3283812.3283815
DO - 10.1145/3283812.3283815
M3 - Conference contribution
AN - SCOPUS:85061829762
T3 - NL4SE 2018 - Proceedings of the 4th ACM SIGSOFT International Workshop on NLP for Software Engineering, Co-located with FSE 2018
SP - 2
EP - 5
BT - NL4SE 2018 - Proceedings of the 4th ACM SIGSOFT International Workshop on NLP for Software Engineering, Co-located with FSE 2018
A2 - Yu, Yijun
A2 - Fredericks, Erik
A2 - Devanbu, Premkumar
Y2 - 4 November 2018
ER -