Linkso: A dataset for learning to retrieve similar question answer pairs on software development forums

Xueqing Liu, Chi Wang, Yue Leng, Cheng Xiang Zhai

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

12 Scopus citations

Abstract

We present LinkSO, a dataset for learning to rank similar questions on Stack Overflow. Stack Overflow contains a massive amount of crowd-sourced question links of high quality, which provides a great opportunity for evaluating retrieval algorithms for community-based question answer (cQA) archives and for learning to retrieve similar questions. However, due to the existence of missing links, one question is whether question links can be readily used as the relevance judgment for evaluation. We study this question by measuring the closeness between question links and the relevance judgment, and we find their agreement rates range from 80% to 88%. We conduct an empirical study that evaluates existing retrieval models’ performance on LinkSO. While existing work focuses on non-learning approaches, our preliminary exploration that assembles simple learning models shows great potential for further improving the retrieval performance with machine learning.

Original languageEnglish
Title of host publicationNL4SE 2018 - Proceedings of the 4th ACM SIGSOFT International Workshop on NLP for Software Engineering, Co-located with FSE 2018
EditorsYijun Yu, Erik Fredericks, Premkumar Devanbu
Pages2-5
Number of pages4
ISBN (Electronic)9781450360555
DOIs
StatePublished - 4 Nov 2018
Event4th ACM SIGSOFT International Workshop on NLP for Software Engineering, NL4SE 2018, co-located with FSE 2018 - Lake Buena Vista, United States
Duration: 4 Nov 2018 → …

Publication series

NameNL4SE 2018 - Proceedings of the 4th ACM SIGSOFT International Workshop on NLP for Software Engineering, Co-located with FSE 2018

Conference

Conference4th ACM SIGSOFT International Workshop on NLP for Software Engineering, NL4SE 2018, co-located with FSE 2018
Country/TerritoryUnited States
CityLake Buena Vista
Period4/11/18 → …

Keywords

  • Community-based question answering
  • Information retrieval

Fingerprint

Dive into the research topics of 'Linkso: A dataset for learning to retrieve similar question answer pairs on software development forums '. Together they form a unique fingerprint.

Cite this