Text-to-SQL Generation for Question Answering on Electronic Medical Records

Ping Wang, Tian Shi, Chandan K. Reddy

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

53 Scopus citations

Abstract

Electronic medical records (EMR) contain comprehensive patient information and are typically stored in a relational database with multiple tables. Effective and efficient patient information retrieval from EMR data is a challenging task for medical experts. Question-to-SQL generation methods tackle this problem by first predicting the SQL query for a given question about a database, and then, executing the query on the database. However, most of the existing approaches have not been adapted to the healthcare domain due to a lack of healthcare Question-to-SQL dataset for learning models specific to this domain. In addition, wide use of the abbreviation of terminologies and possible typos in questions introduce additional challenges for accurately generating the corresponding SQL queries. In this paper, we tackle these challenges by developing a deep learning based TRanslate-Edit Model for Question-to-SQL (TREQS) generation, which adapts the widely used sequence-to-sequence model to directly generate the SQL query for a given question, and further performs the required edits using an attentive-copying mechanism and task-specific look-up tables. Based on the widely used publicly available electronic medical database, we create a new large-scale Question-SQL pair dataset, named MIMICSQL, in order to perform the Question-to-SQL generation task in healthcare domain. An extensive set of experiments are conducted to evaluate the performance of our proposed model on MIMICSQL. Both quantitative and qualitative experimental results indicate the flexibility and efficiency of our proposed method in predicting condition values and its robustness to random questions with abbreviations and typos.

Original languageEnglish
Title of host publicationThe Web Conference 2020 - Proceedings of the World Wide Web Conference, WWW 2020
Pages350-361
Number of pages12
ISBN (Electronic)9781450370233
DOIs
StatePublished - 20 Apr 2020
Event29th International World Wide Web Conference, WWW 2020 - Taipei, Taiwan, Province of China
Duration: 20 Apr 202024 Apr 2020

Publication series

NameThe Web Conference 2020 - Proceedings of the World Wide Web Conference, WWW 2020

Conference

Conference29th International World Wide Web Conference, WWW 2020
Country/TerritoryTaiwan, Province of China
CityTaipei
Period20/04/2024/04/20

Keywords

  • SQL query.
  • Sequence-to-sequence model
  • attention mechanism
  • electronic medical records
  • pointer-generator network

Fingerprint

Dive into the research topics of 'Text-to-SQL Generation for Question Answering on Electronic Medical Records'. Together they form a unique fingerprint.

Cite this