EAGER: A Domain-Informed Generative Framework for Joint Learning of Public Medical Knowledge and Individual Health Records

Project: Research project

Project Details

Description

Access to comprehensive knowledge about diseases, conditions, and medications is not only empowering but also essential for the public to understand complex medical information, leading to better personal well-being. However, current medical information sources, such as Google Knowledge Graph, often have limited scope, primarily covering a small subset of well-known diseases. Public data sources like academic papers on uncommon diseases are not well-structured or easily understood by the general public. This project aims to create a flexible and open-resource medical knowledge base called FORMED, providing multi-faceted information on a wide range of diseases and conditions for public access. This knowledge base will include well-structured sections on symptoms, causes, and treatments, enabling efficient disease classification and indexing. By integrating medical knowledge with individual health records, the project will also evaluate its effectiveness in predicting individual health risks for uncommon diseases. Additionally, this project will involve educational initiatives such as developing new courses on large language models; conducting interdisciplinary research activities to train graduate, undergraduate, and high-school students in data science and bioinformatics; and increasing participation of women and minority groups in academic research. All core outcomes of this project, including software, datasets, and publications, will be made available to the general public.The goal of this project is twofold: (1) to create a public-oriented medical knowledge base called FORMED, covering a wide range of diseases, conditions, and medications in the current disease classification system with descriptive attributes including symptoms, causes, and treatments; and (2) to develop a temporal health outcome prediction and generation framework to evaluate the generated knowledge base with individual health records. This project creates a set of technologies for semi-structured text generation, knowledge graph construction, and mixed-structure temporal data prediction as well as generation. Specifically, the research activities include: (1) Developing hyperbolic embedding-enhanced domain-specific large language models for building FORMED; (2) Constructing a knowledge graph from FORMED to represent the logical concepts of disease characteristics and causes; (3) Designing novel learning and prompting strategies to augment the reasoning capability of large language models with knowledge graphs; and (4) Building a robust testing platform to evaluate the effectiveness of the generated knowledge graph for forecasting individual health risks. The establishment of comprehensive medical knowledge bases will significantly enhance public understanding of uncommon diseases and improve the inference capabilities of generative models, enabling searching-based services. Research outcomes of this project will be disseminated in peer-reviewed publications, tutorials, seminars, and workshops.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
StatusActive
Effective start/end date1/10/2430/09/26

Funding

  • National Science Foundation

Fingerprint

Explore the research topics touched on by this project. These labels are generated based on the underlying awards/grants. Together they form a unique fingerprint.