CRII: RI: Formalizing Human-Interpretable Machine Learning

  • Ho, Mark (PI)

Project: Research project

Project Details

Description

Advances in artificial intelligence have the potential to dramatically benefit society, but whether this possibility is realized depends on whether researchers, industry leaders, and policy-makers can ensure that autonomous systems are aligned with human goals, expectations, and cognitive strategies. To this end, this project will develop approaches for training artificial intelligence systems, such as self-driving cars, to make their decision-making processes more transparent and interpretable to people. A distinctive feature of this project is that it will pair human studies with algorithm design to ensure that the computational methods that are developed are informed by an experimentally-grounded understanding of human psychology. In the longer term, this project will bring the fields of psychology and artificial intelligence into closer contact with one another by facilitating the development of shared methodologies and theoretical tools to build trustworthy systems.The goals of this project are twofold. First, it aims to develop a theoretical framework that formalizes human interpretability in terms of the cognitive cost of the simplest mental models that account for an autonomous system’s behavior. The correspondence of different quantitative predictions of this framework with actual human judgments will be validated through a series of rigorously designed behavioral experiments with human participants. Second, the project will develop new deep reinforcement learning algorithms that use the proposed formalism to optimize for human interpretability directly. The key emphasis of the project is to develop novel, psychologically-grounded approaches to human-interpretable machine learning that meaningfully bridge contemporary research in cognitive science and artificial intelligence.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
StatusFinished
Effective start/end date15/04/2430/09/24

Funding

  • National Science Foundation

Fingerprint

Explore the research topics touched on by this project. These labels are generated based on the underlying awards/grants. Together they form a unique fingerprint.