Understanding the causal mechanisms underlying an observed phenomenon is one of the primary goals of science. The realization that statistical associations in themselves are insufficient for elucidating those mechanisms has led researchers to enrich traditional statistical analysis with techniques based on "causal inference". Most of the recent advances in the field, however, operate under overly optimistic assumptions, which are often not met in practical, large-scale situations. This project seeks to develop a sound and general causal inference theory to cover those situations. The goal is to design a framework for decision-making of intelligent systems, including (1) learning a causal representation of the data-generating environment (learning), (2) performing efficient inference leveraging the learned model (planning/inference), and (3) using the new inferred representation, based on (1) and (2), to decide how to act next (decision-making). The new finding will benefit investigators in every area of the empirical sciences, including artificial intelligence, machine learning, statistics, economics, and the health and social sciences. The research is expected to fundamentally change the practice of data science in areas where the standard causal assumptions are violated (i.e., missing data, selection bias, and confounding bias). The work on decision-making is expected to pave the way toward the design of an "automated scientist", i.e., a program that combines both observational and experimental data, conducts its own experiments, and decides on the best choices of actions and policies. The project will also help to disseminate the principles of causal inference throughout the sciences by (1) engaging in the establishment of new "data science" curriculum where causal inference plays a central role, and (2) developing new educational materials for students and the general public explaining the practice of causal inference (e.g., book). Furthermore, the project supports the causal inference community by fostering a number of educational initiatives such as forums, workshops, and the creation of new incentives for the development of educational material (e.g., a "Causality Education Award").
Making claims about the existence of causal connections (structural learning), the magnitude of causal effects (identification), and designing optimal interventions (decision-making) are some of the most important tasks found throughout data-driven fields. This project will study identification, learning, and decision-making settings where (1) data are missing not at random, (2) non-parametric estimation is not feasible, and (3) aggregated behavior does not translate into guidance for individual-level decision-making. Specifically, the project will consider the problem when measurements are systematically distorted (missing data), which has received an enormous amount of attention in the statistical literature, but has not essentially been investigated in the context of causal inference when data are missing not at random. The project will further aim to leverage the special properties of linear models, the most common first approximation to non-parametric causal inference, to elucidate causal relationships in data, and to facilitate sensitivity analysis in such models. Finally, the project will consider the fundamental problem on how causal and counterfactual knowledge can speed-up experimentation and support principled decision-making. The goal is to develop a complete algorithmic theory to determine when a particular causal effect can be learned from data and how to incorporate causal knowledge learned (possibly by experimentation) so that it can be amortized over new environmental conditions.