Kristima Guha Thakurta - MS Final Oral
Speaker:Kristima Guha Thakurta - MS Final Oral
Title: Studying effectiveness of recovering pairwise causal relations from data with application to Bayesian networks
Abstract: Causal structure discovery is a much studied topic and a fundamental problem in Machine Learning. In this thesis, the aim is to determine the direction of the arrow that will exist between a pair of variables. Causal inference is the process of recovering the cause-effect relationships between the variables in a dataset. In general, causal inference problem is to decide whether X causes Y, Y causes X, or there exists an indirect relationship between X and Y via a confounder or not. Even under very stringent assumptions, causal structure discovery problems are challenging. Much work has been done on bivariate causal discovery methods in recent years. In this thesis, an attempt has been made to extend the bivariate case to the possibility of having at least one confounder between X and Y. Attempts have been made to extend the causal inference process to recover the structure of Bayesian networks from data. The contributions of this thesis include (a) extending causal discovery methods to the networks with exactly one confounder (third variable) ; (b) an algorithm to recover the causal graph between every pair of variables with the presence of a confounder in a large dataset; (c) employing a depth - first search algorithm based on the probability values for the different labels of the model for generating a total order of the variables; (d) the collection of multiple orders is fit to the ordering-based search algorithms to recover the Bayesian network scores for the best network learned using score-and-search methods.
Improved results have been achieved after the introduction of confounders in the bivariate causal graphs.
Further attempts have been made to improve the Bayesian network scores for the network structures of some medium to large sized networks using the standard ordering based search algorithms such as OBS and WINASOBS. Performance of the methods proposed have been tested on the benchmark datasets for cause-effect pairs and from the BLIP library.