Ganesha Upadhyaya: PhD Final Oral

Ganesha Upadhyaya
Friday, November 17, 2017 - 10:00am
223 Atanasoff
Event Type: 

First Name: Ganesha
Last Name: Upadhyaya
Major Professor:  Rajan, Hridesh
Committee Member 1:  Aduri, Pavankumar
Committee Member 2:  Le, Wei
Committee Member 3:  Lutz, Robyn
Committee Member 4: Arun Somani

Status:  PhD Final Oral
Date: Fri, 2017-11-17
Time: 10:00 am
Location: 223 Atanasoff Hall

Title: Collective Program Analysis
Abstract: Popularity of data-driven software engineering has led to an
increasing demand on the infrastructures to support efficient execution of
tasks that require deeper source code analysis. Extant techniques have
focused on leveraging distributed computing to meet the demand, but with a
concomitant increase in computational resource needs. This thesis presents
collective program analysis (CPA), a technique for scaling large scale source
code analysis by leveraging analysis specific similarity. Analysis specific
similarity is about, whether two or more programs can be considered similar
for a given analysis. The key idea of collective program analysis is to
cluster programs based on analysis specific similarity, such that running the
analysis on one candidate in each cluster is specific to produce the result
for others. For determining the analysis specific similarity and for
clustering analysis-equivalent programs, we use a sparse representation and a
canonical labeling scheme. A sparse representation contains only the parts
that are relevant for the analysis and the canonical labeling helps with
finding isomorphic sparse representations. In a nutshell, two or more
programs with same sparse representation must behave similarly for the given
analysis. Our evaluation shows that for a variety of source code analysis
tasks when run on a large dataset of programs, our technique is able to
achieve substantial reduction in the analysis times; on average 69% when
compared to baseline and on average 36% when compared to a prior technique.
We also show that there exists a large amount of analysis-equivalent programs
in large datasets for a variety of analysis.