Hridesh Rajan and Tien Nguyen have received an NSF EAGER grant for the Boa project. NSF EAGER grants are used to support exploratory work in its early stages on untested, but potentially transformative research ideas or approaches. Projects funded under EAGER are sometimes considered especially "high risk-high payoff" in the sense that they involve radically different approaches, apply new expertise, or engage novel disciplinary or interdisciplinary perspectives.
Boa is a domain-specific programming language and an infrastructure for analyzing ultra-large-scale software repositories, e.g. SourceForge (350,000+ projects), GitHub (250,000+ projects), and Google Code (250,000+ projects), etc. These repositories contain an enormous corpus of software and related information about software projects.
Scientists and engineers alike are interested in analyzing this wealth of information both for curiosity as well as for testing important hypotheses. However, the current barrier to entry is often prohibitive and only a few with well-established research infrastructure and deep expertise in mining software repositories can attempt such ultra-large-scale experiments. Necessary expertise includes: programmatically accessing version control system, data storage and retrieval, data mining, and parallelization.
Need to have expertise in these four different areas significantly increase the cost of scientific research that attempts to answer research questions involving the ultra-large scale software repositories. As a result, experiments are irreproducible, reusability of experimental infrastructure low, and data associated and produced by such experiments is often lost and becomes inaccessible and obsolete, because there is no systematic curation. Boa makes analyzing ultra-large-scale software repositories both easy and fast.
More information about the Boa project, including a programmer's guide, researcher's guide, and user account information is available at boa.cs.iastate.edu.