In today's software-centric world, ultra-large-scale software repositories, e.g. SourceForge (350,000+ projects), GitHub (250,000+ projects), and Google Code (250,000+ projects) are the new library of Alexandria. They contain an enormous corpus of software and information about software. Scientists and engineers alike are interested in analyzing this wealth of information both for curiosity as well as for testing important research hypotheses. However, the current barrier to entry is often prohibitive and only a few with well-established research infrastructure and deep expertise in mining software repositories can attempt such ultra-large-scale experiments.
A facility called Boa has been prototyped: a domain-specific language and a BIGDATA repository containing 700,000+ open source projects for analyzing ultra-large scale software repositories to help with such experiments.
This experimental research infrastructure is of significant interest to a wide community of software engineering and programming language researchers. The main goal of this EAGER project is to examine the requirements for making Boa broadly available to the software engineering and programming language community, to work with an initial set of researchers to try to fulfill these requirements, and to take preliminary steps toward making Boa a community-sustained, scalable, and extensible research infrastructure. This is an enabling and transformative project. Its success will aid and accelerate scientific research in software engineering, allowing scientists and engineers to focus on the essential tasks. This advance will primarily be achieved by significantly lowering the barrier to entry and thus enabling a larger and more ambitious line of data-intensive scientific discovery in this area.