Interpretability of Configurable Software in the Biosciences
Users of bioinformatics software tools range from bench scientists with little computational experience, to sophisticated developers. As the number and types of tools available to this diverse set of users grow, they are also increasing in flexibility. The customization of these tools makes them highly-configurable — where the end user is provided with many customization (configuration) options. At the same time, biologists and chemists are engineering living organisms by programming their DNA in a process that mimics software development. As they share their designs and promote re-use, their programs are also emerging as highly-configurable.
As these bioscience systems become mainstream tools for the biology and bioinformatics communities, their dependability, reliability, and reproducibility becomes critical. Scientists are making decisions and drawing conclusions based on the software they use, and the constructs designed by synthetic biologists are being built into living organisms and used in the real world. Yet there is little help guiding users of bioinformatics tools or those building new synthetic organisms.
As an end user equipped with minimal information, it is hard to predict the effect of changing a particular configuration option, yet the choice of configuration can lead to a large amount of variation in functionality and performance. Even if the configuration options make sense to an expert user, understanding all options and their interactions is difficult or even impossible to compute due to the exponential number of combinations. Similarly, synthetic biologists must choose how to combine small DNA segments. However, there can be millions of ways to combine these pieces, and determining the architecture can require significant domain knowledge.
In this dissertation we address these challenges of interpreting the effects of configurability in two areas in the biosciences: (1) bioinformatics software, and (2) synthetic biology. We highlight the challenges of configurability in these areas and provide approaches to help users navigate their configuration spaces leading to more interpretable configurable software in the biosciences.
First, we demonstrate there is variability in both the functional and performance outcomes of highly-configurable bioinformatics tools, and find previously undetected faults. We discuss the implications of this variability, and provide suggestions for developers. Second, we develop a user-oriented framework to identify the effect of changing configuration options in software, and communicate these effects to the end user in a simplistic format. We demonstrate our framework in a large study and compare to a state of the art method for performance-influence modeling in software.
Last, we define a mapping of software product line engineering to the domain of synthetic biology resulting in organic software product lines. We demonstrate the potential reuse and existence of both commonality and variability in an open source synthetic biology repository. We build feature models for four commonly engineered biological functions and demonstrate how product line engineering can benefit synthetic biologists.
Committee: Myra Cohen (major professor), Samik Basu, Robyn Lutz, Julie Dicerkson, and James Lathrop