Neuro-Symbolic Program Generation and Execution for Hybrid Reasoning
Neuro-symbolic learning aims to combine neural networks and symbolic reasoning for a hybrid AI. It can offer many desiderata of human-like intelligence, including explainability, efficiency, compositionality, and robustness that are direly lacking in the monolithic deep neural networks of today. A well-designed interface between deep learning and symbolic reasoning provides a structural learning prior that can lead to performance improvements and state-of-the-art results experimentally. In the bigger picture, the integration of deep learning and symbolic reasoning is an algorithm that unifies empiricism and rationalism, two branches of epistemology in philosophy that explain human acquisition of knowledge, which makes neuro-symbolic reasoning a fundamentally important problem that may one day lead to AGI. This dissertation explores various problems and methods of neuro-symbolic learning with computer programs as the symbolic form, targeting program generation and execution as the two main topics.
First, Neuron Dependency Graphs (NDGs) discover symbolic rules that exist commonly in trained neural networks and represent them as directed graphs, where each node corresponds to the boolean activation value of a neuron, and each edge models an approximate logical implication from one node to another. In addition to providing symbolic explanations of the neural network’s internal structure, an NDG can represent a Structural Causal Model (SCM) that is a causal abstraction of the corresponding neural network that "unfolds" the same way under interventions.
Then, NSEdit designs a domain-specific language (DSL) as the interface for Transformers to edit code. The DSL interface allows localization, insertion, and deletion, and a neuro-symbolic bi-modal decoder learns to perform bug localization and repair jointly with it, predicting mixed data types, including editing actions, locations, and words. When published, NSEdit achieved the state-of-the-art program repair performance.
Next, Neural Interpretation (NI) presents a neural model for procedural code execution, where each function is represented by a neural network, and every variable is represented by a vector. NI resembles how humans abstractly understand how computers would execute programs from top to bottom, without knowing how the program will actually run step-by-step. In experiments, we show that the neuro-symbolic interpreter can be trained end-to-end with gradient descent. The method can be trained to “execute” library functions without test inputs, because the variables are represented as vectors and do not require the actual values or entry points.
Following Neural Interpretation, a Neuro-symbolic Interpreter for Arithmetic Composition (NIAC) demonstrates the compositional generalization ability of NI when performing arithmetic calculation. NIAC learns a structure-preserving mapping between neural execution and arithmetic calculation. Unlike LLMs that lack compositional generalization with respect to productivity (length) and systematicity (format), NIAC guarantees perfect compositional generalization and uses constant memory for potentially infinite input length during inference.
Additionally, TableLabler uses LLMs for tabular dataset construction with large language models. The dataset construction method addresses program synthesis for new programming languages with very little training data through in-context learning.
Finally, QualityFlow proposes an agentic workflow method for program synthesis, consisting of software engineering roles including program generator, test designer, and self-debugger, all of which are controlled by a centralized quality checker. QualityFlow achieved the state-of-the-art performance on various program synthesis benchmarks.
Committee: Dr. Jin Tian (co-major professor), Dr. Chris Quinn (co-major professor), Dr. Ryan Martin, Dr. Ali Jannesari, and Dr. Hongyang Gao
Join on Zoom: https://iastate.zoom.us/j/91625622314?pwd=us2LypLUecPFrzzLKNMXr6wh1UyeLI.1
Zoom Passcode: 964132