MS Final Oral Exam: Sai Harish Uthravalli

MS Final Oral Exam: Sai Harish Uthravalli

Mar 12, 2025 - 12:30 PM
to , -

Dynamic, Unconstrained Optimization of Secreted Enzyme Production in Fed-batch Fermentation Using Reinforcement Learning

Reinforcement learning (RL) has been used to control a wide range of dynamic processes, especially ones that are too complex to model well or have stochastic environmental perturbations. Fed-batch fermentations are subject to changes in starting cell growth rates and process variations that can affect cell growth and secreted target production. RL has been shown on digital environments of fermentation to control known setpoints (such as temperature) but has yet to be demonstrated for unconstrained product maximization. In this work we develop a fed-batch fermentation model (digital twin) of Aspergillus niger secreting amylase using the Monod model, known literature parameters, and assumed constants to align with typical production values. An RL agent is trained on this environment to evaluate types of algorithms (Proximal Policy Optimization (PPO) and Soft Actor-Critic (SAC)), rate of learning, and effects of process perturbations. State variables fed to the model include run time, cell concentration, and measured enzyme activity in the fermentation broth, with the objective of maximizing the enzyme activity. It is found that SAC outperforms PPO, achieving 87.4% of the maximum quality with 190 training episodes, compared to PPO’s 63.7% after 1,637 episodes. The RL controller is benchmarked against a traditional, model free controller that used Bayesian optimization to discover the optimal feed rate for a given cell type. The traditional controller can be implemented with fewer training runs, however is not as robust when exposed to variations in starting cell growth or process perturbations including faulty feed or cooling pumps. In all cases the RL controller can maintain higher enzyme production, despite changes in the process. Finally, the RL controller is exposed to new cell types (in silico) to determine the experimental cost of updating the trained model with real bioreactor runs. Surprisingly, we found that with no updates the model can perform well across a wide range of new cell types, and that with retraining the quality of performance improves. These results indicate that an in silico trained RL agent, can be updated with an array of fermentation experiments to provide robust fermentation control.

Committee: Baskar Ganapathysubramanian (major professor), Pavan Aduri, and Nigel Reuel