MS Final Oral Exam: Toral Chauhan
Benchmarking Adversarial Agents for Strategy Mitigation in Turn-Based Tactical Games
Adversarial gameplay modeling in turn-based tactical environments poses a fundamental challenge: designing agents that not only compete effectively but also adapt intelligently to their opponents' behavioral patterns. Existing approaches rarely distinguish between agents that exploit structural flaws in opponent training and those that exhibit genuine strategic adaptation, leaving a critical gap in both evaluation methodology and adversarial design. This work addresses that gap by developing a three-model adversarial gameplay system for a custom turn-based tactical game inspired by Into the Breach.
The system comprises three interdependent components. A reinforcement learning-based Player Model is trained under multiple distinct reward structures to produce a corresponding set of behavioral variant including Balanced, Aggressive, Defensive, Guardian, and Efficient play styles generating a substantial dataset of winning game sequences across a curated pool of verified maps. A Vector Quantized Variational Autoencoder (VQ-VAE) Strategy Classifier processes sequences of player actions in real time, mapping turn-level behavior to discrete codebook entries that represent the learned strategy categories. The primary contribution of this work is a multiadversary benchmarking framework implementing a diverse library of 19 adversarial agents spanning rule-based, search-based, and learning-based paradigms. These agents were evaluated in a controlled, fixed-player experimental design holding the opponent constant across all runs, enabling clean attribution of performance differences to adversarial strategy alone.
Experimental results reveal a pronounced performance gap between a small number of high-performing adversaries and the broader field. The top-performing rule-based agent achieves its dominance by targeting game objectives that the Proximal Policy Optimization (PPO) player was not trained to defend, exposing a structural reward misalignment rather than demonstrating genuine strategic intelligence. In contrast, the best-performing adaptive agent succeeds through a qualitatively different mechanism: online belief updates over inferred player behavioral profiles, with meaningful win rate variation across player variants confirming genuine real-time adaptation. Search-based adversaries perform the worst overall, reflecting a structural mismatch between treebased planning approaches and the game’s short-horizon nature. Distribution analysis further confirms that only the top two adversaries produce tight, high-scoring performance profiles consistent with systematic and reliable effectiveness.
A novel Value-Based Reward Shaping (VBRS) adversary is introduced, grounded in recent published research, that blends immediate environment rewards with a learned long-term value estimate via an adaptive mixing parameter. VBRS is the only agent in the framework designed to improve continuously with experience: the mixing parameter starts at zero and increases as the critic's confidence matures, so early-stage results reflect incomplete training rather than the adversary’s ultimate capability. This agent is positioned as the natural foundation for future strategy-conditioned adversarial policies, designed to integrate with the real-time strategy classifier once full training converges.
This work establishes a reproducible, controlled benchmarking methodology for evaluating adversarial agents in game environments and demonstrates that aggregate win rate alone is insufficient to characterize adversarial effectiveness. The critical distinction between structural exploitation and genuine strategic adaptation carries direct implications for both adversary design and player model robustness.
Committee: Simanta Mitra (major professor)