Player Characters (NPCs) play a central role in interactive environments, where adaptive and context-aware behavior is essential for engaging gameplay. Traditional rule-based and utility-driven approaches often lack adaptability and contextually coherent behaviors, while Reinforcement Learning (RL) and Large Language Models (LLMs) offer complementary strengths but exhibit distinct limitations: RL suffers from training inefficiency and limited generalization, whereas LLMs are prone to hallucinations and context drift. This paper introduces HeRoN, a mediated framework that integrates RL and LLMs through functional separation and critique-based refinement to enable coherent and strategically adaptive NPC behavior. The architecture comprises an RL-controlled NPC policy for action execution, an LLM-based strategy generator providing context-aware action proposals, and a lightweight reviewer that refines these proposals to enforce consistency with environment constraints. Through experiments in two structurally distinct custom game environments, we show that early LLM-mediated guidance improves exploration efficiency and generalization. Compared to standard RL baselines, HeRoN achieves up to an 81% improvement in task success rate while substantially reducing constraint-violating actions.
HeRoN: A Mediated RL-LLM Framework for Adaptive NPC Behavior in Interactive Environments
Gaetano Cimino;Vincenzo Deufemia;Andrea Selice
In corso di stampa
Abstract
Player Characters (NPCs) play a central role in interactive environments, where adaptive and context-aware behavior is essential for engaging gameplay. Traditional rule-based and utility-driven approaches often lack adaptability and contextually coherent behaviors, while Reinforcement Learning (RL) and Large Language Models (LLMs) offer complementary strengths but exhibit distinct limitations: RL suffers from training inefficiency and limited generalization, whereas LLMs are prone to hallucinations and context drift. This paper introduces HeRoN, a mediated framework that integrates RL and LLMs through functional separation and critique-based refinement to enable coherent and strategically adaptive NPC behavior. The architecture comprises an RL-controlled NPC policy for action execution, an LLM-based strategy generator providing context-aware action proposals, and a lightweight reviewer that refines these proposals to enforce consistency with environment constraints. Through experiments in two structurally distinct custom game environments, we show that early LLM-mediated guidance improves exploration efficiency and generalization. Compared to standard RL baselines, HeRoN achieves up to an 81% improvement in task success rate while substantially reducing constraint-violating actions.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


