Proximal Reward Shaping with Action Masking to create a Pacifist NetHack Agent
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Saudi Digital Library
Abstract
Reward shaping is a classic and effective technique in reinforcement learning that uses domain knowledge to guide agents to a solution. This project implements a proximal variation of reward shaping that rewards and penalizes the agent for being in proximity to certain entities. It also experiments with 3 versions of action masking, which is a technique that prevents the agent from performing sets of actions. We perform 10 experiments, outputting various training plots, testing results, and 10 testing videos to qualitatively and quantitatively assess an agent. This paper presents the results of these experiments using these two methods to create a pacifist agent in the game NetHack using NetHack Learning Environment (NLE). Given the complexity and depth of NetHack and the difficultly of maintaining a pacifist agent, this project could not create such an agent using the mentioned methods. Despite lackluster results, several agents were created, trained, and analyzed; Results, even if disappointing, are still valuable for future research.