Proximal Reward Shaping with Action Masking to create a Pacifist NetHack Agent

Thumbnail Image

Date

Journal Title

Journal ISSN

Volume Title

Publisher

Saudi Digital Library

Abstract

Reward shaping is a classic and effective technique in reinforcement learning that uses domain knowledge to guide agents to a solution. This project implements a proximal variation of reward shaping that rewards and penalizes the agent for being in proximity to certain entities. It also experiments with 3 versions of action masking, which is a technique that prevents the agent from performing sets of actions. We perform 10 experiments, outputting various training plots, testing results, and 10 testing videos to qualitatively and quantitatively assess an agent. This paper presents the results of these experiments using these two methods to create a pacifist agent in the game NetHack using NetHack Learning Environment (NLE). Given the complexity and depth of NetHack and the difficultly of maintaining a pacifist agent, this project could not create such an agent using the mentioned methods. Despite lackluster results, several agents were created, trained, and analyzed; Results, even if disappointing, are still valuable for future research.

Description

Keywords

Citation

Endorsement

Review

Supplemented By

Referenced By

Copyright owned by the Saudi Digital Library (SDL) © 2025