Proximal Reward Shaping with Action Masking to create a Pacifist NetHack Agent

WADHA SAUD NASSER ALHAMDAN

Proximal Reward Shaping with Action Masking to create a Pacifist NetHack Agent

Authors

WADHA SAUD NASSER ALHAMDAN

Publisher

Saudi Digital Library

Abstract

Reward shaping is a classic and effective technique in reinforcement learning that uses domain knowledge to guide agents to a solution. This project implements a proximal variation of reward shaping that rewards and penalizes the agent for being in proximity to certain entities. It also experiments with 3 versions of action masking, which is a technique that prevents the agent from performing sets of actions. We perform 10 experiments, outputting various training plots, testing results, and 10 testing videos to qualitatively and quantitatively assess an agent. This paper presents the results of these experiments using these two methods to create a pacifist agent in the game NetHack using NetHack Learning Environment (NLE). Given the complexity and depth of NetHack and the difficultly of maintaining a pacifist agent, this project could not create such an agent using the mentioned methods. Despite lackluster results, several agents were created, trained, and analyzed; Results, even if disappointing, are still valuable for future research.

URI

https://drepo.sdl.edu.sa/handle/20.500.14154/66312

Collections

SACM - United Kingdom

Full item page

Proximal Reward Shaping with Action Masking to create a Pacifist NetHack Agent

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Citation

URI

Collections

Endorsement

Review

Supplemented By

Referenced By