Formal Specification and Testing for Reinforcement Learning
In this paper we propose specifications for Reinforcement Learning (RL) problems and a well-established class of RL algorithms namely Temporal Difference (TD) methods. The obtained specification is used for testing implementations of RL algorithms and problems. We provide a test harness including property-based tests for the set of case studies in the paper which can be extended with properties for other RL problems and algorithms.
This artifact includes the source code used in our experiments to answer the following Research Questions (RQs):
(1) Is the specification general enough to accommodate diverse reinforcement learning problems? As evidence for the answer to this question, the reinforcement learning framework symsim is included, containing a set of RL case studies in Scala3 and the test harness including property-based tests in Quickcheck style;
(2) How effective is the test harness in finding bugs in reinforcement learning problems? To answer this question, we design experiments using mutation testing (relying on Stryker4s) to evaluate the effectiveness of the test harness. The artifact includes the required infrastructure to run and collect the results of the experiments;
(3) To what extent can generic problem properties be used to reduce the cost of testing for reinforcement learning problems? To answer this question, we design and run experiments using mutation testing that involve only the generic tests in the test harness. The included readme file in the artifact provides detailed information on how to run the experiments for each RQ. A VM image is also included in order to facilitate reproducing the results of the experiments.