article

Free access

Mitigating Value Hallucination in Dyna-Style Planning via Multistep Predecessor Models

Authors:

Martha WhiteAuthors Info & Claims

Journal of Artificial Intelligence Research, Volume 80

https://doi.org/10.1613/jair.1.15155

Published: 09 June 2024 Publication History

PDF eReader

Abstract

Dyna-style reinforcement learning (RL) agents improve sample efficiency over model-free RL agents by updating the value function with simulated experience generated by an environment model. However, it is often difficult to learn accurate models of environment dynamics, and even small errors may result in failure of Dyna agents. In this paper, we highlight that one potential cause of that failure is bootstrapping off of the values of simulated states, and introduce a new Dyna algorithm to avoid this failure. We discuss a design space of Dyna algorithms, based on using successor or predecessor models---simulating forwards or backwards---and using one-step or multi-step updates. Three of the variants have been explored, but surprisingly the fourth variant has not: using predecessor models with multi-step updates. We present the \emph{Hallucinated Value Hypothesis} (HVH): updating the values of real states towards values of simulated states can result in misleading action values which adversely affect the control policy. We discuss and evaluate all four variants of Dyna amongst which three update real states toward simulated states --- so potentially toward hallucinated values --- and our proposed approach, which does not. The experimental results provide evidence for the HVH, and suggest that using predecessor models with multi-step updates is a fruitful direction toward developing Dyna algorithms that are more robust to model error.

Index Terms

Mitigating Value Hallucination in Dyna-Style Planning via Multistep Predecessor Models
1. Theory of computation
  1. Design and analysis of algorithms
  2. Theory and algorithms for application domains
    1. Machine learning theory

Index terms have been assigned to the content through auto-classification.

Recommendations

Multi-step linear Dyna-style planning
NIPS'09: Proceedings of the 22nd International Conference on Neural Information Processing Systems

In this paper we introduce a multi-step linear Dyna-style planning algorithm. The key element of the multi-step linear Dyna is a multi-step linear model that enables multi-step projection of a sampled feature and multi-step planning based on the ...
Dyna-style planning with linear function approximation and prioritized sweeping
UAI'08: Proceedings of the Twenty-Fourth Conference on Uncertainty in Artificial Intelligence

We consider the problem of efficiently learning optimal control policies and value functions over large state spaces in an online setting in which estimates must be available after each interaction with the world. This paper develops an explicitly model-...
Dyna-MLAC: Trading Computational and Sample Complexities in Actor-Critic Reinforcement Learning
BRACIS '15: Proceedings of the 2015 Brazilian Conference on Intelligent Systems (BRACIS)

Sampling and computation budgets are two of the key elements that determine the performance of a reinforcement learning algorithm. In essence, any reinforcement learning agent must sample the environment and perform some computation over the samples to ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Journal of Artificial Intelligence Research

Journal of Artificial Intelligence Research Volume 80, Issue

Sep 2024

1696 pages

Issue’s Table of Contents

Publisher

AI Access Foundation

El Segundo, CA, United States

Publication History

Published: 09 June 2024

Published in JAIR Volume 80

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
60
Total Downloads

Downloads (Last 12 months)60
Downloads (Last 6 weeks)24

Reflects downloads up to 24 Oct 2024

Other Metrics

View Author Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Abstract

Index Terms

Recommendations

Multi-step linear Dyna-style planning

Dyna-style planning with linear function approximation and prioritized sweeping

Dyna-MLAC: Trading Computational and Sample Complexities in Actor-Critic Reinforcement Learning

Comments

Information

Published In

Publisher

Publication History

Qualifiers

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

View options

PDF

eReader

Get Access

Login options

Full Access

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations