The Courage to Stop: Overcoming Sunk Cost Fallacy in Deep Reinforcement Learning

Name: The Courage to Stop: Overcoming Sunk Cost Fallacy in Deep Reinforcement Learning
Author: Jiashun Liu, Johan Obando-Ceron, Pablo Samuel Castro, Aaron Courville, Ling Pan

Jiashun Liu, Johan Obando-Ceron, Pablo Samuel Castro, Aaron Courville, Ling Pan

paper2025-06-16English

Start Reading

deep learning portfolioarxiv

Description

Off-policy deep reinforcement learning (RL) typically leverages replay buffers for reusing past experiences during learning. This can help improve sample efficiency when the collected data is informative and aligned with the learning objectives; when that is not the case, it can have the effect of "polluting" the replay buffer with data which can exacerbate optimization challenges in addition to wasting environment interactions due to wasteful sampling. We argue that sampling these uninformative...