Obtaining accurate estimated action values in categorical distributional reinforcement learning
作者: Yingnan ZhaoPeng LiuChenjia BaiWei ZhaoXianglong Tang
作者单位: 1Harbin Institute of Technology, Harbin 150001, China
刊名: Knowledge-Based Systems, 2020, Vol.194
来源数据库: Elsevier Journal
DOI: 10.1016/j.knosys.2020.105511
关键词: Distributional reinforcement learningEstimated action valueBootstrappingInterval estimation
原始语种摘要: Abstract(#br)Categorical Distributional Reinforcement Learning (CDRL) uses a categorical distribution with evenly spaced outcomes to model the entire distribution of returns and produces state-of-the-art empirical performance. However, using inappropriate bounds with CDRL may generate inaccurate estimated action values, which affect the policy update step and the final performance. In CDRL, the bounds of the distribution indicate the range of the action values that the agent can obtain in one task, without considering the policy’s performance and state–action pairs. The action values that the agent obtains are often far from the bounds, and this reduces the accuracy of the estimated action values. This paper describes a method of obtaining more accurate estimated action values for CDRL...
全文获取路径: Elsevier  (合作)
影响因子:4.104 (2012)

  • action 行为
  • learning 学识
  • bounds 界限
  • estimated 估计的
  • values 价值观
  • accurate 精确的
  • returns 返回粉末
  • empirical 经验的
  • weights 法码
  • policy 政策