论文标题
批处理价值函数近似仅可实现
Batch Value-function Approximation with Only Realizability
论文作者
论文摘要
我们使用可实现的和其他任意的功能类别从探索性和多项式数据集中从探索性和多项式数据集中学习$ q^\ star $中的长期存在的问题(RL)。实际上,所有现有的算法要求功能 - 抗氧化假设比可变性更强,并且越来越多的负面证据导致一个猜想,即在这种情况下不可能进行样本效率学习(Chen and Jiang,2019年)。我们的算法(BVFT)通过锦标赛程序打破了硬度猜想(尽管在更强的探索性数据概念下)打破了,从而减少了学习问题以进行成对比较,并借助于与比较功能构建的状态行动分区解决了后者。我们还讨论了如何将BVFT应用于其他扩展和开放问题之间的模型选择。
We make progress in a long-standing problem of batch reinforcement learning (RL): learning $Q^\star$ from an exploratory and polynomial-sized dataset, using a realizable and otherwise arbitrary function class. In fact, all existing algorithms demand function-approximation assumptions stronger than realizability, and the mounting negative evidence has led to a conjecture that sample-efficient learning is impossible in this setting (Chen and Jiang, 2019). Our algorithm, BVFT, breaks the hardness conjecture (albeit under a stronger notion of exploratory data) via a tournament procedure that reduces the learning problem to pairwise comparison, and solves the latter with the help of a state-action partition constructed from the compared functions. We also discuss how BVFT can be applied to model selection among other extensions and open problems.