因果土匪的组合纯粹探索

论文标题

因果土匪的组合纯粹探索

Combinatorial Pure Exploration of Causal Bandits

论文作者

Xiong, Nuoya, Chen, Wei

论文摘要

对因果匪的组合纯探索是以下在线学习任务：鉴于因果关系未知的因果图表，在每一轮中，我们选择一组变量来进行干预或不进行干预，并观察所有随机变量的随机结果，并以最佳的方式（几乎可以将$）输出$的概率（几乎可以）输出$，我们几乎可以将$的干预付出，$）$），几乎可以将$）输出$） $ 1-δ$，其中$δ$是给定的置信度水平。我们在两种类型的因果模型上提供了第一个依赖GAP的纯粹纯探索算法 - 二进制通用线性模型（BGLM）和一般图。对于BGLM，我们的算法是第一个专门为此设置设计的并实现多项式样本复杂性，而一般图的所有现有算法的样本复杂性要么具有指向图形大小的样本复杂性指数，要么具有一些不合理的假设。对于一般图，我们的算法对样品复杂性提供了显着改善，并且几乎与我们证明的下限相匹配。我们的算法通过先前的因果匪算算法和先前的自适应纯探索算法的新型整合来实现这种改进，而前者则利用了因果匪徒中丰富的观察性反馈，但并非自适应奖励差距，而后者的问题则相反。

The combinatorial pure exploration of causal bandits is the following online learning task: given a causal graph with unknown causal inference distributions, in each round we choose a subset of variables to intervene or do no intervention, and observe the random outcomes of all random variables, with the goal that using as few rounds as possible, we can output an intervention that gives the best (or almost best) expected outcome on the reward variable $Y$ with probability at least $1-δ$, where $δ$ is a given confidence level. We provide the first gap-dependent and fully adaptive pure exploration algorithms on two types of causal models -- the binary generalized linear model (BGLM) and general graphs. For BGLM, our algorithm is the first to be designed specifically for this setting and achieves polynomial sample complexity, while all existing algorithms for general graphs have either sample complexity exponential to the graph size or some unreasonable assumptions. For general graphs, our algorithm provides a significant improvement on sample complexity, and it nearly matches the lower bound we prove. Our algorithms achieve such improvement by a novel integration of prior causal bandit algorithms and prior adaptive pure exploration algorithms, the former of which utilize the rich observational feedback in causal bandits but are not adaptive to reward gaps, while the latter of which have the issue in reverse.

下载PDF全文

下载文献需遵守相关版权规定

论文标题