论文标题
学习算法的风险偏好
Risk Preferences of Learning Algorithms
论文作者
论文摘要
代理商从反馈中学习会塑造经济成果,而当今许多经济决策者采用学习算法来做出结果选择。本说明表明,一种广泛使用的学习算法,$ \ varepsilon $ - 果岭,展示出紧急风险规避:它更喜欢差异较低的动作。当出现相同期望的动作时,在各种条件下,$ \ varepsilon $ - 果岭会选择较低差异的动作,概率接近一个。这种紧急的偏好可能会带来广泛的后果,从对公平性到同质化的担忧,即使风险较高的行动的预期收益更高,也可以暂时保持。我们讨论了纠正这种偏见的两种方法。第一种方法要求该算法重新加权数据作为选择该动作的可能性。第二个要求该算法对没有收集太多数据的动作进行乐观的估计。我们表明,通过这些校正,风险中的性质恢复了。
Agents' learning from feedback shapes economic outcomes, and many economic decision-makers today employ learning algorithms to make consequential choices. This note shows that a widely used learning algorithm, $\varepsilon$-Greedy, exhibits emergent risk aversion: it prefers actions with lower variance. When presented with actions of the same expectation, under a wide range of conditions, $\varepsilon$-Greedy chooses the lower-variance action with probability approaching one. This emergent preference can have wide-ranging consequences, ranging from concerns about fairness to homogenization, and holds transiently even when the riskier action has a strictly higher expected payoff. We discuss two methods to correct this bias. The first method requires the algorithm to reweight data as a function of how likely the actions were to be chosen. The second requires the algorithm to have optimistic estimates of actions for which it has not collected much data. We show that risk-neutrality is restored with these corrections.