路径独立平衡模型可以更好地利用测试时间计算

论文标题

路径独立平衡模型可以更好地利用测试时间计算

Path Independent Equilibrium Models Can Better Exploit Test-Time Computation

论文作者

Anil, Cem, Pokle, Ashwini, Liang, Kaiqu, Treutlein, Johannes, Wu, Yuhuai, Bai, Shaojie, Kolter, Zico, Grosse, Roger

论文摘要

设计能够通过提高推理预算获得更好性能的网络对于促进对更难的问题实例的概括很重要。最近的努力通过利用深度的经常性网络来表现出在这个方向上有希望的结果。我们表明，一类名为平衡模型的广泛体系结构表现出强烈的向上概括，并发现在更艰难的示例上的性能（需要更多的推理才能正确正确）与系统的路径独立性密切相关 - 其趋势会收敛到相同的稳态行为，而不管初始化的计算足够足够的计算。促进路径独立性的实验干预措施会改善对较难的问题实例的概括，而那些惩罚它的人会降低这种能力。路径独立性分析在每个示例中也很有用：对于具有良好分布性能的平衡模型，分布式样品的路径独立性与准确性密切相关。我们的结果有助于解释为什么平衡模型能够强大的向上概括，并激发未来的工作，从而利用路径独立性作为一般建模原理，以促进可扩展的测试时间使用。

Designing networks capable of attaining better performance with an increased inference budget is important to facilitate generalization to harder problem instances. Recent efforts have shown promising results in this direction by making use of depth-wise recurrent networks. We show that a broad class of architectures named equilibrium models display strong upwards generalization, and find that stronger performance on harder examples (which require more iterations of inference to get correct) strongly correlates with the path independence of the system -- its tendency to converge to the same steady-state behaviour regardless of initialization, given enough computation. Experimental interventions made to promote path independence result in improved generalization on harder problem instances, while those that penalize it degrade this ability. Path independence analyses are also useful on a per-example basis: for equilibrium models that have good in-distribution performance, path independence on out-of-distribution samples strongly correlates with accuracy. Our results help explain why equilibrium models are capable of strong upwards generalization and motivates future work that harnesses path independence as a general modelling principle to facilitate scalable test-time usage.

下载PDF全文

下载文献需遵守相关版权规定

论文标题