控制计算与神经序列模型的质量

论文标题

控制计算与神经序列模型的质量

Controlling Computation versus Quality for Neural Sequence Models

论文作者

Bapna, Ankur, Arivazhagan, Naveen, Firat, Orhan

论文摘要

大多数神经网络在每个示例中都使用相同数量的计算，而与输入的固有复杂性无关。此外，将计算量调整到示例的方法集中于每个示例找到固定的推理时间计算图，忽略任何外部计算预算或不同的推理时间限制。在这项工作中，我们利用条件计算使推理过程中的神经序列模型（变压器）更有效，计算意识。我们首先修改变压器体系结构，使每组操作都可以根据学习的控制网络的输出有条件地执行。然后，我们在多任务设置中训练该模型，每个任务都对应于特定的计算预算。这使我们能够训练一个可以控制的单个模型，以根据推理时的可用计算预算在计算质量折衷曲线的不同点上操作。我们在两个任务上评估了我们的方法：（i）WMT英语 - 法语翻译和（ii）无监督的表示学习（BERT）。我们的实验表明，当允许使用其完整的计算预算时，提出的条件计算变压器（CCT）与香草变压器具有竞争力，同时在计算上的计算预算上时，在计算上等效的基准方面显着改善。

Most neural networks utilize the same amount of compute for every example independent of the inherent complexity of the input. Further, methods that adapt the amount of computation to the example focus on finding a fixed inference-time computational graph per example, ignoring any external computational budgets or varying inference time limitations. In this work, we utilize conditional computation to make neural sequence models (Transformer) more efficient and computation-aware during inference. We first modify the Transformer architecture, making each set of operations conditionally executable depending on the output of a learned control network. We then train this model in a multi-task setting, where each task corresponds to a particular computation budget. This allows us to train a single model that can be controlled to operate on different points of the computation-quality trade-off curve, depending on the available computation budget at inference time. We evaluate our approach on two tasks: (i) WMT English-French Translation and (ii) Unsupervised representation learning (BERT). Our experiments demonstrate that the proposed Conditional Computation Transformer (CCT) is competitive with vanilla Transformers when allowed to utilize its full computational budget, while improving significantly over computationally equivalent baselines when operating on smaller computational budgets.

下载PDF全文

下载文献需遵守相关版权规定

论文标题