超越反向流量的硬件：直接反馈对齐的光子处理器

论文标题

超越反向流量的硬件：直接反馈对齐的光子处理器

Hardware Beyond Backpropagation: a Photonic Co-Processor for Direct Feedback Alignment

论文作者

Launay, Julien, Poli, Iacopo, Müller, Kilian, Pariente, Gustave, Carron, Igor, Daudet, Laurent, Krzakala, Florent, Gigan, Sylvain

论文摘要

缩放假设促使模型的扩展超过了数万亿个参数，这是通往更好性能的途径。最近的猜想驱动了最近的重大发展，例如GPT-3。但是，随着模型的扩展，很难通过反向传播进行有效的培训。因为模型，管道和数据并行性在计算节点上分布了参数和梯度，所以交流的沟通挑战：这是进一步扩展的瓶颈。在这项工作中，我们认为替代培训方法可以减轻这些问题，并可以为极端训练硬件的设计提供信息。实际上，使用具有可行的向后通行证的突触不对称方法，例如直接反馈对准，通信需求大大减少。我们提出了一个用于直接反馈对齐的光子加速器，能够用数万亿个参数计算随机投影。我们使用完全连接和图形卷积网络在基准任务上演示了我们的系统。我们的硬件是第一个用于培训神经网络的建筑 - 敏捷的光子处理器。这是迈向构建可扩展硬件，能够超越反向传播的重要一步，并为深度学习开辟了新的途径。

The scaling hypothesis motivates the expansion of models past trillions of parameters as a path towards better performance. Recent significant developments, such as GPT-3, have been driven by this conjecture. However, as models scale-up, training them efficiently with backpropagation becomes difficult. Because model, pipeline, and data parallelism distribute parameters and gradients over compute nodes, communication is challenging to orchestrate: this is a bottleneck to further scaling. In this work, we argue that alternative training methods can mitigate these issues, and can inform the design of extreme-scale training hardware. Indeed, using a synaptically asymmetric method with a parallelizable backward pass, such as Direct Feedback Alignement, communication needs are drastically reduced. We present a photonic accelerator for Direct Feedback Alignment, able to compute random projections with trillions of parameters. We demonstrate our system on benchmark tasks, using both fully-connected and graph convolutional networks. Our hardware is the first architecture-agnostic photonic co-processor for training neural networks. This is a significant step towards building scalable hardware, able to go beyond backpropagation, and opening new avenues for deep learning.

下载PDF全文

下载文献需遵守相关版权规定

论文标题