论文标题
超越反向流量的硬件:直接反馈对齐的光子处理器
Hardware Beyond Backpropagation: a Photonic Co-Processor for Direct Feedback Alignment
论文作者
论文摘要
缩放假设促使模型的扩展超过了数万亿个参数,这是通往更好性能的途径。最近的猜想驱动了最近的重大发展,例如GPT-3。但是,随着模型的扩展,很难通过反向传播进行有效的培训。因为模型,管道和数据并行性在计算节点上分布了参数和梯度,所以交流的沟通挑战:这是进一步扩展的瓶颈。在这项工作中,我们认为替代培训方法可以减轻这些问题,并可以为极端训练硬件的设计提供信息。实际上,使用具有可行的向后通行证的突触不对称方法,例如直接反馈对准,通信需求大大减少。我们提出了一个用于直接反馈对齐的光子加速器,能够用数万亿个参数计算随机投影。我们使用完全连接和图形卷积网络在基准任务上演示了我们的系统。我们的硬件是第一个用于培训神经网络的建筑 - 敏捷的光子处理器。这是迈向构建可扩展硬件,能够超越反向传播的重要一步,并为深度学习开辟了新的途径。
The scaling hypothesis motivates the expansion of models past trillions of parameters as a path towards better performance. Recent significant developments, such as GPT-3, have been driven by this conjecture. However, as models scale-up, training them efficiently with backpropagation becomes difficult. Because model, pipeline, and data parallelism distribute parameters and gradients over compute nodes, communication is challenging to orchestrate: this is a bottleneck to further scaling. In this work, we argue that alternative training methods can mitigate these issues, and can inform the design of extreme-scale training hardware. Indeed, using a synaptically asymmetric method with a parallelizable backward pass, such as Direct Feedback Alignement, communication needs are drastically reduced. We present a photonic accelerator for Direct Feedback Alignment, able to compute random projections with trillions of parameters. We demonstrate our system on benchmark tasks, using both fully-connected and graph convolutional networks. Our hardware is the first architecture-agnostic photonic co-processor for training neural networks. This is a significant step towards building scalable hardware, able to go beyond backpropagation, and opening new avenues for deep learning.