论文标题
完全张力的复发性神经网络
A Fully Tensorized Recurrent Neural Network
论文作者
论文摘要
复发性神经网络(RNN)是用于顺序建模的强大工具,但通常需要明显的过度参数化和正则化以实现最佳性能。这导致在资源有限的设置中部署大型RNN的困难,同时还引入了超参数选择和培训的并发症。为了解决这些问题,我们引入了一种“完全张开”的RNN架构,该体系结构使用轻质张量训练(TT)分解共同编码每个复发单元内的单独重量矩阵。这种方法代表了一种新型的重量共享形式,可将模型大小降低几个数量级,同时与标准RNN相比仍保持相似或更好的性能。关于图像分类和说话者验证任务的实验证明了减少推理时间和稳定模型训练和超参数选择的进一步好处。
Recurrent neural networks (RNNs) are powerful tools for sequential modeling, but typically require significant overparameterization and regularization to achieve optimal performance. This leads to difficulties in the deployment of large RNNs in resource-limited settings, while also introducing complications in hyperparameter selection and training. To address these issues, we introduce a "fully tensorized" RNN architecture which jointly encodes the separate weight matrices within each recurrent cell using a lightweight tensor-train (TT) factorization. This approach represents a novel form of weight sharing which reduces model size by several orders of magnitude, while still maintaining similar or better performance compared to standard RNNs. Experiments on image classification and speaker verification tasks demonstrate further benefits for reducing inference times and stabilizing model training and hyperparameter selection.