论文标题
连续深度复发神经微分方程
Continuous Depth Recurrent Neural Differential Equations
论文作者
论文摘要
复发性神经网络(RNN)在序列标记任务和序列数据中带来了许多进步。但是,当观测值不规则地采样时,观测值以不规则的时间间隔进行时,它们的有效性受到限制。为了解决这个问题,基于神经普通微分方程(节点)引入了RNN的连续时间变体。通过考虑观测值之间的时间间隔,他们使用隐藏状态随着时间的推移将隐藏状态的持续转换学习更好地表示数据。但是,由于使用离散转换和固定的离散数量(深度),因此它们的能力仍然受到限制,以产生输出观察。我们打算通过基于微分方程提出RNN来解决此限制,该方程模拟了深度和时间的连续转换,以预测序列中给定输入的输出。具体而言,我们提出了连续深度复发的神经微分方程(CDR-NDE),该方程通过在时间和深度维度中连续发展隐藏状态来概括RNN模型。 CDR-NDE考虑了这些维度的每个维度上的两个单独的微分方程,并在时间和深度方向上建模了演变。我们还提出了基于部分微分方程的CDR-NDE-HEAT模型,该模型将隐藏状态的计算视为随着时间的推移求解热方程。我们通过与现实世界序列标记问题和数据的最新RNN模型进行比较来证明所提出的模型的有效性。
Recurrent neural networks (RNNs) have brought a lot of advancements in sequence labeling tasks and sequence data. However, their effectiveness is limited when the observations in the sequence are irregularly sampled, where the observations arrive at irregular time intervals. To address this, continuous time variants of the RNNs were introduced based on neural ordinary differential equations (NODE). They learn a better representation of the data using the continuous transformation of hidden states over time, taking into account the time interval between the observations. However, they are still limited in their capability as they use the discrete transformations and a fixed discrete number of layers (depth) over an input in the sequence to produce the output observation. We intend to address this limitation by proposing RNNs based on differential equations which model continuous transformations over both depth and time to predict an output for a given input in the sequence. Specifically, we propose continuous depth recurrent neural differential equations (CDR-NDE) which generalizes RNN models by continuously evolving the hidden states in both the temporal and depth dimensions. CDR-NDE considers two separate differential equations over each of these dimensions and models the evolution in the temporal and depth directions alternatively. We also propose the CDR-NDE-heat model based on partial differential equations which treats the computation of hidden states as solving a heat equation over time. We demonstrate the effectiveness of the proposed models by comparing against the state-of-the-art RNN models on real world sequence labeling problems and data.