论文标题
在监视视频中的有效异常检测的外观和运动学习脱钩和运动学习
Decoupled Appearance and Motion Learning for Efficient Anomaly Detection in Surveillance Video
论文作者
论文摘要
当城市环境或工业站点受大量摄像机监视时,对监视视频录像的分析自动化非常感兴趣。由于异常通常是特定于上下文的,因此很难预先定义感兴趣的事件并收集标记的培训数据。纯粹无监督的自动化异常检测方法更为合适。对于每个相机,可以部署单独的算法,随着时间的流逝,可以学习相机视口内物体的外观和运动相关特征的基线模型。与该基线偏离的任何偏离的事物都被标记为用于下游进一步分析的异常。我们提出了一种新的神经网络体系结构,以纯粹无监督的方式学习正常行为。与以前的工作相反,我们使用潜在代码预测作为我们的异常指标。我们表明,这在不同基准数据集上优于基于重建和基于框架预测的方法,无论是在不断变化的照明和天气条件下的准确性和鲁棒性方面。通过将外观和运动模型解耦,我们的模型还可以每秒处理16至45倍的帧,这使我们的模型适合在相机本身或其他边缘设备上部署。
Automating the analysis of surveillance video footage is of great interest when urban environments or industrial sites are monitored by a large number of cameras. As anomalies are often context-specific, it is hard to predefine events of interest and collect labelled training data. A purely unsupervised approach for automated anomaly detection is much more suitable. For every camera, a separate algorithm could then be deployed that learns over time a baseline model of appearance and motion related features of the objects within the camera viewport. Anything that deviates from this baseline is flagged as an anomaly for further analysis downstream. We propose a new neural network architecture that learns the normal behavior in a purely unsupervised fashion. In contrast to previous work, we use latent code predictions as our anomaly metric. We show that this outperforms reconstruction-based and frame prediction-based methods on different benchmark datasets both in terms of accuracy and robustness against changing lighting and weather conditions. By decoupling an appearance and a motion model, our model can also process 16 to 45 times more frames per second than related approaches which makes our model suitable for deploying on the camera itself or on other edge devices.