tempnet：在视频中发现动物行为的时间关注

论文标题

tempnet：在视频中发现动物行为的时间关注

TempNet: Temporal Attention Towards the Detection of Animal Behaviour in Videos

论文作者

McIntosh, Declan, Marques, Tunai Porto, Albu, Alexandra Branzan, Rountree, Rodney, De Leo, Fabio

论文摘要

有线海洋观测值的最新进展提高了水下视频的质量和流行率。该数据可以提取高级生物学相关信息，例如物种的行为。尽管能力增加了，但大多数用于自动解释水下视频的现代方法仅着眼于检测和计数生物。我们提出了一种有效的计算机视觉和深度学习方法，用于检测视频中的生物行为。 tempnet使用编码器桥和残差块来维持模型性能，并使用两个阶段的空间，随后的编码器来维持模型性能。 Tempnet在空间编码过程中还会引起时间的关注，以及小波下采样预处理，以提高模型的准确性。尽管我们的系统设计用于用于多种鱼类行为的应用（即，是通用的），但我们证明了它在检测黑貂鱼（鼻烟草）惊吓事件中的应用。我们将所提出的方法与最先进的端到端视频检测方法（remotenet）和以前仅用于检测现有数据集中视频中黑貂的惊吓事件的混合方法进行了比较。结果表明，我们的新方法舒适地胜过多个指标的比较基线，分别达到80％和0.81的每张剪辑精度和精度。这表示使用该数据集的相对精度相对提高了31％，精度为27％。我们的计算管道也非常有效，因为它只能在38ms中处理每个4秒视频剪辑。此外，由于它不采用特定于SableFish震惊事件的功能，因此我们的系统可以轻松扩展到未来作品中的其他行为。

Recent advancements in cabled ocean observatories have increased the quality and prevalence of underwater videos; this data enables the extraction of high-level biologically relevant information such as species' behaviours. Despite this increase in capability, most modern methods for the automatic interpretation of underwater videos focus only on the detection and counting organisms. We propose an efficient computer vision- and deep learning-based method for the detection of biological behaviours in videos. TempNet uses an encoder bridge and residual blocks to maintain model performance with a two-staged, spatial, then temporal, encoder. TempNet also presents temporal attention during spatial encoding as well as Wavelet Down-Sampling pre-processing to improve model accuracy. Although our system is designed for applications to diverse fish behaviours (i.e, is generic), we demonstrate its application to the detection of sablefish (Anoplopoma fimbria) startle events. We compare the proposed approach with a state-of-the-art end-to-end video detection method (ReMotENet) and a hybrid method previously offered exclusively for the detection of sablefish's startle events in videos from an existing dataset. Results show that our novel method comfortably outperforms the comparison baselines in multiple metrics, reaching a per-clip accuracy and precision of 80% and 0.81, respectively. This represents a relative improvement of 31% in accuracy and 27% in precision over the compared methods using this dataset. Our computational pipeline is also highly efficient, as it can process each 4-second video clip in only 38ms. Furthermore, since it does not employ features specific to sablefish startle events, our system can be easily extended to other behaviours in future works.

下载PDF全文

下载文献需遵守相关版权规定

论文标题