在Twitter上用于在线侵略检测的流媒体机学习框架

论文标题

在Twitter上用于在线侵略检测的流媒体机学习框架

A Streaming Machine Learning Framework for Online Aggression Detection on Twitter

论文作者

Herodotou, Herodotos, Chatzakou, Despoina, Kourtellis, Nicolas

论文摘要

在线侵略在社交媒体上的兴起正在发展成为一个关注的主要点。最近提出了几种机器和深度学习方法，用于检测各种类型的攻击行为。但是，社交媒体的节奏很快，产生了越来越多的内容，而侵略性行为会随着时间的流逝而发展。在这项工作中，我们介绍了第一个实用的实时框架，用于通过拥抱流机学习范式在Twitter上检测侵略性。我们的方法以增量的方式调整了其ML分类器，因为它获得了新的注释示例，并且能够获得与基于批处理的ML模型相同（甚至更高）的性能，具有超过90％的精度，精度和召回率。同时，我们对实际Twitter数据的实验分析揭示了我们的框架如何轻松扩展以适应整个Twitter Firehose（每天7.78亿条推文），只有3台商品机器。最后，我们表明我们的框架足以实时检测其他相关行为，例如讽刺，种族主义和性别歧视。

The rise of online aggression on social media is evolving into a major point of concern. Several machine and deep learning approaches have been proposed recently for detecting various types of aggressive behavior. However, social media are fast paced, generating an increasing amount of content, while aggressive behavior evolves over time. In this work, we introduce the first, practical, real-time framework for detecting aggression on Twitter via embracing the streaming machine learning paradigm. Our method adapts its ML classifiers in an incremental fashion as it receives new annotated examples and is able to achieve the same (or even higher) performance as batch-based ML models, with over 90% accuracy, precision, and recall. At the same time, our experimental analysis on real Twitter data reveals how our framework can easily scale to accommodate the entire Twitter Firehose (of 778 million tweets per day) with only 3 commodity machines. Finally, we show that our framework is general enough to detect other related behaviors such as sarcasm, racism, and sexism in real time.

下载PDF全文

下载文献需遵守相关版权规定

论文标题