论文标题

Twitter知道您的政治观点吗?政治网络数据集和政治倾斜发现的半自动方法

Does Twitter know your political views? POLiTweets dataset and semi-automatic method for political leaning discovery

论文作者

Baran, Joanna, Kajstura, Michał, Ziółkowski, Maciej, Rajda, Krzysztof

论文摘要

每天,世界都在Twitter或Facebook上发布的数百万个消息和声明充斥着世界。社交媒体平台试图保护用户的个人数据,但仍然存在滥用的真正风险,包括选举操纵。您知道吗,只有13个帖子涉及社会重要或有争议的主题足以预测一个人的政治隶属关系,以0.85 F1得分?为了研究这种现象,我们创造了一种新颖的半自动化政治倾向发现的普遍方法。它依赖于启发式数据注释程序,该程序被评估为与人类注释者达成0.95一致(被视为准确度量)。我们还提出了政治活动 - 在多方设置中,第一个公开开放的波兰数据集,用于政治隶属关系发现,由近10K波兰语撰写的用户带来的147k推文组成,并通过注释启发式的启发式用户,并从166个用户注释的166个用户作为测试设置中。我们使用数据来研究主题和内容作家的类型 - 普通公民与专业政客的类型。

Every day, the world is flooded by millions of messages and statements posted on Twitter or Facebook. Social media platforms try to protect users' personal data, but there still is a real risk of misuse, including elections manipulation. Did you know, that only 13 posts addressing important or controversial topics for society are enough to predict one's political affiliation with a 0.85 F1-score? To examine this phenomenon, we created a novel universal method of semi-automated political leaning discovery. It relies on a heuristical data annotation procedure, which was evaluated to achieve 0.95 agreement with human annotators (counted as an accuracy metric). We also present POLiTweets - the first publicly open Polish dataset for political affiliation discovery in a multi-party setup, consisting of over 147k tweets from almost 10k Polish-writing users annotated heuristically and almost 40k tweets from 166 users annotated manually as a test set. We used our data to study the aspects of domain shift in the context of topics and the type of content writers - ordinary citizens vs. professional politicians.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源