话语解析有争议的，非引人入胜的在线讨论

论文标题

话语解析有争议的，非引人入胜的在线讨论

Discourse Parsing of Contentious, Non-Convergent Online Discussions

论文作者

Zakharov, Stepan, Hadar, Omri, Hakak, Tovit, Grossman, Dina, Kolikant, Yifat Ben-David, Tsur, Oren

论文摘要

在线话语通常被认为是两极分化的和非生产性的。尽管可以使用一些对话性话语解析框架，但它们自然而然地支持了有争议和两极分化的讨论。受巴赫蒂尼亚对话主义理论的启发，我们提出了一个新颖的理论和计算框架，更适合非赋予讨论。我们重新定义了成功讨论的衡量标准，并开发了一种新颖的话语注释模式，反映了话语策略的层次结构。我们考虑了一系列分类模型 - 从逻辑回归到伯特。我们还考虑了各种特征类型和表示，例如LIWC类别，标准嵌入式，对话序列以及分别学到的非转换话语标记。鉴于标签集中的31个标签，如果我们允许每个标签的模型不同，则达到0.61的平均f评分，而单个模型则达到0.526。根据所提出的模式在注释讨论中获得的有希望的结果为许多下游任务和应用程序铺平了道路，例如早期检测讨论轨迹，主动调节公开讨论以及教师辅助机器人。最后，我们分享了第一个被标记为有争议的非争论在线讨论的数据集。

Online discourse is often perceived as polarized and unproductive. While some conversational discourse parsing frameworks are available, they do not naturally lend themselves to the analysis of contentious and polarizing discussions. Inspired by the Bakhtinian theory of Dialogism, we propose a novel theoretical and computational framework, better suited for non-convergent discussions. We redefine the measure of a successful discussion, and develop a novel discourse annotation schema which reflects a hierarchy of discursive strategies. We consider an array of classification models -- from Logistic Regression to BERT. We also consider various feature types and representations, e.g., LIWC categories, standard embeddings, conversational sequences, and non-conversational discourse markers learnt separately. Given the 31 labels in the tagset, an average F-Score of 0.61 is achieved if we allow a different model for each tag, and 0.526 with a single model. The promising results achieved in annotating discussions according to the proposed schema paves the way for a number of downstream tasks and applications such as early detection of discussion trajectories, active moderation of open discussions, and teacher-assistive bots. Finally, we share the first labeled dataset of contentious non-convergent online discussions.

下载PDF全文

下载文献需遵守相关版权规定

论文标题