论文标题
使用图案检测的对话模式挖掘
Conversational Pattern Mining using Motif Detection
论文作者
论文摘要
由于社交媒体和其他在线媒体的爆发,对话挖掘的主题最近引起了人们的极大兴趣。补充这种文本爆炸的是预先训练的语言模型的进步,这有助于我们利用这些信息来源。要分析的有趣领域是就复杂性和价值而言进行对话。复杂性由于以下事实,即对话可能是异步的,并且可能涉及多个政党。它在计算上也很密集。我们在工作中使用无监督的方法,以开发一种对话模式挖掘技术,该技术不需要耗时,苛刻的知识和资源密集型标签练习。在生物信息学领域中,对识别序列中重复模式的任务进行了很好的研究。在我们的工作中,我们将其调整到自然语言处理领域,并为主题检测算法进行多个扩展。为了证明该算法在动态,现实世界数据集中的应用;我们从开源膜脚本数据源中提取图案。我们对我们能够开采的图案的类型进行了探索性研究。
The subject of conversational mining has become of great interest recently due to the explosion of social and other online media. Supplementing this explosion of text is the advancement in pre-trained language models which have helped us to leverage these sources of information. An interesting domain to analyse is conversations in terms of complexity and value. Complexity arises due to the fact that a conversation can be asynchronous and can involve multiple parties. It is also computationally intensive to process. We use unsupervised methods in our work in order to develop a conversational pattern mining technique which does not require time consuming, knowledge demanding and resource intensive labelling exercises. The task of identifying repeating patterns in sequences is well researched in the Bioinformatics field. In our work, we adapt this to the field of Natural Language Processing and make several extensions to a motif detection algorithm. In order to demonstrate the application of the algorithm on a dynamic, real world data set; we extract motifs from an open-source film script data source. We run an exploratory investigation into the types of motifs we are able to mine.