论文标题

ClioQuery:互动式查询的文本分析,用于全面研究历史新闻档案

ClioQuery: Interactive Query-Oriented Text Analytics for Comprehensive Investigation of Historical News Archives

论文作者

Handler, Abram, Mahyar, Narges, O'Connor, Brendan

论文摘要

历史学家和档案管理员经常在报纸档案中找到并分析查询单词的发生,以帮助回答有关社会的基本问题。但是,文本分析中的许多工作重点是帮助人们研究其他文本单元,例如事件,群集,排名文档,实体关系或主题层次结构。在对历史学家和档案管理员需求的研究中,我们提出了Clioquery,这是一个围绕上下文中查询单词的分析的文本分析系统。 Clioquery应用了自然语言处理中的文本简化技术,以帮助历史学家快速,全面地收集和分析所有查询单词在档案中的所有事件。它还将这些新的NLP方法与更传统的功能搭配,例如链接的视图和文本强调,以帮助对摘要技术产生信任。我们通过两项独立的用户研究评估Clioquery,其中历史学家解释了Clioquery的新颖文本简化功能如何有助于促进历史研究。我们还通过单独的定量比较研究进行了评估,该研究表明Clioquery可以帮助人群工作人员查找和记住历史信息。这样的结果表明,在其他面向查询的设置中,可能是文本分析的新方向。

Historians and archivists often find and analyze the occurrences of query words in newspaper archives, to help answer fundamental questions about society. But much work in text analytics focuses on helping people investigate other textual units, such as events, clusters, ranked documents, entity relationships, or thematic hierarchies. Informed by a study into the needs of historians and archivists, we thus propose ClioQuery, a text analytics system uniquely organized around the analysis of query words in context. ClioQuery applies text simplification techniques from natural language processing to help historians quickly and comprehensively gather and analyze all occurrences of a query word across an archive. It also pairs these new NLP methods with more traditional features like linked views and in-text highlighting to help engender trust in summarization techniques. We evaluate ClioQuery with two separate user studies, in which historians explain how ClioQuery's novel text simplification features can help facilitate historical research. We also evaluate with a separate quantitative comparison study, which shows that ClioQuery helps crowdworkers find and remember historical information. Such results suggest possible new directions for text analytics in other query-oriented settings.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源