详细信息：一种自动检测和分析语言漂移的工具

论文标题

详细信息：一种自动检测和分析语言漂移的工具

DetAIL : A Tool to Automatically Detect and Analyze Drift In Language

论文作者

Madaan, Nishtha, Manjunatha, Adithya, Nambiar, Hrithik, Goel, Aviral Kumar, Kumar, Harivansh, Saha, Diptikalyan, Bedathur, Srikanta

论文摘要

机器学习和基于深度学习的决策已成为当今软件的一部分。这项工作的目的是确保机器学习和基于深度学习的系统与传统软件一样受信任。传统软件可通过遵循严格的练习（例如静态分析，测试，调试，验证和修复）在整个开发和维护生命周期中进行可靠。同样，对于机器学习系统，我们需要保持这些模型的最新状态，以免损害它们的性能。为此，当前的系统依赖于这些模型的计划重新训练，因为在这项工作中，我们建议在新数据启动时衡量发生的数据漂移，以便在实际需要重新培训的情况下可以适应性地重新培训模型，而不论时间表如何。除此之外，我们还在句子级别和数据集级别生成各种解释，以捕获为什么给定有效载荷文本漂移的原因。

Machine learning and deep learning-based decision making has become part of today's software. The goal of this work is to ensure that machine learning and deep learning-based systems are as trusted as traditional software. Traditional software is made dependable by following rigorous practice like static analysis, testing, debugging, verifying, and repairing throughout the development and maintenance life-cycle. Similarly for machine learning systems, we need to keep these models up to date so that their performance is not compromised. For this, current systems rely on scheduled re-training of these models as new data kicks in. In this work, we propose to measure the data drift that takes place when new data kicks in so that one can adaptively re-train the models whenever re-training is actually required irrespective of schedules. In addition to that, we generate various explanations at sentence level and dataset level to capture why a given payload text has drifted.

下载PDF全文

下载文献需遵守相关版权规定

论文标题