通过反事实推理和主动学习有效分类

论文标题

通过反事实推理和主动学习有效分类

Efficient Classification with Counterfactual Reasoning and Active Learning

论文作者

Mohammed, Azhar, Nguyen, Dang, Duong, Bao, Nguyen, Thin

论文摘要

数据增强是提高计算机视觉中机器学习模型的分类准确性的最成功的技术之一。但是，将数据增强应用于表格数据是一个具有挑战性的问题，因为很难用标签生成合成样本。在本文中，我们提出了一种有效的分类器，该分类器采用用于表格数据的新型数据增强技术。我们称为CCRAL的方法结合了因果推理，以学习原始培训样本的反事实样本，并积极学习以基于不确定性区域选择有用的反事实样本。通过这样做，我们的方法可以最大化模型对看不见的测试数据的概括。我们通过分析验证我们的方法，并与标准基准进行比较。我们的实验结果表明，就精确性和AUC而言，CCRAL的性能要比几个现实世界中的基准数据集的性能要好得多。数据和源代码可在以下网址获得：https：//github.com/nphdang/ccral。

Data augmentation is one of the most successful techniques to improve the classification accuracy of machine learning models in computer vision. However, applying data augmentation to tabular data is a challenging problem since it is hard to generate synthetic samples with labels. In this paper, we propose an efficient classifier with a novel data augmentation technique for tabular data. Our method called CCRAL combines causal reasoning to learn counterfactual samples for the original training samples and active learning to select useful counterfactual samples based on a region of uncertainty. By doing this, our method can maximize our model's generalization on the unseen testing data. We validate our method analytically, and compare with the standard baselines. Our experimental results highlight that CCRAL achieves significantly better performance than those of the baselines across several real-world tabular datasets in terms of accuracy and AUC. Data and source code are available at: https://github.com/nphdang/CCRAL.

下载PDF全文

下载文献需遵守相关版权规定

论文标题