计算良好模型的规则列表的集合

论文标题

计算良好模型的规则列表的集合

Computing the Collection of Good Models for Rule Lists

论文作者

Mata, Kota, Kanamori, Kentaro, Arimura, Hiroki

论文摘要

自2001年布雷曼（Breiman）开创性论文以来，他指出了从可解释的AI的角度来指出预测多重性的潜在危害，对所有良好模型的集合（也称为“ Rashomon set”）的全球分析在过去几年中引起了很多关注。由于找到这样一组好的模型是一个严重的计算问题，因此到目前为止，该问题只有几种算法，其中大多数是近似或不完整的。为了克服这一困难，我们研究了所有良好模型的有效枚举，用于可解释模型的子类，称为规则列表。基于Angelino等人提出的最佳最佳规则清单学习者Corels。在2017年，我们提出了一种有效的枚举算法Corelsenum，用于精确地使用输入大小中的多项式空间计算一组所有良好模型，给定数据集和最佳模型的错误容忍度。 By experiments with the COMPAS dataset on recidivism prediction, our algorithm CorelsEnum successfully enumerated all of several tens of thousands of good rule lists of length at most $\ell = 3$ in around 1,000 seconds, while a state-of-the-art top-$K$ rule list learner based on Lawler's method combined with CORELS, proposed by Hara and Ishihata in 2018, found only 40 models until the timeout of 6,000秒。对于全球分析，我们进行了实验，以表征Rashomon集合，并观察到模型的模型多样性多样性。

Since the seminal paper by Breiman in 2001, who pointed out a potential harm of prediction multiplicities from the view of explainable AI, global analysis of a collection of all good models, also known as a `Rashomon set,' has been attracted much attention for the last years. Since finding such a set of good models is a hard computational problem, there have been only a few algorithms for the problem so far, most of which are either approximate or incomplete. To overcome this difficulty, we study efficient enumeration of all good models for a subclass of interpretable models, called rule lists. Based on a state-of-the-art optimal rule list learner, CORELS, proposed by Angelino et al. in 2017, we present an efficient enumeration algorithm CorelsEnum for exactly computing a set of all good models using polynomial space in input size, given a dataset and a error tolerance from an optimal model. By experiments with the COMPAS dataset on recidivism prediction, our algorithm CorelsEnum successfully enumerated all of several tens of thousands of good rule lists of length at most $\ell = 3$ in around 1,000 seconds, while a state-of-the-art top-$K$ rule list learner based on Lawler's method combined with CORELS, proposed by Hara and Ishihata in 2018, found only 40 models until the timeout of 6,000 seconds. For global analysis, we conducted experiments for characterizing the Rashomon set, and observed large diversity of models in predictive multiplicity and fairness of models.

下载PDF全文

下载文献需遵守相关版权规定

论文标题