通过人类反馈学习可解释的基于概念的模型

论文标题

通过人类反馈学习可解释的基于概念的模型

Learning Interpretable Concept-Based Models with Human Feedback

论文作者

Lage, Isaac, Doshi-Velez, Finale

论文摘要

已经提出了首先学习以人为理解的概念来了解领域的机器学习模型，然后使用它来做出预测，以促进与对高维数据训练的模型进行解释和互动。但是，这些方法具有重要的局限性：它们定义概念的方式不可固有地解释，并且他们认为概念标签要么存在于单个实例中，要么可以轻松地从用户中获取。对于高维表格特征，这些局限性尤其严重。我们提出了一种在高维表数据中学习一组透明概念定义的方法，该数据依赖于用户标记概念特征而不是单个实例的用户。我们的方法产生的概念既符合用户对概念的含义的直观感，又可以通过透明的机器学习模型来促进下游标签的预测。这样可以确保完整的模型是透明和直观的，并且在此限制下，尽可能地预测性。我们通过模拟对实际预测问题的模拟用户反馈（包括临床领域的一个问题）证明，这种直接反馈在学习解决方案方面比依赖于标记实例或其他现有交互机制的替代透明方法的学习解决方案更有效，同时保持相似的预测性能。

Machine learning models that first learn a representation of a domain in terms of human-understandable concepts, then use it to make predictions, have been proposed to facilitate interpretation and interaction with models trained on high-dimensional data. However these methods have important limitations: the way they define concepts are not inherently interpretable, and they assume that concept labels either exist for individual instances or can easily be acquired from users. These limitations are particularly acute for high-dimensional tabular features. We propose an approach for learning a set of transparent concept definitions in high-dimensional tabular data that relies on users labeling concept features instead of individual instances. Our method produces concepts that both align with users' intuitive sense of what a concept means, and facilitate prediction of the downstream label by a transparent machine learning model. This ensures that the full model is transparent and intuitive, and as predictive as possible given this constraint. We demonstrate with simulated user feedback on real prediction problems, including one in a clinical domain, that this kind of direct feedback is much more efficient at learning solutions that align with ground truth concept definitions than alternative transparent approaches that rely on labeling instances or other existing interaction mechanisms, while maintaining similar predictive performance.

下载PDF全文

下载文献需遵守相关版权规定

论文标题