关于遗传编程表示和适应性的功能，可解释的维度降低

论文标题

关于遗传编程表示和适应性的功能，可解释的维度降低

On genetic programming representations and fitness functions for interpretable dimensionality reduction

论文作者

Uriot, Thomas, Virgolin, Marco, Alderliesten, Tanja, Bosman, Peter

论文摘要

减少维度（DR）是数据探索和知识发现的重要技术。但是，大多数主要的DR方法是线性（例如PCA），在原始数据及其较低维表示（例如MDS，MDS，T-SNE，ISOMAP）之间不提供明确的映射，或者产生无法轻易解释的映射（例如，基于神经基于神经的自动核能模型）。最近，遗传编程（GP）已被用来以符号表达式的形式进化可解释的DR映射。有多种方法可以将GP用于该目的，并且没有进行比较的研究。在本文中，我们通过比较现有的GP方法以及设计新的空白来填补这一空白。我们根据预测精度以及仅使用较低维表示可以重建原始特征的方式在几个基准数据集上评估我们的方法。最后，我们定性地评估由此产生的表达及其复杂性。我们发现，各种GP方法可以与最先进的博士算法具有竞争力，并且它们有可能产生可解释的博士映射。

Dimensionality reduction (DR) is an important technique for data exploration and knowledge discovery. However, most of the main DR methods are either linear (e.g., PCA), do not provide an explicit mapping between the original data and its lower-dimensional representation (e.g., MDS, t-SNE, isomap), or produce mappings that cannot be easily interpreted (e.g., kernel PCA, neural-based autoencoder). Recently, genetic programming (GP) has been used to evolve interpretable DR mappings in the form of symbolic expressions. There exists a number of ways in which GP can be used to this end and no study exists that performs a comparison. In this paper, we fill this gap by comparing existing GP methods as well as devising new ones. We evaluate our methods on several benchmark datasets based on predictive accuracy and on how well the original features can be reconstructed using the lower-dimensional representation only. Finally, we qualitatively assess the resulting expressions and their complexity. We find that various GP methods can be competitive with state-of-the-art DR algorithms and that they have the potential to produce interpretable DR mappings.

下载PDF全文

下载文献需遵守相关版权规定

论文标题