多视图学习的新型随机森林差异度量

论文标题

多视图学习的新型随机森林差异度量

A Novel Random Forest Dissimilarity Measure for Multi-View Learning

论文作者

Cao, Hongliu, Bernard, Simon, Sabourin, Robert, Heutte, Laurent

论文摘要

多视图学习是一项学习任务，其中数据由几个并发表示形式描述。它的主要挑战通常是利用这些表示形式之间的互补性来帮助解决分类/回归任务。如果有大量数据可供学习，这是一个挑战，如今可以应对。但是，对于所有现实世界中的问题，这并不一定是正确的，因为数据有时很少（例如与医疗环境有关的问题）。在这些情况下，有效的策略是根据实例之间的差异使用中间表示。这项工作提出了构建这些差异表示的新方法，并从随机森林分类器中学习它们。更确切地说，提出了两种方法，这些方法修改了随机的森林接近度度量，以使其适应高尺寸低样本量（HDLSS）多视图分类问题的背景。基于实例硬度测量值的第二种方法比其他最先进的测量值（包括原始RF接近度测量和较大的边缘最近的邻居（LMNN）度量度学习测量测量测量）要准确得多。

Multi-view learning is a learning task in which data is described by several concurrent representations. Its main challenge is most often to exploit the complementarities between these representations to help solve a classification/regression task. This is a challenge that can be met nowadays if there is a large amount of data available for learning. However, this is not necessarily true for all real-world problems, where data are sometimes scarce (e.g. problems related to the medical environment). In these situations, an effective strategy is to use intermediate representations based on the dissimilarities between instances. This work presents new ways of constructing these dissimilarity representations, learning them from data with Random Forest classifiers. More precisely, two methods are proposed, which modify the Random Forest proximity measure, to adapt it to the context of High Dimension Low Sample Size (HDLSS) multi-view classification problems. The second method, based on an Instance Hardness measurement, is significantly more accurate than other state-of-the-art measurements including the original RF Proximity measurement and the Large Margin Nearest Neighbor (LMNN) metric learning measurement.

下载PDF全文

下载文献需遵守相关版权规定

论文标题