论文标题
Delaunay密度诊断
The Delaunay Density Diagnostic
论文作者
论文摘要
实用值函数的准确近似取决于可用数据的两个方面:感兴趣域内输入的密度以及该域上输出的变化。几乎没有什么方法可以评估输入的密度是否为\ textIt {足够},以确定输出的相关变化(即,函数的``几何尺度'''的相关变化 - 尽管采样密度与近似方法的成功或失败息息相关。在本文中,我们介绍了一种通用计算方法,以使用计算几何形状的确定性插值技术来检测固定域上实价函数的几何量表。该算法旨在以中等维度(2-10)的标量数据进行工作。我们的算法是基于这样的观察结果:一系列分段线性插值剂将以二次速率(以$ l^2 $ norm)为单位收敛到连续函数,并且仅当数据被密集地采样足以区分噪声(假设足够的常规采样)时。我们提出了数值实验,证明了我们的方法如何识别特征量表,特征量表中的不确定性估计不确定性以及评估输入输出对的固定(静态)数据集的采样密度。我们包括分析结果,以支持我们的数值发现,并发布了可以在各种数据科学设置中使用的轻量级代码。
Accurate approximation of a real-valued function depends on two aspects of the available data: the density of inputs within the domain of interest and the variation of the outputs over that domain. There are few methods for assessing whether the density of inputs is \textit{sufficient} to identify the relevant variations in outputs -- i.e., the ``geometric scale'' of the function -- despite the fact that sampling density is closely tied to the success or failure of an approximation method. In this paper, we introduce a general purpose, computational approach to detecting the geometric scale of real-valued functions over a fixed domain using a deterministic interpolation technique from computational geometry. The algorithm is intended to work on scalar data in moderate dimensions (2-10). Our algorithm is based on the observation that a sequence of piecewise linear interpolants will converge to a continuous function at a quadratic rate (in $L^2$ norm) if and only if the data are sampled densely enough to distinguish the feature from noise (assuming sufficiently regular sampling). We present numerical experiments demonstrating how our method can identify feature scale, estimate uncertainty in feature scale, and assess the sampling density for fixed (i.e., static) datasets of input-output pairs. We include analytical results in support of our numerical findings and have released lightweight code that can be adapted for use in a variety of data science settings.