论文标题

无通置的高度降级方法

A Hypergradient Approach to Robust Regression without Correspondence

论文作者

Xie, Yujia, Mao, Yixiu, Zuo, Simiao, Xu, Hongteng, Ye, Xiaojing, Zhao, Tuo, Zha, Hongyuan

论文摘要

我们考虑了回归问题的一种变体,其中没有输入和输出数据之间的对应关系。在许多现实世界中,通常会观察到这种改组的数据。以流式细胞仪为例,测量仪器可能无法维持样品和测量之间的对应关系。由于问题的组合性质,大多数现有方法仅适用于样本量很小,并且仅限于线性回归模型。为了克服此类瓶颈,我们为洗牌回归问题提出了一个新的计算框架 - 机器人 - 适用于大型数据和复杂的非线性模型。具体而言,我们在没有信件的情况下将回归重新重新制定为连续优化问题。然后,通过利用回归模型和数据对应之间的相互作用,我们基于可区分的编程技术开发了一种高度级别的方法。这种高度级别的方法本质上将数据对应视为回归的运算符,因此使我们能够通过通过数据对应关系区分模型参数找到更好的下降方向。机器人可以进一步扩展到不精确的对应关系设置,在该设置中,输入数据和输出数据之间可能没有确切的对齐。彻底的数值实验表明,在线性和非线性回归任务中,机器人的性能要比现有方法(包括流式细胞仪和多对象跟踪)更好。

We consider a variant of regression problem, where the correspondence between input and output data is not available. Such shuffled data is commonly observed in many real world problems. Taking flow cytometry as an example, the measuring instruments may not be able to maintain the correspondence between the samples and the measurements. Due to the combinatorial nature of the problem, most existing methods are only applicable when the sample size is small, and limited to linear regression models. To overcome such bottlenecks, we propose a new computational framework -- ROBOT -- for the shuffled regression problem, which is applicable to large data and complex nonlinear models. Specifically, we reformulate the regression without correspondence as a continuous optimization problem. Then by exploiting the interaction between the regression model and the data correspondence, we develop a hypergradient approach based on differentiable programming techniques. Such a hypergradient approach essentially views the data correspondence as an operator of the regression, and therefore allows us to find a better descent direction for the model parameter by differentiating through the data correspondence. ROBOT can be further extended to the inexact correspondence setting, where there may not be an exact alignment between the input and output data. Thorough numerical experiments show that ROBOT achieves better performance than existing methods in both linear and nonlinear regression tasks, including real-world applications such as flow cytometry and multi-object tracking.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源