论文标题

财务时间数据的机器学习:挑战和机遇

Machine Learning for Temporal Data in Finance: Challenges and Opportunities

论文作者

Wittenbach, Jason, d'Alessandro, Brian, Bruss, C. Bayan

论文摘要

时间数据在金融服务(FS)行业中无处不在 - 传统数据,例如经济指标,运营数据(例如银行帐户交易)以及网站ClickStreams等现代数据源 - 所有这些都以时间索引的顺序出现。但是,FS中的机器学习工作通常无法解释这些数据的时间丰富性,即使在域知识表明事件之间的精确时间模式应该包含有价值的信息的情况下,即使在此情况下,也无法说明这些数据的时间。充其量,这种数据通常被视为统一时间序列,那里有一个序列,但没有确切的时机感。在最坏的情况下,在预选的窗口上计算出粗糙的汇总功能,以便可以应用基于静态样本的方法(例如上一年的开放信用额度或上个月的最大信用利用率)。这种方法与深度学习范式不符,该范式倡导直接对原始或易于处理的数据作用的构建模型,并利用现代优化技术来发现最佳特征转换,以解决手头上的建模任务。此外,只能通过检查在潜在的大差异时间尺度上展开的多个数据流来实现所建模的实体(客户,公司等)的完整情况。在本文中,我们研究了在常见FS用例中发现的不同类型的时间数据,回顾当前的机器学习方法,最终评估了在FS中为时间数据和应用程序的机器学习交集的研究人员的挑战和机会。

Temporal data are ubiquitous in the financial services (FS) industry -- traditional data like economic indicators, operational data such as bank account transactions, and modern data sources like website clickstreams -- all of these occur as a time-indexed sequence. But machine learning efforts in FS often fail to account for the temporal richness of these data, even in cases where domain knowledge suggests that the precise temporal patterns between events should contain valuable information. At best, such data are often treated as uniform time series, where there is a sequence but no sense of exact timing. At worst, rough aggregate features are computed over a pre-selected window so that static sample-based approaches can be applied (e.g. number of open lines of credit in the previous year or maximum credit utilization over the previous month). Such approaches are at odds with the deep learning paradigm which advocates for building models that act directly on raw or lightly processed data and for leveraging modern optimization techniques to discover optimal feature transformations en route to solving the modeling task at hand. Furthermore, a full picture of the entity being modeled (customer, company, etc.) might only be attainable by examining multiple data streams that unfold across potentially vastly different time scales. In this paper, we examine the different types of temporal data found in common FS use cases, review the current machine learning approaches in this area, and finally assess challenges and opportunities for researchers working at the intersection of machine learning for temporal data and applications in FS.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源