论文标题
使用概率优化的事件案例相关性用于过程挖掘
Event-Case Correlation for Process Mining using Probabilistic Optimization
论文作者
论文摘要
过程挖掘支持使用事件日志对业务流程的实际行为和性能的分析。 %例如ERP系统记录的销售交易。一个必不可少的要求是,日志中的每个事件都必须与唯一的案例标识符(例如,订单订单为现金过程的订单ID)关联。但是,实际上,这种情况标识符可能并不总是存在,尤其是当从不同系统中获取日志或从非过程感知信息系统中提取日志时。在这种情况下,需要通过将事件分组为案例来预处理事件日志 - 一种称为事件相关的操作。现有的用于关联事件的技术已与假设合作以使问题可以解决:有些人认为生成过程是无环的,而其他人则需要启发式信息或用户输入。此外,%这些技术的主要假设是它们将日志抽象为活动和时间戳,并且错过了使用数据属性的机会。 %在本文中,我们提出这些假设,并提出了一种基于概率优化的新技术,称为EC-SA-DATA。该技术将输入作为一个时间戳事件(无案例ID的日志),描述基本业务过程的过程模型以及对事件属性的约束。我们的方法返回事件日志,其中每个事件都与案例标识符关联。该技术允许用户灵活地合并有关过程知识和数据约束的规则。该方法最小化生成的日志与输入过程模型之间的未对准,最大程度地支持给定数据约束对相关日志的支持,以及跨案例的活动持续时间之间的差异。我们对各种现实生活数据集进行的实验表明,我们的方法优于艺术状况。
Process mining supports the analysis of the actual behavior and performance of business processes using event logs. % such as, e.g., sales transactions recorded by an ERP system. An essential requirement is that every event in the log must be associated with a unique case identifier (e.g., the order ID of an order-to-cash process). In reality, however, this case identifier may not always be present, especially when logs are acquired from different systems or extracted from non-process-aware information systems. In such settings, the event log needs to be pre-processed by grouping events into cases -- an operation known as event correlation. Existing techniques for correlating events have worked with assumptions to make the problem tractable: some assume the generative processes to be acyclic, while others require heuristic information or user input. Moreover, %these techniques' primary assumption is that they abstract the log to activities and timestamps, and miss the opportunity to use data attributes. % In this paper, we lift these assumptions and propose a new technique called EC-SA-Data based on probabilistic optimization. The technique takes as inputs a sequence of timestamped events (the log without case IDs), a process model describing the underlying business process, and constraints over the event attributes. Our approach returns an event log in which every event is associated with a case identifier. The technique allows users to incorporate rules on process knowledge and data constraints flexibly. The approach minimizes the misalignment between the generated log and the input process model, maximizes the support of the given data constraints over the correlated log, and the variance between activity durations across cases. Our experiments with various real-life datasets show the advantages of our approach over the state of the art.