论文标题
自动化过程中的概括发现:基于事件日志模式的框架
Generalization in Automated Process Discovery: A Framework based on Event Log Patterns
论文作者
论文摘要
质量措施在过程开采中的重要性有所增加。概括的关键质量方面之一与衡量过程模型W.R.T.的过度拟合程度有关。事件日志,因为记录的行为只是基础业务流程的真实行为的一个示例。现有的概括措施表现出几种缺陷,严重阻碍了其实践中的适用性。例如,他们假设事件日志完全适合发现的过程模型,并且无法处理大型现实事件日志和复杂的过程模型。更重要的是,当前的措施忽略了对模型中某种结构的清晰模式的概括。例如,事件日志中的重复序列应在模型中使用循环结构进行推广。我们通过提出一个措施框架来解决这些缺点,该框架概括了一组从事件日志中发现的具有代表性痕迹的模式,并通过其痕量对齐来检查过程模型中相应的控制流结构。我们将使用串联重复序列的概括度量实例化框架,以识别与循环结构和并发甲骨文进行比较的重复模式,以识别与过程模型的并行结构进行比较的并发模式。在使用74个对数模型对的广泛定性和定量评估中,使用两个基线泛化措施,我们表明,所提出的概括措施一致地对满足观察到的模式的过程模型始终如一地排名,而概括控制流的结构却不高于那些不高的模式,而基线测量值则忽略了这些模式。此外,我们表明,可以为数据集有效地计算我们的度量,该量度比基线概括措施可以处理的两个数量级要大两个数量级。
The importance of quality measures in process mining has increased. One of the key quality aspects, generalization, is concerned with measuring the degree of overfitting of a process model w.r.t. an event log, since the recorded behavior is just an example of the true behavior of the underlying business process. Existing generalization measures exhibit several shortcomings that severely hinder their applicability in practice. For example, they assume the event log fully fits the discovered process model, and cannot deal with large real-life event logs and complex process models. More significantly, current measures neglect generalizations for clear patterns that demand a certain construct in the model. For example, a repeating sequence in an event log should be generalized with a loop structure in the model. We address these shortcomings by proposing a framework of measures that generalize a set of patterns discovered from an event log with representative traces and check the corresponding control-flow structures in the process model via their trace alignment. We instantiate the framework with a generalization measure that uses tandem repeats to identify repetitive patterns that are compared to the loop structures and a concurrency oracle to identify concurrent patterns that are compared to the parallel structures of the process model. In an extensive qualitative and quantitative evaluation using 74 log-model pairs using against two baseline generalization measures, we show that the proposed generalization measure consistently ranks process models that fulfil the observed patterns with generalizing control-flow structures higher than those which do not, while the baseline measures disregard those patterns. Further, we show that our measure can be efficiently computed for datasets two orders of magnitude larger than the largest dataset the baseline generalization measures can handle.