论文标题
Uniparser:异质日志数据的统一日志解析器
UniParser: A Unified Log Parser for Heterogeneous Log Data
论文作者
论文摘要
日志为工程师提供了第一手信息,以诊断大规模在线服务系统中的故障。日志解析将半结构化的原始日志消息转换为结构化数据,是自动分析的先决条件,例如基于日志的异常检测和诊断。几乎所有现有的日志解析器都遵循将共同部分提取为模板和动态部分作为参数的一般思想。但是,这些日志解析方法经常忽略日志消息的语义含义。此外,各种日志源之间的高度多样性在跨不同系统的日志解析的概括方面也构成了障碍。在本文中,我们提出了Uniparser,以从异质日志数据中捕获常见的记录行为。 UniParser利用令牌编码器模块和上下文编码器模块从日志令牌及其相邻上下文中学习模式。上下文相似性模块的设计专门设计用于模拟学习模式的共同点。我们已经在16个公共日志数据集上进行了广泛的实验,我们的结果表明,Uniparser较大的原木模拟分析器的幅度很大。
Logs provide first-hand information for engineers to diagnose failures in large-scale online service systems. Log parsing, which transforms semi-structured raw log messages into structured data, is a prerequisite of automated log analysis such as log-based anomaly detection and diagnosis. Almost all existing log parsers follow the general idea of extracting the common part as templates and the dynamic part as parameters. However, these log parsing methods, often neglect the semantic meaning of log messages. Furthermore, high diversity among various log sources also poses an obstacle in the generalization of log parsing across different systems. In this paper, we propose UniParser to capture the common logging behaviours from heterogeneous log data. UniParser utilizes a Token Encoder module and a Context Encoder module to learn the patterns from the log token and its neighbouring context. A Context Similarity module is specially designed to model the commonalities of learned patterns. We have performed extensive experiments on 16 public log datasets and our results show that UniParser outperperforms state-of-the-art log parsers by a large margin.