论文标题
最好吗?自然语言处理的贝叶斯统计模型比较
Is the Best Better? Bayesian Statistical Model Comparison for Natural Language Processing
论文作者
论文摘要
最近的工作引起了人们对使用标准拆分比较自然语言处理模型的担忧。我们提出了一种贝叶斯统计模型比较技术,该技术使用多个数据集中使用K折的交叉验证来估计一种模型的可能性胜过另一个模型,或者两者实际上会产生等效的结果。我们使用此技术在两个数据集和三个评估指标中对六个英语一部分标签者进行排名。
Recent work raises concerns about the use of standard splits to compare natural language processing models. We propose a Bayesian statistical model comparison technique which uses k-fold cross-validation across multiple data sets to estimate the likelihood that one model will outperform the other, or that the two will produce practically equivalent results. We use this technique to rank six English part-of-speech taggers across two data sets and three evaluation metrics.