CNN-Trans-enc：CNN增强的变压器编码器，在静态BERT表示的基础上进行文档分类

论文标题

CNN-Trans-enc：CNN增强的变压器编码器，在静态BERT表示的基础上进行文档分类

CNN-Trans-Enc: A CNN-Enhanced Transformer-Encoder On Top Of Static BERT representations for Document Classification

论文作者

Benarab, Charaf Eddine, Gui, Shenglin

论文摘要

Bert在文本分类任务中取得了显着的结果，但尚未完全利用它，因为只有最后一层用作下游分类器的表示输出。关于伯特学到的语言特征性质的最新研究表明，不同的层集中在不同种类的语言特征上。我们提出了一个CNN增强的变压器编码模型，该模型在所有层的固定BERT $ [CLS] $顶部进行了训练，该模型使用卷积神经网络使用卷积神经网络在变压器编码器内生成QKV特征图，而不是输入到嵌入空间中的输入线性线性投影。 CNN-Trans-enc相对较小，作为下游分类器，并且不需要对BERT进行任何微调，因为它可以确保从所有层中的$ [CLS] $表示的最佳使用，从而利用具有更有意义且更有意义的QKV表示的不同语言特征。使用带有CNN-Trans-ENC的BERT保持$ 98.9 \％$和$ 94.8 \％\％$ $的最新性能的IMDB和SST-5数据集的最新性能，同时在Yelp-5上获得了新的最新Yelp-5的最新最新时间，并以$ 82.23 $（$ 8.9 \％的改善），$ 0.98 $ 0.98 \％（$ 0.98）\％（0.98％）\％（0.98％）。（来自两个数据集的1M样本子集上的K折交叉验证）。在AG News数据集中，CNN-Trans-Enc在当前最新的$ 99.94 \％$ $中，并在DBPEDIA-14上获得了新的最高绩效，平均准确性为99.51美元\％$。索引术语：文本分类，自然语言处理，卷积神经网络，变压器，伯特

BERT achieves remarkable results in text classification tasks, it is yet not fully exploited, since only the last layer is used as a representation output for downstream classifiers. The most recent studies on the nature of linguistic features learned by BERT, suggest that different layers focus on different kinds of linguistic features. We propose a CNN-Enhanced Transformer-Encoder model which is trained on top of fixed BERT $[CLS]$ representations from all layers, employing Convolutional Neural Networks to generate QKV feature maps inside the Transformer-Encoder, instead of linear projections of the input into the embedding space. CNN-Trans-Enc is relatively small as a downstream classifier and doesn't require any fine-tuning of BERT, as it ensures an optimal use of the $[CLS]$ representations from all layers, leveraging different linguistic features with more meaningful, and generalizable QKV representations of the input. Using BERT with CNN-Trans-Enc keeps $98.9\%$ and $94.8\%$ of current state-of-the-art performance on the IMDB and SST-5 datasets respectably, while obtaining new state-of-the-art on YELP-5 with $82.23$ ($8.9\%$ improvement), and on Amazon-Polarity with $0.98\%$ ($0.2\%$ improvement) (K-fold Cross Validation on a 1M sample subset from both datasets). On the AG news dataset CNN-Trans-Enc achieves $99.94\%$ of the current state-of-the-art, and achieves a new top performance with an average accuracy of $99.51\%$ on DBPedia-14. Index terms: Text Classification, Natural Language Processing, Convolutional Neural Networks, Transformers, BERT

下载PDF全文

下载文献需遵守相关版权规定

论文标题