使用BERT模型的转移学习通用垃圾邮件检测

论文标题

使用BERT模型的转移学习通用垃圾邮件检测

Universal Spam Detection using Transfer Learning of BERT Model

论文作者

Tida, Vijay Srinivas, Hsu, Sonya

论文摘要

深度学习变压器模型通过基于自我注意的机制培训文本数据变得很重要。该手稿使用预先训练的Google的双向编码器表示，在实时场景中有效地对HAM或垃圾邮件电子邮件进行了有效分类，使用预先训练的Google的双向编码器表示，展示了一种新颖的通用垃圾邮件检测模型。使用了环体组织，垃圾邮件邮件，lingspam和spamtext消息分类数据集的不同方法，用于单独训练模型，其中在四个数据集中获得了可接受性能的单个模型。通用垃圾邮件检测模型（USDM）经过四个数据集的训练，并从每个模型中使用了杠杆的超参数。组合模型分别用这四个模型的相同的超参数进行了固定。当每个模型使用其相应的数据集时，在单个模型中，F1得分在0.9中均为0.9。总体精度达到97％，F1得分为0.96。讨论了研究结果和含义。

Deep learning transformer models become important by training on text data based on self-attention mechanisms. This manuscript demonstrated a novel universal spam detection model using pre-trained Google's Bidirectional Encoder Representations from Transformers (BERT) base uncased models with four datasets by efficiently classifying ham or spam emails in real-time scenarios. Different methods for Enron, Spamassain, Lingspam, and Spamtext message classification datasets, were used to train models individually in which a single model was obtained with acceptable performance on four datasets. The Universal Spam Detection Model (USDM) was trained with four datasets and leveraged hyperparameters from each model. The combined model was finetuned with the same hyperparameters from these four models separately. When each model using its corresponding dataset, an F1-score is at and above 0.9 in individual models. An overall accuracy reached 97%, with an F1 score of 0.96. Research results and implications were discussed.

下载PDF全文

下载文献需遵守相关版权规定

论文标题