深训练：使用深度学习和数据增强的图像垃圾邮件检测

论文标题

深训练：使用深度学习和数据增强的图像垃圾邮件检测

DeepCapture: Image Spam Detection Using Deep Learning and Data Augmentation

论文作者

Kim, Bedeuro, Abuadbba, Sharif, Kim, Hyoungshick

论文摘要

图像垃圾邮件电子邮件通常用于逃避基于文本的垃圾邮件过滤器，该垃圾邮件过滤器以其常用的关键字检测垃圾邮件电子邮件。在本文中，我们建议使用卷积神经网络（CNN）模型提出一种称为DeepCapture的新图像垃圾邮件检测工具。已经做出了许多努力来检测图像垃圾邮件电子邮件，但是由于在培训阶段过于拟合，由于过度拟合，全新且看不见的图像垃圾邮件邮件的性能很大。为了解决这个具有挑战性的问题，我们主要专注于开发更强大的模型来解决过度拟合问题。我们的关键想法是构建一个CNN-XGBOOST框架，该框架仅由八层组成，仅使用针对图像垃圾邮件检测任务量身定制的数据增强技术和大量培训样品。为了显示深训练的可行性，我们通过由6,000个垃圾邮件和2,313个非垃圾邮件图像样本组成的公开数据集评估了其性能。实验结果表明，DeepCapture能够达到88％的F1分数，比最佳现有垃圾邮件检测模型CNN-SVM提高了6％，F1得分为82％。此外，针对新的和看不见的图像数据集，DeepCapture的表现优于现有的图像垃圾邮件检测解决方案。

Image spam emails are often used to evade text-based spam filters that detect spam emails with their frequently used keywords. In this paper, we propose a new image spam email detection tool called DeepCapture using a convolutional neural network (CNN) model. There have been many efforts to detect image spam emails, but there is a significant performance degrade against entirely new and unseen image spam emails due to overfitting during the training phase. To address this challenging issue, we mainly focus on developing a more robust model to address the overfitting problem. Our key idea is to build a CNN-XGBoost framework consisting of eight layers only with a large number of training samples using data augmentation techniques tailored towards the image spam detection task. To show the feasibility of DeepCapture, we evaluate its performance with publicly available datasets consisting of 6,000 spam and 2,313 non-spam image samples. The experimental results show that DeepCapture is capable of achieving an F1-score of 88%, which has a 6% improvement over the best existing spam detection model CNN-SVM with an F1-score of 82%. Moreover, DeepCapture outperformed existing image spam detection solutions against new and unseen image datasets.

下载PDF全文

下载文献需遵守相关版权规定

论文标题