通过GAN半监督学习中的不平衡数据集中的伪造检测

论文标题

通过GAN半监督学习中的不平衡数据集中的伪造检测

Fake detection in imbalance dataset by Semi-supervised learning with GAN

论文作者

Bordbar, Jinus, Ardalan, Saman, Mohammadrezaie, Mohammadreza, Ghasemi, Zahra

论文摘要

随着社交媒体的持续增长，这些平台上骚扰的流行也有所增加。这激起了研究人员在虚假检测领域的兴趣。社交媒体数据经常形成带有许多节点的复杂图形，提出了几个挑战。这些挑战和局限性包括处理矩阵中的大量不相关特征，并解决了诸如高数据分散和数据集中的班级分布等问题。为了克服这些挑战和局限性，研究人员采用了自动编码器，并将半监督学习与GAN算法（称为SGAN）组合。我们提出的方法利用自动编码器进行特征提取，并结合了SGAN。通过利用一个未标记的数据集，无监督的SGAN层补偿了标记数据的有限可用性，从而有效利用了有限的标记实例。采用了多个评估指标，包括混淆矩阵和ROC曲线。该数据集分为培训和测试集，其中有100个标记样品进行培训，并进行了1,000个样品进行测试。我们研究的新颖性在于应用SGAN来解决虚假帐户检测中数据集不平衡的问题。通过优化少量标记实例的使用并减少对广泛的计算能力的需求，我们的方法提供了更有效的解决方案。此外，我们的研究通过仅使用100个标记样本来检测假帐户的精度达到81％的准确性，从而为该领域做出了贡献。这证明了SGAN作为处理少数群体和解决伪造帐户检测中的大数据挑战的强大工具的潜力。

As social media continues to grow rapidly, the prevalence of harassment on these platforms has also increased. This has piqued the interest of researchers in the field of fake detection. Social media data, often forms complex graphs with numerous nodes, posing several challenges. These challenges and limitations include dealing with a significant amount of irrelevant features in matrices and addressing issues such as high data dispersion and an imbalanced class distribution within the dataset. To overcome these challenges and limitations, researchers have employed auto-encoders and a combination of semi-supervised learning with a GAN algorithm, referred to as SGAN. Our proposed method utilizes auto-encoders for feature extraction and incorporates SGAN. By leveraging an unlabeled dataset, the unsupervised layer of SGAN compensates for the limited availability of labeled data, making efficient use of the limited number of labeled instances. Multiple evaluation metrics were employed, including the Confusion Matrix and the ROC curve. The dataset was divided into training and testing sets, with 100 labeled samples for training and 1,000 samples for testing. The novelty of our research lies in applying SGAN to address the issue of imbalanced datasets in fake account detection. By optimizing the use of a smaller number of labeled instances and reducing the need for extensive computational power, our method offers a more efficient solution. Additionally, our study contributes to the field by achieving an 81% accuracy in detecting fake accounts using only 100 labeled samples. This demonstrates the potential of SGAN as a powerful tool for handling minority classes and addressing big data challenges in fake account detection.

下载PDF全文

下载文献需遵守相关版权规定

论文标题