论文标题
研究综合少数族裔班级过采样技术(SMOTE)在心血管疾病(CVD)数据集上
Investigating the Synthetic Minority class Oversampling Technique (SMOTE) on an imbalanced cardiovascular disease (CVD) dataset
论文作者
论文摘要
在这项工作中,我们采用合成的少数族裔过度采样技术(SMOTE)来生成少数族裔冠状动脉疾病数据集的实例。我们首先分析公共数据集Z-Alizadeh Sani,这是一种用于非侵入性预测CAD的数据集。我们执行特征选择,以排除与冠状动脉疾病风险无关的属性。使用SMOTE(这是机器学习任务中通常采用的技术)进行新样本的产生。我们设计人工神经网络,决策树和支持矢量机,以对原始数据集和增强进行分类。结果表明,在特定情况下,数据增加可能是有益的,但它不是灵丹妙药,并且应仔细检查其在特定数据集中的应用。
In this work, we employ the Synthetic Minority Oversampling Technique (SMOTE) to generate instances of the minority class of an imbalanced Coronary Artery Disease dataset. We firstly analyze the public dataset Z -- Alizadeh Sani, a dataset used for non-invasive prediction of CAD. We perform feature selection to exclude attributes unrelated to Coronary Artery Disease risk. The generation of new samples is performed using SMOTE, a technique commonly employed in machine learning tasks. We design Artificial Neural Networks, Decision Trees, and Support Vector Machines to classify both the original dataset and the augmented. The results demonstrate that data augmentation may be beneficial in specific cases, but it is not a panacea, and its application in a specific dataset should be carefully examined.