机器学习绩效分析以根据医学数据集预测中风

论文标题

机器学习绩效分析以根据医学数据集预测中风

Machine Learning Performance Analysis to Predict Stroke Based on Imbalanced Medical Dataset

论文作者

Jing, Yuru

论文摘要

大脑中风是普遍死亡的第二大大量原因，在过去几年中一直是公共卫生的主要关注。在机器学习技术的帮助下，可以访问各种冲程警报的早期检测，这可以有效防止或减少中风。但是，医疗数据集的班级标签经常不平衡，并且倾向于预测少数群体。在本文中，研究了中风的潜在危险因素。此外，采用了四种独特的方法来改善中风数据集中少数群体的分类，它们是合成重量投票分类器，合成的少数群体过度采样技术（SMOTE），主要组成部分分析，具有K-MeansS聚集（PCA-KMEANS）的主要组成部分分析（PCA-KMEANS），与Deep Neural网络（Deep Neural网络）（DNN）（DNN）和表现。通过分析结果，具有DNN-Focal损失的SMOTE和PCA-KMEANS最有限的大型严重不平衡数据集的尺寸最佳，这是2-4倍的效果超过Kaggle的工作。

Cerebral stroke, the second most substantial cause of death universally, has been a primary public health concern over the last few years. With the help of machine learning techniques, early detection of various stroke alerts is accessible, which can efficiently prevent or diminish the stroke. Medical dataset, however, are frequently unbalanced in their class label, with a tendency to poorly predict minority classes. In this paper, the potential risk factors for stroke are investigated. Moreover, four distinctive approaches are applied to improve the classification of the minority class in the imbalanced stroke dataset, which are the ensemble weight voting classifier, the Synthetic Minority Over-sampling Technique (SMOTE), Principal Component Analysis with K-Means Clustering (PCA-Kmeans), Focal Loss with the Deep Neural Network (DNN) and compare their performance. Through the analysis results, SMOTE and PCA-Kmeans with DNN-Focal Loss work best for the limited size of a large severe imbalanced dataset,which is 2-4 times outperform Kaggle work.

下载PDF全文

下载文献需遵守相关版权规定

论文标题