高级检索

    陈钊志, 李冬冬, 王喆, 阮彤, 高炬. 基于下采样的局部判别矩阵型分类的心衰死亡率预测[J]. 华东理工大学学报(自然科学版), 2019, 45(1): 156-162. DOI: 10.14135/j.cnki.1006-3080.20171217001
    引用本文: 陈钊志, 李冬冬, 王喆, 阮彤, 高炬. 基于下采样的局部判别矩阵型分类的心衰死亡率预测[J]. 华东理工大学学报(自然科学版), 2019, 45(1): 156-162. DOI: 10.14135/j.cnki.1006-3080.20171217001
    CHEN Zhaozhi, LI Dongdong, WANG Zhe, RUAN Tong, GAO Ju. Locality Sensitive Discriminant Matrixized Classifier with Under-Sampling for Heart Failure Prediction[J]. Journal of East China University of Science and Technology, 2019, 45(1): 156-162. DOI: 10.14135/j.cnki.1006-3080.20171217001
    Citation: CHEN Zhaozhi, LI Dongdong, WANG Zhe, RUAN Tong, GAO Ju. Locality Sensitive Discriminant Matrixized Classifier with Under-Sampling for Heart Failure Prediction[J]. Journal of East China University of Science and Technology, 2019, 45(1): 156-162. DOI: 10.14135/j.cnki.1006-3080.20171217001

    基于下采样的局部判别矩阵型分类的心衰死亡率预测

    Locality Sensitive Discriminant Matrixized Classifier with Under-Sampling for Heart Failure Prediction

    • 摘要: 不平衡分类问题的特征是样本集中每类样本个数相差较大,导致分类结果偏向多数类样本,少数类样本被忽视。而在不平衡分类问题中,少数类样本需要更多的关注。本文基于上海曙光医院提供的心衰医疗数据,提出了一个针对心衰病人死亡率预测的框架,为心衰的辅助治疗和诊断提供有效的信息。心衰医疗病例属于典型的不平衡分类问题,心衰病人在总的病人数量中只占少数,在检查中,应尽可能重点关注心衰病例。本文提出的框架采用下采样方法调整样本的比例,使类与类之间的规模平衡;使用主成分分析方法对高维数据进行特征选择;并在采样后的数据集上训练局部敏感判别矩阵型分类器,提高局部样本的关注度以获得更好的分类性能。实验结果表明,该框架能对心衰医疗数据提供较好的预测结果,与同类算法比较,表现出了更好的性能,是一个有效且实用的方法。

       

      Abstract: Imbalanced problem will occur when the amounts of samples in classes are largely different, which results in the bias to the majority class and the neglecting to the minority class. However, the minority class usually need more attention in the imbalanced problem, since these samples in minority class may be very important matter in application scenery due to its scarcity. The cost of a wrong prediction for minority class is more fatal than that the one for majority class. Especially, the exception in the financial fraud detection must be sensitive to data and the misclassified samples would bring critical damage to the financial system of banks. The potential diseases, predicted as no abnormalities by mistake, would delay the best cure time for patient. Hence, the research on the classification problem of imbalanced data is quite important in the machine learning. In this paper, we propose a prediction framework on heart failure mortality based on the dataset from Shanghai Shuguang Hospital so as to provide effective information for auxiliary cure and diagnosis of the heart failure. The heart failure classification is a typical imbalanced problem and the heart failure patients only occupy a few part in the whole cases. In the examination, we should pay attention to heart failure cases as far as possible. The proposed framework adjusts the proportion of samples to balance the scale between two classes. Because the raw data include some redundant features against performance of classification, the proposed framework selects the more important features via the principal component analysis. Since the samples around the border of two classes may disturb the generating of the decision boundary, the locality sensitive discriminant matrixized classifier is used to strength the local samples so as to get a robust model against the noise samples. Experiment results show that the proposed framework can attain better prediction performance on the heart failure dataset than the similar method.

       

    /

    返回文章
    返回