Abstract:
Imbalanced problem will occur when the amounts of samples in classes are largely different, which results in the bias to the majority class and the neglecting to the minority class. However, the minority class usually need more attention in the imbalanced problem, since these samples in minority class may be very important matter in application scenery due to its scarcity. The cost of a wrong prediction for minority class is more fatal than that the one for majority class. Especially, the exception in the financial fraud detection must be sensitive to data and the misclassified samples would bring critical damage to the financial system of banks. The potential diseases, predicted as no abnormalities by mistake, would delay the best cure time for patient. Hence, the research on the classification problem of imbalanced data is quite important in the machine learning. In this paper, we propose a prediction framework on heart failure mortality based on the dataset from Shanghai Shuguang Hospital so as to provide effective information for auxiliary cure and diagnosis of the heart failure. The heart failure classification is a typical imbalanced problem and the heart failure patients only occupy a few part in the whole cases. In the examination, we should pay attention to heart failure cases as far as possible. The proposed framework adjusts the proportion of samples to balance the scale between two classes. Because the raw data include some redundant features against performance of classification, the proposed framework selects the more important features via the principal component analysis. Since the samples around the border of two classes may disturb the generating of the decision boundary, the locality sensitive discriminant matrixized classifier is used to strength the local samples so as to get a robust model against the noise samples. Experiment results show that the proposed framework can attain better prediction performance on the heart failure dataset than the similar method.