Professional Documents
Culture Documents
ISSN No:-2456-2165
Abstract:- Developing effective survival analysis models alternate action that affects the life of patients receiving
would help guide the decision-making in managing treatment is required[8, 9 & 10].However, to respond to this
major health challenges. Model development can be need, many studies have concentrated on prediction models
achieved through various approaches. Diabetes is a in traditional techniques, but computer scientists have
health challenge in Nigeria that has attracted the interest focused on it using machine learning methodologies, to
of researchers thus much research has been carried out construct prediction models [11, 12].
as regards its management necessitating the
development of models. This study carried out a machine Machine learning has played a great role in survival
learning analysis on diabetes data collected from Central analysis helping out in clinical forecasting. Work done in
Hospital, Warri, Delta State implementing Cox-PH this area includes “machine learning for survival analysis: a
Model due to the role both play in survival analysis. A case study on recurrence of prostate cancer” [13],
dataset of 100 diabetic patients' records was collected. “ComoRbidity: an R package for the systematic analysis of
The dataset was used for training multiple machine disease comorbidities”[14], “Survival model for Diabetes
learning algorithms, namely, SupportVector (SVM), K- Mellitus Patients’ using support vector machine”[15], etc.
nearestneighbors (KNN) classifier, etc., and the proposed This field has drawn much attention in the past and has
model (Cox-PH Hybrid or CPH-SML). The performance become a dominant technology in the AI community [16].
evaluation of the machine learning algorithms and the
Survival Analysis is one of the most popular methods
proposed model gave accuracy levels as follows: KNN-
47%, SVM; 74%, and Cox-PH Hybrid-96%. The of data mining that deals withthe estimation of the time to an
event such as death, childbirth, radioactive decay, etc [3, 17,
concordance index was used to evaluate the proposed
18 & 19].
model and it had an index of 0.7204, on several
covariates such as Age, Gender, Education, Marital Due to the increasing rate of DM in the world, there
Status, history of smoking, SBP, DBP, etc. From this arises the needfor more medical attention.One area which
study's analysis of the diabetic data, it was able to has responded to such need is the area of developing
conclude that the variables associated with diabetes survival analysis models by researchers using machine
mortality are; the age of the patient and diabetes types. learning algorithms. One algorithm in Machine learning is
The patients' hazard ratio reduces when they are young the Support Vector Machine algorithm used in a recent
compared to when they are old. The patient's hazard study for survival analysis on DM patients in Nigeria
ratio is also dependent on the diabetes type. Thus, early [20].This algorithm is said to be a fault in the area of
diagnosis and proper health management of diabetics handling large datasets for analysis. This, therefore calls for
can prolong the age of diabetic patients. development of an enhanced model for diabetes survival
Keywords:- Survival Model, Machine Learning, Cox analysis. This study developed a survival analysis model
Proportional Hazard, Diabetes. using a machine learning algorithm for diabetes mellitus
patients while implementing the Cox Proportional Hazard
I. INTRODUCTION model. Dataset for diabetes data was collected from Central
Hospital in Delta State of Nigeria.
Diabetes patients are increasing at a rapid rate, and it is
estimated that more than 90-95 percent of people globally II. RELATED WORKS
have Type 2 diabetes, which is one of the leading causes of
death and contributes to a large number of deaths each year Several models have been built to solve different
in an unnoticed manner [1]. Diabetes Mellitus (DM) has human problems. These problems ranges from customers
been defined as a condition that is induced by unregulated challenges at airport, performing online transactions,
diabetes that may lead to multi-organ failure in patients [2]. advisory systems, intrusion detection systems, and so
Diabetes has become one of the biggest health challenges in onOkpeki et al.[21]; Okofu[22]; Okpeki & Omede[23];
the world hence the need to control and manage it. Okofu et al. [24]; Efozia et al. [25]; Akazue[26]; Oijoe [27];
Ojugo&Otakore [28]; Ojugo& Yoro [29].Thus, it is good to
Related works in the healthcare information systems provide survival models for health challenges. A survival
show that the increasing number of healthcare data requires model to predict the survival of pediatric Sickle Cell Disease
the need for effective means of extracting information to aid (SCD) was developed using clinical variables by Idowu et
the delivery of healthcare services to patients [3,4,5,6& 7]. al., [30]. The predictive model works with fuzzy logic.Three
The development of a prediction model to guide clinical (3) clinical variables were used and the rules for the
decisions about whether to continue therapy or take an inference engine were elicited from an expert pediatrician
Predicting factors for the survival of breast cancer Survival analysis has also been done on coronavirus
patients using machine learning techniques was done in patients with the introduction of two models called
2019. The study developed models for detecting and Cox_COVID_19 and Deep_ _Cox_COVID_19. These
visualizing significant prognostic indicators of breast cancer models were developed to help hospitals select patients with
survival rate [35]. The datasets were a hospital-based breast better chances of survival and to predict the most important
cancer dataset from the University of Malaya Medical features affecting the rate of survival [39]. One of the
Centre, Kuala in Malaysia with diagnostic information survival models; Cox_COVID_19 is based on Cox
between 1993 and 2016. To determine the predicting regression while the second model; Deep_Cox_COVID_19
factors, models were developed with a decision tree, random is a hybrid model, i.e a combination of autoencoder deep
forest, neural networks, extreme boots, logistic regression, neural network and Cox regression. The study affirms that
and support vector machine. The study affirms that all both systems (i.e the Cox_COVID_19 and Deep_
algorithms produced close outcomes, with the lowest Cox_COVID_19) can predict the survival likelihoodand
obtained from the decision tree (accuracy = 79.8%) and the also present significant symptoms that differentiate severe
highest from the random forest (accuracy = 82.7%). cases and death cases.
More work done on breast cancer survival Ojie, et al., [41] applied classification algorithm in
includesKalafi et al., [36] using 4,902 patient records from their proposed hybrid model of Genetic Algorithm and Data
the University of Malaya Medical Centre Breast Cancer value Metric (DVM) as an information theoretic metric for
Registry (UMMCBCR).The prediction modelswere quantifying the quality and utility for feature selection. They
designed and implemented by machine learning(SVM, RF, proposed that this can be applied to traditional data.
and DT) and deep learning MLP techniques.Findings show
that the multilayer perceptron (MLP),random forest (RF)
and decision tree (DT) classifierscould predict survivorship,
respectively, with 88.2 %, 83.3 %, and 82.5 % accuracy in
the tested samples. And Support vector machine (SVM) was
recorded lower with 80.5 %.
Tachkov et al.,[37] conducted a study to evaluate the III. METHODS AND MATERIALS
expected life expectancy in patients with diabetes and to
C. Building the Model From the fitted COX PH Model the variables that are
Multiple models were trained, namely, Random Forest, associated with the mortality are the age of the patient,
Gradient Boosting, DecisionTree, Support Vector, Multi- diabetes, and education level. The patients' hazard ratio
Layer Perception, and K-nearest neighbors' classifier. The increases by 1.95 when they have secondary education
Cox proportionality Hazards model was also implemented compared to when they have primary education. The
and evaluated on the dataset. This model was introduced by patients' hazard ratio reduces by 0.96 when they are young
Cox and takes into account the effect of several variables at compared to when they are old. The patients' hazard ratio
a time and the relationship of their survival distribution to reduces by 0.53 when they have diabetes compared to when
these variables. The Random Forest algorithm yielded the they have no diabetes.
best result with an accuracy of 82% against other
ACCURACY CHART
100
80
KNN
60
Logistic
40 SVM
20 Cox-PH Hybrid
0
Accuracy
0.98
1
0.9
0.8 0.72
0.7 0.63
Precision
0.6
0.48
0.5 Recall
0.4
F1-Score
0.3
0.2
0.1
0
KNN Logistic SVM Cox-PH Hybrid
Fig. 3: Precision, Recall Value, and F-Measure Analysis on Diabetes Datasets
V. RESULT AND DISCUSSION The patient’s hazard ratio reduces by 0.53 when they have
diabetes compared to when they have no diabetes.
To perform the survival analysis of diabetes mellitus,
this study has developed a hybrid model of implementing VI. CONCLUSION
the Random Forest algorithm Cox-PH. From the
performance and evaluation of the designed model, this The study has been able to establish that a machine
study has shown that the integration of Machine Learning learning model for survival analysis and prediction can be
and the Cox-proportional hazard model in survival analysis implemented along parameter or non-parameter tools for
is achievable. The result of the model on the dataset showed modeling time-to-event data. This is because machine
that the variables that are associated with diabetes mortality learning tools and algorithms and efficient in building
are; the age of the patient and diabetes types. The study prediction and analysis models.
shows that patients' hazard ratio reduces by 0.96 when they
are young compared to when they are old. The patient's REFERENCES
hazard ratio reduces by 0.53 when they have diabetes [1.] S. I. Ayom and I. Milon, “Diabetes Prediction: A
compared to when they have no diabetes. Deep Learning Approach. I.J. Information
Evaluating the algorithm’s predictions using a statistic Engineering and Electronic Business, Vol2, pp.21–
tool known as the concordance index or c-index on the Cox- 27, 2019.
PH concordance Index for the models with several [2.] J. Chaki, S.T. Ganesh, S. K. Cidham and S. A.
covariates the best model with increasing index was used Theertan, “Machine learning and artificial
and deployed. The result from the fitting of the CoxPH intelligence-based Diabetes Mellitus detection and
model on the dataset shows that the variables that are self-management: A systematic review, Journal of
associated with diabetes mortality are; the age of the patient King Saud University – Computer and Information
and diabetes types. The patients' hazard ratio reduces by Sciences, volxxx, no. xxxx, pp. 1–22, 2020.
0.96 when they are young compared to when they are old.