You are on page 1of 6

Volume 8, Issue 5, May – 2023 International Journal of Innovative Science and Research Technology

ISSN No:-2456-2165

Interpreting the Premium Prediction of Health


Insurance Through Random Forest Algorithm Using
Supervised Machine Learning Technology
V.Srinivasa Rao 1 M. Iswarya2
1 2
Asst. Professor, NRI Institute of Technology, UG Scholar, Dept. of IT, NRI Institute of Technology,
A.P, India-521212 A.P, India-521212

SK. Ameer Hamza3 B. Satish4


3 4
UG Scholar, Dept. of IT, NRI Institute of Technology, UG Scholar, Dept. of IT, NRI Institute of Technology,
A.P, India-521212 A.P, India-521212

Abstract:- In this study, we examine individual I. INTRODUCTION


insurance amounts using health data. The performance
of these algorithms has been compared using the three The goal of this exercise is to examine various features
regression models employed in this study: multiple linear to see how they relate to one another and to plot a multiple
regression, decision tree regression, and decision tree linear regression based on characteristics of people like age,
regression. The dataset is used to train the models, and physical or family condition, and location against their
the training then assists in producing more predictions. current medical expenses in order to predict future medical
Later, the model will be tested and verified by expenses of people and assist medical insurance companies
comparing the anticipated quantity with the actual data. in determining how much to charge for coverage. The
These models' accuracy levels will then be compared. project's primary objective is to provide people with a
The decision tree and linear regression are outperformed general sense of the amount needed based on their personal
by the random forest regression algorithm, according to health status. They can then adhere to any health insurance
the analysis. It enables a person to understand the company's policies and benefits while taking the anticipated
required amount based on their health situation. They funding from our project into consideration. This can assist
might examine any health insurance company, their someone in concentrating more on the health-related aspects
plans, and the benefits while keeping in mind the of insurance as opposed to the pointless ones. These days,
anticipated amount from the project. Later, the health insurance is virtually always required, and almost all
predicted amount will be compared with the real of them are connected to a public or commercial health
amount. This can also be quite beneficial to someone insurance organisation. The variables affecting the cost of
who wants to concentrate more on the useful aspects of insurance vary from business to business. Additionally,
insurance than the health-related ones. In addition, most relatively few individuals in rural areas are aware that the
people are susceptible to being duped regarding the cost Indian government offers free health insurance to those who
of insurance and may unnecessarily purchase expensive fall below the poverty level.
medical coverage. This project does not provide the
precise sum needed by any health insurance provider, It is a highly complicated process, and some rural
but it does provide a general sense of the sum needed by residents opt to forgo health insurance altogether or only
an individual for their personal health insurance. purchase a small amount of private coverage. In addition,
Prediction is inaccurate and does not apply to any consumers can be duped into purchasing expensive health
organization; therefore, it should not be the only factor insurance unnecessarily if they are given false information
considered when choosing a health insurance plan. First, about the cost of the policy. This project does not provide
estimating the cost of health insurance is extremely the precise sum needed by any health insurance provider,
beneficial and helps in better examining the amount but it does provide a good indication of the cost associated
required so that a person can be confident that the with a person's own health insurance.
amount he or she is going to justify It can also provide
you with a wonderful idea for maximizing your health Prediction is unreliable and ungoverned by any
insurance profits. corporation, so it should not be the only factor considered
while choosing a health insurance policy. Early health
Keywords:- Health Insurance Premium Prediction, Linear insurance cost estimation might help with the required
Regression, Decision Tree Regression, Multiple Regression amount. Where a person can ensure that the amount, they
Algorithm, Machine Learning, Python, Deep Learning, choose is appropriate. Additionally, it can give guidance on
Insurance Amount Prediction, Random Forest Regression how to get more advantages from health insurance. This
Algorithm. study aims to provide a person with an understanding of the

IJISRT23MAY741 www.ijisrt.com 726


Volume 8, Issue 5, May – 2023 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165
required amount based on their personal health situation. improve the healthcare system by enhancing disease
Later, customers can look at any health insurance provider diagnosis or tailoring therapy. In security applications, such
and their plans and advantages while taking the anticipated as examining and analysing email or internet usage, machine
funding from our initiative into consideration. learning [Fig. 2] can also be used. These and other
technological applications need to be investigated right
Large-scale data acceleration has been proposed using away, and steps will be taken to ensure that they are used for
a variety of methods. For the greatest performance in a the benefit of society. While machine learning is distinct
variety of applications, some clustering techniques have from robotics, there are several areas where they intersect.
been developed over the past few decades. There are
numerous distinct rule discovery algorithms, thanks to
earlier research. The many clustering algorithms can be
divided into four categories: hybrid k-means, Parkinson's
disease, situation understanding for intelligent online
learning platforms, and artificial neural networks. Based on
how closely they relate to one another, the initial centroids
of the K-means clustering algorithm are generated at
random from k points. Directional information is contained
in the circular k-means (CK-means) cluster vectors. A new
competitive k-means algorithm was discovered through
research to address the inconsistent outcomes of traditional
k-means, which scale poorly for very large data sets. In
order to determine the ideal number of clusters for the k-
means algorithm, Kumar conducted research to present a
taxonomy of clustering techniques. He also investigated
combining machine learning and intelligent systems with k- Fig 2 Machine Learning
means. [Fig.1]
 Deep Learning:
II. TECHNOLOGIES USED Machine learning, which is essentially a neural
network with three or more layers, is divided into deep
learning and machine learning. These neural networks aim
to mimic how the human brain operates by making it
capable of quickly learning from massive volumes of data.
The accuracy of predictions made by a neural network with
a single layer can still be improved and refined with the
inclusion of hidden layers. Deep learning powers a variety
of artificial intelligence programmes and tools that advance
automation by carrying out mental and physical tasks
without the need for human intervention. Technology based
on deep learning extends beyond common goods and
services. [Fig.3]

Fig 1 Technologies Used

 Machine Learning:
Internet search engines, spam-sniffing email filters,
websites that offer personalised advice, banking software
that can spot unusual transactions, and numerous mobile
apps that use voice recognition are all examples of
applications that use machine learning. A subfield of
artificial intelligence and computer science called "machine
learning" uses data and algorithms to mimic how people Fig 3 Deep Learning
learn while slightly increasing the accuracy of the results.
These days, the technology has a wide range of potential  Python:
applications, some of which have higher stakes. Future Guido van Rossum created Python, a high-level,
developments will significantly affect this society and could interpreted programming language. It was primarily created
support the UK economy. to offer a language with a very simple and easy grammar
that is easy to read and understand. Numerous programmers
Machine learning, for instance, can give us readily started to gradually cling to Python for coding because of
available "personal assistants" to help us manage our lives; the language's shorter codes and ease of writing.
it might significantly improve the transportation system by Additionally, it contains built-in features and can work as
utilising autonomous vehicles; and it could also significantly procedural, functional, or object-oriented programming.

IJISRT23MAY741 www.ijisrt.com 727


Volume 8, Issue 5, May – 2023 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165
Additionally, it is a platform-neutral programming language. III. SOFTWARE REQUIREMENTS
As a result, it is free and open source, has a large library of SPECIFICATION
support, can be used for a wide range of tasks, and many
programmers find it to be far simpler to learn and use than SRS is a comprehensive description of how the system
many other languages. Additionally, it features built-in should function. It is typically approved at the conclusion of
memory management strategies and exception handling. It is the requirements engineering phase. It outlines how
short, dense, and dynamically typed; thus, there are no software systems will communicate with all internal
declarations. Indentation is the most important aspect of hardware and modules, as well as with other programmes
Python since it controls how statements flow. Artificial and human users, in a variety of situations that are similar to
intelligence is a feature of Python [Fig. 4] that makes it real-world ones.
useful in a variety of industries. Additionally, it serves as the
foundational language for the Raspberry Pi and is used in
game development and information security. The best
programming language is Python, which is great. Although
it is very easy to read, it is also incredibly forceful.

Fig 6 SRS
Fig 4 Python
 Reliability:
 Random Forest Regression: It is more reliable. It can perform both regression and
Among the supervised learning techniques, Random classification tasks easily. A random forest brings out good
Forest is a well-known machine learning technique. In predictions that can be understood easily. It can handle huge
machine learning, it is applied to problems involving datasets efficiently. This algorithm provides a higher level
regression as well as classification. It is based on the idea of of accuracy in predicting the outcomes over the decision tree
ensemble learning, which is a method of combining algorithm. [Fig.6]
different classifiers to solve a difficult problem and enhance
the performance of the model.  Quality: The quality of this project is good and it is very
efficient.
As suggested by its name, Random Forest is a
classifier that uses several decision trees on different subsets  Maintainability:
of the input dataset and averages the results to increase the Maintenance of software will be clean and done by the
dataset's predicted accuracy. Instead of using a single administrator keeps the information safe without any failure
decision tree for planning, the random forest uses forecasts or error.
from each tree and predicts the outcome based on which
predictions received the most votes.  Efficiency: It would be more efficient for users to use it.
It provides a good prediction for health insurance.
Therefore, for forecasting the cost of health insurance,
random forest regression [Fig. 5] outperforms linear,  Portability: It should be portable on any system.
multiple, and decision tree regression algorithms.
 Performance: Performance is good and efficient
because it would have done a good work to the users.

IV. EXISTING SYSTEM

A linear regression, multiple regression, and decision


tree regression methodology is used in the existing system to
predict health insurance premiums. Linear regression is a
very sensitive tool to use in the current system. It could have
an impact on those using the system. In addition, a
protracted and difficult analysis and calculation process is
involved. In the current system of predicting health
insurance premiums, it also does not fit complex datasets
Fig 5 Random Forest properly.

IJISRT23MAY741 www.ijisrt.com 728


Volume 8, Issue 5, May – 2023 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165
 Linear Regression: On behalf of relying on one decision tree, the random forest
The primary purpose of linear regression analysis is to takes the predictions from each tree and, based on that,
forecast the value of a variable based primarily on the value predicts most of the predictions, and it also predicts the final
of another variable. The dependent variable is the one that output. Therefore, random forest regression outperforms
needs to be predicted. The variable for which you are linear, multiple, and decision tree regression in terms of
attempting to predict the value of the other variable is accuracy in predicting the cost of health insurance.
referred to as the independent variable. The mathematical
formula used in linear-regression models is simple to  Advantages of Proposed System:
understand and can be used to make predictions. The
business world and academic research both greatly benefit  The proposed system employs the Random Forest
from the use of linear regression. algorithm, which is capable of both classification and
regression tasks.
 Multiple Regression:  It makes accurate and reliable predictions here.
A single dependent variable can be analyzed in  It is simple to comprehend.
relation to a few independent variables using the statistical  In this case, it can effectively handle big and large
technique known as multiple regression. The main goal of datasets.
multiple regression analysis is to use known independent  The random forest algorithm in the proposed system
variables to predict the value of a single known dependent offers a high level of accuracy.
variable. The use of multiple regression analysis enables  It offers a useful way to handle the missing data and is
researchers to evaluate the significance of each predictor to flexible and simple to use.
the relationship as well as the strength of the relationship  The health care industry primarily uses the random
between an outcome and various predictor variables, forest algorithm.
frequently with the effect of other predictors being
statistically eliminated. VI. SYSTEM ARCHITECTURE
 Disadvantages of Exixting Sysytem:

 The disadvantage in this existing system is it does not


give the exact amount of prediction.
 Its efficiency is very less.
 The prediction process is slow.
 Linear regression is noise and overfitting.
 In this existing system using linear regression is more
sensitive.
 Another algorithm multiple regression is also not good
at predicting the accurate result.

V. PROPOSED SYSTEM

In this project, we suggest a random forest algorithm


to boost system performance. The supervised learning
method includes the very well-known machine learning
algorithm Random Forest. In machine learning, it is used for
both classification and regression issues. Random Forest is a
classifier that uses several decision trees on numerous
subsets of the input dataset and averages the results to
increase the dataset's predictive accuracy. So the algorithm
with the best performance for this task is the random forest
algorithm.

 Random Forest Regression Alogorithm:


An algorithm for machine learning called Random
Forest primarily uses the supervised learning approach. It is
primarily used for classification and regression issues in
machine learning. It is largely based on the idea of ensemble
learning, which is the process of combining various
classifiers to solve a very complex problem and enhance the
performance of the model. As the name indicates, Random
Forest is a classifier that consists of a few decision trees on
many different subsets of the given dataset and takes the
average to improve the predictive accuracy of that dataset. Fig 7 System Architecture

IJISRT23MAY741 www.ijisrt.com 729


Volume 8, Issue 5, May – 2023 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165
[9]. Han, Kimyoung, Minho Cho, and Kihong Chun.
VII. FUTURE SCOPE Determinants of health care expenditures and the
contribution of associated factors: 16 cities and
The health insurance premium prediction for the future provinces in Korea, 2003-2010 Journal of Preventive
provides the precise predicted amount, allowing people to Medicine and Public Health 46, no. 6 (2013): 300.
easily understand their status. The prediction of premium [10]. Sharma, Ashish, Ashish Sharma, and Anand Singh
amounts focuses more on an individual's health than on the Jalal. "Distance-based facility location problem for
policies of other insurance companies. These models can be fuzzy demand with simultaneous opening of two
used to predict the precise amount of data that will be facilities." International Journal of Computing
collected in the upcoming years. Additionally, it may aid Science and Mathematics 9.6 (2018): 590-601.
insurance companies in operating. [Fig.7] [11]. Singh, Anshy, Shashi Shekhar, and Anand Singh
Jalal. "Semantic based image retrieval using multi-
VIII. CONCLUSION agent model by searching and filtering replicated web
images." 2012 World Congress on Information and
To forecast changes in insurance based on attributes, Communication Technologies. IEEE, 2012.
this project employs a variety of machine learning [12]. Shekhar, Shashi, et al. "A WEBIR crawling
regression models. A health insurance company billed each framework for retrieving highly relevant web
user, and this model is used to forecast the insurance claim documents: evaluation based on rank aggregation and
each user will submit. Businesses' efficiency will increase as result merging algorithms." 2011 International
a result of this. The model can handle a huge amount of Conference on Computational Intelligence and
data. I hope you enjoyed reading this article on predicting Communication Networks. IEEE, 2011.
health insurance premiums using machine learning and the [13]. Varun K L Srivastava, N. Chandra Sekhar Reddy, Dr.
random forest regression algorithm. To make health Anubha Shrivastava, "An Effective Code Metrics for
insurance operations simpler, artificial intelligence and Evaluation of Protected Parameters in Database
machine learning are very capable of analysing and Applications", International Journal of Advanced
estimating large volumes of data. For insurers, the predicted Trends in Computer Science and Engineering,
health insurance premiums will result in time and cost Volume 8, No.1.3, 2019.
savings. Our model had a 92.72% accuracy rate. doi.org/10.30534/ijatcse/2019/1681.32019
[14]. Prasad, K.S., Reddy, N.C.S. & Puneeth, B.N. A
REFERENCES Framework for Diagnosing Kidney Disease in
Diabetes Patients Using Classification Algorithms.
[1]. F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, SN COMPUT. SCI. 1, 101 (2020)
B. Thirion, O. Grisel, et al., "Scikitlearn: Machine
Learning in Python," Journal of Machine Learning BIOGRAPHIES
Research, vol. 12, pp. 2825-2830, Oct 2011.
[2]. E. Wang; and G. Gee, "Larger Issuers, Larger
Premium Increases: Health insurance issuer
competition post-ACA," 2015
[3]. C. C. a. A. Semanskee, "Analysis of UnitedHealth
Group’s Premiums and Participation in ACA
Marketplaces," 2016.
[4]. Ng, "Machine learning for Housing Price Prediction
Mobile Application," Masters, Department of
Computing, Imperial College London, 2015 V. Srinivasa Rao is currently working as a Asst.
[5]. Sturm, Roland. The effects of obesity, smoking, and Professor in the Department of Information technology at
drinking on medical problems and costs. Health NRI Institute of technology, Pothavarappadu, Agiripalli,
affairs 21, no. 2 (2002): 245-253. Krishna(dist), India.
[6]. Sturm, Roland, Ruopeng an, Josiase Maroba, and
Deepak Patel. The effects of obesity, smoking, and
excessive alcohol intake on healthcare expenditure in
a comprehensive medical scheme. South African
Medical Journal 103, no. 11 (2013): 840-844
[7]. Kim, David D., and Anirban Basu. Estimating the
medical care costs of obesity in the United States:
systematic review, meta-analysis, and empirical
analysis. Value in Health 19, no. 5 (2016): 602- 613.\
[8]. Han, Kimyoung, Minho Cho, and Kihong Chun.
Determinants of health care expenditures and the
M. Iswarya is currently studying B. Tech with
contribution of associated factors: 16 cities and
specification of Information Technology in NRI Institute of
provinces in Korea, 2003-2010 Journal of Preventive
Technology. She done a mini project Health insurance
Medicine and Public Health 46, no. 6 (2013): 300.
premium prediction.

IJISRT23MAY741 www.ijisrt.com 730


Volume 8, Issue 5, May – 2023 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165

SK. Ameer Hamza is currently studying B. Tech with


specification of Information Technology in NRI Institute of
Technology. He had done a mini project Health insurance
premium prediction.

B. Satish is currently studying B. Tech with


specification of Information Technology in NRI Institute of
Technology. He had done a mini project Health insurance
premium prediction.

IJISRT23MAY741 www.ijisrt.com 731

You might also like