You are on page 1of 4

Volume 7, Issue 7, July – 2022 International Journal of Innovative Science and Research Technology

ISSN No:-2456-2165

Prediction of Flood with Real-Time Data Integrating


Machine Learning Models and Scraping Techniques
Shiju E (1), Mr. D. Justin Jose (2)
Department of CSE, MACET,

Abstract:- Floods are quite possibly of the most harming of info information. At long last, support learning interfaces
regular disappointment, which can be perceptibly mind powerfully with its current circumstance and gets positive or
boggling to demonstrate. The examinations on the negative input to further develop execution. Notwithstanding
improvement of flood expectation designs added to peril the model above, you can further develop execution by
decrease, strategy thought, minimization of the deficiency adding the capacity to recover ongoing information from live
of human life, and markdown the effects hurt connected information over the Internet, bringing about the constant
with floods. to copy the convoluted numerical forecast of floods in unambiguous regions.
articulations of substantial strategies of floods, for the
beyond two quite a while, brain local area techniques II. RELATED WORK
contributed rather inside the improvement of expectation
frameworks offering better execution and practical A. Prediction of Flood Using Radial Basis Function (RBF)
arrangements. To save you this problem to foresee using Internet of Things (IoT)
regardless of whether a flood happens through ANN had been prepared with the information of water
precipitation dataset it looks at the brain network-based levels and information of precipitation, this is utilized to
procedures. The investigation of the dataset with the foresee water level and day to day precipitation of the
guide of Multi-Layer Perceptron Classifier (MLP) to following month. The boundaries used to get the least blunder
catch various data like variable personality, missing cost in the forecast course of level of water, precipitation with the
cures, insights approval, and realities cleaning/planning best outspread premise capability brain network utilize
may be finished at the total given dataset. To generally multiple times cycles and utilize the learning rate that is
speaking execution in forecast of flood occur or presently equivalent to 0. 00007. Here the Radial Basis Function is been
not by exactness estimation with appraisal type record, utilized to anticipate the flood. The information was gotten
find the disarray grid and the aftereffect of this shows that from Citarum River Hall. The outcome from Radial Basis
the viability of the GUI basically based programming Function Neural Network is shipped off an android
utilizing given ascribes. Notwithstanding the above application that shows the chance of flooding. Involving age
model, we increment the presentation by adding a however much 700 gives 0.027 as the mistake worth of TMA
component that gets the constant information from the and 0.002 as the blunder worth of CH, a learning pace of
live information through the web and the outcome would 0.00007 gives 0.286 as the mistake worth of TMA and 0.002
be a continuous expectation of flood in some random as the blunder esteem CH, and a secret neuron of 2 gives
region. 0.6483 as mistake worth of TMA and 15.999 as the blunder

Keywords:- Dataset, Python, Preprocessing, MLP Classifier, B. Predicting flood with the use of Multi-Layer ANN in
Web Scrapping. Monitoring System Along With The Rain Gauge, Sensor
of Soil Moisture
I. INTRODUCTION This research requires the implementation of a real-time
monitoring system capable of measuring parameters such as
AI expects to anticipate the future from past rainfall intensity, soil moisture, water level and rate of water
information. AI (ML) is a sort of man-made consciousness rise. Various sensors are integrated into the system to record
(AI) that permits PCs to learn without being expressly and store data. A prediction model based on multilayer
modified. AI centers around creating PC programs that can artificial neural networks was developed and tested in a real-
change when presented to new information and the rudiments world setup. In this study, we examined the response of
of AI and carries out basic AI calculations utilizing Python. hierarchical network models. The flood prediction model
The preparation and forecast process includes the utilization showed an RMSD of 2.2648, slightly off from the actual
of exceptional calculations. Preparing information is shipped water level. This was a big problem in the Philippines as
off the calculation, which utilizes this preparing information it caused property damage, infrastructure damage and damage
to make expectations about new test information. There are and even loss of life. Current systems address problem
three classes of Machine learning, in particular regulated solving and prevent catastrophic flood disasters. A multilayer
learning, solo learning, and support learning. Managed artificial neural network using MATLAB was used to develop
learning programs get both information, and legitimate the predictive model. The network fit very well in training,
naming of learning information should be pre-marked by testing, validation, and the overall data set. Specifically, it
people. Solo learning isn't a name. Given to the learning was 0.99889 for the training dataset, 0.99362 for the test
calculation. This calculation needs to figure out the grouping

IJISRT22JUL1550 www.ijisrt.com 1619


Volume 7, Issue 7, July – 2022 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165
dataset, 0.99764 for the validation dataset, and 0.97952 for all B. Creating a predicted variable using the range of rainfall:
data in the dataset. A validation dataset is a sample of data retained during
model training and used to estimate model features during
C. Technique of Optimal Web scraping. model tuning, by analyzing univariate, bivariate, and
Data entry is one of the most tedious tasks requiring a multivariate processes. It is validated and tested when the
lot of human resources to create structured data from inputs. model is prepared for evaluation or data. B. Rename specific
The large amount of data entering the system can contradict datasets and remove columns. Data cleansing procedures and
the original data and cause confusion. This is especially true techniques vary from dataset to dataset. The main purpose of
if you need to collect data from image files. In this paper, we data cleaning is to detect and eliminate errors and anomalies
propose a text recognition system that can be used to in order to increase the value of data in analysis and decision
automatically recognize text from images and update it with making. Data visualization provides an important set of tools
a target file. The proposed method accepts a web URL as for qualitative understanding. This is useful for exploring and
input and uses web scraping techniques to retrieve text or training datasets, identifying patterns, corrupted data,
images. The system extracts text data from user-defined outliers, and more. With a little bit of expertise, data
ranges. Additionally, the extracted text is classified using a visualization can be used to represent and illustrate key
support vector machine (SVM) and a simple Bayesian relationships in charts and graphs that are more intuitive and
classifier. Output is saved in Google Sheets, CSV, PDF, Text, relevant than measuring relevance and importance. Data
or Excel format, depending on user selection. State-of-the-art visualization and exploratory data analysis are separate areas,
text recognition models such as PyTesseract, PyOCR, and and some of the books listed at the end are worth reading.
TesseOCR are compared using metrics such as accuracy,
precision, and execution speed. Experimental results show Data may not make sense until it can be presented in a
that PyTesseract provides 83.45% accuracy and 75.55% visual form. With charts and numbers. Being able to quickly
accuracy. The performance of support vector machines visualize things like data samples is a skill in applied statistics
(SVMs) and naive Bayesian classifiers are compared. 92.08% or machine learning. Discover the different types of charts
accuracy. The recall of this is 90.148% and for the classifier you need to know about when visualizing data in Python and
algorithm Naïve Bayes classifier. how you can use them to better understand your data.

III. METHODS C. Performance measurement of ML algorithm

A. Data Validation, Cleaning or Preparing  Logistic Regression


Loads the specified data set and imports the library It is a statistical technique that is used to analyze a data
package. Analyze variable identification according to data set using one or more independent features to influence the
format and data type and evaluate missing and duplicate outcome. Outcomes are measured using dichotomous
values. A validation dataset is a sample of data retained from variables (only two possible outcomes). The goal of logistic
training a model and used to estimate the capabilities of the regression is to find the best model to describe the
model while optimizing the model and procedure. This can relationship between a dichotomous characteristic of interest
be used to test datasets and validate models as they are (dependent variable directly responsible for the response or
evaluated and used optimally. Data cleaning/preparation such outcome variable) and a set of independent variables
as renaming the given dataset and removing columns. (predictor or explanatory variables). Logistic regression is a
Analyze univariate, bi-variate, and multi-variate processes. machine learning classification algorithm used to predict the
Data cleansing techniques vary depending on the data set. The probability of categorical dependent variables. In logistic
purpose of data cleansing is to identify and extract errors and regression, the dependent variable is a binary variable with
anomalies to increase the value of data for analysis and data encoded as 1 (yes, success, etc.) or 0 (no, failure, etc.).
decision making.
 Support Vector Machines
 Data processing: A classifier that classifies a data set by determining the
Data preprocessing refers to transformations applied to best hyperplane between the data. We chose this classifier
data before it is sent to an algorithm. Data preprocessing is a because the number of applicable kerning functions is very
technique used to transform raw data into a clean data set. diverse and this model can achieve high predictability.
This means that the data collected from various sources are Support vector machines are probably one of the most
collected in raw form and are not useful for analysis. To get popular and discussed machine learning algorithms he. These
better results from models applied by machine learning were very popular when they were developed in the 1990s
techniques, data must be in the right format. Some specific and remain a method of selecting powerful algorithms with
machine learning models require specific forms of only minor tweaks.
information. For example, the random forest algorithm does  How to resolve multiple names used to refer to support
not support null values. Therefore, to run the random forest vector machines.
algorithm, we need to manage null values from the original  Representation used by SVM when actually saving the
raw data set. model to disk.
 Predict new data using the trained SVM model
representation.

IJISRT22JUL1550 www.ijisrt.com 1620


Volume 7, Issue 7, July – 2022 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165
 How does the model learns using the training data.
 Prepare data that are optimal for the SVM algorithm.
 Where to find more information about SVM.

 K-Nearest Neighbor (KNN)


The KNN method is a supervised machine learning
algorithm that stores all instances corresponding to training
data points in n-dimensional space. When it receives
unknown discrete data, it analyzes the nearest number of k-
nearest neighbors (nearest neighbors) and returns the most
common class as a prediction, and for real-valued data, it
returns the average of k-nearest neighbors. The distance-
weighted nearest neighbor algorithm uses the following query Fig 1: MLP Classifier
to weight each contribution in the k-nearest neighbors
according to the distance. This gives a large weight to the A perceptron is a very simple learning machine. I might
nearest neighbors. get some input. Each input has a weight that indicates its
importance, producing a '0' or '1' output decision. However, it
ANNs typically average over the k nearest neighbors, so combines with many other perceptrons to form artificial
they are robust to noisy data. The k-nearest neighbor neural networks. Given enough training data and
algorithm is a classification algorithm and is supervised. Get computational power, a neural network can theoretically
a set of labeled points and use them to learn how other points answer any question.
are labeled.

To label a new point, find the closest labeled points


(nearest neighbors) to the new point and have those points
vote so that the label with the most neighbors becomes the
label for the new single point . (where 'K' is the number of
neighbors checked). Use the entire training set to make
predictions on the validation set. ANN makes predictions
about new instances by finding the k "closest" instances
across the set. "Accessibility" is determined using
approximate (Euclidean) measurements of all features.

D. Performance of the MLP classifier


MLPs (multilayer perceptrons) are a class of feed-
forward artificial neural networks (ANNs). The term MLP
refers either to feed-forward ANNs or strictly to multi-level
("threshold-activated") networks of perceptron. used loosely
Fig 2: Input and Output of perceptron
and sometimes loosely. Perceptrons are sometimes
colloquially referred to as neural networks, especially when
A multi-layer perceptron (MLP) is a perceptron that
there is only one hidden layer. Layered perceptron can only
solves complex problems in combination with additional
learn linear functions, Multilayer perceptrons can also learn
perceptrons stacked in multiple layers. Each perceptron in the
nonlinear functions. An MLP consists of at least three layers
first layer (input layer) on the left sends an output to all
of nodes: an input layer, a hidden layer, and an output layer.
perceptrons in the second layer (hidden layer), and all
Each node, with the exception of the input node, is a neuron
perceptrons in the second layer send output to the last layer
with a nonlinear activation function. MLP uses a supervised
(output) on the right. Send output to I will send it to you.
learning technique called backpropagation for training. Its
layer). Each perceptron sends multiple signals, one to each
complexity and nonlinear activation distinguish MLPs from
perceptron in the next layer.
linear perceptrons. You can distinguish between data that are
not linearly separable.
E. Web-Scraping of weather data
Extract of data from the Internet or websites using
automated processes. Data on the website is unstructured.
Web scraping helps collect this unstructured data and store it
in a structured format. There are many ways to scrape a
website, including online services, APIs, and writing your
own code. This article describes how to implement web
scraping in Python.

IJISRT22JUL1550 www.ijisrt.com 1621


Volume 7, Issue 7, July – 2022 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165
[5]. F. Dötzer, “Privacy issues in vehicular ad hoc
networks,” in Proc. Int. Workshop Privacy Enhancing
Technol., May 2005, pp. 197–209.
[6]. J. R. Douceur, “The sybil attack,” in Proc. Int.
Workshop Peer-to-Peer Syst., 2002, pp. 251–260.
[7]. M. Mousa, X. Zhang, and C. Claudel, “Flash flood
detection in urban cities using ultrasonic and infrared
sensors,” IEEE Sensors Journal, vol. 16, no. 19, pp.
Fig 3: Web Scraping 7204–7216, 2016.

IV. CONCLUSION

The analytical process began with data cleansing and


processing, missing values, exploratory analysis, and finally
modeling and evaluation. Finally, it uses machine learning
algorithms to predict flash floods and produce different
results. Therefore, the best result is the MLP algorithm
(97.40%). This gives us the following insights about flood
forecasting: Specifically, the main purpose of this project is
to extract data from live weather data, use web scraping
techniques to extract daily precipitation from weather data on
a weather website, and then extract that specific It is to extract
rainfall. day of use. Designed to predict floods using volume
data.

In 3.3, you can predict flooding using multiple different


algorithms like SVM, Logistic Regression, and KNN
algorithms. Each algorithms mentioned above gives different
accuracy levels.

In 3.4 We can see that the MLP classifier gives the


highest prediction accuracy. Therefore, MLP is the best
algorithm for flood forecasting. Also in 3.5 you can see
information about web scraping. It can be used to extract live
precipitation data from forecasts and weather websites. As a
further improvement, it can also be automated on a regular
basis so that the model automatically extracts data from
websites and makes predictions.

REFERENCES

[1]. Febus Reidj G. Cruz , Matthew G. Binag , Marlou Ryan


G. Ga , Francis Aldrine A. Uy. Flood Prediction Using
Multi-Layer Artificial Neural Network in Monitoring
System with Rain Gauge, Water Level, Soil Moisture
Sensors, IEEE,28-31Oct.2018,DOI:
10.1109/TENCON.2018.8650387

[2]. Roopesh N, Akarsh M S, C. Narendra Babu. An Optimal


Data Entry Method, Using Web Scraping and Text
Recognition 2021 International Conference on
Information Technology (ICIT) | 978-1-6654-2870-
5/21/$31.00 ©2021 IEEE | DOI:
10.1109/ICIT52682.2021.9491643
[3]. H. Hartenstein and L. P. Laberteaux, “A tutorial survey
on vehicular ad hoc networks,” IEEE Commun. Mag.,
vol. 46, no. 6, pp. 164–171, Jun. 2008.
[4]. B. Parno and A. Perrig, “Challenges in securing
vehicular networks,” in Proc. Workshop Hot Topics
Netw. (HotNets-IV), MD, USA, Nov. 2005, pp. 1–6.

IJISRT22JUL1550 www.ijisrt.com 1622

You might also like