Professional Documents
Culture Documents
ISSN No:-2456-2165
Abstract:- The objective of this study was to use and that revolve around a gravitation center of the black hole[1].
compare multi- ple classifying models that can be used Quasars are quasi-stellar objects which emit electromagnetic
for classifying as- tronomical data and was tested upon radiation more potent than the luminosities of the galaxies,
data obtained from Sloan Digital Sky Survey: Data combined [1]. There have been numerous large scale survey
Release-16. Various Classi- fying models have been catalogs that have been done to map the universe and the
trained and tested by dividing the data into two parts- 80 celestial objects present in it. The most important surveys
per cent of the data was then used for training purposes are the Sloan Digital Sky Survey (SDSS), which commenced
and 20 per cent for testing. In order to achieve the task of observations of the universe in 1998 [2]. There have been
classifying the tabular data consist- ing of spectroscopic four major phases of this survey with multiple data
and photometric parameters effectively, the study was not releases(5’th phase going on in 2022). The information
just limited to usage of indiviual models. Stacking : the captured by the SDSS survey includes optical, spectroscopic,
combination of multiple Classifying mod- els has also and photometric information, along with an array of other
been implemented. Multiple stacking models were observations. Here we use the data of SDSS from Data
created for the same .Stacking models have on mul- tiple release 14 which was made available in 2017 [sdss.org]. Our
occasions proven to have higher evaluation metrics , thus objective in this project is to compare multiple classifying
having significantly better performance than any in- machine learning models and determine the best classifier
dividual classifier, proving that stacking is a better choice among them. Here we only use the spectroscopic and
to classify data. certain Individual models such as Bag- photometric information of the SDSS DR-14 dataset. The
ging , Hard Voting etc have been found to have comparable objectives are as follows:-
performance to that of Stacked Models. Box plots for in- To perform Exploratory Data Analysis on the SDSS
diviual classes were also plotted to compare and determine dataset and to tidy the data. Also, to count plot & scatter
the models that were capable in identifying a single class of plot the data for data visualization.
stellar objects. The models from this study could be used To select multi-class classifier models like DecisionTree,
as a reliable classification tool for a wide variety of astro- LogisticRegression, Stacked, Boosted etc., which can be
nomical purposes to accelerate the expansion of the sample used to classify the data
sizes of stars, galaxies, and quasars. Split the data into 80/20 ratio for training and testing the
data
© 2023 Elsevier Ltd. All rights reserved. Use the training data to train the selected models and then
Keyword:- Classification, Classifiers, Stars, Galaxies, test them.
Quasars. Evaluation Metrics such as Accuracy score, Precision
score, Recall, F1 & Classification report are used to
I. INTRODUCTION compare the tested models with the original test data.
Thus identifying the best performing model
The universe is composed of various objects of different Box Plotting the evaluation metrics with classification
shape, size and color. In order to understand the universe we first models to get a better visualization of the performance of
need to classify the objects that make it up. For centuries, the Models
astronomers have been observing and studying the sky to
understand what kinds of objects were in the universe. From A. Background
the ancient to the present day, humanity has created thousands of Many Large Surveys of the universe have been done over
different astronomical catalogs. The goal of all of them is to for a while. Amongst the most popular surveys which
collect observations of astronomical objects made with one or capture information about the celestial objects in the universe
more instruments and to combine them into a unique is the Sloan Digital Sky Survey (SDSS) [2]. Machine learning
homogeneous description. This enables anybody interested in and Deep learning architectures are being continually designed
the study of a given class of sources to compare their and utilized in many large-scale astronomical surveys. Both
properties on an equal basis. Now, with more extensive and supervised, and unsupervised Machine Learning is used for
higher quality catalogs, we can perform this study in a better classification but supervised Machine Learning has proven to
way. Stars, Quasars and Galaxies are the most commonly be superior for the task [7]. CNN & ANN, like Skynet &
found objects in the universe[1]. A star is an astronomical AstroNN, are designed and used to survey astronomical data
object comprising a luminous spheroid of plasma held together collected by observatories like APO (Apache Point
by its gravity [wiki]. Galaxies are made of billions of such stars Observatory) [1]. The Javalambre Photometric Local
U G R I Z Redshift
Mean 18.619355 17.371931 16.840963 16.583579 16.422833 0.143726
Std 0.828656 0.945457 1.067764 1.141805 1.203188 0.388774
Min 12.988970 12.799550 12.431600 11.947210 11.610410 -0.004136
25% 18.178035 16.815100 16.173333 15.853705 15.618285 0.000081
50% 18.853095 17.495135 16.858700 16.554985 16.389945 0.042591
75% 19.259232 18.010145 17.512675 17.258550 17.141447 0.092579
Max 19.599900 19.918970 24.802040 28.179630 22.833060 5.353854
Table 2: Dataset Statistics
Actual Predicted
Negative Positive
Negative True Negative False Positive
Positive False Negative True Positive
Table 3: TP, TN, FP, & FN explanation table
True Positive (TP) - The actual value is positive and is IV. EXPERIMENTAL RESULTS
predicted to be positive.
True Negative (TN) - The actual value is negative and is To compute the performance of the classifier models we
predicted to be negative. used evaluation metrics such as accuracy, recall, precision and
False Positive (FP) - The actual value is negative but is F1 score. We also used classification report from sklearn to
predicted to be positive. obtain class-wise evaluations for each model.
False Negative (FN) - The actual value is positive but is
predicted to be negative