You are on page 1of 8

Volume 8, Issue 2, February – 2023 International Journal of Innovative Science and Research Technology

ISSN No:-2456-2165

Classification of Stars, Galaxies and Quasars


Vansh Teotiaa,b, Praveena Melepattea,c, Poojan Upadhyaya,d, Saksham Chandnaa,e, Dr Aditi Pandaf
a
Research Intern, Spartificial Innovations Private Limited, Indira Colony, Kotkasim, 301702, Rajasthan, India
b
Department of Physics , M6M5+639, University of Delhi, New Delhi, 110007, New Delhi, India
c
Department of Physics, ARSD ,Ring Rd, Dhaula Kuan Enclave I, Dhaula Kuan, New Delhi, 110021, New Delhi, India
d
Department of Physics, KSV university Sector 15, Gujarat, 382016, Gujarat, India eDepartment of Physics, Indian
Institute of Science Education and Research, Bhauri Bypass Rd., Bhopal, 462066, Madhya Pradesh, India
f
Department of CSE, J-15, Khandagiri Marg, Dharam Vihar, Jagamara, Bhubaneshwar, 751030, Odisha, India

Abstract:- The objective of this study was to use and that revolve around a gravitation center of the black hole[1].
compare multi- ple classifying models that can be used Quasars are quasi-stellar objects which emit electromagnetic
for classifying as- tronomical data and was tested upon radiation more potent than the luminosities of the galaxies,
data obtained from Sloan Digital Sky Survey: Data combined [1]. There have been numerous large scale survey
Release-16. Various Classi- fying models have been catalogs that have been done to map the universe and the
trained and tested by dividing the data into two parts- 80 celestial objects present in it. The most important surveys
per cent of the data was then used for training purposes are the Sloan Digital Sky Survey (SDSS), which commenced
and 20 per cent for testing. In order to achieve the task of observations of the universe in 1998 [2]. There have been
classifying the tabular data consist- ing of spectroscopic four major phases of this survey with multiple data
and photometric parameters effectively, the study was not releases(5’th phase going on in 2022). The information
just limited to usage of indiviual models. Stacking : the captured by the SDSS survey includes optical, spectroscopic,
combination of multiple Classifying mod- els has also and photometric information, along with an array of other
been implemented. Multiple stacking models were observations. Here we use the data of SDSS from Data
created for the same .Stacking models have on mul- tiple release 14 which was made available in 2017 [sdss.org]. Our
occasions proven to have higher evaluation metrics , thus objective in this project is to compare multiple classifying
having significantly better performance than any in- machine learning models and determine the best classifier
dividual classifier, proving that stacking is a better choice among them. Here we only use the spectroscopic and
to classify data. certain Individual models such as Bag- photometric information of the SDSS DR-14 dataset. The
ging , Hard Voting etc have been found to have comparable objectives are as follows:-
performance to that of Stacked Models. Box plots for in-  To perform Exploratory Data Analysis on the SDSS
diviual classes were also plotted to compare and determine dataset and to tidy the data. Also, to count plot & scatter
the models that were capable in identifying a single class of plot the data for data visualization.
stellar objects. The models from this study could be used  To select multi-class classifier models like DecisionTree,
as a reliable classification tool for a wide variety of astro- LogisticRegression, Stacked, Boosted etc., which can be
nomical purposes to accelerate the expansion of the sample used to classify the data
sizes of stars, galaxies, and quasars.  Split the data into 80/20 ratio for training and testing the
data
© 2023 Elsevier Ltd. All rights reserved.  Use the training data to train the selected models and then
Keyword:- Classification, Classifiers, Stars, Galaxies, test them.
Quasars.  Evaluation Metrics such as Accuracy score, Precision
score, Recall, F1 & Classification report are used to
I. INTRODUCTION compare the tested models with the original test data.
Thus identifying the best performing model
The universe is composed of various objects of different  Box Plotting the evaluation metrics with classification
shape, size and color. In order to understand the universe we first models to get a better visualization of the performance of
need to classify the objects that make it up. For centuries, the Models
astronomers have been observing and studying the sky to
understand what kinds of objects were in the universe. From A. Background
the ancient to the present day, humanity has created thousands of Many Large Surveys of the universe have been done over
different astronomical catalogs. The goal of all of them is to for a while. Amongst the most popular surveys which
collect observations of astronomical objects made with one or capture information about the celestial objects in the universe
more instruments and to combine them into a unique is the Sloan Digital Sky Survey (SDSS) [2]. Machine learning
homogeneous description. This enables anybody interested in and Deep learning architectures are being continually designed
the study of a given class of sources to compare their and utilized in many large-scale astronomical surveys. Both
properties on an equal basis. Now, with more extensive and supervised, and unsupervised Machine Learning is used for
higher quality catalogs, we can perform this study in a better classification but supervised Machine Learning has proven to
way. Stars, Quasars and Galaxies are the most commonly be superior for the task [7]. CNN & ANN, like Skynet &
found objects in the universe[1]. A star is an astronomical AstroNN, are designed and used to survey astronomical data
object comprising a luminous spheroid of plasma held together collected by observatories like APO (Apache Point
by its gravity [wiki]. Galaxies are made of billions of such stars Observatory) [1]. The Javalambre Photometric Local

IJISRT23FEB329 www.ijisrt.com 432


Volume 8, Issue 2, February – 2023 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165
Universe Survey (J-PLUS) is also one of the surveys designed B. Data
to observe several thousand square degrees in optical bands The data used in this study are from the Sloan Digital
[5]. Some other databases used for such classification Sky Survey (SDSS), which is a leading astronomical survey
instead of SDSS are Gaia, WISE and UKIDSS [6]. There is a that has been working for more than 20 years to produce
significant increase in research works related to stellar spectra extremely precise and detailed imaging and maps of the
detection and classification. Many researchers focused on universe. This public dataset, Data Release 14, is the second
star-quasar, galaxy-quasar or star-galaxy binary classification. release of the fourth phase of the survey and had observations
Others focused on multi-class classification of stars, galaxies and through July 2016[7]. It contains 18 variables with 10,000
quasars. In these works, various methods have been applied to total entries and no missing values and has been extracted
automatically classify the heavenly bodies accurately [6]. Many from the SDSS public server using SQL query. 11 Variables
authors used classical machine learning algorithms such as (location of an object on the celestial sphere, the field/area
support vector machines (SVM), k-nearest neighbors of pixels in the image taken, info and specifications on the
(KNN), DecisionTree (DT), XGBoost, RandomForest (RF) spectroscopy, optical fiber, etc.) have been removed since they
etc. [1][2][3]. Others adopted deep learning techniques or are not contributing towards classification in any way[8]. The
developed their own novel solution. Other authors use data descriptions of the remaining 6 feature variables and 1 class
released from surveys like VEXAS and use an Ensemble of variable (Camera; Measures Of Flux And Magnitude;
classifiers like KNN, ANN & CatBoost to get a better result Redshifts, The Photometric Camera and the CCDs;
for classifying stellar objects [4]. Understanding SDSS Imaging Data; Understanding the
Imaging Data), their first 10 entries, and their statistics are
shown in Tables below:

U The intensity of light (flux) with a wavelength of 3 5 5 1 Å emitted by the object


G The intensity of light (flux) with a wavelength of 4 6 8 6 Å emitted by the object
R The intensity of light (flux) with a wavelength of 6 1 6 6 Å emitted by the object
I The intensity of light (flux) with a wavelength of 7 4 8 0 Å emitted by the object
Z The intensity of light (flux) with a wavelength of 8 9 3 2 Å emitted by the object
Measurement of how fast the object is moving away relative to Earth. A result
Red Shift of Doppler’s Effect: light emitted from an object moving away increases in wavelength and
shifts to the red end of the light spectrum
Class Classification of the object as star, galaxy and Quasar
Table 1: Description Of Variables

C. Exploratory Data Analysis


Exploratory data analysis is the initial process to analyse the data set through data statistics and graph plots , wherever
applicable. For EDA the libraries of pandas, seaborn, were used. The number of variables , their types, entries, and missing or null
values were examined. Then, the mean, standard deviation, minimum, lower quartile, median, upper quantiles, and maximum of the
feature variables were calculated and organised. Then , a count plot and scatter plot of the dataset were plotted.

U G R I Z Redshift
Mean 18.619355 17.371931 16.840963 16.583579 16.422833 0.143726
Std 0.828656 0.945457 1.067764 1.141805 1.203188 0.388774
Min 12.988970 12.799550 12.431600 11.947210 11.610410 -0.004136
25% 18.178035 16.815100 16.173333 15.853705 15.618285 0.000081
50% 18.853095 17.495135 16.858700 16.554985 16.389945 0.042591
75% 19.259232 18.010145 17.512675 17.258550 17.141447 0.092579
Max 19.599900 19.918970 24.802040 28.179630 22.833060 5.353854
Table 2: Dataset Statistics

IJISRT23FEB329 www.ijisrt.com 433


Volume 8, Issue 2, February – 2023 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165

Table 3: First 10 entries of the dataset

Fig. 1: Quantitative distribution of class-labels in data

Fig. 2: Pair plots between features of dataset.

IJISRT23FEB329 www.ijisrt.com 434


Volume 8, Issue 2, February – 2023 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165
II. REVIEW AND BACKGROUND (RCV), is a ridge classifier with built in cross-validation.
INFORMATION Stochastic Gradient Descent classifier (SGD) is a linear
classifier optimized by SGD, which computes the gradient of
Various algorithms and machine learning models are the loss per sample [11]. We used 3 different variations of the
possible to achieve the goal of classifying the data from SDSS( classifier by changing the penalty parameter, as follows: SGD
Sloan Digital Sky Survey). In this Study we have focused on l1 (penalty = ‘l1’), SGD l2 (penalty = ‘l2’, which is the
Comparing results from Individual models and stacked default) and SGD elastic-net (penalty = ‘elasticnet’).
models to determine the best possible method. Support Vector Machines (SVM) are classifiers that use
thresholds with soft margins to classify the training data into
A. Background clusters and subsequently classify the test data based on
The classification of celestial objects is of great their position relative to the threshold. We used two SVM
significance in the fields of astronomy. Its most direct models with different kernels, which help in non-linear
benefit is providing the means to gather data samples of stars, classification: PSVM (kernel = ‘polynomial) and RBFSVM
galaxies, and quasars. Particularly for quasars, despite how (kernel = ‘rbf’, radial basis function). SVM uses the one-
important they are to a wide range of astronomy studies and versus-one approach for multi-class classifications as a default,
research, their sample sizes are still in the relative minority since they are inherently a binary classifier [9]. Nu-Support
class[1][2]. In order to achieve the increase in sample size , Vector Machine (Nu-SVM), is similar to SVM, but uses a
fast and reliable classifying models are crucial. Modern day different regularization parameter Nu and Linear Support
telescopes have the capability to gather large amounts of Vector Machine (LSVM) is similar to a SVM with a linear
data , leading to the need of fast and reliable classification kernel and works on the liblinear library instead of libsvm
models being emphasised even further[1]. Past Studies have [9]. We used LSVM with 2 penalty parameters: LSVM1
prove that both supervised models have been shown to have (penalty = ‘l1’) and LSVM2 (penalty = ‘l2’, which is the
higher accuracies than unsupervised models, which have default) [9]. Perceptron (PTN), is a linear classifier algorithm
been shown to be more efficient in identifying unknown that acts as an artificial neuron while Multilayer Perceptron
objects[3][4]. Supervised models such as Support Vector, (MLP) is an artificial neural network that works non-linearly
Random forest have been shown to classify the data with [12]. Passive-Aggressive (PA) is also a linear classifier that
accuracies of up to 90 percent in studies to solely identify doesn’t use a learning rate but uses a regularization
quasars or to classify stars , galaxies and quasars[5]. Decision parameter (c = 1.0, default). Calibrated CV (CCV) uses
tree classifier has been shown to be considerably effective than cross validation to determine the parameters and also to
most other supervised classifiers such as Logistic calibrate a classifier (estimator = LinearSVC, default) [9].
Regression[2]. Logistic regression (LR) is a linear clas- sifier, in which the
III. PROPOSED SOLUTIONS probabilities of an outcome are modeled using a logistic
function. In the case of models that are binary classifiers, the
The solution towards the class task devised by us has multi-class dataset is split into binary class subsets. The default
been described below setting for binary classifiers in scikit-learn library for
handling multi-class dataset is, in general, the One-vs-Rest
A. Classifiers approach. Decision Trees (DT) is a classifier that has a tree-
We’ve compared forty three classifiers in this project, like hierarchical structure and uses different functions
including boosting, bagging and ensemble classifiers. Naive (criterion = ‘gini’, default) to find the optimum split. Extra
Bayes (NB) is a probabilistic classifier that is based on the Trees (ET) and Random Forest (RF) are collections of
Bayes’ theorem with conditional dependence. Three variants decision trees with ET using a random split and RF using an
of the classifier, which varied based on their assumptions on optimum split and subsampling the data [13]. We used 2
the distribution of likelihood of features, were used in this models of RF with different functions to calculate the
project: Gaussian Naive Bayes (GNB), Multinomial Naive optimum split: RFE (criterion = ‘entropy’) and RF (criterion
Bayes (MNB) and Bernoulli Naive Bayes (BNB) [9]. Linear = ‘gini’) [9]. Both are extensions of Bagging classifier,
Discriminant Analy- sis (LDA) and Quadratic Discriminant which is an ensemble algorithm which uses multiple versions
Analysis (QDA) are classifiers that use linear and non-linear of the base estimators (estimator = DecisionTreeClassifier,
surfaces to separate the classes [9]. K-nearest neighbor (KNN) default) on data subsets, run them in parallel and then
is an algorithm that classifies test data points by calculating combine their individual predictions to achieve the final
the distance between the training data points and test data outcome. Boosting is an ensemble learning technique that
points, and then finding the probability for the point builds a number of weak classifiers sequentially to produce a
belonging to K nearest neighbors [10]. The value of K can strong classifier [9]. We have used numerous boosting
be changed (n neighbors = 5, default), and the algorithm used classifiers such as Adaptive Booster (AdaB), which identifies
to compute the nearest neighbor can also be varied. We have misclassified data points and adjusts their weights so as to
used three KNN classifiers with three different algorithms for minimize the error, and feeds it to the next sequential classifier;
finding the K nearest neighbors: Brute force (KNNB), Ball Gradient Booster (GB), a boosting algorithm that works on
tree (KNN) and KD tree (KNNKD) [10]. Nearest Centroid reducing the residuals of the predictors of the previous
(NC) is a classifier who assigns the test data to the class whose classifier; Extreme Gradient Booster (XGB), which is a
centroid is nearest to the test data point [9]. The Ridge computation- ally efficient implementation of GB; Light
Classifier (RC) is based on the ridge regression method that Gradient Boosting Machine (LightGBM), which is similar to
uses linear least squares with l2 regularization. Ridge CV XGBoost but different in that it chooses a leaf that will lead

IJISRT23FEB329 www.ijisrt.com 435


Volume 8, Issue 2, February – 2023 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165
to a higher reduction in loss, and continues the tree (leaf-  Soft Voting:
wise growth), CatBoost (CB), which is an open source (i) QDA, (ii) Nu - SVM, (iii) RBFSVM, (iv) PSVM,
library with gradient boosting framework and HistGradBoost (v) DTC, (vi) RF, (vii) XGBC, (viii) BC, (ix) MLP, (x)
(HGB), which is a Histogram-based Gradient boosting clas- ETC, (xi) GNB.
sifier. Inorder to increase the accuracy of classifiers, we
stacked a few of them and made five stacked models:  Hard Voting 1: (i)RFE, (ii) XGBoost.
 Stacked model-1 with DT, LR, KNN and SVM (RBF)  Hard Voting 2: (i) QDA, (ii) Nu - SVM, (iii)
 Stacked model-2 with KNN, SVM (RBF) and AdaBoost, RBFSVM, (iv) PSVM, (v) DTC, (vi) RF, (vii)
which were some of the classifiers with lower performance XGBC, (viii) BC, (ix) MLP, (x) ETC, (xi) GNB.
 Stacked model-3 with LR, KNN and SVM (RBF)
 Stacked model-4 with MLP, RF and SVM (RBF) B. Evaluation Metrics
 Stacked model-5 with SVM (RBF), KNN and PA We have used four Evaluation metrics for calculating and
comparing the performance of all the classifier models:
The stacked models showed higher performance than Precision, Recall, Accuracy and F1 Score. We also calculated
any individual models. Voting classifier is also an ensemble Precision, Recall and F1 score of each class (Galaxy, Stars and
learning method, and is of two types: Hard voting (HV), quasars) for each of the classifier models using the classification
where the voting is calculated based on the predicted output report function in the sci-kit learn library. To calculate these
and Soft Voting (SV), where it is based on the predicted values, a confusion matrix is required, such as this:
probability of the output class. We built one soft voting and
two hard voting classifiers as such:

Actual Predicted
Negative Positive
Negative True Negative False Positive
Positive False Negative True Positive
Table 3: TP, TN, FP, & FN explanation table

 True Positive (TP) - The actual value is positive and is IV. EXPERIMENTAL RESULTS
predicted to be positive.
 True Negative (TN) - The actual value is negative and is To compute the performance of the classifier models we
predicted to be negative. used evaluation metrics such as accuracy, recall, precision and
 False Positive (FP) - The actual value is negative but is F1 score. We also used classification report from sklearn to
predicted to be positive. obtain class-wise evaluations for each model.
 False Negative (FN) - The actual value is positive but is
predicted to be negative

A. Evaluation Metrics of the Classifiers used


Out of the 43 models used , the evaluation metrics of the best 20 have been listed below:

S.no Classifier Accuracy Precision Recall F1 score


1 Stacking Model-5 0.9785 0.9787 0.9785 0.9784
2 Extra Trees 0.9790 0.9790 0.9790 0.9789
3 SGD l1 0.9790 0.9790 0.9790 0.9789
4 LSVM(L1) 0.9800 0.9799 0.9800 0.9789
5 Multi Layer Perception 0.9825 0.9825 0.9825 0.9824
6 Hard Voting-2 0.9875 0.9875 0.9875 0.9874
7 Decision Tree 0.9880 0.9879 0.9880 0.9879
8 Grad Boost 0.9885 0.9879 0.9885 0.9884
9 LightGBM 0.9885 0.9884 0.9885 0.9884
10 HistGradBoost 0.9890 0.9890 0.9890 0.9889
11 CatBoost 0.9890 0.9889 0.9890 0.9889
12 RF entropy 0.9890 0.9890 0.9890 0.9889
13 Soft Voting 0.9890 0.9890 0.9890 0.9889
14 RF gini 0.9895 0.9894 0.9895 0.9894
15 Stacked Model-2 0.9895 0.9895 0.9895 0.9894
16 XGBoost 0.9895 0.9895 0.9895 0.9894
17 Hard Voting-1 0.9900 0.9900 0.9900 0.9899
18 Stacked Model-3 0.9900 0.9900 0.9900 0.9900
19 Stacking Model-4 0.9903 0.9903 0.9903 0.9903
20 Stacking Model-1 0.9910 0.9910 0.9910 0.9909
Table 4: Best 20 classifiers

IJISRT23FEB329 www.ijisrt.com 436


Volume 8, Issue 2, February – 2023 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165
B. Corresponding Box-Plots
Box-plots are made for 20 models which have the highest accuracy of the total 43:

(a) Accuracy Box-Plot (b) Galaxies Precision Box-Plot

(c) Galaxies Recall Box-Plot (d) Galaxies F1-score Box-Plot

(e) Quasars Precision Box-Plot (f) Quasars Recall Box-Plot

(g) Quasars F1-score Box-Plot (h) Stars Precision Box-Plot

(i) Stars Recall Box-Plot (j) Stars F1-score Box-Plot


Fig. 3: Box-Plots

IJISRT23FEB329 www.ijisrt.com 437


Volume 8, Issue 2, February – 2023 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165
V. CONCLUSION AND FUTURE SCOPE several machine learning models. The hierarchy of model
performance based on each class-label is sufficiently
After utilizing several machine learning models for presented, and with the ever-increasing data incoming from
subsequent classification of astronomical dataset from the Sloan various ongoing astronomical surveys, machine learning
Digital Sky Survey, Data Release - 14 (SDSS - DR14) into algorithms specialized to classifying each class-label can be
class-labels: “GALAXY”, “QSO” & “STAR”, a thorough identified using the models already narrowed down above.
analysis of the evaluation metrics: Accuracy, Precision (per Since the astronomical survey data is publicly released, a
class label), Recall (per class label), & F1-score (per class basic set of specialized high-performing models can be made
label) is performed and results are analyzed. The data readily available for public-access in order to avoid long
consists of information on 17 feature variables and 1 class hours associated with finding the right fit for the data set,
variable of 10,000 astronomical objects in total, and of the considering the numerous machine learning models present
17 only 15 feature variables are retained as an input for today. Further optimization based on computational time
classification models. An initial EDA of the dataset is and complexity can also be performed based on the
performed in order to prepare the data for the subsequent requirements of individual projects and the available
classification task, wherein a 80/20 split of given data into resources.
training and testing data respectively is done. The training
data is used to train independent Machine Learning models, REFERENCES
directly accessible from the sklearn Python library, while
keeping the hyper-parameters to a default. Some ensemble [1.] Sabeesh Ethiraj, Bharath Kumar Bolla,
based algorithms, such as voting (hard and soft) are also ”CLASSIFICATION OF QUASARS, GALAXIES,
analyzed with their ensemble members being derived based on AND STARS USING MULTI-MODAL DEEP
an optimally performing voting algorithm evident in LEARNING”, Cornell university arXiv, May 2022.
previous research. Stacking algorithms are constructed from [2.] Zhuliang Qi, “Stellar Classification by Machine
poorly performing models and high performing algorithms to Learning”, SHS web of conferences, August 2022.
observe an increase in subsequent performance. In total, 43 [3.] Jordi Sabat´es de la Huerta, “Classifying astronomical
machine learning models are fitted to the training data, and sources with machine learning”, Dipòsit Digital de la
their corresponding evaluation metrics are found using the Universitat de Barcelona, January 2022.
testing data. The results are tabulated above in an increasing [4.] V. Khramtsov, C. Spiniello, A. Agnello, A. Sergeyev,
order of accuracy of models. As was expected, stacked models “VEXAS: VISTA EXtension to Auxiliary Surveys”,
tended to perform better based on almost all evaluation metrics. Astronomy & Astrophysics, March 2021.
Soft and hard voting algorithms also managed to score better [5.] Cunshi Wang, Yu Bai , C. Lóp ez -Sanjuan, et al.,”J-
than most individual models tested. For further detailed PLUS: Support vector machine applied to STAR-
analysis of metric variation with repeated fitting, box-plots of GALAXY-QSO classification”, Astronomy &
all evaluation metrics are developed for the top 20 machine Astrophysics, December 2021.
learning models in the table, and are presented above. Given [6.] Micha-l Wierzbiński, Pawe-l P-lawiak, Mohamed
that the data is unbalanced in the frequency of class labels, Hammad, U. Rajendra Acharya, “Development of
the plots of precision, recall and F1-score are separately accurate classification of heavenly bodies using novel
analyzed for each class label. In case of “QSO” class label, machine learning techniques”, Methodologies and
the evaluation metric box-plots are observed to be more Application, March 2021.
spread out than for other class labels, a result that can be [7.] Yulun Wu, “MACHINE LEARNING
attributed to the fact the number of objects labeled “QSO” CLASSIFICATION OF STARS, GALAXIES, AND
are significantly less than those labeled “GALAXY” and QUASARS”, MATTER: International Journal of
“STAR”, both are which are almost equally distributed. The Science and Technology, 2021.
combined accuracy plot identifies the Stacked Model 1, [8.] Blanton, M. R., Bershady, M. A., Abolfathi, B.,
Stacked Model 4 and XGBoost as the best performing models Albareti, F. D., Allende Prieto, C., Almeida, A., ...
in terms of accuracy due to their high mean accuracy values Zou, H. (2017, July). Sloan Digital Sky Survey IV:
and small spread about the mean. For “GALAXIES”, Mapping the Milky Way, Nearby Galaxies, and the
Stacked Model 1, Hard voting model 1 and XGBoost model Distant Universe. NASA/ADS.
seem to dominate with respect to recall and F1-score, but https://ui.adsabs.harvard.edu/abs/ 2017AJ....154...28B/
individual models seem to give better precision results than abstract.
ensemble-based models. For “QSO”, we note due to a lower [9.] Pedregosa, F., Varoquaux, G., Gramfort, A., Michel,
frequency of data, almost all models perform equivalently V., Thirion, B., Grisel, O., ... & Duchesnay, E. (2011).
across all evaluation metrics, and therefore, a decision Scikit-learn: Machine learning in Python. The Journal
regarding the best models for this class-labels can’t be reached of machine Learning research, 12, 2825-2830.
due to unavailability of sufficient data. For “STARS”, Stacked [10.] Christopher, A. (2021, February 3). K-Nearest
Model 1, Hard voting model 1, XGBoost model, Gradient Neighbor. Medium. Retrieved December 14, 2022,
Boosting model and Decision tree models seem to outperform from https://medium.com/swlh/k-nearest-neighbor-
in precision and F1-score, whereas almost all models perform ca2593d7a3c4
equivalently well when it comes to recall. In case of [11.] Introduction to SGD classifier - Michael Fuchs Python.
“STARS”, a generic high performance is observed when MFuchs. (2019, November 11). Retrieved December
compared to other class-labels, implying that the objects 14, 2022, from https://michael-fuchs-
belonging to this class label are more easily classified with python.netlify.app/2019/11/11/introduction-to-sgd-

IJISRT23FEB329 www.ijisrt.com 438


Volume 8, Issue 2, February – 2023 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165
classifier/ [12] Hange, A. (2021, April 18). Flux
prediction using single-layer perceptron and
Multilayer Perceptron. Medium. Retrieved December
14, 2022, from https://medium.com/nerd-for-tech/flux-
prediction-using-single-layer-perceptron-and- multilayer-
perceptron-cf82c1341c33
[12.] Aznar, P. (2020, June 17). What is the difference
between extra trees and random forest?
[13.] Quantdare. Quantdare. Retrieved December 14,
2022, from https://quantdare.com/what-is-the-
difference-between-extra-trees-and-random-forest/
[14.] Wang, C. ; Bai, Y. ; Lóp e z -Sanjuan, C. ; Yuan, H. et
al. (2022, March). J-PLUS: Support vector machine
applied to STAR-GALAXY-QSO classification.
Astronomy & Astrophysics, Volume 659, id.A144, 23
pp. https://arxiv.org/pdf/2106.12787.pdf
[15.] Mahalakshmi, G. S., Swadesh, B., Aswin, R. R.
V., Sendhilkumar, S., Swaminathan, A., &
Surendran, S. (2022, August). Classification and
Feature Prediction of Star, Galaxies, Quasars, and
Galaxy Morphologies Using Machine Learning. Research
Square. https://assets.researchsquare.com/files/rs-
1885343/v1 covered.pdf?c=1661794633
[16.] O. Clarke, A. M. M. Scaife, R. Greenhalgh and V.
Griguta: Identifying galaxies, quasars, and stars with
machine learning: A new catalogue of classifications for
111 million SDSS sources without spectra, Astronomy &
Astrophysics, Volume 639, id. A84, 29 pp., July 2020.
https://doi.org/10.1051/0004- 6361/20193670.
[17.] Nicholas M. Ball, Robert J. Brunner, Adam D. Myers,
David Tcheng: Robust Machine Learning Applied to
Astronomical Data Sets. I. Star-Galaxy Classification
of the Sloan Digital Sky Survey DR3 Using Decision
Trees, The Astrophysics Journal 650 497, October 2006.
DOI:10.1086/507440.
[18.] Wierzbiński, M., P-lawiak, P., Hammad, M. et al. :
Development of accurate classifica- tion of heavenly
bodies using novel machine learning techniques, Soft
Comput 25, 7213–7228 (2021).
https://doi.org/10.1007/s00500-021-05687-4
[19.] S. Chaini, A. Bagul, A. Deshpande, R. Gondkar, K.
Sharma, M Vivek, A. Kembhavi: Photo- metric
identification of compact galaxies, stars, and quasars
using multiple neural networks, Monthly Notices of
the Royal Astronomical Society, Volume 518, Issue
2, January 2023, Pages 3123–3136.
https://doi.org/10.1093/mnras/stac3336

IJISRT23FEB329 www.ijisrt.com 439

You might also like