You are on page 1of 6

Volume 7, Issue 7, July – 2022 International Journal of Innovative Science and Research Technology

ISSN No:-2456-2165

Deep Action: An approach on the basis of Deep


Learning for the Prediction of Novel
Drug-Target Interactions
Hilma K K., Karthika Krishnan, Lima P Subran Prof. Neetha Joseph
Department of Computer Science and Engineering Faculty in charge, Department of Computer Science
Mar Athanasius College of Engineering and Engineering Mar Athanasius College of
Ernakulam, India Engineering Ernakulam, India

Abstract:- In the processes of drug development and


discovery, Drug-target interactions (DTIs) take part a I. INTRODUCTION
vital position. DTI prediction through laboratory
experiments consumes a lot of time. Also they were costly The DTI determination is a vast area of research which
and tiring. Although computational approaches can perform a crucial position in instigating advanced drugs for prior
recognize new interactions between drug-target pairs and targets and for identifying advanced protein target for studied
speed up the drug conversion procedures, some problems drugs. Number of potential untraced interactions were large and
many of the drugs identified were refused due to its side effects
like large scope of data and imbalanced class have been
or high toxicity. Conventional methods of laboratory
encountered in the course of the prediction procedures,
and the number of unknown interactions were huge. experimentation were expensive and laborious. Computational
Therefore, an approach on the grounds of deep learning approaches were developed to detect new DTIs. The public
(deepACTION) is put forward to predict possible or databases such as KEGG, DrugBank, etc,. which were developed
unrevealed DTIs. Here, each drug chemical structure and based on verified interaction information, reserve and render
protein sequence is transformed according to structural knowledge on the grounds of laboratory experiments for
and sequence information using different descriptors to constructing a computational application for determining latest
correctly constitute their properties. In this method the DTIs and are applied as the gold standard dataset. To tackle
majority and minority instances in the dataset are limitations of previous models, a Convolutional Neural Network
balanced using the SMOTE technique. For accurate DTI related model, deep ACTION is proposed to intoduce budding
DTIs with the help of chemical structure and sequence
prediction a convolutional neural network (CNN)
algorithm is trained with balanced and reduced features. information of drugs and proteins respectively. The pairs of drug-
For comparing the performance of the DeepACTION target and their corresponding chemical structures and protein
model with that of other methods AUC is regarded as the sequences were downloaded from the DrugBank database and
primary evaluation metric. An AUC curve of 0.933 is KEGG database.
achieved by Deep ACTION model for the experimental Firstly, the chemical structure of drug is converted to a
dataset acquired from the Drug Bank database. Based on topo-logical, constitutional, and geometrical form. Various
exper-imental results it is evident that the model is protein descriptors were utilized to describe sequence information
capable to predict a remarkable number of new DTI’s of a target sequence. Then the valid data is created by combin-ing
and it produce thorough knowledge that inspires the extracted drug-protein features. Here the interacting features
scientists to instigate advanced drugs. are indicated as positive pairs and non-interacting features as
Keywords:- Drug-target interaction, CNN, Data balancing. negative pairs. Then manage the imbalanced drug-target dataset
using a data balancing technique and finally the training model
for prediction of the interacting and non-interacting pairs were
constructed by utilizing CNN classifier. Subsequently, when
compared to other classifiers and methods, the suggested model
exhibits the highest performance.

Fig. 1: Deep Action model

IJISRT22JUL964 www.ijisrt.com 1544


Volume 7, Issue 7, July – 2022 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165
II. DRUG TARGET INTERACTION (DTI) B. Extraction of Drug-target features
In the dataset, drugs are represented in SMILES format
The binding of a drug to a target position which and targets in FASTA format. Simplified Molecular Input
concludes in an alteration in its behaviour or functionality is Line Entry System (SMILES) allows to describe the
referred to as Drug target interactions. A drug or medicine chemical structure of the drug in a way which can be utilized
primarily points to any chemical compound that could bring a by the system. The chemical structure of the drug which is a
physiological alterations in the human body when it is 3-dimensional representation will be difficult to be
ingested, injected or sucked up. Target aka biological target manipulated by the system. So in order to ease the procedure
can be any part of the living being to which the drugs get of determi-nation of drugs the 3D representation is converted
attached so as to introduce the physiological change. Entities to a 1D representation. This resulting format is the SMILES
like proteins or nucleic acids which are administered for any format. FASTA is an acronym for FAST-ALL. It is an
change are considered as targets. Nuclear receptors, ion alignment tool. FASTA format is used to describe the
channels, G-protein coupled receptors and enzymes are the nucleotide or amino acid sequences with single-letter codes.
most common biological targets. Specified drug and target IDs are used to gather the SMILES
and FASTA format from DrugBank and KEGG databases
DTI prediction takes part in an important position in the respectively. These IDs are available from KEGG database.
procedure of drug discovery. Drug discovery is focused in The extracted features are preprocessed.
the identification of unidentified compounds of the drugs for
different biological targets. Drug’s chemical compound get As the generated features may exist in varied length, all
attached to the molecules of target through bonds that are of them are trimmed to fixed length vectors and it is then fed
temporary. The chemical compounds of the drugs that are as an input to the chosen classifier. The features which are of
attached will then reciprocate with the biological target. This no or least importance in the prediction procedure will be
reaction may result in changes which can be positive or eliminated. In the end, an overall of 193 drugs and 1290
negative and accordingly draw out the target. In order to treat targets features are remained. It is then combined to form the
diseases, the drugs ward off certain catalyzed reactions drug-target pairs.
happening in the human body by inhibiting the functioning of
the target. It is attained by inhibiting the contact of drugs C. SMOTE For Balancing
with substrates, which is a kind of enzyme. In this proposed model, SMOTE technique which is an
over-sampling method is included to get hold of the
There are two ways for the occurrence of the DTIs: One imbalanced datasets and to create the training dataset of drug-
of the method is that, to impede the reaction the drugs which target pairs. For the sake of enhancing the prediction
are known as competitive inhibitors will affix themselves to capability of the classifier by balancing the minority class,
the active site of the target. Another method is that to avert SMOTE syntheti-cally give rise to positive samples of the
the substrate from recognizing target. It is done by allosteric experimental datasets. SMOTE procedure is as follows:
inhibitors which is a kind of drug which can affix to the  Initially select a sample randomly and from the minority
target’s allosteric site. Thereby it alters target’s shape and class datasets detect the K-nearest neighbors for every
structure. Thus, reactions will not take place. The shutting off samples in minority class
of the reactions of the target can direct to the handling of  The distance or dimension between the feature vector and
metabolic imbalances or can kill pathogens in order to cure its nearest neighbors are calculated
diseases.  Then the difference is multiplied using a random number
DTI predictions has numerous applications. It can ease from [0 to 1].
the procedures of drug discovery, repositioning of drug and
 This number is then added to the feature vector (sample)
prediction of side effects of drugs. Discovery of drugs can be
 The process is continued until the minority and the majority
defined as the inquiry of new drugs which is capable of
classes have same number of samples and then stop the
interaction with a specified target.
process
III. BACKGROUND
D. Convolutional neural network
A Convolutional neural network (CNN or ConvNet) is a
A. Datasets
kind of artificial neural network(ANN) and is employed in
The dataset required for the experiment was freely
pattern recognition and image processing. It is particularly
available from DrugBank which is a Canadian online
designed to exercise on pixel datas. They are recurrently
database. It contains information related to the drugs and
utilized for various applications, such as analysis of voice,
targets. In the dataset, overall 12,674 interactions between
systems for recommendations, recognition and classification
drugs and targets are available. It comprises of 5877 drugs
of videos and images and processing of natural language.
and 3348 target proteins. Distinct interpretations of the
ConvNet can reduce the images into a form that is convenient
DrugBank dataset were utilized as standard datasets in earlier
to process, without eliminating features that are crucial for
researches.
making a good prediction. Generally, CNNs consist of an
input layer, output layer and multiple hidden layers (MHLs).
Multiple hidden layers in CNNs may include many
convolutional layers (CLs). To alleviate the problem of
model over fitting that may add noise to the HLs randomly, a
dropout layer is assigned. Drop out layer is a regularization

IJISRT22JUL964 www.ijisrt.com 1545


Volume 7, Issue 7, July – 2022 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165
strategy. The nodes which are indicated as ‘dropped out’ will A. Performance analysis
not either join in back propagation or assist in the forward By setting parameters train the model on the basis of
pass. A common activation function called RELU layer is train-ing data. Some of the metrics for analyzing the
included and is accordingly observed by extra convolution performance include Accuracy, Sensitivity, Specificity and
layers. MCC . The AUC metric is used to evaluate the performance
of deepACTION method. Then plot the ROC curve with TPR
In CNNs, main structure blocks are convolutional (sensitivity) in X-axis and FPR (1-specificity) in Y axiz by
layers. For training more significant features deeper CLs are adjusting with different thresholds.
applied. This is achieved by including sliding kernels on the
upper part of the earlier layers. Usually after each CL the B. Performance analogy with other classifiers
pooling operation is applied. Thus, PLs can reduce the A comparison with various classifiers on the Drug Bank
number of features provided and by local nonlinear functions dataset was performed to analyse the effectiveness and
can offer translation invariance . robust-ness of the deep ACTION model. Different classifier
models obtain different accuracy values on same set of data.
IV. EXPERIMENTAL RESULTS
The CNN classifier produced the highest AUC of
The suggested approach is entirely implemented with 0.9133 for the dataset. GBN classifier achieved the second-
Python (version 3.6) as the programming language. Also highest result, with the performance AUC value of 0.8930.
incorporates Pytorch and scikit-learn library. Spyder, a free The classifiers Random Forest and KNN have performance
and open source environment which is written in Python is AUC value of 0.885 and 0.882 respectively.
the em-ployed IDE. For the implementation on neural
networks, Keras and tensorflow are included. Various tests The CNN classifier have comparatively higher AUC
were performed to examine the accuracy of various values than the value obtained by the classifiers GBN,
classifiers and balancing and selection approaches used. At Random Forest and KNN. By comparing the AUC values, it
the end, the classifier will put forward a list of the unused is clear that the accuracy of CNN is 2.03%, 2.83%, and
potential interactions. 3.13% higher than the other classifiers used respectively.

Fig. 2: Performace Evaluation: Accuracy

Fig. 3: ROC Curve Before and After Balancing Using SMOTE

IJISRT22JUL964 www.ijisrt.com 1546


Volume 7, Issue 7, July – 2022 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165
V. FUTURE RESEARCH tation method that uses SMILES sequences of the interacting
ligands to describe proteins. So that both the input dataset can
Presently the project uses a SMILES format to represent be represented in the same format.
drugs and FASTA format to represent proteins. As a future
work, it is possible to utilize a ligand-based protein represen-

Fig. 4: ROC Curve of GBN

Fig. 5: ROC Curve of KNN

Fig. 6: ROC Curve of RandomForest

IJISRT22JUL964 www.ijisrt.com 1547


Volume 7, Issue 7, July – 2022 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165

Fig. 7: Performance comparison of different classifiers

Also, a drug-target dataset (networks) which is REFERENCES


heteroge-neous with auPR matrices for experiments can be
considered and a stand-alone web application can be [1.] S.M. Hasan Mahmuda, Wenyu Chena, Hosney Jahanb,
developed by pro-viding simple mechanisms and a user- Bo Daia, Salah Ud Dina, Anthony Mackitz Dzisoo,”
friendly interface for the model. In future, research can also DeepACTION: A deep learning-based method for
be extended to predict related directions such as drug-drug, predicting novel drug-target interactions”, October
drug side effects, and protein diseases. 6,2020..
[2.] ShanShan Hu, DeNan Xia, Benyue Su, Peng Chen ,
By integrating multiple heterogeneous data sources of Bing Wang , and Jinyan Li,”A Convolutional Neural
drugs and targets, a more efficient prediction model can be Network System to Discriminate Drug-Target
created. Interactions”, August 2021. .
[3.] Ping Xuan , Bingxu Chen, Tiangang Zhang , and Yan
VI. CONCLUSION Yang, ”Prediction of Drug–Target Interactions Based
on Network Representation Learning and Ensemble
The project developed a model based on deep learning Learning”, December 2021.
called deepACTION for the prediction of interactions [4.] ABDELRAHMAN I. SAAD , YASSER M. K. OMAR,
between drugs and targets . Interactions between drugs and AND FAHIMA
targets are determined using CNN algorithm. The drug-target [5.] MAGHR, ”Predicting Drug Interaction With Adenosine
structure is represented in numeical form by utilizing various Receptors Using Machine Learning and SMOTE
feature extraction techniques. Related works used different Techniques”, October 22, 2019.
methods to manage the imbalanced datasets. DeepAction [6.] SKonstantinos Pliakos , Celine Vens , and Grigorios
model include SMOTE to manipulate the majority (negative) Tsoumakas, ”Pre-dicting Drug-Target Interactions With
and minority (positive) instances in the dataset. SMOTE Multi-Label Classification and Label Partitioning”,
balance class distribution by randomly increasing minority August 2021.
class samples by replicating them. [7.] S. M. H. Mahmud, W. Chen, H. Jahan, Y. Liu, N. I.
Sujan and S. Ahmed, ”iDTi-CSsmoteB: Identification
Different classifiers such as CNN, GBN, Random of Drug–Target Interaction Based on Drug Chemical
Forest and KNN were applied. CNN returned higher Structure and Protein Sequence Using XGBoost With
accuracy of 91.33% compared to other classifiers. GBN Over-Sampling Technique SMOTE,” in IEEE Access,
classifier returns the second-highest accuracy of 89.30%. vol. 7, pp. 48699-48714, 2019, doi:
Random Forest and KNN produced accuracy of 88.5% and 10.1109/ACCESS.2019.2910277.
88.2% respectively. [8.] M. Campillos, M. Kuhn, A.C. Gavin, L.J. Jensen, P.
From the experimental results it is evident that the deep- Bork, Drug target identification using side-effect
Action method attains highest prediction performance and is similarity, Science 321 (2008) 263–266,
capable to predict novel drug-target pairs from the DrugBank https://doi.org/10.1126/ science.1158140, 80-.
dataset. The improved performance of proposed model may [9.] H. Yu, J. Chen, X. Xu, Y. Li, H. Zhao, Y. Fang, X. Li,
motivate scientists to utilize this method in prediction of new W. Zhou,
DTIs. [10.] Wang, Y. Wang, A systematic prediction of multiple
drug-target interactions from chemical, genomic, and
pharmacological data, PLoS One 7 (2012),
https://doi.org/10.1371/
[11.] S. Hu, C. Zhang, P. Chen, P. Gu, J. Zhang, B. Wang,
Predicting drug-target interactions from drug structure
and protein sequence using novel convolutional neural
networks, BMC Bioinf. 20 (2019) 1–12,
https://doi.org/10.1186/s12859- 019-3263-x.

IJISRT22JUL964 www.ijisrt.com 1548


Volume 7, Issue 7, July – 2022 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165
[12.] Ezzat, M. Wu, X.L. Li, C.K. Kwoh, Drug-target
interaction prediction via class imbalance-aware
ensemble learning, BMC Bioinf. 17 (2016),
https://doi.org/ 10.1186/s12859-016-1377-y.
[13.] S. Hu, D. Xia, B. Su, P. Chen, B. Wang, J. Li, A
convolutional neural network system to discriminate
drug-target interactions, IEEE ACM Trans. Comput.
Biol. Bioinf (2019),
https://doi.org/10.1109/TCBB.2019.2940187.
[14.] T. Pahikkala, A. Airola, S. Pietila, Toward more
realistic drug-target interaction predictions, Briefings
Bioinf. (2014) 1–13,
https://doi.org/10.1093/bib/bbu010.

IJISRT22JUL964 www.ijisrt.com 1549

You might also like