Professional Documents
Culture Documents
ISSN No:-2456-2165
Abstract:- Drug function identification from the drug The use of biological properties of the drugs is increasing
properties is important in drug discovery. Each year to discover new drug-drug interaction [8, 9, 10] and side
billions of dollars are spent on empirical testing of the effects identification [11, 12, 13, 14]. However, biological
drugs, which is costly, chemical wastage, and time- properties are not yet utilized to analyze drug function. Such a
consuming. The computational experiments would help study efficiently analyzes drug function from the biological
reduce drug discovery time and cost significantly. Most of properties of the drug. In the field of pharmacology,
the existing works have focused on single-label drug biotechnology, drug discovery, development, and design,
function identification. However, the capability of the analyzing the drug functions are essential in discovering new
drug's biological properties (transporter, target, carrier, drugs efficiently. A drug can have multiple drug functions.
and enzyme) has not yet been explored for multiple drug Therefore classifying a drug into different drug functions is a
function identification. Identifying drug function is a multi-label task [15, 16]. The analysis of drug function can be
multi-label classification problem. So, in the present work, carried out using a Multi-Label Long Short-Term Memory
a multi-label long short-term memory-based (MLLSTM). Unlike single label identification classification,
framework has been proposed for identifying drug the multi-label identification approach identify one or more
function. The data related to biological properties has been drug functions at the same time. This work demonstrates how
extracted from DrugBank, and drug functions are the multi-label long short-term memory approach is used on
collected from PubChem. The proposed framework biological drug properties to analyze various drug functions
performance has been found promising in terms of derived from the medical subject heading (MeSH) [17]. The
accuracy, precision, recall, F1, ROC-AUC score, and common problem in multi-label classification tasks is that it
hamming-loss, and it achieved the highest accuracy of faces a class imbalance problem. A multi-label dataset with
95.80%. class imbalance is a complex problem, and the result may be
affected. So, Multi-Label Synthetic Minority Over-Sampling
Keywords:- Multi-Label, LSTM, Biological Properties, Drug Techniques (MLSMOTE) have been used to address the class
Function, Machine Learning. imbalance issue [18].
Hence, the multi-label identification task is essential to identify multiple drug function based on the biological properties of
drug. The aim of the multi-label identification of drug function is to assign multiple labels (drug function) for a drug Drug k, which
input is related to a collection of drug features (X), and output is a set of possible Drug_Function.
Biological Properties
Biological properties are also crucial in silico
experiments to drug discovery and development. In this work,
transporter, target, carrier, and enzyme are used to classify
drug function. The popular drug information database
DrugBank [23] is used to retrieve the biological information.
After mapping the drug biological properties with the drug Fig. 3. Frequency of class distribution on protein dataset.
function, it contains 1108 drugs corresponding to 12 drug
functions. C. Proposed Methodology
For identifying drug functions, the input is related to
The dataset with biological properties and drug function biological features, and the output is the drug functions of a
has been illustrated in Fig. 2, Where Functionn is the total particular drug. One drug may have more than one drug
amount of drug function (n=12) and transporter, target, carrier, function, so it belongs to the Multi-label task. A framework
and enzyme are the properties of drugs. for the proposed methodology has been represented
diagrammatically in Fig. 4. In the proposed methodology. For
solving multi-label drug function identification task, a multi-
label supported LSTM approach is proposed. Finally,
MLLSTM classification algorithm performance is evaluated
using different performance measures such as ROC-AUC,
precision, hamming-loss, accuracy, recall, and f1 score.
Input, Hidden Layer Sizes Accuracy Precision Recall F1 Score ROC-AUC Hamming-Loss
16, 16 94.40 % 90.71% 87.34% 89% 97.59% 5.59%
16, 32 94.30% 91.88% 85.52% 85.55% 97.75% 5.66%
16, 64 95.80% 92.05% 91.11% 91.60% 98.15% 4.23%
32, 32 92.20% 91.70% 76.14% 83.20% 97.70% 7.82%
32, 64 90.40% 91.04% 69.28% 78.69% 97.46% 9.64%
Table II: Results of the MLLSTM approach on biological properties to identify drug functions.
B. Results
The outcomes of the experiment to identify drug function
have been discussed in this section. The biological properties
(transporter, target, carrier, and enzyme) were utilized to
determine the drug function by employing a multi-label LSTM
framework. The performance of the MLLSTM framework on
biological properties has been presented in Table II.
FIG. 5. ROC CURVE OF MLLSTM CLASSIFIER ON BIOLOGICAL Fig. 7. ROC curve of MLLSTM classifier on biological
PROPERTIES FOR INPUT AND HIDDEN LAYER UNIT 16 AND 16 properties for input and hidden layer unit 16 and 64
RESPECTIVELY. respectively.