Explainable Artificial Intelligence How Face Masks Are Detected Via Deep Neural Networks

Volume 6, Issue 9, September – 2021 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165
Explainable Artificial Intelligence: How Face Masks

are Detected via Deep Neural Networks
Ahmet Haydar Ornek, Mustafa Celik Murat Ceylan
Integration Solutions Development Department Electrical and Electronics Engineering Department
Huawei Turkey R&D Center Konya Technical University
Istanbul, Turkey Konya, Turkey
Abstract:- The image classification has become a well-

known process with the development of deep neural
networks. Although classification studies above 90%
accuracy are realized, their explainable side is still an open
area which means the classification process are not known
by researchers. In this study, we show what a deep neural
network model learns from face images to classify them
into with mask and without mask classes using last
convolutional layers of the model. As a deep neural
network model ResNet-18 was selected and the model was
trained with 18600 balanced face images belonging two
classes and tested with 4540 face images different from
training images. The model's test results are obtained as
95.16% sensitivity, 96.69% specificity, 96.58% accuracy.
With the created activation maps it is clearly seen that the Fig 1. Problem definition. While Deep Neural Networks are able to
model learns face structure for images without mask and correctly classify the face images, the decision process is not known.
By highlighting the convolutional layers, image parts learned by deep
mask structure for images with mask. neural networks are shown in this study.
Keywords:- Classification; Covid-19; Explainable Artificial

Intelligence; Transfer Learning.
In explanations of the models, there are three main ways
I. INTRODUCTION as numerical, visual, and rule-based [13]. The numerical
methods calculate all inputs' contributions from all to zero or
The computer vision applications such as image vice versa by using a method like Information Gain [14].
classification, object detection, image segmentation and Trying manually which input changes the classification result,
clustering are being solved with high performance by deep important features are detected in numerical methods.
neural network models [1-9]. Since the models have layers
having more than 1M parameters their explainable side The rule-based methods such as Decision Tree or
becomes bad-known state and the models are called as black- Random Forest, use information gain to decide how importance
box models. By applying the class activation map [10] of inputs for the classification. However in big models such as
technique to our image classification problem that try to detect deep neural networks, they cannot be implemented to the
whether people wear a face mask image or not (Fig. 1 was models to create an explainable structure because of parameter
created to show the problem definition), we show how the size.
images are classified into mask and no mask classes by
highlighting the related regions over the images. Visual approaches such as class activation map is used for
deep neural models. By using a created activation map (also
During the pandemic that affects the whole people, it is known as heat-map), the importance of each pixel of an image
important to follow if people wear a face mask [11]. To decide can be represented over the image.
that an intelligent system is required that has a camera and
processing unit. In this sense we make following contributions to the
literature by applying class activation maps to our real-world
In traditional approaches such as feature extraction, classification problem to explain how images are classified into
feature selection and classification all process is manually mask and no mask classes.
realized [12]. When extracted features are less and a method  Real-world images are collected under difficult conditions
like the decision tree is used, the classification could be such as low resolution, quality, changing light and background.
explained. In deep learning models it is almost impossible to  The images are classified into classes as mask and no mask
show all features' contributions by using the decision tree by using transfer learning method (with ResNet-18
classifier. architecture).
IJISRT21SEP702 www.ijisrt.com 1104

ISSN No:-2456-2165
 The important regions over the images are highlighted images [10], example studies are person re-identification [33-
using class activation maps to show how images are classified 34], object localization [35-37], texture analysis [38], aerial
into classes. imaging [39-40] and image segmentation [41-43].
The rest of the paper was organized that related work In [44] a new method is proposed which computes and
about face detection, mask detection and explanation studies highlights the main components of the important
are detailed in Section 2. The materials used in the study such representations from the layers. They claim up to 12%
as images and working environment are shown in Section 3. In improvement on weakly supervised object localization.
Section 4 Convolutional Neural Networks, Transfer Learning,
Class Activation Maps, and evaluation metrics are described. To the best our knowledge, mask detection and class
Experiments and results, Discussion are given in Section 5 and activation maps are used for the first time in [45]. They
Section 6, respectively. In Section 7, conclusion and future developed a system that monitors social distance, face mask,
works are explained. and touching face conditions by combining deep learning
based imaging system and class activation maps.
II. RELATED WORK
III. MATERIAL
To identify the COVID-19 number of studies with deep
learning are carried out [15-20], but these studies are about AI projects require hundreds of images to learn
clinical findings. Taking the advantage of deep learning, a effectively, and computation sources to realize mathematical
system that monitor whether people wear a mask can be operations. The collected images and working environment will
developed. be clarified in this section.
To detect face mask images [21] proposed a hybrid deep A. Creating Dataset
learning and machine learning model. Deep learning is used to The images used in this study have been collected from a
extract features and support vector machines (SVM) classifies created imaging system at Huawei entrance and open source
the extracted features. The SVM classifier achieved 99.64%, datasets [46-47]. Fig. 2 shows the created imaging system to
99.49%, 100% testing accuracy in different datasets. take real world face images. The system was detailed in the
Working Environment section.
In [22] a face mask-wearing condition identification
method is developed by combining super-resolution and
classification methods for images. Their algorithm consist of
four steps: pre-processing, face detection-cropping, super-
resolution, and face mask-wearing identification. They
achieved 98.70% accuracy using the proposed deep learning
method.
[23] analyzed no-masked and masked face recognition

accuracy using principal component analysis (PCA).
According to its results, a face without mask achieves more
performance in PCA based face recognition. They tried four
different scenarios by changing their test size. The average
accuracies are 95.68% and 70.53% for no-mask and mask,
respectively. Fig 2. Created Imaging Setup. Huawei Entrance. Face images are
collected by using
Not all this system.
images in the open source datasets were used
for
Face detection [24-25] is a challenging problem. With training because of mislabeled images. The collected images
the advances in deep learning, convolutional based solutions with mask and without mask can be seen from Fig. 3 and Fig.
provides high efficiency in the face detection problem [26-29]. 4, respectively.
[30] used YOLOv3 [31] to detect face images by changing its
layers, using softmax function, and reducing features'
dimension.
In [32], an edge computing-based mask detection model

is proposed to provide real-time performance on common
camera devices. Their system contains three steps: restoration,
face and mask detection. They achieved 95.9% accuracy.
Convolutional neural networks have ability to learn

representations of images. Although it performs high Fig 3. Collected face images with mask. (a and b are from M2150
performance on classification problems, its transparency side camera, c is from a mobile phone, d-j are from open-source datasets.)
is still open to develop. It has been started to develop
Gradient-based methods to highlight the important parts of

ISSN No:-2456-2165
IV. METHODS
This part describes how to classify images and create

class activation maps which refer to highlighted areas of the
images.
A. Convolutional Neural Networks (CNNs)

Convolutional Neural Network is one of the well-known
deep learning models which is special for image-related
problems such as image classification, object detection, and
Fig 4. Collected face images without mask. (a and b are from M2150
camera, c is from a mobile phone, d-j are from open-source datasets.) image segmentation [49].
CNNs consist of convolutional and neural structures

With the system at Fig. 2 and open source datasets, 18400 being responsible for automatically extracting and classifying
training, 200 validation and 4540 test, which are collected after important features of given images [50]. For instance edge,
the training, images have been collected as shown at Table 1. corner and pattern features are important features for images. A
CNN architecture can be seen in the Fig. 5.
Table 1. Number of Train - Validation - Test Images
Type Number of Image
Train 18400
Validation 200
Test 4540
Total 23140
B. Working Environment
To take real world face images a camera setup was built Fig 5. Simple CNN Architecture including convolution, pooling and
at Huawei Entrance (Fig. 2). The used camera is Huawei fully-connected layers.
M2150-10-EI [48] which has 5MP image sensor, 1 TOPS
computing power, 2560(H) x 1920(V) effective pixels as
shown at Table 2. It can capture face images and send via File As it is seen in the Fig 5, the convolutional structure
Transfer Protocol. includes convolution and pooling layers that are feature
extraction and dimension reduction methods, respectively.
Table 2. Technical Specifications of the Camera After features were extracted and reduced, they are classified
CPU Hi3516D by neural layer which is also known as fully-connected layer.
Computing Power 1 TOPS
Intelligent Analysis Face and Person Detection When it compares to traditional learning methods, end-to-
Effective pixels 2560 (H) x 1920 (V) end learning can be realized by using CNNs architecture. The
comparison of CNNs and traditional learning is demonstrated
Video Encoding Format H.265/H.264/MJPEG
at the Table 4.
Frame Rate 30 FPS
Table 4. Comparison of CNNs and Traditional Learning
To train the AI model, a GPU-based linux server (Centos
Operations CNNs Traditional Learning
7, Cuda 11.2) that has Tesla T4 has been used. As can be seen
Feature Extraction Convolution Local Binary Pattern
at Table 3 Python is selected as programming language,
Dimension Linear Discriminant
Pytorch is selected as deep learning library, Resnet-18 is Pooling
selected as pre-trained model. Reduction Analysis
Neural Artificial Neural
Classification
Table 3. Used Hardware and Software Layer Network
Programming Language Python
As seen at Table 4, in traditional methods all features
Deep Learning Library Pytorch
should be extracted by using algorithms such as Local Binary
Transfer Learning Model Resnet-18
Pattern [51] then reduced by using embedded or filter based
GPU Tesla T4
algorithms such as Linear Discriminant Analysis [52].
CUDA 11.2
Operation System Linux Centos 7 After reduction of the features, a classifier such as
Artificial Neural Network or Support Vector Machine should
be used to classify the features into classes desired [53]. When
it comes to CNNs, all operations are realized by convolution,
pooling and neural layer automatically.

ISSN No:-2456-2165
B. Transfer Learning In this study, gradient-weighted class activation mapping
Training a deep learning model from scratch causes [59] is used to create heat-maps because it uses the gradients of
computation cost, and requires hundreds of labelled images. classes by starting from the last neural layer to final
Transfer learning is a method which uses pre-trained models convolutional layer. Therefore, without making any change in
and modify them according to application to avoid starting a the trained model, important regions in the input image are
learning process from scratch and train with comparatively highlighted.
little data [54].
The overall operations to obtain the heat-maps can be
Transfer Learning uses pre-trained models which are seen in the Algorithm 2.
trained with millions of labelled images of ImageNet dataset
[55]. There are various pre-trained models such as VGG16
[56], ResNet [57], Inception [58]. Some differences of the
models are depth, filter sizes, connections of the layers, and
activation functions.
In this study a ResNet architecture called ResNet-18 is

selected as CNN model that is 18 layers deep. As mentioned
before, ResNet is a pre-trained model which is trained with
ImageNet dataset which has 1000 classes such as mouse, desk,
lemon, and pizza. To change its classes and re-train the model Fig 7. Class Activation Maps samples.
operations in Algorithm 1 are used. (Fig. 6 presents the transfer
learning process.) Algorithm 2:
 Resizing image to (224x224)
Algorithm 1:  Classifying the image
 Freezing convolutional layers  Getting weights between global average pooling and neural
 Removing the last layer with 1000 classes layer
 Adding new neural layer with classes desired  Reshaping the last convolutional layer from (7x7, 512) to
 Training (49, 512)
 Dot product of the weights and the convolutional layer
ResNet-18 accepts images with 224x224 sizes. Since pre-  Reshaping from (1x49) to (7x7)
trained models have specific input sizes, all input images (Fig.  Resizing from (7x7) to (224x224)
6 Part 1) need to be resized.  Overlaying the input image and heat-map
When first layers of trained models' are detailed it is seen To detail how CAMs is working Fig. 8 has been created.
that low level features such as corner, edge and curve are This figure shows the last three layer of the modified ResNet-
learned by the models. Instead training the first layers (Fig. 6 18 architecture. It has 512 pieces 7x7 convolution filters in the
Part 2, 3, 4, 5) again, they are frozen and other (Fig. 6 Part 6, 7, last convolutional layer (Fig. 8 Part 1). By applying the global
8, 9) layers are trained. average pooling operation, its size converted from (7x7, 512) to
(1x1, 512) (Fig. 8 Part 2). These filters with (1x1, 512) are
classified by the neural layer (Fig. 8 Part 3) with 2 neurons that
are responsible for mask and no mask classes.
Fig 6. Transfer Learning Architecture. (2-5 are frozen and 6-9 are
trained.)
Since the pre-trained models are trained with 1000

classes, their last layer should be changed according to number
of class desired. To change the last layer (Fig. 6 Part 9), last
layer is removed and a new neural layer is added. After model Fig 8. Class Activation Maps, Last Layers of the ResNet-18.
was modified, the training is started.
C. Class Activation Maps As shown at Algorithm 2, an input image is resized to

Class Activation Map is a method which allows us to (224x224) to be classified. There are (1x512, 2) weights
understand classification processes by creating a heat-map over between the global average pooling and neural layer (Fig. 8
input images after training was completed. Two class activation Part 2 and Part 3). If classification result is about to 1 that
maps obtained from this study can be seen at Fig. 7.

ISSN No:-2456-2165
means "mask", the weights between the global average pooling images (18400 train, 200 validation, 4540 test) were collected
and "mask" neuron are taken (w1,1; w2,1; ... w512,1), as shown at Table 1.
otherwise the weights between the global average pooling and
"no mask" neuron are taken (w1,2; w2,2; ... w512,2). Pre-trained models such as VGG16, and ResNet18 have
ability to classify images with 1000 classes. Since there exists
After the (1x512) weights were obtained, (7x7, 512) two classes as mask and no mask in this study, ResNet-18
filters in the last convolutional layer (Fig. 8 Part 1) is converted model was modified to classify the images with two classes. To
to (49x512). Dot product of (1x512) size neural weights and do that, its 1000-classes last neural layer was removed and a
(49x512) size filter weights is realized to obtain (1x49) size new 2-classes neural layer was added to the ResNet-18 model.
importance weights.
The ResNet-18 model was re-trained with the 18400 train
The obtained importance weights (1x49) are reshaped to and 200 validation images for 30 epochs. The used parameters
(7x7) and then resized to (224x224). The (224x224) size map is can be seen at the Table 5.
called class activation map and used to overlay with the input
image. Table 5. Parameters To Be Used For Training and
Number of Images
D. Metrics to Evaluate the Classification Results Model Resnet-18
A classification process is generally evaluated by accuracy (Eq. Loss Function Cross Entropy
1), sensitivity (Eq. 2) and specificity (Eq. 3) metrics. Stochastic
CCIWM + CCIWOM / IWM + IWOM (1) Optimiser
Gradient Descent
CCIWOM / IWOM (2) Learning Rate 0.001
CCIWM / IWM (3) Momentum 0.9
Epoch 30
Where IWM stands for images with mask, IWOM stands
for images without mask, CCIWM stands for correctly After the training, 4540 new images (4230 with mask,
classified images with mask, CCIWOM stands for correctly 310 without mask) were taken from the system at the Huawei
classified images without mask. entrance to test the model with real-world images. According to
the classification results;
Accuracy gives information about how all images are
correctly classified but when balanced test set is not available  4090 of 4230 images with mask (96.69%)
sensitivity and specificity metrics are used. Sensitivity
 295 of 310 images without mask (95.16%) were correctly
measures how positive (no mask) class, specificity measure classified as shown at Table 6.
how negative (mask) class is correctly classified.
Table 6. All Results
V. EXPERIMENTS AND RESULTS Images With Mask 4230
Images Without Mask 310
This section describes the all experiments and results by
Correctly Classified Images With Mask 4090
taking into account hyper-parameters, training, validation,
testing, and class activation maps. The overall process can be Correctly Classified Images Without Mask 295
seen from Fig. 9. False Classified Images With Mask 140
False Classified Images Without Mask 15
Sensitivity 95.16%
Specificity 96.69%
Accuracy 96.58%
Detected face images with and without mask can be seen

from Fig. 10 and Fig. 11 respectively.
Fig 9. The Overall Process.
As seen in Fig. 9, first process is collection of images. To

collect the images two different sources are used which are a
system built at the Huawei entrance and open-source datasets.
By using the system at the entrance real-world images were
collected to develop a system that works under hard conditions
such as low resolution, darkness, brightness, and image size. Fig 10. Classified face images with mask. (a and b are from M2150
By combining with the open-source datasets, totally 23140 camera, c is from a mobile phone, d-j are from open-source datasets.)

ISSN No:-2456-2165
VI. DISCUSSION
When a classification is realized using deep neural

networks such as CNNs, classification results are obtained with
high performance but why the model classified into a class in
the classification are not explained because of complexity of
deep neural networks. That's why such models are called black-
box models.
To explain what a CNN learn from images with and

Fig 11. Classified face images without mask. (a and b are from
without face masks, ResNet-18 pre-trained model was selected
M2150 camera, c is from a mobile phone, d-j are from open-source as a CNN model and re-trained with our images, and class
datasets.) activation maps technique was applied to images to highlight
regions learned by the model.
The Gradient-weighted Class Activation Maps need According to learning results with test images, sensitivity
class-related weights. Since ResNet-18 has 512 pieces filter and specificity values were obtained as 95.16% and 96.69%
with 1x1 size in global average pooling layer (Fig. 8 Part 2) and respectively. The results show that the CNN model can
two neuron in neural layer (Fig. 8 Part 3), there are 2 pieces successfully classifies images into mask and no mask classes,
1x512 weights related to mask and no mask classes. After and when we ask why an image was classified into mask class,
classification result was obtained, the weights of the desired it answers by highlighting the region where the mask image
class is taken. For example, if mask class's activations are exists. The importance of that is we can be sure there was no
looked for mask class's weights are taken (size is 1x152). Class overfitting and the CNN model was correctly trained.
activation maps of the mask and no mask classes are shown in
Fig. 12 and Fig. 13. VII. CONCLUSION AND FUTURE WORKS
Explainable Artificial Intelligence is a relatively new

topic in deep neural network models. When a model making a
decision in a classification, it is not known why the model
decided.
In this study, we are showing why face images are

classified into classes as mask and no mask by highlighting
class-related activation maps using the gradient-weighted class
activation maps technique.
In future works, we are planning to grow the study by

adding weakly supervised object detection techniques so that
we would have ability to realize object detection without using
a labeling application that causes time costs.
As can be seen from Fig. 12 images are correctly
classified as mask and the region where the mask exists is ACKNOWLEDGMENT
highlighted and followed by class activation maps. When the
mask is removed, images are still correctly classified and class This study was supported by "Epidemic Prevention
activation maps highlight and follow the face over the image System" of Huawei Turkey R&D Center.
(Fig. 13).
REFERENCES
[1]. A. Mikołajczyk and M. Grochowski, “Data

augmentation for improvingdeep learning in image
classification problem,” in2018
internationalinterdisciplinary PhD workshop (IIPhDW).
IEEE, 2018, pp. 117–122.
[2]. C. Affonso, A. L. D. Rossi, F. H. A. Vieira, A. C. P. de
Leon Ferreiraet al.,“Deep learning for biological image
classification,”Expert Systems withApplications, vol. 85,
pp. 114–122, 2017.
[3]. W. Zhao and S. Du, “Spectral–spatial feature extraction
for hyperspectralimage classification: A dimension
reduction and deep learning approach,”IEEE

ISSN No:-2456-2165
Transactions on Geoscience and Remote Sensing, vol. [17]. E. E.-D. Hemdan, M. A. Shouman, and M. E. Karar,
54, no. 8, pp.4544–4554, 2016. “Covidx-net: A framework of deep learning classifiers to
[4]. S. Ghosh, N. Das, I. Das, and U. Maulik, “Understanding diagnose covid-19 in x-ray images,” arXiv preprint
deep learningtechniques for image arXiv:2003.11055, 2020.
segmentation,”ACM Computing Surveys (CSUR),vol. [18]. A. M. Ismael and A. S¸engür, “Deep learning approaches
52, no. 4, pp. 1–35, 2019. for covid-19 detectionbasedonchestx-
[5]. G. Wang, W. Li, M. A. Zuluaga, R. Pratt, P. A. Patel, M. rayimages,”ExpertSystemswithApplications, vol. 164, p.
Aertsen, T. Doel,A. L. David, J. Deprest, S. 114054, 2021.
Ourselinet al., “Interactive medical [19]. A. A. Ardakani, A. R. Kanafi, U. R. Acharya, N.
imagesegmentation using deep learning with image- Khadem, and A. Mohammadi, “Application of deep
specific fine tuning,”IEEEtransactions on medical learning technique to manage covid-19 in routine clinical
imaging, vol. 37, no. 7, pp. 1562–1573, 2018. practice using ct images: Results of 10 convolutional
[6]. S. Minaee, Y. Y. Boykov, F. Porikli, A. J. Plaza, neural networks, ”Computers in biology and
N. Kehtarnavaz, andD. Terzopoulos, “Image medicine,vol.121,p.103795,2020.
segmentation using deep learning: A survey,”IEEE [20]. Q. Ni, Z. Y. Sun, L. Qi, W. Chen, Y. Yang, L. Wang, X.
Transactions on Pattern Analysis and Machine Zhang, L. Yang, Y. Fang, Z. Xing et al., “A deep
Intelligence, 2021. learning approach to characterize 2019 coronavirus
[7]. Z.-Q. Zhao, P. Zheng, S.-t. Xu, and X. Wu, “Object disease (covid-19) pneumonia in chest ct images,”
detection with deeplearning: A review,”IEEE European radiology, vol. 30, no. 12, pp. 6517–6527,
transactions on neural networks and learningsystems, 2020.
vol. 30, no. 11, pp. 3212–3232, 2019. [21]. M. Loey, G. Manogaran, M. H. N. Taha, and N. E. M.
[8]. A. R. Pathak, M. Pandey, and S. Rautaray, “Application Khalifa, “A hybrid deep transfer learning model with
of deep learningfor object detection,”Procedia computer machine learning methods for face mask detection in the
science, vol. 132, pp. 1706–1717,2018. era of the covid-19 pandemic,” Measurement, vol. 167,
[9]. J. Han, D. Zhang, G. Cheng, N. Liu, and D. Xu, p. 108288, 2021. [Online]. Available:
“Advanced deep-learningtechniques for salient and https://www.sciencedirect.com/science/article/pii/S0263
category-specific object detection: a survey,”IEEE 224120308289
Signal Processing Magazine, vol. 35, no. 1, pp. 84–100, [22]. B. Qin and D. Li, “Identifying facemask-wearing
2018. condition using image super-
[10]. R. R. Selvaraju, M. Cogswell, A. Das, R. Vedantam, D. resolutionwithclassificationnetworktopreventcovid-
Parikh, and D. Batra, “Grad-cam: Visual explanations 19,”Sensors, vol. 20, no. 18, p. 5236, 2020.
from deep networks via gradientbased localization,” in [23]. M. S. Ejaz, M. R. Islam, M. Sifatullah, and A. Sarker,
Proceedings of the IEEE international conference on “Implementation of principal component analysis on
computer vision, 2017, pp. 618–626. masked and non-masked face recognition,” in 2019 1st
[11]. N. C. Brienen, A. Timen, J. Wallinga, J. E. Van international conference on advances in science,
Steenbergen, and P. F. Teunis, “The effect of mask use engineering and robotics technology (ICASERT). IEEE,
on the spread of influenza during a pandemic,” Risk 2019, pp. 1–5.
Analysis: An International Journal, vol. 30, no. 8, pp. [24]. E. Hjelmås and B. K. Low, “Face detection: A survey,”
1210–1218, 2010. Computer vision and image understanding, vol. 83, no.
[12]. M. F. Uddin, J. Lee, S. Rizvi, and S. Hamada, 3, pp. 236–274, 2001.
“Proposing enhanced feature engineering and a selection [25]. A. Kumar, A. Kaur, and M. Kumar, “Face detection
model for machine learning processes,” Applied techniques: a review,” Artificial Intelligence Review,
Sciences, vol. 8, no. 4, p. 646, 2018. vol. 52, no. 2, pp. 927–948, 2019.
[13]. L. H. Gilpin, D. Bau, B. Z. Yuan, A. Bajwa, M. Specter, [26]. C. Li, R. Wang, J. Li, and L. Fei, “Face detection based
and L. Kagal, “Explaining explanations: An overview of on yolov3,” in Recent Trends in Intelligent Computing,
interpretability of machine learning,” in 2018 IEEE 5th Communication and Devices. Springer, 2020, pp. 277–
International Conference on data science and advanced 284.
analytics (DSAA). IEEE, 2018, pp. 80–89. [27]. X. Sun, P. Wu, and S. C. Hoi, “Face detection using deep
[14]. L. E. Raileanu and K. Stoffel, “Theoretical comparison learning: An improved faster rcnn approach,”
between the gini index and information gain criteria,” Neurocomputing, vol. 299, pp. 42–50, 2018.
Annals of Mathematics and Artificial Intelligence, vol. [28]. W. Wu, Y. Yin, X. Wang, and D. Xu, “Face detection
41, no. 1, pp. 77–93, 2004. with different scales based on faster r-cnn,” IEEE
[15]. L. Huang, R. Han, T. Ai, P. Yu, H. Kang, Q. Tao, and L. transactions on cybernetics, vol. 49, no. 11, pp. 4017–
Xia, “Serial quantitative chest ct assessment of covid-19: 4028, 2018.
a deep learning approach,” Radiology: Cardiothoracic [29]. R. Qi, R.-S. Jia, Q.-C. Mao, H.-M. Sun, and L.-Q. Zuo,
Imaging, vol. 2, no. 2, p. e200075, 2020. “Face detection method based on cascaded convolutional
[16]. Y.Oh,S.Park,andJ.C.Ye,“Deeplearningcovid- networks,” IEEE Access, vol. 7, pp. 110740–110748,
19featuresoncxrusing 2019.
limitedtrainingdatasets,”IEEEtransactionsonmedicalimag
ing,vol.39, no. 8, pp. 2688–2700, 2020.

ISSN No:-2456-2165
[30]. C. Li, R. Wang, J. Li, and L. Fei, “Face detection based maps,” in International Conference on Medical Imaging
on yolov3,” in Recent Trends in Intelligent Computing, with Deep Learning. PMLR, 2019, pp. 370–379.
Communication and Devices. Springer, 2020, pp. 277– [43]. Y. Zhu, Y. Zhou, H. Xu, Q. Ye, D. Doermann, and J.
284. Jiao, “Learning instance activation maps for weakly
[31]. J.RedmonandA.Farhadi,“Yolov3:Anincrementalimprove supervised instance segmentation,” in Proceedings of the
ment,”arXiv preprint arXiv:1804.02767, 2018. IEEE/CVF Conference on Computer Vision and Pattern
[32]. X. Kong, K. Wang, S. Wang, X. Wang, X. Jiang, Y. Recognition, 2019, pp. 3116–3125.
Guo, G. Shen, X. Chen, and Q. Ni, “Real-time mask [44]. M. B. Muhammad and M. Yeasin, “Eigen-cam: Class
identification for covid-19: an edgecomputing- activation map using principal components,” in 2020
baseddeeplearningframework,”IEEEInternetofThings International Joint Conference on Neural Networks
Journal, 2021. (IJCNN). IEEE, 2020, pp. 1–7.
[33]. W. Yang, H. Huang, Z. Zhang, X. Chen, K. Huang, and [45]. F. I. Eyiokur, H. K. Ekenel, and A. Waibel, “A computer
S. Zhang, “Towards rich feature discovery with class vision system to help prevent the transmission of covid-
activation maps augmentation for person re- 19,” arXiv preprint arXiv:2103.08773, 2021.
identification,” in Proceedings of the IEEE/CVF [46]. W. Intelligence, “Face Mask Detection Dataset,”
Conference on Computer Vision and Pattern Recognition www.kaggle.com/wobotintelligence/face-mask-
(CVPR), June 2019. detection-dataset, 2021.
[34]. Z. Dai, M. Chen, X. Gu, S. Zhu, and P. Tan, “Batch [47]. A. Jangra, “Face Mask Detection,”
dropblock network for person re-identification and www.kaggle.com/ashishjangra27/facemask-12k-images-
beyond,” in Proceedings of the IEEE/CVF International dataset, 2021.
Conference on Computer Vision, 2019, pp. 3691–3701. [48]. Huawei, “Huawei M2150,”
[35]. S. Yang, Y. Kim, Y. Kim, and C. Kim, “Combinational support.huawei.com/enterprise/en/intelligentvision/m215
class activation maps for weakly supervised object 0-10-ei-pid-250673491, 2021.
localization,” in Proceedings of the IEEE/CVF Winter [49]. G. Litjens, T. Kooi, B. E. Bejnordi, A. A. A. Setio, F.
Conference on Applications of Computer Vision Ciompi, M. Ghafoorian, J. A. Van Der Laak, B. Van
(WACV), March 2020. Ginneken, and C. I. Sánchez, “A survey on deep learning
[36]. V. Gupta, M. Demirer, M. Bigelow, M. Y. Sarah, S. Y. in medical image analysis,” Medical image analysis, vol.
Joseph, L. M. Prevedello, R. D. White, and B. S. Erdal, 42, pp. 60–88, 2017.
“Using transfer learning and class activation maps [50]. A. Krizhevsky, I. Sutskever, and G. E. Hinton,
supporting detection and localization of femoral “Imagenet classification with deep convolutional neural
fracturesonanteroposteriorradiographs,”in2020IEEE17thI networks,” Communications of the ACM, vol. 60, no. 6,
nternational Symposium on Biomedical Imaging (ISBI). pp. 84–90, 2017.
IEEE, 2020, pp. 1526–1529. [51]. Z. Guo, L. Zhang, and D. Zhang, “A completed
[37]. W. Bae, J. Noh, and G. Kim, “Rethinking class modeling of local binary pattern operator for texture
activation mapping for weakly supervised object classification,” IEEE transactions on image processing,
localization,” in European Conference on Computer vol. 19, no. 6, pp. 1657–1663, 2010.
Vision. Springer, 2020, pp. 618–634. [52]. A. J. Izenman, “Linear discriminant analysis,” in Modern
[38]. J. Cai, F. Xing, A. Batra, F. Liu, G. A. Walter, K. multivariate statistical techniques. Springer, 2013, pp.
Vandenborne, and L. Yang, “Texture analysis for 237–280.
muscular dystrophy classification in mri with improved [53]. A. Tzotsos and D. Argialas, “Support vector machine
class activation mapping,” Pattern recognition, vol. 86, classification for object-based image analysis,” in
pp. 368–375, 2019. Object-Based Image Analysis. Springer, 2008, pp. 663–
[39]. K. Fu, W. Dai, Y. Zhang, Z. Wang, M. Yan, and X. Sun, 677.
“Multicam: Multiple class activation mapping for aircraft [54]. C. Tan, F. Sun, T. Kong, W. Zhang, C. Yang, and C. Liu,
recognition in remote sensing images,” Remote Sensing, “A survey on deep transfer learning,” in International
vol. 11, no. 5, p. 544, 2019. conference on artificial neural networks. Springer, 2018,
[40]. B. Vasu, F. U. Rahman, and A. Savakis, “Aerial-cam: pp. 270–279.
Salient structures and textures in network class activation [55]. J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-
maps of aerial imagery,” in 2018 IEEE 13th Image, Fei, “Imagenet: A large-scale hierarchical image
Video, and Multidimensional Signal Processing database,” in 2009 IEEE conference on computer vision
Workshop (IVMSP). IEEE, 2018, pp. 1–5. and pattern recognition. IEEE, 2009, pp. 248–255.
[41]. Y. Wang, F. Zhu, C. J. Boushey, and E. J. Delp, [56]. K. Simonyan and A. Zisserman, “Very deep
“Weakly supervised food image segmentation using convolutional networks for large-scale image
class activation maps,” in 2017 IEEE International recognition,” vol. abs/1409.1556, 2014. [Online].
Conference on Image Processing (ICIP), 2017, pp. 1277– Available: http://arxiv.org/abs/1409.1556
1281. [57]. K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual
[42]. H.-G. Nguyen, A. Pica, J. Hrbacek, D. C. Weber, F. La learning for image recognition,” CoRR, vol.
Rosa, A. Schalenbourg, R. Sznitman, and M. B. Cuadra, abs/1512.03385, 2015. [Online]. Available:
“A novel segmentation framework for uveal melanoma http://arxiv.org/abs/1512.03385
in magnetic resonance imaging based on class activation

ISSN No:-2456-2165
[58]. C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z.
Wojna, “Rethinking the inception architecture for
computer vision,” CoRR, vol. abs/1512.00567, 2015.
[Online]. Available: http://arxiv.org/abs/1512.00567
[59]. R. R. Selvaraju, A. Das, R. Vedantam, M. Cogswell, D.
Parikh, and D. Batra, “Grad-cam: Why did you say that?
visual explanations from deep networks via gradient-
based localization,” CoRR, vol. abs/1610.02391, 2016.
[Online]. Available: http://arxiv.org/abs/1610.02391

Explainable Artificial Intelligence How Face Masks Are Detected Via Deep Neural Networks

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Explainable Artificial Intelligence How Face Masks Are Detected Via Deep Neural Networks

Uploaded by

Copyright:

Available Formats

Volume 6, Issue 9, September – 2021 International Journal of Innovative Science and Research Technology

Explainable Artificial Intelligence: How Face Masks

Abstract:- The image classification has become a well-

Keywords:- Classification; Covid-19; Explainable Artificial

IJISRT21SEP702 www.ijisrt.com 1104

[23] analyzed no-masked and masked face recognition

In [32], an edge computing-based mask detection model

Convolutional neural networks have ability to learn

IJISRT21SEP702 www.ijisrt.com 1105

This part describes how to classify images and create

A. Convolutional Neural Networks (CNNs)

CNNs consist of convolutional and neural structures

IJISRT21SEP702 www.ijisrt.com 1106

In this study a ResNet architecture called ResNet-18 is

Since the pre-trained models are trained with 1000

C. Class Activation Maps As shown at Algorithm 2, an input image is resized to

IJISRT21SEP702 www.ijisrt.com 1107

Detected face images with and without mask can be seen

Fig 9. The Overall Process.

As seen in Fig. 9, first process is collection of images. To

IJISRT21SEP702 www.ijisrt.com 1108

When a classification is realized using deep neural

To explain what a CNN learn from images with and

Explainable Artificial Intelligence is a relatively new

In this study, we are showing why face images are

In future works, we are planning to grow the study by

[1]. A. Mikołajczyk and M. Grochowski, “Data

IJISRT21SEP702 www.ijisrt.com 1109

IJISRT21SEP702 www.ijisrt.com 1110

IJISRT21SEP702 www.ijisrt.com 1111

IJISRT21SEP702 www.ijisrt.com 1112

You might also like