Automatic Gate for Body Temperature Check and Masks Wearing Compliance Using an Embedded System and Deep Learning

- A new coronavirus variant known as n-Cov has emerged with a fast transmission rate. The World Health Organization (WHO) has declared the related disease or COVID-19 as a global pandemic that requires special handling. Many parties have shown efforts to reduce virus transmission by implementing health protocols and adapting a new normal lifestyle. Implementation of the health protocol creates new problems, especially in the health check at the main entrance. The officers in charge of measuring body temperature are at risk of getting infected by COVID. Such a measurement is prone to errors. This study proposed a solution to build an automatic gate system that worked based on the new normal health protocol. The system utilizes the MLX90614 contactless temperature sensor to probe body temperature. It applies deep learning implementing the Convolutional Neural Network (CNN) algorithm with the MobileNetV2 architecture as a determinant of the conditions of wearing face masks. The system is equipped with an IoT-based remote controller to control the gate. Experimental results prove that the system works well. Temperature measurement takes a response time of 20 seconds for each user with 99% accuracy for the sensor and masks classification model.


Introduction
Coronavirus Disease 2019 (COVID- 19) is an infectious disease caused by a novel corona virus variant discovered in 2019. COVID-19 causes diseases in humans such as the flu, with a very fast transmission through physical contact interactions or with other media within a close range [1]. The World Health Organization (WHO) has determined that the spread of COVID-19 is a global pandemic that requires special handling to suppress its transmission. The COVID-19 virus has spread in several regions of Indonesia with the number of cases reaching 45892 cases, and the ratio of patients recovering 40% and dying by 5% [2]. The Indonesian government to handle COVID-19 cases has implemented several policies such as Large-Scale Social Restrictions (Pembatasan Sosial Berskala Besar -PSBB), Enforcement of Community Activity Restrictions (Pemberlakukan Pembatasan Kegiatan Masyarakat -PPKM), and adaptation for a new normal lifestyle. The implementation of the new normal lifestyle is carried out to break the chain of the spread of the COVID-19 virus in line with economic and community activities that must remain active during this pandemic. The ITERA's environment also adapts the implementation of the new normal protocol to deal with the activities of the academic community that cannot be done online. ITERA through the New Life Adaptation Development Agency (Badan Pembina Adaptasi Kehidupan Baru -BPAKB) ITERA Campus has compiled a new normal protocol with the standards set by the Ministry of Education and Culture (Kemdikbud). The new normal protocol implemented consists of several main points such as the body temperature and the use of mask checks which are carried out at the ITERA main gate (one gate system) [3]. The gate is used as an intermediary and controls the incoming and outgoing vehicles [4], as well as the academic community. Every ITERA academic community who wants to enter the ITERA's environment need to go through body temperature and mask-wearing check by ITERA officers. The checking is carried out by officers at the main gate caused various problems, such as a high risk of transmitting the virus to officers and the emergence of queues if the number of visitors was not proportional to the number of existing officers.
Of the various problems that arise due to body temperature and masks wearing checks by officers, it is necessary to have a system that can replace the role of officers and can be integrated automatically through the campus gate. Research related to the development of an intelligent system connected to the gate has been carried out previously, in which the gate can be opened through facial recognition using the HAAR Cascade algorithm [5]. Another similar development has also been carried out previously to deal with influenza by measuring normal body temperature without contact using MLX90614 in monitoring attendance at schools [6]. In addition, research related to the face mask detection model can produce 2-3% better accuracy on public datasets, through the application of the MobileNetV2 architecture [7].
Based on the explanation of previous studies, in this study, an automatic gate system was developed using the MLX90614 sensor to measure non-contact body temperature, which has good accuracy [8], and the MobileNetV2 architecture to categorize the use of masks by considering the low model size, as well as quite capable computing.

Method
The research methodology used in this study consists of several stages as shown in Figure 1. a. Problem Identification The initial stage carried out in this research is the identification of the problem, (especially to reduce interaction of officers) at the ITERA campus gate with an automatic gate system that can check the temperature and detect the use of masks.

b. Literature Studies and Literature Studies
The literature study was conducted to obtain relevant references related to the research. The references used are sourced from various domestic and international journals regarding the methods and objects studied to avoid repeated research. The literature study was conducted to obtain the basic and conceptual theory that underlies this research. Some of the basic theories in this research are: a) Computer Vision The technology which is development part of artificial intelligence, has an important role in processing images and videos by an electronic machine. The application of computer vision in various systems has the main goal of being able to imitate the ability of the human eye so that machines can recognize objects through images or videos provided. In addition to imitating the human ability to see, several important things need to be considered in the field of Computer Vision, specifically systems with low power and resources but still running powerful [9]. Currently, the use of computer vision has expanded and is widely used in the development of intelligent devices capable of detecting and recognizing various objects such as faces (Face Recognition) and other objects [10]. b) Body temperature Temperature is the level of hot or cold in a situation that can be measured using an instrument called a thermometer. Body temperature is generally divided into two types, namely skin/surface temperature and core temperature. The temperature on the inside of the body is called core temperature, which has a constant level in the non-febrile condition, which is around ± 1ºF (± 0.6º C). Furthermore, skin temperature is the level of heat found on the surface of the skin, with fluctuating values because it can be influenced by the state of the ambient temperature. If excessive heat is formed in the body, then the skin temperature will rise, and vice versa. In general, the normal human body temperature ranges from 97ºF (36ºC) -99.5ºF (37.5ºC). Body temperature can also be affected by the activities carried out, for example, the temperature will rise when doing sports or other physical activities. In addition, extreme environmental temperatures can also affect body temperature [11]. c) MobileNetV2 MobileNetV2 is a neural network architecture that was developed to overcome the problem of limited mobile device resources with several improvements to support efficiency [12]. MobileNetV2 development is done by reducing the number of operations and memory usage while maintaining the accuracy of the model. MobileNetV2 can be implemented efficiently on a variety of standard operations and modern frameworks. In addition, MobileNetV2 is very suitable for the implementation of embedded systems because it allows memory reduction during the use [13]. d) OpenCV OpenCV is an open-source library that can be used for computer vision applications. Initially, this library was developed to accelerate the performance and capabilities of computing machines in the fields of computer vision and artificial intelligence by providing robust infrastructure. OpenCV, which was developed using the C and C++ programming languages, can operate well on several operating systems such as Linux, Windows, and macOS [14].
In addition, development is also carried out in several other programming languages and interfaces such as Python, Java, Matlab, and others. OpenCV provides an infrastructure that makes it easier for users to apply computer vision concepts (e.g., image processing). Currently, OpenCV contains more than 500 functions that can be applied to various problem areas such as health, security, and others [15]. e) Tensorflow Tensorflow is a widely used library for deploying and developing deep neural networks which is developed by the Google Brain team. Tensorflow is an opensource machine learning system that can operate in heterogeneous environments. Tensorflow uses dataflow to represent computations, shared states, and other operations. It maps each node of the dataflow graph on multiple machines within a cluster, and on each machine across multiple computing devices, such as multicore CPUs, general-purpose GPUs, and custom-designed ASICs known as Tensor Processing Units (TPUs). This architecture gives developers flexibility in designing parameters and managing shared states on the system [16].
c. System planning The design of system aims to compile components and system requirements. In addition, at this stage, the circuit and workflow of the system are also made. In the development of this automatic gate system, there are several main requirements needed to build a smart device that is integrated with the gate. The main requirements needed for this automatic gate system can be observed in Table 1. A microcontroller that has role as a data processing center, which can work in three stages, namely Input, Process, and Output.

MLX90614
A non-contact temperature sensor that functions to measure body temperature by utilizing a passive infrared mechanism.

Raspberry Pi Camera
Retrieve image data in image or video file types to be used as input data.

Raspberry Pi Display / Monitor
Graphical data viewer generated from microcontroller processing.

Relay
A switch that serves as a controller and a large electric current with a small electric current control.

HC-SR04
The ultrasonic sensor serves to measure the distance of the object relative to the sensor.

MicroSD
A tool that is used as storage space for the operating system and other files on the Raspberry Pi.
The automatic gate system was developed using the Raspberry Pi4 microcontroller as a center for processing data obtained from sensors and cameras. In addition, actuators in the form of relays are also controlled by a microcontroller based on data validation that has been obtained previously. The interaction between each component can be represented through a schematic diagram which can be seen in Figure 2.

Figure 2. System schematic diagram
The workflow of the automatic gate system is designed according to the defined new normal protocol. To open the gate, two methods can be done, the system must check the user's compliance with the new normal protocol through the body temperature checks and the use of masks detection. The workflow of the gate system is represented through a block diagram which can be observed in Figure 3. device. The software for controlling the gate system generally consists of several functions which can be seen through the use case diagram in Figure 4.

d. Dataset Preparation
The dataset is a collection of data that is used to develop an existing face mask classification model. In this study, there are three sources of the dataset used, which are then used to form the most optimal model to be implemented in the system. The source of the dataset is obtained through the Kaggle data scientist platform. Dataset specifications can be seen in Table 2.
The dataset used contains a collection of images with the extension *jpg or *png. The image is divided into two labels/classes, 'with_mask' and 'without_mask' labels which are arranged into two different folders with the same composition so that the dataset is in a balanced state.  e. Data Pre-Processing This stage is useful for processing and manipulating datasets before they are included in the model or training process. Data pre-processing includes label transformation, data sharing, and data augmentation. The label transformation is done to change the data label ('with_mask', 'without_mask') into a NumPy array [1,0] so that it can be used in the model creation. Data sharing is done to divide the entire dataset into two parts, 75% of training data and 25% of testing data. Data Augmentation is a step taken to enrich data by manipulating a set of images. Augmentation is useful for assisting the model in generalizing features and increasing accuracy, especially during training [20]. In the field of computer vision, augmented data can be generated by performing geometric transformations on the original image, so that another image is obtained. The geometry transformation process carried out generally includes several processes such as translations, rotations, scaling, shearing, zoom, and flip. An illustration of image augmentation can be seen in Figure 5.

f. Model Development
The model creation stage is done by designing the model architecture and training the model based on the dataset that has been prepared before the data preprocessing stage. The face mask classification model created uses the MobileNetV2 architecture as the base model, while the fully connected layer is arranged according to the dataset and needs. In general, MobileNetV2 still has the same architectural form as MobileNetV1, but this architecture has two additions facility, namely the existence of linear bottlenecks and shortcut connections. MobileNetV2 consists of a convolution layer using 32 filters, followed by 19 residual bottleneck layers. In addition, MobileNetV2 uses ReLU6 for non-linearity transformation [13]. A simple visualization of the working architecture of MobileNetV2 can be seen in Figure 6. MobileNetV2 architecture generally consists of layers that are arranged in such a way. MobileNetV2 has layers with specifications as shown in Table 3. In this study, the default fully connected layer on MobileNetV2 was not used and was redesigned to match the face mask classification and dataset used. The fully connected layer used can be seen in Table 4.
The fully connected layer consists of 5 layers connected to the MobileNetV2 base model, in which there is a 2D average pooling layer with a pooling size of 7 x 7, followed by a flatten layer, relu with 128 dense, 0.5 dropout, and softmax activation function with size 2. Layer pooling is generally used to reduce the size of the matrix. Layer pooling is a filter that has a certain size and will alternately shift in the feature area. A simple application of the processing applied to the pooling layer can be seen in Figure 7.

Figure 8. Illustration of average pooling process
Average pooling is one type of layer pooling that calculates each block of values through the average function [21] [22]. The next stage is flattening, which is a layer with the function of "flattening" the matrix into a form that is easily spreadable into neural networks. Figure 8 provides a visualization of the simple process of flattening, with a focus on allowing data in the form of a matrix to be entered into the input layer of the neural network. Furthermore, on the neural network layer components, Relu is used as an activation function to convert each linear component into non-linear ones. Formula (1) can be used to calculate the relu activation function [23] [24].
Relu in general only makes a limit on the number zero which means if the value of x is 0 then x = 0 and if x > 0 then the value of x remains.
At the end of processing the neural network layer, the Softmax function is used which is an activation function in the output layer. Softmax can be used to classify multiple classes [25]. This function generally calculates each probability of some cases, which in this case calculates the probability of the label to be classified by the model. The output of this function is a range of probability values from zero to one. Softmax activation function can be calculated by equation (2) [26] [27].
(2) Information: = vector input in the softmax function, consisting of (z0…zK) z i = the input vector element is a real number = fungsi exponensial yang diterapkan pada setiap z i = normalization term to ensure that the output is a valid probability distribution K = number of classes After designing the model, the next step is training process, where there is a hyperparameter configuration that is used to optimize model training to form a good model [28]. The hyperparameter consists of batch size, learning rate, and the number of epochs used. The hyperparameters used in the mask classification model training process can be seen in Table 5. The training process that has been carried out will produce a model which will later be implemented in the system. In addition, the resulting model can be seen from the evaluation results through the Classification Report which consists of Precision, Recall, and F1-Score. Precision (Positive Predictive Value) [29] [28] is an evaluation result that describes how accurate the requested data is with the classification results or predictions given. Precision can be calculated by applying equation (3).
Recall/Sensitivity (True Positive Rate) is an evaluation result that describes the success of a model in finding data in a certain class/label. Equation (4) can be applied to obtain the recall value. (4) In the end, the classification quality calculation ends with a comparison of the F1-Score values, which compares the average value of precision with weighted recall. F1-Score, through equation (5), can describe the performance if the number of FP and FN is not balanced (nonsymmetric) (5) g. Implementation Implementation is the stage carried out to assemble the entire system components. The hardware components are then assembled so that they can be integrated into the flow as shown in Figure 9. Furthermore, after the model training process was carried out, the research produced a model with the extension *.model with the h5 storage format. The model is then stored in the internal storage space on the microcontroller. Based on the experiments in the previous training, the model that has the most optimal performance is the model using the 2nd dataset. This model will be used by the microcontroller during the mask classification process for users. The working process of the system in classifying masks can be seen in the following picture Several processes are carried out to classify the use of masks through video data captured by the camera. The model file that has been created in the training process along with the pre-trained model provided by OpenCV is entered into the system, which then the system will get a data stream in the form of image frames through the camera. From the image frame obtained, the model will extract the Region of Interest (ROI) to get user face data to be classified. This ROI data will later be included in the mask classification model. The results of the detection of the user's facial area will be displayed in the form of a boundary box with a percentage of classification accuracy. The results of the classification with the model can be seen in Figure 11.

Result
Automated gate system development results in embedded devices that are fully integrated between every hardware and software component. In addition, the gate system has a user interface consisting of several features such as home and sensor calibration which can be observed in Figure 12. In addition, there is the development of mobile applications that are used to control certain conditions. The mobile application was developed using a mobile multiplatform framework, Flutter [30]. The selection of Flutter is considered more effective and efficient for developing mobile applications on several platforms at once. Flutter uses the Dart programming language as the main programming language. The development of a mobile application on this gate system has several features such as calibration and direct gate control through relay activation, as can be seen in Figure 13. The results of the gate system development that have been carried out are then evaluated through several test components, namely sensor testing, classification model testing, and system performance testing. In addition, there are also model training results that are considered to obtain the best model to be implemented in the gate system.

Figure 13. System application user interface
Sensor testing used in this study consisted of the MLX90614 temperature sensor and the HC-SR04 ultrasonic sensor. Sensor testing is done by comparing the results of temperature measurements with standard measuring instruments, in this case, the temperature sensor will be compared with a thermogenic and ultrasonic sensor compared with a ruler. The test experiment was carried out 10 times and the sensor accuracy results were shown in Table 6.

Figure 14. Mobile application user interface
Based on the test data of the MLX90614 temperature sensor and the HC-SR04 ultrasonic sensor, the sensor accuracy rate is very good, reaching 99%. The use of temperature sensors MLX90614 and HC-SR04 can also generally work well with minimum standards equivalent to standard measuring instruments. Testing the mask classification model is done by looking at the results of the confusion matrix obtained after training the classification model. The confusion matrix obtained in the training process can be observed in Figure 15.
In the confusion matrix shown in Figure 15, a classification report can be calculated during model training and testing the model's ability to classify different types of masks. The results of the classification report for the mask classification model showed the highest results in the first and second datasets. The results of performance measurement on the classification can be observed in Table  7.  To choose the best face mask classification model, it is necessary to test the success of the model in classifying the variance of masks. The percentage of success of the classification model shows that the model with the second dataset excels with a value of 80% compared to the first dataset by 40% and the third dataset by 20%. Thus, the model that will be used is a model that is trained using the second dataset. In addition, the selected model will also be tested on various types of masks circulating in the community. The test results for the variation of masks can be seen in Table 8.
Tests were carried out to see the model's performance in recognizing mask variations, carried out on 15 variations of masks, and obtained an average accuracy of 99.63%. This shows that in general, the classification model can recognize masks that are generally used by the public Evaluation of response time or system operating time also needs to be considered to find out how fast the system works to take into account the possibility of queues. Evaluation of response time is done by testing 10 times. The results of the response time test can be observed in Table 9.
Based on the results of the response time test, the classification of masks has the largest response time, this is because the classification process requires a fairly high resource. The total time in general for the system to work is 19.8 seconds, which is quite fast compared to manual testing by officers.

Conclusion
Research results suggest that the automatic gate entrance system works well as expected. It checks body temperature accurately and quickly. It enforces the compliance of wearing masks. The gate system that uses the MLX90614 sensor is accurate enough to measure the body temperature without contact. The classification model using MobileNetV2 can classify the wearing of masks with an accuracy of 99.63%. In addition, the average working time of the system is 19.8 seconds per person which is fast enough to overcome the possibility of queues at the gate.