Detection of Cyber Malware Attack Based on Network Traffic Features Using Neural Network

-Various techniques have been developed to detect cyber malware attacks, such as behavior based method which utilizes the analysis of permissions and system calls made by a process. However, this technique cannot handle the types of malware that continue to evolve. Therefore, an analysis of other suspicious activities – namely network traffic or network traffic – need to be conducted. Network traffic acts as a medium for sending information used by malware developers to communicate with malware infecting a victim’s device. Malware analyzed in this study is divided into 3 classes, namely adware, general malware, and benign. The malware classification implements 79 features extracted from network traffic flow and an analysis of these features using a Neural Network that matches the characteristics of a time-series feature. The total flow of network traffic used is 442,240 data. The results showed that 15 main features selected based on literature studies resulted in F-measure 0.6404 with hidden neurons 12, learning rate 0.1, and epoch 300. As a comparison, the researchers chose 12 features based on the nature of the malware possessed, with the F-measure score of 0.666 with hidden neurons 12, learning rate 0.05, and epoch 300. This study found the importance of data normalization technique to ensure that no feature was far more dominant than other features. It was concluded that the analysis of network traffic features using Neural Network can be used to detect cyber malware attacks and more features does not imply better detection performance, but real-time malware detection is required for network traffic on IoT devices and smartphones.


Introduction
As the adoption of society towards technology increases, the number of IoT (Internet of Things) devices and smartphones usage has been increasing and widespread. Security threats on IoT devices and smartphones also increase. Various cyberattacks can be committed on IoT devices and smartphones, ranging from taking access rights, destructing the data, thieving important information, and recording personal activities of users when using IoT devices and smartphones [1]. Most of these cyberattacks enter the system through malicious software or malware that are successfully planted on IoT devices and smartphones.
Malware is an application that has a negative purpose, such as corrupting data, stealing important information, disrupting device performance, and taking over the system. This threat continues to increase every year. In 2017, it is found around 3.5 million new malwares only on Android smartphone devices [2]. One of the suspicious activities of malware is the use of network traffic -can be applied as a medium for sending confidential information in the form of PINs, bank account information, personal messages, and passwords to malware makers [3]. Malware can also utilize network traffic as a backdoor for other malwares to enter.
The network traffic on IoT devices and smartphones has the same basis as network traffic in general, which contains packets that have a header and data section [4]. Data is obtained and processed at the application layer, while headers are added at each layer. The size of each data and header varies with the specified limits. The packet contains the data that the sender wants to send from source to destination. The header contains the destination IP address, sender's IP address, source port, destination port, and several other related information. Most network traffic features are time-series.
In general, malware detection system classifies applications into adware, general malware, and benign [5]. Adware is a type of malware that displays advertisements on running software. Adware aims to increase revenue for software developers so that the advertised company pays for the adware. Each type of general malware is confirmed to have a negative purpose, such as damaging or stealing data. Benign is a normal type of application that does not have dangerous purposes; it runs according to what the application developer has written in the documentation section.
There are several efforts in detecting mobile malware that have been carried out using various approaches. Behavior-based approach that uses permissions and system calls as features, produces accuracy that is still relatively low with an average of 60%. Specifically, Simple Logistic 65.29%, Naive Bayes 65.29%, SMO 70.31% and Random Tree 54.79% [6]. Other studies using network traffic features using the Neural Network (NN) method to detect malware on smartphones have successfully detected malware botnets with a precision level of around 88.3% [7]. This result is much higher compared to the Naive Bayes and Logistic Regression methods, each of which has a value of 7% and 32% [7]. In addition, the NN method successfully outperformed the Support Vector Machine (SVM) method in classifying network traffic [8]. NN method is often used as a classification method because of its robust characteristics. It can even be used for quality classification [9]. Detecting malware through network traffic analysis -which is mostly in time-series data -suits with the NN machine learning method.
The weakness of the previous research is that the NN method is carried out on all network traffic features, despite there are several network features that has a more important role than other network traffic features. For example, the network destination port is more important than the length of the header contents. Second, the use all network traffic features results in the increase of the internal errors that carried in the data. Third, features with large values automatically weigh higher, for example the port values commonly used are much smaller, when being compared to the value of data flow across the network [5].
The difference between this study and previous research is the network traffic dataset, the combination of features, and the iteration of the NN configuration applied. The dataset applied in this research obtained from the Canadian Institute for Cybersecurity, University of New Brunswick [10] combined with sample data collected at the Harapan Bangsa Institute of Technology Computer Laboratory (ITHB). A total of 1900 android applications with a percentage of 20% malware and 80% benign. Malware is divided into two types including adware and general malware. The combination of features is carried out based on literature studies to obtain the intersection of network traffic features that are frequently used in malware detection system. The iteration of the NN configuration is conducted by programming that concern to learning rate, epoch, and parameter evaluation. The purpose of this study is to obtain the configuration of the NN model to detect cyber-type malware attacks and to investigate the combination of network traffic features that can result in high precision, recall, and F-measure in the detection of mobile malware using NN.

Methods a. Research Flowchart
The research steps are arranged in the form of a flowchart, which begins with preprocessing. The preprocessing conducted is the normalization of features that will be used by dividing the features' values by the maximum value of each feature. Hence, this process will minimize features, so that a feature does not dominate other features.
Next, the learning stage applies the Neural Network method with backpropagation algorithm and the testing phase uses feed-forward method. In the initial phase, the weights will be randomly assigned in accordance with the previous provisions and they are stored in the file weight. Learning outcomes will give new weight values. The test will use the weight in the previous learning file. The test output is divided into 3, namely benign, adware, or general malware. Neural networks are included in supervised learning, with the resulting model in the form of weight [11]. The weights are used at the test stage and the output is mapped to the activation function to determine which label the output refers to.

Figure 2. Neural Network Component
As shown in Figure 2, There are 3 main layers in the Neural Network, namely the input layer, hidden layer and output layer. It is also drawn several circles of various colors according to their role. The blue circles are called nodes or neurons, while the red circles are biasbenefit to increase the flexibility of the model.
The input layer acts as the layer that receives initial input. The input obtained is processed to produce output on the hidden layer. The hidden layer is situated between the input and output layers and is useful for supporting neural networks learning complex features. The hidden layer itself can contain several layers. Each layer in the hidden layer may have different number of neurons. The hidden layer will produce output which then subjects to an activation function, to be mapped to the class in the output layer.  Figure 3 shows that each neuron has a weight according to the number of connections with other neurons. Output calculation is influenced by the weight and input values, which the results will then be processed with an activation function. According to Stevanovic [7], this mechanism makes Neural Network able to read and analyze simultaneously many features of network traffic for detection of malware with a high degree of precision.
In this study, three layers will be used, including the input layer, hidden layer and output layer. The input layer has a number of neurons according to the number of features used. In the hidden layer, only one layer will be used with the number of neurons tested, such as 4, 5, 6, and 12. The output layer will produce output in the form of 3 classes, namely benign, adware, and general malware. The test will apply several combinations of Neural Network parameters including learning rate, hidden neurons and the number of epochs. The learning rates tested are 0.1, 0.05, and 0.01 with the number of epoch 100, 200, and 300.

c. Dataset
Dataset used is a pcap (packet capture) file that contains network traffic packets with a total of 79 features. The pcap file was earned from a total of 1900 android applications with a percentage of 20% malware and 80% benign. The malware dataset is divided into three groups, including 250 adware applications, 150 general malware applications, and 1,500 benign applications. In the training data, there are 2,312 network traffic flows from general malware, 149,871 for adware, and 201,609 for benign, while in data testing there are 1,626 general flow malware, 24,271 flow adware, and 62,551 flow benign. The total flow of network traffic used is 442,240 data. By using the CICFlowMeter application, the pcap file is converted to CSV file, so that one flow means one line of data.

d. Feature Combination Analysis
The combination of features that will be used in the Neural Network is chosen based on the analysis, obtained from the literature study. The results of the literature study can be observed in Table 1. As a comparison, researchers chose 12 features according to researchers' understanding regarding malware. Adware variant has characteristics that interrupts the application to display the advertisements which is actually malicious code. This causes a lot of flow in the forward and backward packages. The twelve features selected by the researchers did not overlap with the features of the literature study results, and are informed in Table 2.

e. Objective and Evaluation
From the analysis of dataset, it was found that class imbalance occurred in malware label data, which was only 20% compared to benign (80%) [10] resulting in an evaluation computed with ordinary accuracy metric to be insufficient. Therefore, in this case, F-measure was used as a metric instead of accuracy. The F-measure is used to help in drawing conclusions about which Neural Network parameters are best implemented. The advantage of the F-measure is able to consider precision and recall into a single unit that is interconnected with one another. Table  2 shows the confusion matrix used to obtain the values of True Positive, False Positive, True Negative and False Negative.   (3) are employed for determining the value of precision, recall, and F-measure, respectively. (1)

Results and Discussion
The implementation and testing environment is conducted in cloud computing since the CSV data that must be processed is quite large, both for training and testing. Weight configurations on the Neural Network are randomly generated. Then, the first training process is carried out -the weights are updated. The training process is conducted continuously until the specified epoch is finished. After that, testing is carried out with feed forward. Table 4 shows a comparison of Neural Network results with features obtained from literature studies and features earned from researchers' knowledge.
Complete test results for each combination of features are given in the supplement of this article. The highest F-measure was achieved for hidden neurons number 12. These results are consistent with Stevanovich's research [7,12] which states that the more hidden neurons used, the Neural Network performance tends to be better until it finds a saturation point. This is different for learning rate. Comparison of learning rate and epoch for each combination of features in hidden neurons totaling 12 can be seen in Table  5.
A higher learning rate does not guarantee that the F-measure results will also be better. In the combination of researchers' features, the best results are achieved when the learning rate is 0.05 only. The combination of literature study features does achieve the best results with maximum configuration of Neural Network parameters (learning rate 0.1 and epoch 300). Technically, learning rate is the magnitude of change given to the weight which is changed according to the error value. Whereas, the epoch indicates the number of iterations performed by the computer. Learning rate that is overly high or low might result in new weights at further deviation than the expected weights. From Table 5, it is shown that in the combination of researchers' features, there are several learning processes that produce a value of 0 for precision, recall, and F-measure. This is assumed that the model produced with these parameters experienced underfitting when the learning rate is 0.01 and 0.1.
The F-measure score of the combination of 12 researchers' features is greater than the combination of 15 features of literature studies (0.6660 > 0.6404). This shows that using more features does not necessarily improve the accuracy of malware detection on the Neural Network. It is obvious that the two sets of feature combinations do not intersect, but have slightly different F-measure values. It means that there are still combinations of features that are likely to produce F-measure values better than both. For this reason, researchers merged the two combinations of features and conducted testing and training once again. The results of the merged combination of 27 features earned the highest score of F-measure on the number of hidden neurons 12, learning rate 0.1, and epoch 300; the resulted F-measure is 0.6395 (see Table 4). This score is lower than the results of a combination of literature study features. These results once again show that more features do not necessarily improve detection accuracy. This is because the more features used might result in more internal errors which were involved in the learning process. Each feature has internal errors, such as errors due to measurement or errors due to rounding values [13]. Another factor is that each feature has its own contribution in malware detection and there is a possibility that features that are combined together have the effect of eliminating each other, so that the detection accuracy might decrease [15].

Conclusion
Detection of cyber malware attacks based on network traffic features using Neural Network results in different F-measure values for different combinations of features. A combination of features based on literature studies (15 features) produces an F-measure of 0.6404, a combination of researchers' analysis features (12 features) produces an F-measure of 0.6660, and a combination of the two combined features (27 features) produces an F-measure of 0.6395. The conclusion is that the number of features does not mean that the accuracy of malware detection will increase. Instead, an improper combination of features can reduce detection accuracy.
This research uses Dataset with 442,240 data which is a combination of existing Dataset and the results of laboratory experiments, for the learning process. It is recommended that the existing Neural Network model can be applied to detect malware in real time on IoT devices and smartphones. Additionally, further research is also needed on the analysis of the combination of network traffic features to produce even better accuracy.