Convolutional Neural Network and Support Vector Machine in Classification of Flower Images

-Flowers are among the raw materials in many industries including the pharmaceuticals and cosmetics. Manual classification of flowers requires expert judgment of a botanist and can be time consuming and inconsistent. The ability to classify flowers using computers and technology is the right solution to solve this problem. There are two algorithms that are popular in image classification, namely Convolutional Neural Network (CNN) and Support Vector Machine (SVM). CNN is one of deep neural network classification algorithms while SVM is one of machine learning algorithm. This research was an effort to determine the best performer of the two methods in flower image classification. Our observation suggests that CNN outperform SVM in flower image classification. CNN gives an accuracy of 91.6%, precision of 91.6%, recall of 91.6% and F1 Score of 91.6%.


Introduction
Flowers are God's creation that have long been admired to resemble beauty and romance because of their diverse shapes and colors. Flowers are objects of esotericism, witchcraft, medicine, and source of food. Many industries pick flowers as raw materials. Numerous use of flowers and their huge diversity leads to the need of flower classification. Classifying flowers is a strenuous and painstaking task and requires expertise of a botanist. The ability to utilize computer software and technology in flower image classification will greatly help the pharmaceutical, cosmetic and other industries.
Flower image classification has invited many researchers to investigate new methods to obtain accurate results. Among the methods are deep learning and neural networks, which are experiencing rapid progress in artificial intelligence and innovation. The methods have taken a huge leap in recent years and is capable of surpassing humans in several tasks related to detecting and labeling objects.
Convolutional Neural Network (CNN) is one of the most widely used deep learning method, compared to several image processing methods [1]. CNN has solved some of the categorization problems of real-life object images. This method is relatively easy and flexible so that it is easy to integrate into various platforms. However, the convenience and advantages of this method have the consequence of providing hardware with high capabilities to obtain the desired model [2]. CNN processes images in the same way as the human brain, namely by teaching and learning [3]. Image classification is done by labeling images. Furthermore, with certain equations, the computer begins to classify.
Besides CNN, another method that can be used in image classification is the Support Vector Machine (SVM). SVM is the best and uncomplicated initial classification method [4]. SVM works by defining the boundary between two classes with the maximum distance from the closest data. The maximum distance is obtained by finding the best hyperplane (separation line) in the input space obtained by measuring the hyperplane margin.
When in a high-dimensional input space, SVM still has good generalizability without additional knowledge. This is the advantage of SVM. The weakness of SVM is that its use is limited to linear data. For use with non-linear data, modifications to the kernel function are required.
Color is one of the most dominant characteristics of flowers. Research related to color in flower images has been carried out by [5] who classifies flowers using the SVM method. The classification results are very good with an accuracy of 85.93%. By applying Artificial Neural Networks (ANN), [6] has carried out pre-processing activities to improve image quality, segmentation and feature extraction. The pre-processing step gives classification results with an accuracy of 81.19%. Research on flower classification using CNN has also been carried out by [7] and the model created produces an overall accuracy of 90% and is able to make predictions in real time of 98%. Meanwhile, [8] in his research conducted flower image classification with CNN, conducted a validation test with k-fold cross validation, and got the highest average accuracy of 76.49%.
So far, there has never been a comparison of the results of flower image classification using Deep Neural Network and classical Machine Learning methods to find out which method shows the best performance. Previous studies have focused more on the classification process in one method only.

Method
This research was carried out in several stages, namely processing the initial image data (pre-processing), sharing training and testing data, making classification models using CNN and SVM methods, evaluating classification models, results of performance comparisons between classification models, and drawing conclusions. The stages of the research method can be seen in Figure 1.

a. Dataset
The images or objects used as datasets in this study are roses, tulips and aster. The total dataset used is 1200, and was obtained randomly using Google image. The data in the Rose category are 400, Tulip flowers are 400 and Aster flowers are 400. The data in each category is divided into training data and testing data with details of 80% training data and 20% testing data [9]. Table 1 presents the method of dividing the dataset that has been done.

b. Preprocessing Data
Image quality can be improved by pre-processing. Noise removal and determination of the part of the image to be used are some of the methods used in pre-processing. Unwanted outer areas can be removed so as to make the image focus better and more uniform, this technique is also known as cropping [10].
As a research test material, good image quality is a must so that the classification results are more optimal, so noise cleaning is one thing that absolutely must be done. Noise is a disturbance in the image that does not reflect the true intensity of the actual scene [11].
All images must have the same size, so the image must be reshaped before modeling. Simple resizing can be done manually by stretching and changing the ratio of each image and then forcing it to a new size. The size of the input image must be chosen which is the most effective for use in classification, because a larger image input will result in a longer classification process, while a smaller image input makes the classification process faster [12]. This study uses an image with a size of 200x200 pixels. An example of an image after preprocessing is shown in Figure 2.

c. Classification Modeling
Classification modeling is carried out before data processing to determine what parameters are used in the algorithm. The goal is to determine the effect of the parameters on the resulting performance and get the best parameters.
In this research, CNN uses several parameters, namely Epochs, layer depth, and activation layer. The function used for activation is ReLU (Rectified Linear Unit). The number of selected Epochs is 20. The layers used are 4 layers and each layer uses a 3 x 3 kernel. Convolution filters for each layer are 16 in the first layer, 16, 32 in the second layer, 64 in the third layer and 128 in the fourth layer.
On CNN there is a pooling process that is used to reduce the dimensions of the image to the convoluted model which is a type of Non-Linear Down Sampling. The max pooling kernel size used in this research is 2 x 2. In order to avoid overfitting, it is necessary to add the Combined L2 + Dropout parameter, the Dropout value is 0.5 then the L2 value is 0.0001. Dense is the last parameter in the classifier network, totaling 3 layers with the number of neurons in each layer is 256, 256 and 128 neurons.
SVM uses several parameters, including the toll, which is the tolerance value that is used as a determinant of when the classification will stop. The toll value used in this study is 0.001.
Feature extraction and preprocessing are performed to transform data from image matrix to vector. The results of the transformation are used as input for the SVM classification which consists of 16 texture features.

d. Convolutional Neural Network
Image classification using the CNN method is carried out in two stages. The first stage is the training process and the second stage is the testing process. Feature extraction in the CNN method is carried out by performing the convolution, ReLU, maxpooling and flatten processes using the image as input. The features generated from this process are different from those generated by GLCM (Gray Level Co-occurrence Matrix) in feature extraction. The CNN method that will be used can be described with a flowchart as shown in Figure 3.

e. Support Vector Machine
SVM classification is divided into two stages, namely the training process and the testing process. The training process is carried out after the image undergoes a process of separating background and foreground. The purpose of this stage is that the RGB color is still owned by the foreground and the white color is owned by the background. [13]. The image is then segmented, namely by dividing the image into several areas or objects [14]. In this research, segmentation is done using edge detection. The segmented flower image is used as input for feature extraction data to facilitate the texture feature extraction process. GLCM is used to perform texture feature extraction. The stages of using the SVM method can be modeled with a flowchart as shown in Figure 4.

f. Classification Model Evaluation
Measuring the performance of the created model is an important step in machine learning. The classification results from a classification model are usually summarized in a confusion matrix. According to [15], test data that are not used in training can be used as an evaluation. Evaluation of the classification model is done by looking at the performance of the classification model. The metrics used are accuracy, precision, recall and F1 score. Accuracy is used to see the percentage of the number of data items that are classified correctly by the system [16]. Equation 1 calculates accuracy. (1) The level of suitability of the information provided by the system and the information required by the user is also called precision, while the success rate of the system in finding information is recall [17]. Equation 2 and equation 3 are used to calculate precision and recall. (2) F1 Score, is the average formula of the correct recall value and precision value [18]. Equation 4 calculates the F1 score.

Result a. CNN
Modeling with CNN was done in 20 Epochs. The model generated in the 20th epoch provides 91.67% accuracy, 2.59% loss, 91.67% precision, and 92% recall as shown in Figure 5. Figure 6 shows a graph of changes in Accuracy, Loss, Precision and Recall values during the modeling and classification process with CNN. The loss function is overfitting in the 3rd epoch as indicated by the loss function value of 26.13% while the validation loss value is 74.70%. The loss value indicates the amount of variation in the training data. As the epoch increases, the loss decreases, which means that the error in the model becomes smaller, so that the final loss is 1.54%.
The summary of the classification results shows the final value of precision, recall, and other metrics for each category. The classification summary for the CNN model is shown in Figure 7.
Confusion Matrix classification results with CNN can be seen in Figure 8. It can be seen that for each class, there are many images that were successfully predicted well, namely 73 out of 80 for Aster class, 71 out of 80 for Mawar class, and 76 out of 80 for Tulip class. Based on equation (1), the accuracy value of classification with CNN is (73 + 71 + 76) / 240 = 0.916 or 91.6%.  Image classification using the SVM method produces accuracy, f1-score, precision and recall values as shown in Figure 9. The classification process with this method provides a Confusion Matrix as shown in Figure 10, with correct predictions for the Aster class of 63, the Mawar class of 64, and Tulip grade of 61.

c. Comparison
Classification with CNN and with SVM results in different performance. Table 2 presents the performance of both methods for the four metrics measured. The table shows that the classification carried out using the CNN model is in the range of 91.6% for the values of accuracy, precision and recall. Meanwhile, the classification performance using the SVM method provides a lower accuracy of 78.3%, and lower precision and recall values of 78.6% and 78.3%, respectively. Various studies have shown that higher accuracy is obtained when using the CNN model. In the study [7], the CNN classification gave an accuracy value of 90%. Classification using the SVM method in research [5] gives an accuracy value of 85.93, which means it is lower than the results of classification using the CNN model. Some researchers have tried to combine the CNN model for feature extraction and machine learning for the classification process. The use of the hybrid method in research [19] and [20] showed a high classification performance with an accuracy value above 98%.

Conclusion
Our investigation shows that the CNN model performance for flower classification gives an accuracy value of 91.6%, precision of 91.6%, recall of 91.6% and F1 Score of 91.6%. The classification with SVM produces lower performance values, where accuracy attains a value at the level of 78.3%, precision at 78.6%, recall at 78.3% and F1 Score at 78.6%. The result suggests that CNN model outperforms SVM methods in flower classification tasks.