Batik Pattern Classification using Naïve Bayes Method Based on Texture Feature Extraction

-One of the arts in Surakarta culture is batik cloth. A batik is a form of heritage from the nation's ancestors whose manufacturing process must use specific tools and materials. Surakarta's typical batik has many patterns and motifs, such as Sawat, Satriomanah, and Semenrante. Pattern is a picture framework whose results will display the type of batik. A batik may resemble one type and another, so a classification technique is needed to determine the type of batik. This study aims to develop a classification method for batik cloth using the Naïve Bayes classification technique. The feature extraction used is the Gray Level Co-Occurrence Matrix (GLCM) to obtain texture values in each image. The stages in this research include pre-processing, feature extraction, classification, and testing. The training data in this study were 200 images for each Sawat, Satriomanah, and Sementrante class obtained from the data augmentation method by flipping, zooming, cropping, shifting, and changing the brightness of the images. The total sample data is 600 images. The amount of training data and data testing was divided three times (60% training and 40% testing), (70% training and 30% testing), and (80% training and 20% testing) for accuracy. In this study, the Naïve Bayes method using WEKA 3.8.6 tools obtained the best accuracy of 97.22% using a 70% percentage split compared to using 80% and 60% percentage splits with a result of 96.66%, this difference occurs due to differences in training data and test data. The results of this study indicate that the Naïve Bayes method can be used to classify batik cloth patterns based on texture feature extraction.


Introduction
Indonesia has various types of cultural wealth, one of which was inherited by UNESCO is Batik cloth [1]i. This one cultural heritage has many styles and patterns. The diversity of the kinds of batik patterns is currently in great demand by various circles, both used for formal and non-formal events. Along with the development of the times, batik patterns also adjust to support the appearance. Batik comes from the Javanese "amba" and "nitik" which means to paint a wide dot on a writing medium [2]. The history of batik is also an art & culture from ancient times that ancestors used as a promotional medium to other countries [3]. The production of batik cloth is still done traditionally, namely by hand and wax as the main ingredient [4] [5]. Two types of batik cloth can be identified directly, namely geometric & nongeometric batik cloth, as shown in Figure 1.  Figure 1 is a type of geometric and non-geometric batik pattern. Geometry is usually in the form of symmetrical patterns such as squares, circles, and lines, for example, Parang batik. Non-geometric patterns describe an object such as a plant, temple, or animal, for example, batik Semenrante [6]. Each region Jurnal Ilmu Komputer dan Informatika has a distinctive pattern with various styles [7]. Interest in batik has also penetrated other countries [8]. Many foreign tourists who come to Indonesia make batik cloth -an item that must be purchased to be brought back to their country of origin. The diversity of batik patterns makes it difficult to distinguish patterns. The attractiveness and uniqueness of a batik cloth make it difficult for people to know the type of batik pattern [9]. Classification of batik images requires the right method due to the level of symmetry and repetitive patterns [10]. Furthermore, batik cloth patterns can be introduced based on the patterns and motifs. The process of introducing batik cloth patterns can help people recognize the types of batik that exist.
The times are very rapid, especially in technology [11]- [13]. Image processing is one of the techniques in developing machine vision to recognize and analyze data contained in a moving or non-moving image [14]. The image processing method is one example of technological developments used in various fields [15]. Image processing techniques can facilitate completion more than traditional techniques [16]. Other methods such as Linear regression, Back-propagation, Support Vector Machines, Logistic Regression, Naive Bayesian, Rocchio Method, Random Forest, Decision Tree, k-Nearest Neighbor, and Neural Network can also be used as classification techniques from image processing results [17] . The purpose of batik image recognition is to change and analyze image data into information. Part of recognizing an image includes data acquisition, image recovery, segmentation, and recognition [18]. Research using image processing techniques and classification methods to detect batik cloth patterns has been carried out by several previous researchers. Masa and Hamdani have classified the types of batik cloth patterns. The Convolutional Neural Network (CNN) and K-Means Clustering (K-MC) classification techniques used produce an accuracy value of 80% in sharpening [19].
Another method used to detect batik cloth patterns is Backpropagation. Surya et al. have identified batik patterns using GLCM feature extraction and continued with the backpropagation classification technique resulting in the highest accuracy of 91.2% at epoch 100 [20]. According to research by Septiarini et al., the Naïve Bayes (NB) classification technique can be applied in classifying Samarinda sheath. The results obtained show that the application of combining color and texture features can reduce classification errors [21].
This study uses the GLCM method because previous studies obtained good accuracy [22], [23], [24]. The application of the GLCM method as a feature extraction is seen from an angle of 0°, 45°, 90°, and 135° using six features, namely dissimilarity, correlation, homogeneity, contrast, ASM, and energy. NB is used as a classification technique. NB is used because this method is widely implemented in similar studies and gets good results [25], [26]. This method is used to detect patterns of batik cloth types of Sawat, Sementrante, and Satriomanah based on the texture of the pattern of similar batik cloth patterns.
This research uses the python programming language and WEKA tools. Python was chosen because this language is easy and widely used [27]. WEKA tools are used because they have many algorithms for data analysis [28], [29]. The device used for testing is an Acer Aspire E14 laptop with Intel(R) Core (TM) i7-5500U CPU @ 2.40GHz 2.40 GHz processor specifications and 4GB RAM support. The flow of this research includes research methods, results, discussion and ends with conclusions.

a. Data Collection
The dataset collection technique uses dataset provider services on the internet with a distance of 25-30 cm between the object and the camera. This study used three types of batik cloth patterns, as shown in Figure 2.  Figure 2 shows the types of batik c patterns consisting of Sawat, Sementrante, and Satriomanah. The type of batik above is a typical Surakarta batik with the same level of similarity as the style. Every kind of batik cloth has 200 images. The total number of images produced is 600 images. The image data above is divided into training data and test data. The amount of training data and test data will be split three times (60% training and 40% testing), (70% training and 30% testing), and (80% training and 20% testing) to compare the accuracy.

b. Method Stages
In this study, the method used is NB and GLCM classification as feature extraction. The research flow is shown in Figure 3. Improving an image's quality so that the next steps become easier is the goal of pre-processing. Pre-processing is carried out in two stages, namely cropping and aligning the pixel sizes of all images that have been cropped. Cropping reduces parts of the image that are not needed for further processing [30]. Equalizing pixel sizes in an image is useful for reducing processing time during computation. The results of cropping and equalizing image pixels are stored in the dataset folder [10]. The next process is changing the RGB image to an image with a grayscale. This process aims to facilitate calculations. Grayscale image conversion is carried out using equation (1) [31] [32].
Where gray is the variable pixel, R (Red), G (Green), and B (Blue) are variable color pixels in each color from red, green, and blue.
This study also used data augmentation. Data augmentation is a technique for increasing the variety of data by adjusting an image's rotation, brightness, cropping, and flipping.

d. Feature Extraction
Feature extraction aims to obtain different patterns in an image so that the process of separating class categories in classification becomes simple. This study uses GLCM as texture feature extraction.
GLCM is used to obtain results from texture features. GLCM is a technique for obtaining second order statistical values by calculating the probability of a close relationship between 2 pixels at a certain distance (d) and a certain angle (θ) [33]. The GLCM illustration for each batik style is calculated from the four angle directions, namely 0°, 45°, 90°, and 135° [34] shown in Figure 4. In this study, there are 6 GLCM features used, namely dissimilarity, correlation, homogeneity, contrast, ASM, and energy.
Dissimilarity measures the dissimilarity of textures in an image, and the resulting value will be large if the pattern is random and small if the pattern is uniform. Equation (2) shows the Equation of dissimilarity.

(2)
Correlation is an equation for measuring linearity in pixel pairs. The correlation equation is shown in Equation (3).
Homogeneity or homogeneity is a measure of image similarity. The value is high if all pixels have the same value. The homogeneity equation is shown in Equation (4).
Contrast is a measurement of the spatial frequency of the image. The contrast equation is shown in Equation (5).
Angular Second Moment (ASM) measures the uniformity of pixels in an image. The ASM equation is shown in Equation (6).
Energy is an equation of the gray-level inequality in an image. The energy equation is shown in Equation (7).

(7)
Where i is the matrix row, j is the matrix column, and P(i,j) is the general matrix element of row (i) and column (j). μi, μj is the average of the matrix's row and column elements. σi, σj are the standard deviations in the rows and columns of the matrix.

e. Testing and Training
This study uses the NB method in training during the classification process. NB is one of the classification techniques in machine learning to predict a phenomenon [35] and classify data that does not have a class [36]. This method is proven to produce higher accuracy and speed scores when the data is tested with a large dataset [37]. The Naïve Bayes theorem is shown in Equation (8) [38].
Where Ci is the value of a class, C is the choice of class, t is a feature (one feature), and F is the number of features. The likelihood is a new belief in data.

Result and Discussion
This section shows the results of the proposed process. The processes include pre-processing, feature extraction, classification, and evaluation. Pre-processing and feature extraction are carried out using the Python 3.6.13 programming language, while the classification technique uses WEKA Tools 3.8.6 with feature values stored in the form of comma-separated values (CSV).

a. Pre-processing
Pre-processing applies data augmentation to increase the variation of data samples, such as rotation, enlarging or reducing objects, brightening objects, and shifting objects. The pixel size is equalized to 50 × 50 pixels to shorten the imputation time. These three samples are used because they have a fairly high level of similarity in their patterns. Next, the result of the pixel change is converted into a scaled image. The results of the grayscale conversion can be seen in Table 1.  Table 1 shows the conversion results from RGB images to grayscale images for each data sample. It converts RGB values of the image to grayscale values through an integrated weighted sum of the red, green, and blue elements. The value 0 represents black, and the value 255 represents white. The purpose of the conversion, as shown in Table 1, is for each image to have the same color to simplify the following process, namely texture feature extraction.

b. Gray-Level Co-Occurrence Matrix (GLCM)
The GLCM texture feature extraction results produce 24 feature values for each batik image. Texture features include dissimilarity, correlation, homogeneity, contrast, ASM, and energy at angles of 0°, 45°, 90°, and 135°, respectively. The results of extracting texture features from batik patterns are shown in Tables 2 to 5. ased on Table 2, the test was carried out using texture feature extraction, namely GLCM. Each column shows the numerical results of each difference, correlation, homogeneity, contrast, ASM, and energy feature in the GLCM viewed from 0°. Calculations were carried out as in the sample presented in Table  2, which consists of 200 sample data for each class of batik so that the GLCM method can be used to extract the texture of batik cloth in each type of Satriomanah, Sawat, and Semenrante classes. Based on Table 3, the test was carried out using texture feature extraction, namely GLCM. Each column shows the numerical results of each difference, correlation, homogeneity, contrast, ASM, and energy feature in the GLCM viewed from 45°. Calculations were carried out as in the sample presented in Table   3, which consists of 200 sample data for each class of batik so that the GLCM method can be used to extract the texture of batik cloth in each type of Satriomanah, Sawat, and Semenrante classes. ased on Table 4, the test was carried out using texture feature extraction, namely GLCM. Each column shows the numerical results of each difference, correlation, homogeneity, contrast, ASM, and energy feature in the GLCM viewed from 90°.
Calculations were carried out as in the sample presented in Table  4, which consists of 200 sample data for each class of batik so that the GLCM method can be used to extract the texture of batik cloth in each type of Satriomanah, Sawat, and Semenrante classes. ased on Table 5, the test was carried out using texture feature extraction, namely GLCM. Each column shows the numerical results of each difference, correlation, homogeneity, contrast, ASM, and energy feature in the GLCM viewed from 135°. Calculations were carried out as in the sample presented in Table 2, which consists of 200 sample data for each class of batik so that the GLCM method can be used to extract the texture of batik cloth in each type of Satriomanah, Sawat, and Semenrante classes.

c. Classification and Evaluation Methods
Testing this method uses a data sample consisting of 600 images, including 200 images of Sawat batik, 200 images of Semenrante batik, and 200 images of Satriomanah batik. The division of the dataset uses a percentage split. Percentage split is a feature that divides test data and training data with adjustable percentage values, which in this study were carried out three times, namely A, which is (60% training and 40% testing), B (70% training and 30% testing), and C (80% training and 20% testing). This process uses WEKA tools version 3.8.6 with the NB classification technique shown in Figure 5. Based on Figure 5, the highest accuracy graph is expressed in the percentage split (70%) with a total of 420 training data samples, with 180 test data getting an accuracy rate of 97.22%. Accuracy results based on percentage split (60%) with total training data of 360 samples with 240 test data get an accuracy rate of 96.66%. Accuracy results based on percentage split (80%) with total training data of 480 samples with 120 test data get an accuracy rate of 96.66%. The confusion matrix results in Figure 5 are explained in Tables 6 to 8.  Tables 6 to 8 display the data testing results using the  confusion matrix. For example, Table 7, with training data of 420, with the class name "satriomanah" has 58 actual data and 53 data that are correctly predicted (TP). However, there is data that is also predicted incorrectly called (FN), which is 5 data. Evaluation of recall, precision, and accuracy can be seen in equations 11 to 13.

Conclusion
This study develops a classification of batik cloth patterns using the NB method based on texture feature extraction. In this study, the highest accuracy was stated in the percentage split (70%), with a total of 420 training data samples with 180 test data obtaining an accuracy rate of 97.22%. Accuracy results based on percentage split (60%) with total training data of 360 samples with 240 test data get an accuracy rate of 96.66%. Accuracy results based on percentage split (80%) with total training data of 480 samples with 120 test data get an accuracy rate of 96.66%. The test results used a confusion matrix based on the three categories mentioned, the test used a 70% percentage split to get the highest result of 97.22%. It can be concluded that this method has been successfully used as a process of classifying batik cloth patterns in this study. Subsequent research can add variety to datasets using other data collection methods and other feature extraction or classification techniques.