Corn Seeds Identification Based on Shape and Colour Features

Corn is one of the agricultural products that are essential as daily food sources or energy sources. Corn selection or sorting is important to produce high-quality seeds before its distribution to areas with varying conditions and agricultural characteristics. Hence, it is necessary to build corn seeds identification. In this paper, we propose a corn seed identification technique that incorporates the advantage of combining shape and color features. The identification process consists of three main stages, namely, ROI selection, feature extraction, and classification using the Artificial Neural Network (ANN) algorithm. The shape feature originates from the eccentricity value or comparison value between a distance of minor ellipse foci and major ellipse foci of an object. Meanwhile, the color features are extracted based on the HSV (Hue-Saturation-Value) channel. The experimental result shows that the proposed system achieves excellent performance for the identification of poor and good corn quality for BIMA-20 and NASA-29 species. The classification result for BIMA-20 Good vs. BIMA-20 Bad gives an accuracy of 89%, while the classification accuracy of BIMA-20 Good vs. NASA-29 Good is 97%.


Introduction
In the field of food and agriculture technology, to meet production requirements is very difficult. the difficulty is related to the level of product diversity and its non-homogeneous nature. To assess the quality of products or goods, it is necessary to consider the feature specifications including corn [1].
Corn is one of the agricultural products that can be used either as essential food sources in daily life or energy sources. The selection or sorting process must be carried out to produce quality seeds which will then be distributed to areas with varying conditions and agricultural characteristics. Prakasa et al. have successfully developed a system to automatically detect a region of interest (ROI) using the K-means algorithm. ROI will produce images containing only one corn seed by determining the location and boundary box of each corn seed in the input image. Based on the test results, the model is proven to be able to detect ROI with an accuracy exceeding 90% [2].
It is very important to check the quality of corn seeds so that they are safe and of a high standard. Traditionally inspecting corn seeds way is very time-consuming and also depends on the skills of employees/humans. it would be very interesting to be able to control human error and also expect the same quality from many staff with various levels of skill [3].
Feature extraction has a fundamental role in classification techniques [4]. The colour value is one of the object features that widely used in classification algorithms. The colour feature is significant in determining the quality of any fruit [5]. Colour characteristics can be represented in a variety of colour representations or commonly referred to as colour space. There are several colour spaces used to represent colour values. One of the colour space is called HSV (hue, saturation, value). The HSV is a result of the geometrical transformation from The RGB (red, green, blue) colour map [6]. Another feature that can be used to characterize the physical properties is object shape. The shape features can be represented in roundness, convexity, compactness, elongation, eccentricity, and sphericity [7]. Use the shape feature to classify pollen grains [8].
Moallem et al. [4], in their research, have also succeeded in identifying healthy apples and defective apples by applying several segmentation algorithms such as background removal, stem tip detection, calyx detection, primary segmentation defects, and repair of defective areas. By combining statistical, texture, and geometric features, they achieve excellent classification result for healthy apples of 91.75%, and defective apples is an average of 91.5%.
Xiao Ling et al. classifying waxy corn kernels based on a combination of spectral, morphological, and texture features. This feature is extracted from hyperspectral images that look and are almost infrared. Morphological features are used as many as five features, including area, circularity, aspect ratio, roundness, and solidity. Then the texture features used amounted to eight features including energy, contrast, correlation, entropy, and standard deviation. Both features were extracted as the appearance features of each peeled corn. Furthermore, the model was built for the classification of corn seed varieties based on various feature groups, namely Support vector machine (SVM) and partial least squares-discriminant analysis (PLS-DA). The accuracy results achieved using the SVM model are higher than using the PLS-DA model [9].
According to the grain-grading standards, the corn quality is classified based on several conditions, such as moisture, weight, colour, shape, odour, and damage. Among these criteria, the moisture, weight, and odor of corn can still be evaluated using electronic analyzers or other special instruments. Moreover, the damage can still be seen manually. Therefore, the assessment can introduce some errors. A plausible way to classify corn is by using computer vision to classify corn automatically [10].
In this study, we propose the identification of corn seeds using the combination of shape and color features. The data used in the study consisted of three categories, BIMA-20 Good, BIMA-20 Bad, and NASA-29 Good. Figure 1 shows our proposed corn seed identification. According to Figure 1, the corn seed image data will be divided into training and testing data. The corn image data contains many corn seeds in one image. The first step is carried out by preprocessing corn image data and detecting the region of interest (ROI) to produce images containing only one corn seed. The next step will be feature extraction based on shape and colour. Then at the last stage, the classification will be carried out; the corn seed data extracted from the training data will be used to conduct training with ANN algorithm that produces output in the form of a classification model. The classification model will then be used to test the test data based on the shape and colour features that have been extracted.

a. Capturing the corn image data
The capturing process of corn image data is conducted at the Laboratory of Computer Vision Research Group, Research Center for Informatics -Indonesian Institute of Sciences (LIPI) Bandung, with a modified apparatus from the previous research [2]. The con seed samples are provided by the Assessment Institutes for Agricultural Technology of Gorontalo, Ministry of Agriculture Republic of Indonesia. The method of capturing corn image data using a Canon EOS 70D DSLR camera, with the lighting arranged so that the corn can be seen clearly. The apparatus arrangement is displayed in Figure 2.

b. Detection of the region of interest (ROI) and Preprocessing
The purpose of ROI detection is to obtain a single corn seed in each image. Prakasa et al. [2] had successfully developed an algorithm to detect the ROI automatically using the k-means algorithm. The ROI will produce images containing only one corn seed by determining the location and boundary box of each corn seed in the input image. Figure 3 is an input image in the RGB channel, which is converted into a red channel, as shown in Figure 4. Figure 5 shows the preprocessing median filtering done to eliminate noise in the image. Figure 6 shows the result of segmenting the corn seed from background objects by calculating the threshold value using the Otsu method. The result of the erosion process is depicted in Figure 7.

c. Feature Extraction
The feature used in this study is 2 (two) types, namely, shape and colour feature. We use the eccentricity value to represent the shape feature and HSV value to represent the colour feature. 1) Shape feature: eccentricity Shape features are physical dimensional measures that characterize the appearance of kernels [11]. Eccentricity is the comparison value between the distance of the minor ellipse foci with the major ellipse foci of an object with the eccentricity value of an elongated object or straight-line shape is close to one. In contrast, a circular object will have an eccentricity value near to zero [12][13].
(1) Figure 8 shows an example of the minor and major axis. The minor axis is the shortest line (line p3 to p4), while the major axis is the longest line (line p1 to p2). The length of the two lines is calculated using Euclidean distance [14].

2) Colour feature: HSV
The colour value is one of the essential indicators to measure the quality of fruit appearance [15] [16].
To distinguish an object with a particular colour, we can use the Hue value, which is a representation of visible light (red, orange, yellow, green, blue, purple). Hue can be combined with Saturation and Value, which is the brightness level of colour. The original RGB value is converted to HSV to obtain the values of these three parameters [17][18]using linear support vector machine (SVM.

d. Artificial Neural Network
Artificial Neural Network (ANN) is a method that models the nervous system of the human brain or commonly referred to as neurons, while the task is to introduce patterns, especially assemblages. This model is based on the ability of the human brain to organize neurons so that they can improve patterns effectively [6]. The ANN characteristics can be seen from the pattern of relationships between neurons, the method of determining the weights of each connection, and its activation function. In general, ANN consists of : 1) An input layer, which contains several input neurons that functions to send data 2) Hidden layer, which is where the primary process of ANN occurs 3) Output layer, which contains several neuron outputs 4) Activation function.
The ANN process starts from the input received along with the weight value, after entering the input values that will be added by a propagation function. The results of the addition will be processed by the activation function of each neuron. Then the results will be compared with a certain threshold. If the value exceeds the threshold, the activation of neurons will be canceled. Conversely, if it is still below the threshold value, the neurons will be activated. Once active, the neuron will send the output value to the output layer [

Result and Discussion
The following is the amount of corn seed images that we used in the experiment: a. BIMA-20 Good: 632 images b. BIMA-20 Bad: 487 images c. NASA-29 Good: 488 images Figure 9 shows some examples of corn images used in the experiment. From each image, we extract shape and colour feature, with the result examples can be seen in Table 1. We split the data into training and testing data with 70% and 30% for training and testing data, respectively.

Figure 9: Corn seeds image that have been extracted
The setting of the ANN parameter used in this paper consists of three hidden layers and one output layer. The number of neurons in each layer is as follows: layer 1= 128 neurons, layer 2= 128 neurons, and layer 3= 64 neurons. The setting is selected because it was the most optimal and produced the highest accuracy. We have tried to implement the neurons setting as follows; layer 1 = 32 neurons, layer 2 = 32 neurons, and layer 3 = 16 neurons. However, the accuracy obtained by using this setting was 57% for BIMA-20 Good vs. BIMA-20 Poor and 71% in the classification of BIMA-20 Good vs. NASA-29 Good. We also have used another setting as follows; layer 1 = 256 neurons, layer 2 = 256 neurons, layer 3 = 128. The accuracy of this setting is only 70%.
We investigate two scenarios in our corn seed identification: (1) classification of BIMA-20 Good vs. BIMA-20 Bad, and (2) classification of BIMA-20 Good vs. NASA-20 Good. The results of the classification by the ANN method are summarized in a format called a confusion matrix. Confusion matrices are commonly used to describe the performance of a classification method whose actual class is known [12]. The data in the confusion matrix shows the number of class predictions that correspond to the actual class. F1 score is used to measure classification performance by applying Equation 2 [21], where TP, TN, FP, and FN are the number of true -positives, true-negatives, false-positives, and falsenegatives, respectively [1]. (2)

a. Classification result of BIMA-20 Good vs. BIMA-20 Bad
In this section, we classify the corn seed of BIMA-20 Good and BIMA-20 Bad classes. The results of classification using the ANN method and summarized in the form of a confusion matrix is shown in Figure 10, Figure 11, and Figure 12. True Positive is the amount of data from the BIMA-20 Good class that was successfully classified correctly into the Good class. True Negative is the amount of data from the BIMA-20 Bad class that was successfully classified correctly into the Bad class. False-positive is the amount of data from the BIMA-20 Bad class that was not successfully classified into the Bad class. False Negative is the amount of data from BIMA-20 Good, which was not successfully classified into the Good class. using colour features is depicted in Figure 11, in which by using H, S, and V we achieve a classification accuracy of 88%. Meanwhile, accuracy results in Figure 12 with feature extraction using shape and colour obtained from BIMA-20 Good and BIMA-20 Bad data that is equal to 89%.  Figure 13, Figure 14 and Figure 15. True positive is the amount of data from the BIMA-20 Good class that was successfully classified correctly into the Good class. True Negative is the amount of data from the NASA-29 Good class that was successfully classified correctly into the Good class. False-positive is the amount of data from the NASA-29 Good class that was not successfully classified into the Good class. False Negative is the amount of data from BIMA-20 Good that was not successfully classified into the Good class.
In Figure 13, we only use the shape feature extraction, namely eccentricity, with an accuracy of 71%, whereas, in Figure 14 we use colour feature extraction, namely H, S, and V values, with an accuracy of up to 96%. Meanwhile, accuracy results in Figure 15 with feature extraction using shape and colour obtained from the BIMA-20 Good vs. NASA-29 Good data that is equal to 97%. The F1 score is shown in Figure 16. Based on Figure  16, the classification results between BIMA-20 Good vs. NASA-29 Good are better, both in terms of shape features, colour features, or shape, and colour features. However, the colour feature classification results are better than the shape feature classification results. Colour features have a strong influence in terms of this classification. Figure 17 is an example of a picture that was misclassified from NASA-29 Good vs. BIMA-20 Good. The misclassification is because the colour of corn seeds from NASA-29 Good has a brighter colour than corn seeds from BIMA-20 Good.

Conclusion
In this study, we proposed corn seed identification using color and shape features. We used eccentricity as the shape feature and HSV channel as the color feature. Based on the results obtained in the experiment, we can conclude that the proposed algorithm is performed better in classifying the corn varieties at the same quality than quality grading. In this case, the using of colour feature produce a higher degree of accuracy than the shape feature. This condition is caused due to the shape of corn seed is relatively the same.

Data Availability
The corn images are available in the RIN (Repositori Ilmiah Nasional/National Scientific Repository) of the Indonesian Institute of Sciences. The images are stored as three datasets, BIMA-20 Good, BIMA-20 Bad, and NASA-29 Good. The dataset can be found in the following link: https://data.lipi.go.id/dataverse/seed-grading