Performance Assessment of University Lecturers: A Data Mining Approach

A lecturer with a good performance has a positive impact on the quality of teaching and learning. The said quality includes the delivery of teaching materials, learning methods, and ultimately the academic results of students. Performance of lecturers contributes significantly to the quality of research and community service which in turn improves the quality of teaching materials. It is desirable, therefore, to have a method to measure the performance of lecturers in carrying out the Tri Dharma (or the three responsibility) activities, which consist of teaching and learning process, research, and community service activities, including publications at both national and international level. This study seeks to measure the performance of lecturers and cluster them into three categories, namely “satisfactory”, “good”, and “poor”. Data were taken from academic works of nursing study program lecturers in conducting academic activities. Clustering process is carried out using two machine learning approaches, which is K-Means and K-Medoids algorithms. Evaluation of the clustering results suggests that K-Medoids algorithm performs better compared to using K-Means. DBI score for clustering techniques using K-Means is -0.417 while the score for K-Medoids is -0.652. The significant difference in the score shows that K-Medoids algorithm works better in determining the performance of lecturers in carrying out Tri Dharma activities.


Introduction
Performance appraisal is an activity usually carried out by an organization or institution. Performance appraisal is an organized, structured, and periodic process for observing individual performance and institutional productivity in accordance with predetermined organizational criteria and goals [1]. Performance appraisal has similar meaning with evaluation of performance [2].
Performance appraisal is implemented in many organizations and in case of universities, it means to assess the performance of lecturers [3]. Lecturers are important stakeholders for tertiary education institutions and they play the role as both educators and scientists. Lecturers have responsibilites to explore new knowledge and disseminate to ordinary people and students [4]. Performance evaluation of lecturers is important to evaluate the achievements of higher education institutions and to encourage lecturers to be productive. Lecturer activities under evaluation include teaching and learning activities, research activities as evidenced by the publication papers, and community service [5].
In many cases, lecturers performance is evaluated using a form of questionnaire to students. It assesses the aspect teaching and learning activities. Teaching evaluation alone is certainly not sufficient because lecturer activities are not only teaching but also doing research and community service. However, multicriteria performance appraisal requires a special calculation that involves items being examined. This paper describes the results of research to calculate performance figures using the data mining methods. Assessment aspects are transformed into attributes of data to be processed. We examine two different calculation methods namely K-Means and K-Medoids algorithms.
Data mining is a machine learning approach that seeks to find knowledge from a big set of available data utilizing artificial intelligence techniques, statistics, and mathematics. Data mining is usually operated against large amounts of data stored in databases, warehouses, or other repositories [6]. Data mining is often referred to as an effort to find knowledge in databases or Knowledge Discovery in Databases (KDD) [7].
Many papers have discussed the application of data mining to data from higher education institutions. The requirements include predicting the length of study of students, assessment of student performance, lecturer performance, determination of college promotion strategies, selection of scholarship grantees, and evaluation of learning outcomes of alumni. The application of data mining has been carried out to gain new knowledge about the behavior of leaders, students, alumni, lecturers, and university staffs which utilized decision support systems and assisted managers in making decisions [8]. Twijri and Noaman revealed that data mining in tertiary institutions is one area of research that is rapidly developing and has quickly become popular because of its benefits for the institutions [9]. Romero et al. stated that there had been an increase in research interest to apply data mining methods in the educational sector, so that a new term had emerged called Education Data Mining. The research was very useful to reveal student behavior, assist instructors, improve teaching quality, assess and improve e-learning systems, and improve curricula [10].
Chalaris et al. revealed that the educational process can be improved through decision making on various processes by utilizing existing knowledge in the organization's database or through collecting data with questionnaires, which are then extracted using data mining [11]. Data mining techniques are very useful in marketing analysis, analysis of student acceptance selection, predicting student performance, planning curriculum, analyzing learning outcomes, and maximizing the efficiency of the educational process [12,13].
Data mining is a process of exploration and analysis in an automatic or semi-automatic way to find meaningful patterns and rules on large amounts of data [14]. Data mining is one of the most common methods used to investigate information, patterns, and relationships that have not yet been explored [15]. Data mining provides benefits in many fields including e-commerce, bioinformatics and education known as Educational Data Mining (EDM) [16,17].
The description related to the application of data mining and its application in the world of education inspires the author to observe the application of data mining for assessing lecturer performance. Two data mining methods were tested namely K-Means and K-Medoids. K-means clustering algorithm is a data mining technique that groups data based on the distance closest to the cluster center. While the K-medoids algorithm or also known as PAM (Partitioning Around Medoids) uses the clustering partitioning method to find the k cluster for object n, by first finding the initial object randomly (medoid) as a representation for each cluster. Each remaining object is grouped with the most similar medoid. The k-medoid method uses a representative object as a reference point and not the average object per cluster. The algorithm takes the input parameter k the number of clusters to be partitioned between a set of objects n.

Method
There are two major processes carried out in this research, namely data mining and evaluation or validation. The data mining process has four main stages, namely (a) data collection, (b) data preprocessing, (c) data mining and (d) analysis [18]. Evaluation or validation is done by clustering algorithm (see figure 1).

Figure 1. Knowledge Acquisition with data mining a. Data collection
Data for this study were obtained from the Tri Dharma activitites of a Higher Education institution. The data includes aspects of teaching, research, and community service. Data related to the teaching aspect was obtained from the Quality Assurance Unit. Data on research and community service activities was obtained from Research and Community Service Unit, and data on lecture evaluation by students was obtained from the Academic Administration Unit.

b. Data preprocessing
Preprocessing is needed to prepare data before the main data mining process is carried out. Preprocessing has several purposes such as cleaning data from typos, and filling in table columns so they are not empty which can cause failures in computation. Preprocessing is also to reduce the dimensions of the data and adjust the attributes so that calculation may be simplified. Preprocessing in this study includes grouping the raw data into the categories of teacing, research and community service, so that this process produces accumulated values in all of the three aspects.

c. Data mining
Data mining method in this study is basically clustering. Clustering techniques is an unsupervised learning method which partitions objects in a data set into several groups. There are many algorithms that apply distance equations such as Euclidean Distance [19] to determine the similarity of data, which is the basis for determining whether an object goes into a particular cluster [20]. This study examines two clustering techniques for grouping lecturers based on their performance. The two clustering techniques are K-means and K-medoids. K-Means clustering is a data grouping technique that breaks a set of objects into k clusters based on the closest distance of an object to a centroid cluster. The steps of the K-means clustering algorithm applied in this study are as follows [21] (see also Figure 2). Stage 1 : Determine the number of clusters in the preprocessing dataset Stage 2 : Randomly select an object from each cluster to be the center location of the initial cluster or centroid Stage 3 : Group objects according to the distance closest to the centroid Stage 4 : Recalculate the centroid of each cluster formed to update the centroid location Stage 5 : Repeat steps 3 through 5 until no object has moved to another cluster. K-medoids are also often referred to as the PAM (Partioning Araound Medoids) algorithm which also breaks the set of objects into k clusters. The stages of clustering with the K-medoids technique are as follows [26] (see also Figure 3). Stage 1 : Initialize the cluster center by the number of clusters (k) Stage 2 : Each data or object is entered into the closest cluster based on Euclidian Distance Stage 3 : Randomly select objects as new medoids candidates in each cluster Stage 4 : Each object in each cluster is calculated its distance from the new medoid candidate. Stage 5 : Calculate the total deviation (S) by calculating the value of total new distance -total old distance. If S <0 is obtained, exchange the object with the data cluster to create a new set of k objects as medoids Stage 6 : Repeat steps 3 through 5 until there is no change in the medoid, so that clusters and cluster members are obtained.

. Analysis
The analysis phase is carried out to get the pattern of lecturer performance grouping. The tool used is Rapid Miner. This tool is widely used in data science, including for data preparation, machine learning, text mining, and predictive analysis [22].

e. Evaluation/validation
The evaluation process is carried out using the Davies-Bouldin Index (DBI) approach. DBI was developed in 1979 by David L. Davies and Donald W. Bouldin using the DBI metric to evaluate the performance of clustering algorithms [23]. This evaluation metric measures the distance between clusters and the level of data grouping within the cluster. If the DBI value is small, the distance between large clusters and the distance of objects in small clusters is a sign that clustering is optimal. textbooks/references, literary works, developing learning methods and lecturers evaluation by students. Research performance scores were obtained from the attributes of intellectual property rights, international-level keynote/ invited speakers, national-level keynote/invited speakers, papers in reputable international journals, papers in accredited national journals, papers in national journals, works of art, sports achievements and awards. Whereas the score for community service is derived from the attributes of technology implementation, environmental management, technology application, community empowerment, and partnership development (see Table  1-3).

a. Data Collection
Data mining process using K-Means and K-Medoids begins with collecting data. Samples were taken from lecturer performance data from two study programs in Universitas Muhammadiyah Kalimantan Timur. The attributes specified are teaching performance scores, research performance scores and community service activities.

b. Data pre-processing
The pre-processing phase is carried out to change the dimensions of raw data into data with the attribute scores of teaching performance, research, and community service. Scoring of each attribute is carried out based on Operational Guidelines for Assessing Credit Scores for Academic Promotion / Lecturer Rank, Directorate General of Science and Technology Resources and Higher Education, Ministry of Research, Technology and Higher Education in 2019 [24]. Teaching performance scores are the accumulation of each PEKERTI attribute credit score, applied approach, textbooks or references, literary works, development of learning methods and EDOM. The research score is an accumulation of each IPR attribute credit score, international and national keynote / invited speakers, reputable international journal publications, accredited national journals, national journals, works of art, sports achievements, awards. Then the community service performance score is an accumulation of each credit score attributes of appropriate technology service, environmental arrangement, technology application, community empowerment and partnership development. c. Data mining Preprocessing data is processed using Rapidminer. The clustering process is carried out into three categories by applying the K-means algorithm and K-medoids techniques.
The process of applying the K-Means algorithm with Rapidminer begins with determining the centroid value. Because it is desirable to cluster into three categories, 3 centroids are generated, namely one for each cluster (see table 5).

Attribut Cluster_0 Cluster_1 Cluster_2
Total 66,500 27,778 100,667 Clustering means the process of grouping data into one cluster by determining the proximity of data points on the centroid. The results of clustering with Rapidminer produce 10 data items entered into Cluster_0, 9 data items entered into cluster_1 and 6 items entered into cluster_2. Clusters_1 with the smallest centroid value are labeled as cluster Poor, Clusters_0 with medium centroid value are well labeled, and Clusters_2 with high centroid value are labeled as satisfactory. Membership of each cluster can be seen in Table 6.  The application of the K-Medoids algorithm also begins with the determination of three centroids for the 3 targeted clusters. Centroid values obtained when processing data with Rapidminer are shown in Table 7  for the K-Means method. Clusters with satisfactory labels have centroids at the number 90, while clusters with good and bad labels have centroids of 65 and 35 respectively. Clustering with the K-Medoids method produces a number of data items that are somewhat different for each cluster. Cluster_0 which has a satisfactory label has 9 members. Cluster_1 with the label is not good getting 9 data items. Whereas Cluster_2 with good label gets 7 data items (see Table 8).

d. Analysis
Table 5-8 shows that clustering with the K-Means and K-Medoids methods gives different results in many ways. The centroid point calculation results give different values for the same cluster label. For example, for the cluster Satisfactory, the K-Means method places the centroid at a value of 100.7 while the K-Medoids method puts the centroid at a value of 90.0. As a result, it certainly can be expected, the number of data items entered into each cluster becomes different. Furthermore, it can be seen in Table 9 that shows how the two methods place each data item in a cluster. to group data items into the same and different categories.
Both methods put 9 of the same data items into the cluster Poor. Meanwhile, the K-Means method places 10 data items in a cluster Good and 6 data items in the cluster Satisfactory. The K-Medoids method places 7 data items in either cluster and 9 data items in cluster Satisfactory. Cluster placement discrepancies occur for Lecturer 2, Lecturer 5 and Lecturer 7 data items. All three data items have the same attribute values based on the pre-processing data in Table 4.

e. Evaluation/validation
Clustering with two different methods, namely K-Means and K-Medoids has resulted in a somewhat different data cluster. The next question is which method gives better or more accurate results. To answer this question, a standardized or agreed-upon measuring instrument is needed. One of the measurement tools that can be used to determine which method is more optimal in the clustering process is the Davies-Bouldin Index (DBI). The more optimal clustering results will have a smaller DBI value. Table 10 presents the DBI values for the results of lecturer performance data clustering using the K-Means method (first row) and K-Medoids (second row). DBI results from clustering using the K-Means method are -0.417 while DBI results from clustering using the K-Medoids method are -0.652. The DBI number indicates that the K-Medoids method in this situation results in more optimal clustering than the K-Means method.

Conclusion
In this research, we have conducted a clustering process on lecturer performance data in carrying out teaching activities, research and publications, as well as community service. Data obtained from records of lecturer activities in two study programs, i.e. Nurse and Pharmacy. We examine 6 attributes of teaching activities, 9 attributes of research and publication activities, and 5 attributes of community service activities. In pre-processing, a score is put for each activity based on the Credit Score Assessment Guidelines for Academic Promotion of Lecturers, then reducing the dimensions by accumulating scores for each activity.
We examine two clustering methods, namely K-Means and K-Medoids. Both methods provide three clusters with three different centroid points. Placement of data items into each cluster is somewhat different. Both clustering methods place 9 of the same data items into cluster Poor. The K-Means method places 10 data items in cluster Good and 6 data items in the cluster Satisfactory. Whereas the K-Medoids method places 7 data items in cluster Good and 9 data items in cluster Satisfactory. Evaluation of the clustering results with DBI (Davies Bouldin Index) gives a value of -0.417 for the K-Means method and a value of -0.652 for the K-Medoids method.
The last fact suggests that the K-Medoids method shows better clustering results than K-Means.