Komparasi Kinerja Algoritma Data Mining pada Dataset Konsumsi Alkohol Siswa

Noviyanti Sagala(1*), Hendrik Tampubolon(2),

(1) Universitas Kristen Krida Wacana
(2) Universitas Kristen Krida Wacana
(*) Corresponding Author
DOI: https://doi.org/10.23917/khif.v4i2.7061

Abstract

Data mining melakukan proses ekstraksi pengetahuan yang diperoleh dari sekumpulan data dalam jumlah besar. Penelitian ini bertujuan untuk menerapkan dan melakukan analisis kinerja algoritma data mining untuk memprediksi konsumsi alkohol dan menganalisis faktor-faktor yang terkait pada siswa tingkat menengah. Adapun tahapan yang dilakukan ialah pra-proses data, seleksi fitur, klasifikasi, dan evaluasi model. Pada tahap praproses, beberapa fitur diubah menjadi bentuk yang sesuai untuk memudahkan proses klasifikasi. Selanjutnya, algoritma Gain Ratio dan Feature Correlation-Based Filter (FCBF) digunakan untuk memilih fitur-fitur yang relevan dan penting untuk digunakan dalam tahapan klasifikasi. Decision Tree C5.0, Support Vector Machine (SVM), K-Nearest Neighbor (KNN), dan Naive Bayes (NB) dieksekusi pada kelompok fitur yang terpilih. Akurasi model yang dibangun dievaluasi menggunakan 10-fold Cross-Validation (CV). Hasil penelitian menunjukkan bahwa model klasifikasi yang dibangun menggunakan Naïve Bayes memiliki nilai akurasi tertinggi dengan menggunakan 5 fitur terbaik dari Gain Ratio. Selain itu, penggunaan metode pemilihan fitur mampu meningkatkan performa dari seluruh klasifier secara umum. Pengujian lebih lanjut pada data yang sama maupun berbeda perlu dilakukan untuk mendapatkan gambaran lebih mendalam mengenai kinerja algoritma-algoritma yang digunakan.

Keywords

data mining; konsumsi alkohol siswa; Naïve Bayes; KNN; decision tree

References

R. Sumitha, E. S. Vinothkumar, and P. Scholar, “Prediction of Students Outcome Using Data Mining Techniques,” Int. J. Sci. Eng. Appl. Sci., vol. 2, no. 6, pp. 132–139, 2016.

P. Kaur, M. Singh, and G. S. Josan, “Classification and Prediction Based Data Mining Algorithms to Predict Slow Learners in Education Sector,” Procedia Comput. Sci., vol. 57, pp. 500–508, 2015..

R. Campagni, D. Merlini, R. Sprugnoli, and M. C. Verri, "Data Mining Models for Student Careers," Expert Sys. App., vol. 42, no.13, pp. 5508–5521, 2015.

W. H. Organisation, “Global status report on alcohol and health,” World Heal. Organ., pp. 1–100, 2014.

S. Kairouz and E. M. Adlaf, “Schools, Students and Heavy Drinking: a Multilevel Analysis,” Addict. Res. Theory, vol. 11, no. 6, pp. 427–439, 2003.

S. Palaniappan, N. A. Hameed, A. Mustapha, and N. A. Samsudin, “Classification of Alcohol Consumption among Secondary School Students,” vol. 1, no. 4, pp. 224–226, 2017.

M.-P. Fabio, D. la Hoz-Manotas Alexis, M.-O. Roberto, M.-P. Ubaldo, D.-M. Jorge, and C.-N. Harold, “Designing A Method for Alcohol Consumption Prediction Based on Clustering and Support Vector Machines,” Res. J. Appl. Sci. Eng. Technol., vol. 14, no. 4, pp. 146–154, 2017.

B. Hariharan, R. Krithivasan, and A. Deborah, “Prediction of Secondary School Students’ Alcohol Addiction using Random Forest,” Int. J. Comput. Appl., vol. 149, no. 6, pp. 975–8887, 2016.

Syaiful and Harianto, “Pemilihan Fitur untuk Monitoring dan Klasifikasi Kondisi Pahat,” vol. 37, no. 1, pp. 32–40, 2016.

M. M. Abdul Jalil, F. Mohd, and N. M. Mohamad Noor, “A Comparative Study to Evaluate Filtering Methods for Crime Data Feature Selection,” Procedia Comput. Sci., vol. 116, pp. 113–120, 2017.

R. Revathy and R. Lawrance, “Comparative Analysis of C4.5 and C5.0 Algorithms on Crop Pest Data,” Int. J. Innov. Res. Comput. Commun. Eng., vol. 5, no. 1, pp. 50–58, 2017.

J. Han, M. Kamber, and J. Pei, Data Mining Concepts and Technique. 2011.

O. Ardhapure, G. Patil, D. Udani, and K. Jetha, “Comparative Study of Classification Algorithm for Text Based Categorization,” Int. J. Res. Eng. Technol., vol. 5, no. 2, pp. 217–220, 2016.

Y. Kustiyahningsih, D. R. Anamisa, and N. Syafa'ah, "Sistem Pendukung Keputusan untuk Menentukan Jurusan pada Siswa SMA Menggunakan Metode KNN dan SMART,” Skripsi, Universitas Trunojoyo, Madura, 2013.

A. M. Puspitasari, D. E. Ratnawati, and A. W. Widodo, “Klasifikasi Penyakit Gigi Dan Mulut Menggunakan Metode Support Vector Machine,” Pengemb. Teknol. Inf. dan Ilmu Komput., vol. 2, no. 2, pp. 802–810, 2018.

D. Dheeru and E. K. Taniskidou, "UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science, 2017.

N. Sagala and J. Wang, "A Comparative Study for Classification on Different Domain," 10th Intl. Conf. on Mach. Learn. and Comp., pp. 1–5, 2018.

S. W. Lin, K. C. Ying, S. C. Chen, and Z. J. Lee, "Particle Swarm Optimization for Parameter Determination and Feature Selection of Support Vector Machines," Exp. Syst. Appl., vol. 35, no. 4, pp. 1817–1824, 2008.

F. Pagnotta and M. A. Hossain, “Using Data Mining to Predict Secondary School Student Alcohol Consumption,” Dep. Comput. Sci. Univ. Camerino., pp. 1–9, 2016.

A. S. Rani and S. Jyothi, "Performance Analysis of Classification Algorithms under Different Datasets," 3rd Intl. Conf. on Comp. for Sustainable Global Dev. (INDIACom), pp. 1584-1589, 2016.

A. Ashari, I. Paryudi, and A. M. Tjoa, "Performance Comparison between Naïve bayes, Decision Tree and k-Nearest neighbor in Searching Alternative Design in an Energy Simulation Tool," Intl. J. of Adv. Comp. Science and App., vol 4, pp 33-39, 2013.

R. M. Rahman and F. Afroz, “Comparison of Various Classification Techniques using Different Data Mining Tools for Diabetes Diagnosis,” J. Softw. Eng. Appl., vol. 6, no. 1, pp. 85–97, 2013.

L. Dan, L. Lihua, Z. Zhaoxin, “Research of Text Categorization on WEKA,” 3rd Intl. Conf. on Intelligent Sys. Design and Engi. App., 2013.

J. Huang, J. Lu, C. X. Ling, “Comparing Naive Bayes, Decision Trees, and SVM with AUC and Accuracy,” 3rd IEEE Int. Conf. on Data Mining, 2003.

E. Frank, L. Trigg, G. Holmes, and I. H. Witten, “Technical note: Naive Bayes for Regression,” Mach. Learn., vol. 41, no. 1, pp. 5–25, 2000.

H. Zhang, “The Optimality of Naive Bayes,” Florida Artif. Intell. Res. Soc. Conf., no. 2, pp. 1–6, 2004.

Article Metrics

Abstract view(s): 4281 time(s)
PDF (Bahasa Indonesia): 2490 time(s)

Refbacks

  • There are currently no refbacks.