Combination of Support Vector Machine and Lexicon-Based Algorithm in Twitter Sentiment Analysis

Rindu Hafil Muhammadi(1*), Tri Ginanjar Laksana(2), Amalia Beladinna Arifa(3),

(1) 
(2) Institut Teknologi Telkom Purwokerto
(3) Institut Teknologi Telkom Purwokerto
(*) Corresponding Author
DOI: https://doi.org/10.23917/khif.v8i1.15213

Abstract

Data from the Ministry of Civil Works and Public Housing (Kementrian PUPR) in 2019 shows that around 81 million millennials do not own houses. Government Regulation Number 25 of 2020 on the Implementation of Public Housing Savings, commonly called PP 25 Tapera 2020, is one of the government's efforts to ensure that Indonesian people can afford houses. Tapera is a deposit of workers for house financing, which is refundable after the term expires. Immediately after enaction, there were many public responses regarding the ordinance. We investigate public sentiments commenting on the regulation and use Support Vector Machine (SVM) in the study since it has a good level of accuracy. It also requires labels and training data. To speed up labeling, we use the lexicon-based method. The issue in the lexicon-based lies in the dictionary component as the most significant factor. Therefore, it is possible to update the dictionary automatically by combining lexicon-based and SVM. The SVM approach can contribute to lexicon-based, and lexicon-based can help label datasets on SVM to produce good accuracy. The research begins with collecting data from Twitter, preprocessing raw and unstructured data into ready-to-use data, labeling the data with lexicon-based, weighting with TF-IDF, processing using SVM, and evaluating algorithm performance model with a confusion matrix. The results showed that the combination of lexicon-based and SVM worked well. Lexicon-based managed to label 519 tweet data. SVM managed to get an accuracy value of 81.73% with the RBF kernel function. Another test with a Sigmoid kernel attains the highest precision at 78.68%. The RBF kernel has the highest recall result with a value of 81.73%. Then, the F1-score for both the RBF kernel and Sigmoid is 79.60%.

Keywords

sentiment analysis; tapera; public housing; lexicon-based; confusion matrix

Full Text:

PDF

References

Kumparan, "Apa itu Tapera? Fakta Penting Tabungan Perumahan Rakyat, Akankah Berbuah Rumah?," kumparan, 11 06 2020. [Online]. Available: https://kumparan.com/kumparansains/apa-itu-tapera-fakta-penting-tabungan-perumahan. [Accessed 10 05 2021].

Lokadata, "52,4% Kepala Rumah Tangga Milenial Belum Punya Rumah, Tapera Jadi Harapan," Rumah123.com, 29 07 2020. [Online]. Available: https://artikel.rumah123.com/52-4-kepala-rumah-tangga-milenial-belum-punya-rumah-tapera-jadi-harapan-61264. [Accessed 10 05 2021].

BPK RI, "JDIH BPK RI DATABASE PERATURAN," 20 05 2020. [Online]. Available: https://peraturan.bpk.go.id/Home/Details/137950/pp-no-25-tahun-2020. [Accessed 14 06 2021].

F. S. Pamungkas and I. Kharisudin, "Analisis Sentimen dengan SVM, NAIVE BAYES dan KNN untuk Studi Tanggapan Masyarakat Indonesia Terhadap Pandemi Covid-19 pada Media Sosial Twitter," PRISMA, Prosiding Seminar Nasional Matematika, vol. 4, p. 633, 2020.

N. D. S. E. &. S. I. Susanti, "Uji Perbandingan Akurasi Analisis Sentimen Pariwisata Menggunakan Algoritma Support Vector Machine dan Naive Bayes. 3(2), 26-33," Nusantara of Engineering, vol. 3, 2016.

M. R. A. Nasution and M. Hayaty, "Perbandingan Akurasi dan Waktu Proses Algoritma K-NN dan SVM dalam Analisis Sentimen Twitter," JURNAL INFORMATIKA, vol. 6, p. 233, 2016.

I. &. M. H. Syafei, "Analisis Kinerja Kombinasi Metode Berbasis Lexicon dan Metode Berbasis Learning pada Analisis Sentimen Twitter.," Universitas Indonesia, Depok, 2014.

A. I. &. A. S. Kurniawan, "Analisis Sentimen Opini Film Menggunakan Metode Naïve Bayes dan Lexicon Based Features," Jurnal Pengembangan Teknologi Informasi dan Ilmu Komputer, vol. 3, p. 8338, 2019.

A. Nurfalah, Adiwijaya and A. A. Suryani, "ANALISIS SENTIMEN BERBAHASA INDONESIA DENGAN PENDEKATAN LEXICON-BASED PADA MEDIA SOSIAL," JURNAL MASYARAKAT INFORMATIKA INDONESIA, vol. 2, p. 8, 2017.

D. W. Seno and A. Wibowo, "Analisis Sentimen Data Twitter Tentang Pasangan Capres-Cawapres Pemilu 2019 Berbasis Metode Lexicon Dan Support Vector Machine," JURNAL ILMIAH FIFO, vol. XI, pp. 144-154, 2019.

F. S. Jumeilah, "Penerapan Support Vector Machine (SVM) untuk Pengkategorian Penelitian," Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi), vol. 1, p. 2020, 2017.

D. W. P. Lestari, R. S. Perdana and P. P. Adikara, "Klasifikasi Video Clickbait pada YouTube Berdasarkan Analisis Sentimen Komentar Menggunakan Learning Vector Quantization (LVQ) dan Lexicon Based Features," Jurnal Pengembangan Teknologi Informasi dan Ilmu Komputer, vol. 3, p. 1186, 2019.

R. Arief and K. Imanuel, "ANALISIS SENTIMEN TOPIK VIRAL DESA PENARI PADA MEDIA SOSIAL TWITTER DENGAN METODE LEXICON BASED," Jurnal Ilmiah MATRIK, vol. 21, p. 245, 2019.

S. H. Kusumahadi, H. Junaedi and J. Santoso, "Klasifikasi Helpdesk Menggunakan Metode Support Vector Machine," Jurnal Informatika: Jurnal Pengembangan IT (JPIT), vol. 4, p. 55, 2019.

R. Melita, V. Amrizal, H. B. Suseno and T. Dirjam, "PENERAPAN METODE TERM FREQUENCY INVERSE DOCUMENT FREQUENCY (TF-IDF) DAN COSINE SIMILARITY PADA SISTEM TEMU KEMBALI INFORMASI UNTUK MENGETAHUI SYARAH HADITS BERBASIS WEB (STUDI KASUS: SYARAH UMDATIL AHKAM)," URNAL TEKNIK INFORMATIKA, vol. 11, p. 157, 2018.

H. C. Husada and A. S. Paramita, "Analisis Sentimen Pada Maskapai Penerbangan di Platform Twitter Menggunakan Algoritma Support Vector Machine (SVM)," TEKNIKA, vol. 10, p. 20, 2021.

Y. T. Pratama, A. F. Bachtiar and N. Y. Setiawan, "Analisis Sentimen Opini Pelanggan Terhadap Aspek Pariwisata Pantai Malang Selatan Menggunakan TF-IDF dan Support Vector Machine," Jurnal Pengembangan Teknologi Informasi dan Ilmu Komputer, vol. 2, p. 6246, 2018.

O. H. Rahman, G. Abdillah and A. Komarudin, "Klasifikasi Ujaran Kebencian pada Media Sosial Twitter Menggunakan Support Vector Machine," JURNAL RESTI (Rekayasa Sistem dan Teknologi Informasi), vol. 5, p. 20, 2021.

Y. Prayoginingsih and R. P. Kusumawardani, "Klasifikasi Data Twitter Pelanggan Berdasarkan Kategori myTelkomsel Menggunakan Metode Support Vector Machine (SVM) Studi Kasus: Telekomunikasi Selular," Jurnal Sisfo, vol. 7, p. 85, 2018.

N. Fitriyah, B. Warsito and D. A. I. Maruddani, "ANALISIS SENTIMEN GOJEK PADA MEDIA SOSIAL TWITTER DENGAN KLASIFIKASI SUPPORT VECTOR MACHINE (SVM)," JURNAL GAUSSIAN, vol. 9, p. 380, 2020.

I. M. Yulietha, S. A. Faraby and Adiwijaya, "KLASIFIKASI SENTIMEN REVIEW FILM MENGGUNAKAN ALGORITMA SUPPORT VECTOR MACHINE," e-Proceeding of Engineering, vol. 4, pp. 4747 - 4748, 2017.

L. Mutawalli, M. T. A. Zaen and W. Bagye, "KLASIFIKASI TEKS SOSIAL MEDIA TWITTER MENGGUNAKAN SUPPORT VECTOR MACHINE (Studi Kasus Penusukan Wiranto)," JIRE (Jurnal Informatika & Rekayasa Elektronika), vol. 2, p. 46, 2019.

M. I. Fikri, Y. Azhar and T. S. Sabrila, "Perbandingan Metode Naïve Bayes dan Support Vector Machine pada Analisis Sentimen Twitter," SMATIKA Jurnal, vol. 10, p. 73, 2020.

L. A. Andika, P. A. N. Azizah and Respatiwulan, "Analisis Sentimen Masyarakat terhadap Hasil Quick Count Pemilihan Presiden Indonesia 2019 pada Media Sosial Twitter Menggunakan Metode Naive Bayes Classifier," Indonesian Journal of Applied Statistics, vol. 2, p. 37, 2019.

E. Patriya, "IMPLEMENTASI SUPPORT VECTOR MACHINE PADA PREDIKSI HARGA SAHAM GABUNGAN (IHSG)," Jurnal Ilmiah Teknologi dan Rekayasa, vol. 25, p. 31, 2020.

C. D. Garcia, "Visualizing the effect of hyperparameters on Support Vector Machines," towards data science , 8 Februari 2021. [Online]. Available: https://towardsdatascience.com/visualizing-the-effect-of-hyperparameters-on-support-vector-machines-b9eef6f7357b. [Accessed 10 07 2021].

Article Metrics

Abstract view(s): 1553 time(s)
PDF: 1456 time(s)

Refbacks