Identifying Hate Speech in Tweets with Sentiment Analysis on Indonesian Twitter Utilizing Support Vector Machine Algorithm

Imam Riadi(1*), Abdul Fadlil(2), Murni Murni(3),

(1) Universitas Ahmad Dahlan
(2) Universitas Ahmad Dahlan
(3) Universitas Ahmad Dahlan
(*) Corresponding Author
DOI: https://doi.org/10.23917/khif.v9i2.22470

Abstract

Twitter had 24 million users in Indonesia at the beginning of 2023. Despite having fewer users than other platforms, its fast and instant nature makes Twitter a significant source of information dissemination. Tweets shared on Twitter offer various advantages. However, it also has negative consequences, including the dissemination of fake news, instances of cyberbullying, and the expression of hate speech. Specifically, hate speech employs offensive language to discriminate against an individual or group based on race, ethnicity, nationality, religion, gender, sexual orientation, or other personal attributes, leading to discord. Such behavior comes under the jurisdiction of various legal statutes, including the Constitution, the Criminal Code, and the ITE Law. The primary objective of this research is to categorize tweets shared on Twitter into hate speech and non-hate speech sentiments, utilizing a Support Vector Machine (SVM) algorithm based on a dataset of 5,000 tweets. This research involved data preprocessing, labeling, feature extraction using TF-IDF, model training (80%), and testing (20%). The final stage includes enhancing SVM parameters through GridSearch and cross-validation methods (GridSearchCV), followed by analysis using a Confusion Matrix with the Matplotlib Library. Radial Basis Function (RBF) kernels, defined by parameters C=10 and gamma=0.1, exhibited the highest performance among SVM models, boasting an 84% accuracy. The RBF kernel also attained 85% precision, 97% recall, and a 91% F1-score for hate speech identification. In conclusion, the evaluation of SVM kernel performance highlights the superiority of RBF kernels in achieving the highest accuracy, complemented by nuanced insights into hate speech precision, recall, and F1-score values across various kernel types.

Keywords

Sentiment Analysis; Hate Speech; Indonesian Twitter; Support Vector Machine Algorithm; Machine Learning

Full Text:

PDF

References

S. Kemp, “Digital 2023 : Indonesia,” Datareportal, 2023.

F. E. Ayo, O. Folorunso, F. T. Ibharalu, and I. A. Osinuga, “Machine learning techniques for hate speech classification of twitter data: State-of-The-Art, future challenges and research directions,” Comput Sci Rev, vol. 38, p. 100311, 2020, doi: 10.1016/j.cosrev.2020.100311.

W. M. Baihaqi, M. Pinilih, and M. Rohmah, “Kombinasi K-Means Dan Support Vector Machine ( Svm ) Untuk K-Means and Support Vector Machine ( Svm ) Combination To Predict Sara Elements on Tweet,” Jurnal Teknologi Informasi dan Ilmu Komputer, vol. 7, no. 3, pp. 501–510, 2020, doi: 10.25126/jtiik.202072126.

N. Badri, F. Kboubi, and A. H. Chaibi, “Combining FastText and Glove Word Embedding for Offensive and Hate speech Text Detection,” Procedia Comput Sci, vol. 207, no. Kes, pp. 769–778, 2022, doi: 10.1016/j.procs.2022.09.132.

Oryza Habibie Rahman, Gunawan Abdillah, and Agus Komarudin, “Klasifikasi Ujaran Kebencian pada Media Sosial Twitter Menggunakan Support Vector Machine,” Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi), vol. 5, no. 1, pp. 17–23, 2021, doi: 10.29207/resti.v5i1.2700.

Murni, I. Riadi, and A. Fadlil, “Analisis Sentimen HateSpeech pada Pengguna Layanan Twitter dengan Metode Naïve Bayes Classifier ( NBC ),” JURIKOM (Jurnal Riset Komputer), vol. 10, no. 2, pp. 0–9, 2023, doi: 10.30865/jurikom.v10i2.5984.

L. Tabassum, A. Karim, L. T. Ava, A. Karim, and A. Charles, “Intelligent Identification of Hate Speeches to address the increased rate of Individual Mental Degeneration,” Procedia Comput Sci, vol. 219, pp. 1527–1537, 2023, doi: 10.1016/j.procs.2023.01.444.

D. Mody, Y. D. Huang, and T. E. Alves de Oliveira, “A curated dataset for hate speech detection on social media text,” Data Brief, vol. 46, p. 108832, 2023, doi: 10.1016/j.dib.2022.108832.

F. E. Ayo, O. Folorunso, F. T. Ibharalu, I. A. Osinuga, and A. Abayomi-Alli, “A probabilistic clustering model for hate speech classification in twitter,” Expert Syst Appl, vol. 173, no. February, p. 114762, 2021, doi: 10.1016/j.eswa.2021.114762.

A. M. U. D. Khanday, S. T. Rabani, Q. R. Khan, and S. H. Malik, “Detecting twitter hate speech in COVID-19 era using machine learning and ensemble learning techniques,” International Journal of Information Management Data Insights, vol. 2, no. 2, p. 100120, 2022, doi: 10.1016/j.jjimei.2022.100120.

M. A. Fauzi and A. Yuniarti, “Ensemble method for indonesian twitter hate speech detection,” Indonesian Journal of Electrical Engineering and Computer Science, vol. 11, no. 1, pp. 294–299, 2018, doi: 10.11591/ijeecs.v11.i1.pp294-299.

M. O. Ibrohim and I. Budi, “Multi-label Hate Speech and Abusive Language Detection in Indonesian Twitter,” pp. 46–57, 2019, doi: 10.18653/v1/w19-3506.

M. Hayaty, S. Adi, and A. D. Hartanto, “Lexicon-Based Indonesian Local Language Abusive Words Dictionary to Detect Hate Speech in Social Media,” Journal of Information Systems Engineering and Business Intelligence, vol. 6, no. 1, p. 9, 2020, doi: 10.20473/jisebi.6.1.9-17.

H. Watanabe, M. Bouazizi, and T. Ohtsuki, “Hate Speech on Twitter: A Pragmatic Approach to Collect Hateful and Offensive Expressions and Perform Hate Speech Detection,” IEEE Access, vol. 6, pp. 13825–13835, 2018, doi: 10.1109/ACCESS.2018.2806394.

M. Okky Ibrohim, E. Sazany, and I. Budi, “Identify abusive and offensive language in indonesian twitter using deep learning approach,” J Phys Conf Ser, vol. 1196, no. 1, 2019, doi: 10.1088/1742-6596/1196/1/012041.

R. H. Muhammadi, T. G. Laksana, and A. B. Arifa, “Combination of Support Vector Machine and Lexicon-Based Algorithm in Twitter Sentiment Analysis,” Khazanah Informatika : Jurnal Ilmu Komputer dan Informatika, vol. 8, no. 1, pp. 59–71, 2022, doi: 10.23917/khif.v8i1.15213.

A. Wikandiputra, Afiahayati, and V. M. Sutanto, “Identifying Hate Speech in Bahasa Indonesia With Lexicon-Based Features and Synonym-Based Query Expansion,” ICIC Express Letters, vol. 16, no. 8, pp. 811–818, 2022, doi: 10.24507/icicel.16.08.811.

A. N. Muhammad, S. Bukhori, and P. Pandunata, “Sentiment Analysis of Positive and Negative of YouTube Comments Using Naïve Bayes-Support Vector Machine (NBSVM) Classifier,” Proceedings - 2019 International Conference on Computer Science, Information Technology, and Electrical Engineering, ICOMITEE 2019, vol. 1, pp. 199–205, 2019, doi: 10.1109/ICOMITEE.2019.8920923.

G. del Valle-Cano, L. Quijano-Sánchez, F. Liberatore, and J. Gómez, “SocialHaterBERT: A dichotomous approach for automatically detecting hate speech on Twitter through textual analysis and user profiles,” Expert Syst Appl, vol. 216, no. June 2022, p. 119446, 2023, doi: 10.1016/j.eswa.2022.119446.

K. Rakshitha, R. H M, M. Pavithra, A. H D, and M. Hegde, “Sentimental analysis of Indian regional languages on social media,” Global Transitions Proceedings, vol. 2, no. 2, pp. 414–420, 2021, doi: 10.1016/j.gltp.2021.08.039.

Rini, E. Utami, and A. D. Hartanto, “Systematic Literature Review of Hate Speech Detection with Text Mining,” 2020 2nd International Conference on Cybernetics and Intelligent System, ICORIS 2020, 2020, doi: 10.1109/ICORIS50180.2020.9320755.

N. Hafidz and D. Yanti Liliana, “Klasifikasi Sentimen pada Twitter Terhadap WHO Terkait Covid-19 Menggunakan SVM, N-Gram, PSO,” Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi), vol. 5, no. 2, pp. 213–219, 2021, doi: 10.29207/resti.v5i2.2960.

M. Rahardi, A. Aminuddin, F. F. Abdulloh, and R. A. Nugroho, “Sentiment Analysis of Covid-19 Vaccination using Support Vector Machine in Indonesia,” International Journal of Advanced Computer Science and Applications, vol. 13, no. 6, pp. 534–539, 2022, doi: 10.14569/IJACSA.2022.0130665.

P. Arsi and R. Waluyo, “Analisis Sentimen Wacana Pemindahan Ibu Kota Indonesia Menggunakan Algoritma Support Vector Machine (SVM),” Jurnal Teknologi Informasi dan Ilmu Komputer, vol. 8, no. 1, p. 147, 2021, doi: 10.25126/jtiik.0813944.

A. M. Pravina, I. Cholissodin, and P. P. Adikara, “Analisis Sentimen Tentang Opini Maskapai Penerbangan pada Dokumen Twitter Menggunakan Algoritme Support Vector Machine (SVM),” Jurnal Pengembangan Teknologi Informasi dan Ilmu Komputer, vol. 3, no. 3, pp. 2789–2797, 2019, [Online]. Available: http://j-ptiik.ub.ac.id

I. Riadi, A. Fadlil, I. Julda, and D. E. P. Putra, “Batik Pattern Classification using Naïve Bayes Method Based on Texture Feature Extraction,” Khazanah Informatika : Jurnal Ilmu Komputer dan Informatika, vol. 9, no. 1, 2023.

A. Yudhana, I. Riadi, and M. R. Djou, “Determining eligible villages for mobile services using k- NN algorithm,” ILKOM Jurnal Ilmiah, vol. 15, no. 1, pp. 11–20, 2023.

H. Herman, I. Riadi, and Y. Kurniawan, “Vulnerability Detection With K-Nearest Neighbor and Naïve Bayes Method Using Machine Learning,” … Journal of Artificial Intelligence Research, 2023, [Online]. Available: http://www.ijair.id/index.php/ijair/article/view/795

K. M. Hana, Adiwijaya, S. Al Faraby, and A. Bramantoro, “Multi-label Classification of Indonesian Hate Speech on Twitter Using Support Vector Machines,” 2020 International Conference on Data Science and Its Applications, ICoDSA 2020, 2020, doi: 10.1109/ICoDSA50139.2020.9212992.

Article Metrics

Abstract view(s): 370 time(s)
PDF: 286 time(s)

Refbacks

  • There are currently no refbacks.