Effectiveness of SVM Method by Naïve Bayes Weighting in Movie Review Classification

Fadli Fauzi Zain(1*), Yuliant Sibaroni(2),

(1) Telkom University
(2) Telkom University
(*) Corresponding Author
DOI: https://doi.org/10.23917/khif.v5i2.7770

Abstract

Classification of movie review belongs to the realm of text classification, especially in the field of sentiment analysis. One familiar text classification method used is support vector maching (SVM) and Naïve Bayes. Both of these methods are known to have good performance in handling text classification separately. Combining these two methods is expected to improve the performance of classifier compared to working separately. This paper reports the effort to classify movie reviews using the combined method of Naïve Bayes and SVM with Naïve Bayes as weights. This combined method is commonly called NBSVM. The results showed the best accuracy is obtained if the classification is done by the NBSVM method, which is equal to 88.8% with the combined features of unigram and bigram and using pre-processing in the form of data cleansing only.

Keywords

movie review; NBSVM; NaiveBayes; SVM; text mining

Full Text:

PDF

References

J. R. Pentheny, “The Influence of Movie Reviews on Consumers,” University of New Hampshire, 2015.

K. Tsutsumi, K. Shimada, and T. Endo, “Movie Review Classification Based on a Multiple Classifier *,” Proc. 21st pacific Asia Conf. Lang. Inf. Comput., no. 2007, pp. 481–488, 2007.

T. P. Sahu and S. Ahuja, “Sentiment analysis of movie reviews: A study on feature selection & classification algorithms,” in 2016 International Conference on Microelectronics, Computing and Communications (MicroCom), pp. 1–6.

S. K. Saritha, “Methods for Identifying Comparative Sentences,” Comput. Appl., vol. 108, no. 19, pp. 23–26, 2014.

P. Das and S. Sharma, “An Entropy Based Effective Algorithm for Data Discretization,” vol. 4, no. 3, 2017.

H. Houngbo and R. E. Mercer, “An automated method to build a corpus of rhetorically-classified sentences in biomedical texts,” in Proceedings ofthe First Workshop on Argumentation Mining, 2014, pp. 19–23.

A. M. F. Al Sbou, A. Hussein, B. Talal, and R. A. Rashid, “A Survey of Arabic Text Classification Models,” vol. 8, no. 6, pp. 4352–4355, 2018.

S. Wang and C. Manning, “Baselines and Bigrams: Simple, Good Sentiment and Topic Classification,” Proc. 50th Annu. Meet. Assoc. Comput. Linguist., vol. 94305, no. July, pp. 90–94, 2012.

S. Vijayarani, J. Ilamathi, and M. Nithya, “Preprocessing Techniques for Text Mining - An Overview,” Int. J. Comput. Sci. Commun. Networks, vol. 5, no. 1, pp. 7–16, 2015.

P. Bo and L. Lee, “Movie Review Data,” 2004. .

A.-H. Tan, “Text Mining: The state of the art and the challenges,” Proc. PAKDD 1999 Work. Knowl. Disocovery from Adv. Databases, vol. 8, pp. 65–70, 1999.

G. Miner, J. Elder, T. Hill, R. Nisbet, D. Delen, and A. Fast, Practical Text Mining and Statistical Analysis for Non - Structured Text Data Applications. Waltham: Elsevier, 2012.

A. K. Uysal and S. Gunal, “The impact of preprocessing on text classification,” Inf. Process. Manag., vol. 50, no. 1, pp. 104–112, 2014.

A. Dasgupta, P. Drineas, B. Harb, V. Josifovski, and M. W. Mahoney, “Feature Selection Methods for Text Classification,” KDD, pp. 230–239, 2007.

T. Arifin and A. Herliana, “Optimasi Metode Klasifikasi Dengan Menggunakan Particle Swarm Optimization Untuk Identifikasi Penyakit Diabetes Retinopathy,” Khazanah Inform. J. Ilmu Komput. dan Inform., vol. 4, no. 2, pp. 77–81, 2018.

A. Tripathy, A. Agrawal, and S. K. Rath, “Classification of sentiment reviews using n-gram machine learning approach _ Elsevier Enhanced Reader.pdf,” Expert Syst. with Appl., pp. 117–126, 2016.

Y. Heights, “Class-Based n-gram Models of Natural Language Iwl )’" Pr ( Wk Iw ~ -l ). Wk,” Comput. Linguist., no. 1950, 1992.

I. Rish, “An empirical study of the naive Bayes classifier,” Empir. methods Artif. Intell. Work. IJCAI, vol. 22230, no. JANUARY 2001, pp. 41–46, 2001.

H. Bhavsar and A. Ganatra, “A Comparative Study of Training Algorithms for Supervised Machine Learning,” Int. J. Soft Comput. Eng., vol. 2, no. 4, pp. 74–81, 2012.

K. Markham, “Simple guide to confusion matrix terminology,” Data School, 2014. .

Kuspriyanto, O. S. Santoso, D. H. Widyantoro, H. S. Sastramihardja, K. Muludi, and S. Maimunah, “Performance Evaluation of SVM-Based Information Extraction using τ Margin Values,” Int. J. Electr. Eng. Informatics -, vol. 2, no. 4, pp. 256–265, 2010.

S. Teufel, A. Siddharthan, and D. Tidhar, “Automatic classification of citation function,” Proc. EMNLP-06, Sydney, Aust., 2006.

S. Teufel and A. Athar, “Detection of Implicit Citations for Sentiment Detection,” Proc. ACL-12 Work. Discov. Struct. Sch. Discourse, Jeju Island, South Korea, 2012, no. July, pp. 18–26, 2012.

Article Metrics

Abstract view(s): 877 time(s)
PDF: 690 time(s)

Refbacks

  • There are currently no refbacks.