Automatic Language Identification for Indonesian-Malaysian Language Using Machine Learning
(1) Universitas Sriwijaya
(2) Universitas Sriwijaya
(*) Corresponding Author
DOI: https://doi.org/10.23917/khif.v9i2.21669
Abstract
Keywords
Full Text:
PDFReferences
T. Jauhiainen, K. Lindén, and H. Jauhiainen, “Evaluation of language identification methods using 285 languages,” in NoDaLiDa 2017 - 21st Nordic Conference of Computational Linguistics, Proceedings of the Conference, 2017, no. May, pp. 183–191.
S. Carter, W. Weerkamp, and M. Tsagkias, “Microblog language identification: Overcoming the limitations of short, unedited and idiomatic text,” Language Resources and Evaluation, vol. 47, no. 1, pp. 195–215, 2013, doi: 10.1007/s10579-012-9195-y.
R. Ferdiana, F. Jatmiko, D. D. Purwanti, A. S. T. Ayu, and W. F. Dicka, “Dataset Indonesia untuk Analisis Sentimen,” Jurnal Nasional Teknik Elektro dan Teknologi Informasi (JNTETI), vol. 8, no. 4, p. 334, 2019, doi: 10.22146/jnteti.v8i4.533.
B. Ranaivo-Malancon, “Automatic Identification of Close Languages - Case study: Malay and Indonesian,” ECTI-CIT, vol. 2, no. 2, pp. 126–134, 2006, doi: 10.37936/ecti-cit.200622.53288.
Z. Indra, N. Zamin, and J. Jaafar, “A Language Identifier for Indonesian and Malay Text Document,” p. 5, 2015.
H. Nomoto, A. Shiro, and S. Asako, “Reclassification of the Leipzig Corpora Collection for Malay and Indonesian.” 東京外国語大学アジア・アフリカ言語文化研究所, Sep. 30, 2018. doi: 10.15026/92899.
Yoav Goldberg, “A Primer on Neural Network Models for Natural Language Processing,” Journal of Artificial Intelligence Research, vol. 57, pp. 345–420, 2016.
A. Massaro, V. Maritati, and A. Galiano, “Automated self-learning Chatbot initially built as a FAQS database information retrieval system: Multi-level and Intelligent Universal Virtual Front-Office Implementing Neural Network,” Informatica (Slovenia), vol. 42, no. 4, pp. 515–525, 2018, doi: 10.31449/inf.v42i3.2173.
A. Massaro, D. Giannone, V. Birardi, and A. M. Galiano, “An innovative approach for the evaluation of the web page impact combining user experience and neural network score,” Future Internet, vol. 13, no. 6, p. 145, 2021, doi: 10.3390/fi13060145.
S. Minaee, N. Kalchbrenner, E. Cambria, N. Nikzad, M. Chenaghlu, and J. Gao, “Deep Learning Based Text Classification: A Comprehensive Review,” ACM Computing Surveys (CSUR), vol. 54, no. 3, pp. 1–40, 2021.
A. Massaro, V. Vitti, A. Galiano, and A. Morelli, “Business Intelligence Improved by Data Mining Algorithms and Big Data Systems: An Overview of Different Tools Applied in Industrial Research,” Computer Science and Information Technology, vol. 7, no. 1, pp. 1–21, 2019, doi: 10.13189/csit.2019.070101.
Y. Li and B. Liu, “A new vector representation of short texts for classification,” International Arab Journal of Information Technology, vol. 17, no. 2, pp. 241–249, 2020, doi: 10.34028/iajit/17/2/12.
E. Tromp and M. Pechenizkiy, “Graph-based N-gram language identification on short texts,” in “Proceedings of the 20th annual Belgian-Dutch Conference on Machine Learning,” 2011, pp. 27–34.
P. Gamallo, M. Garcia, S. Sotelo, and J. R. Pichel, “Comparing ranking-based and Naive Bayes approaches to language detection on tweets,” in CEUR Workshop Proceedings, 2014, vol. 1228, pp. 12–16.
A. Jaech, G. Mulcaire, S. Hathi, M. Ostendorf, and N. A. Smith, “Hierarchical Character-Word Models for Language Identification,” in EMNLP 2016 - Conference on Empirical Methods in Natural Language Processing, Proceedings of the 4th International Workshop on Natural Language Processing for Social Media, SocialNLP 2016, 2016, pp. 84–93. doi: 10.18653/v1/w16-6212.
T. Kocmi and O. Bojar, “LanideNN: Multilingual language identification on character window,” in 15th Conference of the European Chapter of the Association for Computational Linguistics, EACL 2017 - Proceedings of Conference, 2017, vol. 2, pp. 927–936. doi: 10.18653/v1/e17-1087.
D. Jurgens, Y. Tsvetkov, and D. Jurafsky, “Incorporating dialectal variability for socially equitable language identification,” in ACL 2017 - 55th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference (Long Papers), 2017, vol. 2, pp. 51–57. doi: 10.18653/v1/P17-2009.
D. Goldhahn, T. Eckart, and U. Quasthoff, “Building large monolingual dictionaries at the leipzig corpora collection: From 100 to 200 languages,” in Proceedings of the 8th International Conference on Language Resources and Evaluation, LREC 2012, 2012, pp. 759–765.
L. Bottou and C.-J. Lin, “Support Vector Machine Solvers,” Large-Scale Kernel Machines, vol. 3, no. 1, pp. 301–320, 2007, doi: 10.7551/mitpress/7496.003.0003.
W. S. Noble, “What is a support vector machine?,” Nature Biotechnology, vol. 24, no. 12, pp. 1565–1567, 2006, doi: 10.1038/nbt1206-1565.
A. M. Kibriya, E. Frank, B. Pfahringer, and G. Holmes, “Multinomial naive bayes for text categorization revisited,” in Lecture Notes in Artificial Intelligence (Subseries of Lecture Notes in Computer Science), 2004, vol. 3339, pp. 488–499. doi: 10.1007/978-3-540-30549-1_43.
Article Metrics
Abstract view(s): 807 time(s)PDF: 328 time(s)
Refbacks
- There are currently no refbacks.