Word Cloud of UKSW Lecturer Research Competence Based on Google Scholar Data

Suryasatriya Trihandaru, Hanna Arini Parhusip, Bambang Susanto, Carolina Febe Ronicha Putri

DOI: https://doi.org/10.23917/khif.v7i2.13123


There is a need in the Universitas Kristen Satya Wacana (UKSW) to identify the research competence of their faculties at a study program and University level. To accomplish this requirement, we need to automate the analysis of research output and publications quickly. Research articles are scattered in many publisher systems and journals which may be reputable, unreputable, accredited, and unaccredited. We devised a computer code to quickly and efficiently retrieve publication titles recorded in Google Scholar using a machine learning algorithm. The result display is in the form of a word cloud so that dominant and frequent words will be prominent in the visualization. In determining scientific terms to display, we used a modified version of the word cloud Python module and unmodified Term Frequency - Inverse Document Frequency (TF-IDF) library. The algorithm was tested on publication titles of our study program in UKSW and confirmed directly. The system features the ability to produce a word cloud visualization for an individual faculty, for faculties in a study program, or in the University as a whole. We have not differentiated publication sources, whether they are reputable or unreputable, which might affect the accuracy of competence identification.


machine learning; word cloud; corpus; research competence

Full Text:



V. B. Kobayashi, S. T. Mol, H. A. Berkers, G. Kismihók, and D. N. Den Hartog, “Text Classification for Organizational Researchers: A Tutorial,” Organ. Res. Methods, vol. 21, no. 3, pp. 766–799, 2018, doi: 10.1177/1094428117719322.

R. Kusumaningrum, S. Adhy, and Suryono, “WCLOUDVIZ: Word cloud visualization of Indonesian news articles classification based on Latent dirichlet allocation,” Telkomnika (Telecommunication Comput. Electron. Control., vol. 16, no. 4, pp. 1752–1759, 2018, doi: 10.12928/TELKOMNIKA.v16i4.8194.

C. An, H. Lim, D. W. Kim, J. H. Chang, Y. J. Choi, and S. W. Kim, “Machine learning prediction for mortality of patients diagnosed with COVID-19: a nationwide Korean cohort study,” Sci. Rep., vol. 10, no. 1, pp. 1–11, 2020, doi: 10.1038/s41598-020-75767-2.

J. Beschi Raja, R. Anitha, R. Sujatha, V. Roopa, and S. Sam Peter, “Diabetics prediction using gradient boosted classifier,” Int. J. Eng. Adv. Technol., vol. 9, no. 1, pp. 3181–3183, 2019, doi: 10.35940/ijeat.A9898.109119.

H. Qian, “Big data Bayesian linear regression and variable selection by normal-inverse-gamma summation,” Bayesian Anal., vol. 13, no. 4, pp. 1007–1031, 2018, doi: 10.1214/17-BA1083.

Y. A. Sari, A. G. Hapsani, S. Adinugroho, L. Hakim, and S. Mutrofin, “Preprocessing of Skin Images and Feature Selection for Early Stage of Melanoma Detection using Color Feature Extraction,” Int. J. Artif. Intell. Res., vol. 4, no. 2, p. 95, 2021, doi: 10.29099/ijair.v4i2.165.

L. Demidova, E. Nikulchev, and Y. Sokolova, “Big Data Classification Using the SVM Classifiers with the Modified Particle Swarm Optimization and the SVM Ensembles,” Int. J. Adv. Comput. Sci. Appl., vol. 7, no. 5, pp. 294–312, 2016, doi: 10.14569/ijacsa.2016.070541.

D. Antons, E. Grünwald, P. Cichy, and T. O.

Salge, “The application of text mining methods in innovation research: current state, evolution patterns, and development priorities,” R D Manag., vol. 50, no. 3, pp. 329–351, 2020, doi: 10.1111/radm.12408.

E. G. Dada, J. S. Bassi, H. Chiroma, S. M. Abdulhamid, A. O. Adetunmbi, and O. E. Ajibuwa, “Machine learning for email spam filtering: review, approaches and open research problems,” Heliyon, vol. 5, no. 6, 2019, doi: 10.1016/j.heliyon.2019.e01802.

Y. Jin, “Development of Word Cloud Generator Software Based on Python,” in Procedia Engineering, 2017, vol. 174, pp. 788–792, doi: 10.1016/j.proeng.2017.01.223.

G. Sazandrishvili, “Asset tokenization in plain English,” J. Corp. Account. Financ., vol. 31, no. 2, pp. 68–73, 2020, doi: 10.1002/jcaf.22432.

G. Astika, “Lemmatizing textbook corpus for learner dictionary of basic vocabulary,” Indones. J. Appl. Linguist., vol. 7, no. 3, pp. 630–637, 2018, doi: 10.17509/ijal.v7i3.9813.

Hartanto, “Text Mining Dan Sentimen Analisis Twitter Pada Gerakan Lgbt,” Intuisi J. Psikol. Ilm., vol. 9, no. 1, pp. 18–25, 2017.

N. K. Widyasanti, I. K. G. Darma Putra, and N. K. Dwi Rusjayanthi, “Seleksi Fitur Bobot Kata dengan Metode TFIDF untuk Ringkasan Bahasa Indonesia,” J. Ilm. Merpati (Menara Penelit. Akad. Teknol. Informasi), p. 119, 2018, doi: 10.24843/jim.2018.v06.i02.p06.

M. Umadevi, “Document Comparison Based on the Page Layout,” no. 1, pp. 2–6, 2020.

Y. Huang, Y. Wang, and F. Ye, “A Study of the application of word cloud visualization in college english teaching,” Int. J. Inf. Educ. Technol., vol. 9, no. 2, pp. 119–122, 2019, doi: 10.18178/ijiet.2019.9.2.1185.

J. Chen, C. Chen, and Y. Liang, “Optimized TF-IDF Algorithm with the Adaptive Weight of Position of Word,” 2016, vol. 133, pp. 114–117, doi: 10.2991/aiie-16.2016.28.

Article Metrics

Abstract view(s): 135 time(s)
PDF: 73 time(s)


  • There are currently no refbacks.