Combination of Graph-based Approach and Sequential Pattern Mining for Extractive Text Summarization with Indonesian Language

Dian Sa'adillah Maylawati; Yogan Jaya Kumar; Fauziah Binti Kasmin

Combination of Graph-based Approach and Sequential Pattern Mining for Extractive Text Summarization with Indonesian Language

Dian Sa'adillah Maylawati^(1*), Yogan Jaya Kumar⁽²⁾, Fauziah Binti Kasmin⁽³⁾,

(1) Centre for Advanced Computing Technology, Faculty of Information and Communication Technology, Universiti Teknikal Malaysia Melaka, Malaysia and Department of Informatics, UIN Sunan Gunung Djati Bandung, Indonesia
(2) Centre for Advanced Computing Technology, Faculty of Information and Communication Technology, Universiti Teknikal Malaysia Melaka, Malaysia
(3) Centre for Advanced Computing Technology, Faculty of Information and Communication Technology, Universiti Teknikal Malaysia Melaka, Malaysia
(*) Corresponding Author
DOI: https://doi.org/10.23917/khif.v9i2.21495

Abstract

The great challenge in Indonesian automatic text summarization research is producing readable summaries. The quality of text summary can be reached if the meaning of the text can be maintained properly. As a result, the purpose of this study is to improve the quality of extractive Indonesian automatic text summarization by taking into account the quality of structured text representation. This study employs Sequential Pattern Mining (SPM) to generate a sequence of words as a structured representation of text and a graph-based approach to generate automatic text summarization. The SPM algorithm used is PrefixSpan, and the graph-based approach uses the Bellman-Ford algorithm. The results of an experiment using the IndoSum dataset show that combining SPM and Bellman-Ford can improve the precision, recall, and f-measure of ROUGE-1, ROUGE-2, and ROUGE-L. When Bellman-Ford is combined with SPM, the F-measure of ROUGE-1 increases from 0.2299 to 0.3342. The ROUGE-2 f-measure increases from 0.1342 to 0.2191, and the ROUGE-L f-measure increases from 0.1904 to 0.2878. This result demonstrates that SPM can improve the performance of the Bellman-Ford algorithm in producing Indonesian text summaries.

Keywords

automatic text summarization; Bellman-Ford algorithm; graph-based approach; Indonesian language; prefixspan; sequence of words; sequential pattern mining

Full Text:

PDF

References

M. Gambhir and V. Gupta, “Recent automatic text summarization techniques: a survey,” Artif. Intell. Rev., vol. 47, pp. 1–66, 2017, doi: 10.1007/s10462-016-9475-9.

V. Gupta and G. S. Lehal, “A Survey of Text Summarization Extractive techniques,” in Journal of Emerging Technologies in Web Intelligence, 2010, vol. 2, no. 3, pp. 258–268, doi: 10.4304/jetwi.2.3.258-268.

M. Rajangam and C. Annamalai, “Extractive document summarization using an adaptive, knowledge based cognitive model,” Cogn. Syst. Res., vol. 56, pp. 56–71, 2019, doi: 10.1016/j.cogsys.2018.11.005.

N. R. Kasture, N. Yargal, N. N. Singh, N. Kulkarni, and V. Mathur, “A Survey on Methods of Abstractive Text Summarization,” Int. J. Res. Emerg. Sci. Technol., vol. 1, no. 6, 2014.

G. S. Budhi, R. Intan, R. Silvia, and R. R. Stevanus, “Indonesian Automated Text Summarization,” in Proceeding ICSIIT, 2007, pp. 26–27.

D. Gunawan and R. F. Rahmat, “Evaluasi Algoritma Textrank Pada Peringkasan Teks Berbahasa Indonesia,” Universitas Sumatera Utara, 2018.

P. Wongchaisuwat, “Automatic Keyword Extraction Using TextRank,” in 2019 IEEE 6th International Conference on Industrial Engineering and Applications, ICIEA 2019, 2019, pp. 377–381, doi: 10.1109/IEA.2019.8714976.

C. Mallick, A. K. Das, M. Dutta, A. K. Das, and A. Sarkar, “Graph-based text summarization using modified TextRank,” in Soft computing in data analytics, Springer, 2019, pp. 137–146.

G. Garmastewira and M. L. Khodra, “Summarizing Indonesian news articles using Graph Convolutional Network,” J. Inf. Commun. Technol., vol. 18, no. 3, pp. 345–365, 2019, doi: 10.32890/jict2019.18.3.6.

D. Wang, P. Liu, Y. Zheng, X. Qiu, and X. Huang, “Heterogeneous graph neural networks for extractive document summarization,” arXiv Prepr. arXiv2004.12393, 2020.

S. Tuhpatussania, E. Utami, and A. D. Hartanto, “Comparison Of Lexrank Algorithm And Maximum Marginal Relevance In Summary Of Indonesian News Text In Online News Portals,” J. Pilar Nusa Mandiri, vol. 18, no. 2, pp. 187–192, 2022.

S. Agustian and S. Ramadhani, “Peringkasan teks otomatis (automated text summarization) pada artikel berbahasa indonesia menggunakan algoritma lexrank,” J. CoSciTech (Computer Sci. Inf. Technol., vol. 3, no. 3, pp. 371–381, 2022.

W. W. Adytoma et al., “Automatic Text Summarization for Hadith with Indonesian Text using Bellman-Ford Algorithm,” in 2020 6th International Conference on Computing Engineering and Design (ICCED), 2020, pp. 1–6.

M. F. Muharram, C. N. Alam, D. S. Maylawati, W. B. Zulfikar, N. Lukman, and M. A. Ramdhani, “Automatic Text Summarization for Multiple Scientific Indonesian Journal Article using Bellman-Ford Algorithm,” in The 3rd International Conference on Intelligent and Interactive Computing 2021, 2021, vol. 2021.

Y. J. Kumar, O. S. Goh, H. Basiron, N. H. Choon, and P. C. Suppiah, “A review on automatic text summarization approaches,” Journal of Computer Science. 2016, doi: 10.3844/jcssp.2016.178.190.

J. ge Yao, X. Wan, and J. Xiao, “Recent advances in document summarization,” Knowl. Inf. Syst., vol. 53, pp. 297–336, 2017, doi: 10.1007/s10115-017-1042-4.

K. Nandhini and S. R. Balasundaram, “Improving readability through extractive summarization for learners with reading difficulties,” Egypt. Informatics J., vol. 14, no. 3, pp. 195–204, 2013, doi: 10.1016/j.eij.2013.09.001.

D. Rahmawati, G. A. P. Saptawati, and Y. Widyani, “Document clustering using sequential pattern (SP): Maximal frequent sequences (MFS) as SP representation,” in 2015 International Conference on Data and Software Engineering (ICoDSE 2015), 2015, pp. 98–102.

G. A. P. Saptawati, “Set of frequent word sequence (SFWS) as document model for feature based document clustering,” Int. J. Electr. Eng. Informatics, vol. 11, no. 4, pp. 822–832, 2019, doi: 10.15676/ijeei.2019.11.4.13.

S. Alias, S. K. Mohammad, G. K. Hoon, and T. T. Ping, “A text representation model using Sequential Pattern-Growth method,” Pattern Anal. Appl., vol. 21, no. 1, pp. 233–247, 2018, doi: 10.1007/s10044-017-0624-9.

K. Kurniawan and S. Louvan, “INDOSUM : A New Benchmark Dataset for Indonesian Text Summarization,” 2018 Int. Conf. Asian Lang. Process., pp. 215–220, 2018.

S. Vijayarani, J. Ilamathi, and M. Nithya, “Preprocessing Techniques for Text Mining - An Overview,” Int. J. Comput. Sci. Commun. Networks, vol. 5, no. 1, pp. 7–16, 2015.

H. A. Robbani, “Sastrawi,” MIT, 2016. .

M. Adriani, J. Asian, B. Nazief, S. M. M. Tahaghoghi, and H. E. Williams, “Stemming Indonesian : A confix-stripping approach,” ACM Trans. Asian Lang. Inf. Process., vol. 6, no. 4, pp. 1–33, 2007, doi: 10.1145/1316457.1316459.

C.-Y. Lin, “Rouge: A package for automatic evaluation of summaries,” in Proceedings of the workshop on text summarization branches out (WAS 2004), 2004, pp. 74–81, doi: 10.1.1.111.9426.

J. Han, J. Pei, Y. Yin, and R. Mao, “Mining frequent patterns without candidate generation: A frequent-pattern tree approach,” Data Min. Knowl. Discov., vol. 8, no. 1, pp. 53–87, 2004, doi: 10.1023/B:DAMI.0000005258.31418.83.

M. Rofiq and R. F. Uzzy, “Penentuan Jalur Terpendek Menuju Cafe Di Kota Malang Menggunakan Metode Bellman-Ford Dengan Location Based Service Berbasis Android,” J. Ilm. Teknol. Inf. Asia, vol. 8, no. 2, pp. 49–64, 2014.

P. M. Hasugian, “Analisa dan implementasi algoritma bellman ford dalam menentukan jalur terpendek pengantaran barang dalam kota,” J. Mantik Penusa, vol. 18, no. 2, pp. 118–123, 2015.

R. Pramudita and N. Safitri, “Algoritma Bellman-Ford Untuk Menentukan Jalur Tercepat Dalam Sistem Informasi Geografis,” PIKSEL Penelit. Ilmu Komput. Sist. Embed. Log., vol. 6, no. 2, pp. 105–114, 2018, doi: 10.33558/piksel.v6i2.1502.

Article Metrics

Abstract view(s): 1584 time(s)
PDF: 803 time(s)

Refbacks

There are currently no refbacks.

This work is licensed under a Creative Commons Attribution 4.0 International License.

Username
Password
Remember me

Username
Password
Remember me