Writer Identification of Lampung Handwritten Documents Based on Selected Characters

Akmal Junaidi, Syifa Trianingsih, Muhammad Iqbal

DOI: https://doi.org/10.23917/khif.v6i1.8418

Abstract

Writer identification is a sub-field in handwriting recognition which its objective is to determine the identity of the writer based on handwriting input. The goal is usually for forensic purposes such as finding the perpetrators of crimes that leave traces of evidence in the form of written messages. In addition, writer identification can also be used to determine the identity of a historical actor if he or she leaves a valuable written artefact. The object of this research is the traditional character of the Lampung region which is so-called Had Lampung by the local community. The traditional character of Lampung consists of 20 main characters and 12 diacritics. Based on selected characters, the writer will be recognized using the Principal Component Analysis (PCA) feature. PCA is one linear feature extraction method of an object in pattern recognition. The PCA algorithm consists of several stages, namely the calculation of the average dataset, the subtraction of the vector dataset with averages, the calculation of covariance, the calculation of eigenvectors and eigenvalues, eigenvector reduction, and the projection of the dataset against reduced eigenvector space. PCA in this paper is used as a feature in image recognition. The dataset utilized in this study is the Lampung Dataset which is a handwritten character recognition (HWCR) dataset. Lampung Dataset consists of 82 Lampung handwritten documents. All Lampung character images in the dataset were extracted from these documents using the connected component extraction algorithm and eventually generated 32,140 images. Furthermore, these images are converted into grayscale images. In this research, as many as 12,500 grayscale images of Lampung handwriting characters were chosen to represent 82 different writers. This data is employed as training and testing data on the proposed method. The highest accuracy of the identification of the writer using this PCA feature is 82.92%, while the lowest accuracy is 28.29%.

Keywords

Lampung Script; Writer Identification; Principal Component Analysis; Lampung Dataset

Full Text:

PDF

References

A. Junaidi, S. Vajda and G. A. Fink, “Lampung - A New Handwritten Character Benchmark: Database, Labeling and Recognition,” dalam Join Workshop on Multilingual OCR and Analytics for Noisy Unstructured Text Data, Beijing, 2011.

H. Kusetogullari, A. Yavariabdi, A. Cheddad, H. Grahn and J. Hall, “ARDIS: A Swedish Historical Handwritten Digit Dataset,” Neural Computing and Applications, 2019.

K. Adam, A. Baig, S. Al-Maadeed, A. Bouridane and S. El-Menshawy, “KERTAS: Dataset for Automatic Dating of Ancient Arabic Manuscripts,” International Journal on Document Analysis and Recognition (IJDAR), vol. 21, no. 4, p. 283–290, December 2018.

A. M. Saeed, T. A. Rashid, A. M. Mustafa, R. A. A.-R. Agha, A. S. Shamsaldin and N. K. Al-Salihi, “An Evaluation of Reber Stemmer with Longest Match Stemmer Technique in Kurdish Sorani Text Classification,” Iran Journal of Computer Science, vol. 1, no. 2, p. 99–107, June 2018.

D. Brodić, Z. N. Milivojević and Č. A. Maluckov, “Script Characterization in the Old Slavic Documents,” dalam International Conference on Image and Signal Processing, Cherbourg, France, 2014.

M. Alghamdi and W. Teahan, “Experimental Evaluation of Arabic OCR Systems,” PSU Research Review, vol. 1, no. 3, pp. 229-241, 2017.

M. Z. Alom, P. Sidike, M. Hasan, T. M. Taha and V. K. Asari, “Handwritten Bangla Character Recognition Using the State-of-the-Art Deep Convolutional Neural Networks,” Computational Intelligence and Neuroscience, 2018.

M. Grębowiec and J. Protasiewicz, “A Neural Framework for Online Recognition of Handwritten Kanji Characters,” dalam Federated Conference on Computer Science and Information Systems (FEDCSIS), Poznań, Poland, 2018.

Z. Zhong, L. Jin and Z. Xie, “High Performance Offline Handwritten Chinese Character Recognition Using GoogLeNet and Directional Feature Maps,” dalam International Conference on Document Analysis and Recognition (ICDAR), Nancy, France, 2015.

K. Khan, R. Ullah, N. A. Khan and K. Navid, “Urdu Character Recognition using Principal Component Analysis,” International Journal of Computer Applications, vol. 60, no. 11, pp. 1-4, 2012.

A. Das, T. Kundu and C. Saravanan, “Dimensionality Reduction for Handwritten Digit Recognition,” EAI Endorsed Transactions on Cloud Systems, vol. 4, no. 13, 2018.

Y. LeCun, L. Bottou, Y. Bengio and P. Haffner, “Gradient-based Learning Applied to Document Recognition,” dalam Proceedings of the IEEE, 1998.

M. Diem, S. Fiel, A. Garz, M. Keglevic, F. Kleber and R. Sablatnig, “ICDAR2013 Competition on Handwritten Digit Recognition (HDRC 2013),” dalam International Conference on Document Analysis and Recognition, Washington DC, 2013.

X. Cui, P. Zhou and W. Yang, “Local Dominant Orientation Feature Histograms (LDOFH) for Face Recognition,” Applied Informatics, vol. 5, no. 1, December 2017.

M. Venianaki, O. Salvetti, E. de Bree, T. Maris, A. Karantanas, . E. Kontopodis, K. Nikiforaki and K. Marias, “Pattern Recognition and Pharmacokinetic Methods on DCE-MRI Data for Tumor Hypoxia Mapping in Sarcoma,” Multimedia Tools and Applications, vol. 77, no. 8, p. 9417–9439, April 2018.

T. K. Padma Shri and N. Sriraam, “Pattern Recognition of Spectral Entropy Features for Detection of Alcoholic and Control Visual ERP’s in Multichannel EEGs,” Brain Informatics, vol. 4, p. 147–158, January 2017.

T. Hachaj and M. R. Ogiela, “Human actions recognition on multimedia hardware using angle-based and coordinate-based features and multivariate continuous hidden Markov model classifier,” Multimedia Tools and Applications, vol. 75, p. 16265–16285 , 2016.

L. C. Paul and A. Al-Sumam, “Face Recognition Using Principal Component Analysis Method,” International Journal of Advanced Research in Computer Engineering & Technology (IJARCET), vol. 1, no. 9, pp. 135-139, 2012.

A. Kaur, S. Singh and Taqdir, “Face Recognition Using PCA (Principal Component Analysis) and LDA (Linear Discriminant Analysis) Techniques,” International Journal of Advanced Research in Computer and Communication Engineering, vol. 4, no. 3, pp. 308-310, 2015.

I. Siddiqi and N. Vincent, “A Set of Chain Code Based Features for Writer Recognition,” dalam 10th International Conference on Document Analysis and Recognition, Barcelona, 2009.

L. Zuo, Y. Wang and T. Tan, “Personal Handwriting Identification Based on PCA,” dalam Second International Conference on Image and Graphics, Hefei, 2002.

M. Cheriet, N. Kharma, C.-L. Liu and C. Y. Suen, Character Recognition Systems A Guide for Students and Practitioners, New Jersey: John Willey & Sons, Inc., 2007.

G. Louloudis, N. Stamatopoulos and B. Gatos., “ICDAR 2011 Writer Identification Contest,” dalam 11th International Conference on Document Analysis and Recognition, Beijing, 2011.

S. Fiel and R. Sablatnig, “Writer Retrieval and Writer Identificatin using Local Features,” dalam 10th IAPR International Workshop ond Document Analysis Systems, Queensland, 2012.

S. Al-Maadeed, A. Hassaine, A. Bouridane and M. A. Tahir, “Novel Geometric Features for Off-line Writer Identification,” Pattern Analysis and Applications, vol. 19, no. 3, pp. 699-708, 2016.

Article Metrics

Abstract view(s): 306 time(s)
PDF: 178 time(s)

Refbacks

  • There are currently no refbacks.