Performance Analysis of Isolation Forest Algorithm in Fraud Detection of Credit Card Transactions

Indra Waspada(1*), Nurdin Bahtiar(2), Panji Wisnu Wirawan(3), Bagus Dwi Ari Awan(4),

(1) 
(2) Universitas Diponegoro
(3) Universitas Diponegoro
(4) Universitas Diponegoro
(*) Corresponding Author
DOI: https://doi.org/10.23917/khif.v6i2.10520

Abstract

Losses incurred due to fraud on e-commerce transactions, especially those based on credit cards, continue to increase, resulting in large losses each year. One mechanism to minimize the risk of fraudulent credit card transactions is to utilize a detection technique for ongoing transactions. Credit card transaction data in its original state does not have a label, and the amount of fraud data on the training data is very small so that it belongs to a very unbalanced category, and the pattern of fraud continues to change. Isolation forest is an unsupervised algorithm that is efficient in detecting anomalies. Several techniques can be applied to improve the performance of the Isolation forest model. Previous studies used the ROC-AUC metric in analyzing the performance of Isolation Forests, which could provide incorrect information. This study made two contributions; the first is to present a performance analysis with both the ROC-AUC and AUCPR. Thus, it can be seen that the high ROC-AUC value does not guarantee the model has the reliability in detecting fraud. In comparison, the information provided through AUCPR is more appropriate to describe the ability of the model to capture data fraud. The second contribution is to propose several techniques that can be applied to improve the performance of the Isolation forest model, namely to optimize the determination of the amount of training data, feature selection, the amount of fraud contamination, and setting hyper-parameters in the modeling stage (training). Experiments were carried out using a real-life dataset from ULB. The best results are obtained when the validation data split ratio is 60:40, using the five most important features, using only 60% of fraud data, and setting hyper-parameters with the number of trees 100, 128 sample maximum, and 0.001 contamination. The validation performance of this model is precision 0.809917, recall 0.710145, F1-score 0.756757, ROC-AUC 0.969728, and AUCPR 0.637993, while for Testing results obtained precision 0.807143, recall 0.763514, F1-score 0.784722, ROC-AUC 0.97371, and AUCPR 0.759228.

Keywords

credit card; fraud; Isolation forest; unsupervised; precision; recall; ROC-AUC, AUCPR

Full Text:

PDF

References

S. Gee, Fraud and Fraud Detection. Wiley Series, 2015.

ACFE, “Report to the Nation,” 2019. [Online]. Available: http://www.acfe.com/rttn.aspx.

Ystats.com, “Global Online Payment Method: Full Year 2016,” 2017.

R. Brause, T. Langsdorf, and M. Hepp, “Neural Data Mining for Credit Card Fraud Detection,” in Pro- ceedings ofthe IEEE International Conference on Tools with Artificial Intelligence, 1999, pp. 103–106.

A. Dal, O. Caelen, Y. Le Borgne, S. Waterschoot, and G. Bontempi, “Expert Systems with Applications Learned lessons in credit card fraud detection from a practitioner perspective,” Expert Syst. Appl., vol. 41, no. 10, pp. 4915–4928, 2014.

S. Bhattacharyya, S. Jha, K. Tharakunnel, and J. C. Westland, “Data mining for credit card fraud : A comparative study,” Decis. Support Syst., vol. 50, no. 3, pp. 602–613, 2011.

N. Khare, S. Y. Sait, K. Campus, and K. Campus, “Credit Card Fraud Detection Using Machine Learning Models and Collating Machine Learning Models,” Int. J. Pure Appl. Math., vol. 118, no. 20, pp. 825–838, 2018.

F. Braun, O. Caelen, E. N. Smirnov, S. Kelk, and B. Lebichot, “Improving Card Fraud Detection Through Suspicious Pattern Discovery,” in The 30th International Conference on Industrial, Engineering, Other Applications of Applied Intelligent Systems, 2017, vol. 1, pp. 181–190.

S. Xuan and S. Wang, “Random Forest for Credit Card Fraud Detection,” in 2018 IEEE 15th International Conference on Networking, Sensing and Control (ICNSC), 2018.

G. E. Melo-Acosta, F. Duitama-Munoz, and J. D. Arias-Londono, “Fraud detection in big data using supervised and semi-supervised learning techniques,” 2017 IEEE Colomb. Conf. Commun. Comput. COLCOM 2017 - Proc., 2017.

A. Pumsirirat and L. Yan, “Credit Card Fraud Detection using Deep Learning based on Auto-Encoder and Restricted Boltzmann Machine,” Int. J. Adv. Comput. Sci. Appl., vol. 9, no. 1, pp. 18–25, 2018.

H. Lee et al., “Feature selection practice for unsupervised learning of credit card fraud detection,” J. Theor. Appl. Inf. Technol., vol. 96, no. 2, pp. 408–417, 2018.

V. Zaslavsky and A. Strizhak, “Credit Card Fraud Detection Using Self-Organizing Maps,” Inf. Secur. An Int. J., vol. 18, pp. 48–63, 2006.

D. Tripathi, Y. Sharma, T. Lone, and S. Dwivedi, “Credit Card Fraud Detection using Local Outlier Factor,” Int. J. Pure Appl. Math., vol. 118, no. 7 Special Issue, pp. 229–234, 2018.

B. Baesens and W. Verbeke, Fraud Analytics. Wiley Series, 2015.

N. S. Arunraj, R. Hable, M. Fernandes, K. Leidl, and M. Heigl, “Comparison of Supervised , Semi-supervised and Unsupervised Learning Methods in Network Intrusion Detection System ( NIDS ) Application,” vol. 6, no. 6, 2017.

B. Kristof and S. Rinderle-ma, “Anomaly Detection in Business Process Runtime Behavior – Challenges and Limitations,” CoRR, vol. abs/1705.0, 2017.

M. Daykin and I. Poole, “A Comparison of Unsupervised Abnormality Detection Methods for Interstitial Lung Disease,” in MIUA2018, 2018, vol. 3, pp. 1–12.

F. T. Liu and K. M. Ting, “Isolation-Based Anomaly Detection,” ACM Trans. Knowl. Discov. from Data, vol. 6, no. 1, 2012.

B. Hussain, Q. Du, and P. Ren, “Semi-supervised learning based big data-driven anomaly detection in mobile wireless networks,” China Commun., vol. 15, no. 4, pp. 41–57, 2018.

S. Ounacer, H. Ait, E. Bour, Y. Oubrahim, and M. Y. Ghoumari, “Using Isolation Forest in anomaly detection : the case of credit card transactions,” vol. 6, no. 2, pp. 394–400, 2018.

X. Niu, L. Wang, and X. Yang, “A Comparison Study of Credit Card Fraud Detection: Supervised versus Unsupervised,” arXiv:1904.10604 [cs.LG], 2019.

V. Ceronmani Sharmila, K. R. Kumar, R. Sundaram, D. Samyuktha, and R. Harish, “Credit Card Fraud Detection Using Anomaly Techniques,” Proc. 1st Int. Conf. Innov. Inf. Commun. Technol. ICIICT 2019, pp. 1–6, 2019.

T. Saito and M. Rehmsmeier, “The precision-recall plot is more informative than the ROC plot when evaluating binary classifiers on imbalanced datasets,” PLoS One, vol. 10, no. 3, pp. 1–21, 2015.

J. Davis and M. Goadrich, “The relationship between precision-recall and ROC curves,” ACM Int. Conf. Proceeding Ser., vol. 148, pp. 233–240, 2006.

Article Metrics

Abstract view(s): 889 time(s)
PDF: 1390 time(s)

Refbacks

  • There are currently no refbacks.