Statistical Profiling of Banking Data for Anomaly Identification:An Applied Study Using a Synthetic Banking Database

Djina  Ivanovic; Evans  khoza

doi:10.48161/qelj.v1n2a3

Download the PDF file

Published: 2022-06-30

DOI: https://doi.org/10.48161/qelj.v1n2a3

Keywords:

Anomaly Detection, Statistical Analysis, Fraud Analytics, Class Imbalance, Synthetic Data

Djina Ivanovic

Digital Economics Department, Institute of Economic Sciences, Belgrade, Serbia;

Evans khoza

Business management, Cape Peninsula University of Technology, Cape Town, South Africa

Abstract

This study examines whether statistical analysis can meaningfully distinguish anomalous from non-anomalous records in an openly available banking database. The dataset contains 5,000 observations and 40 variables covering customer, account, transaction, loan, credit-card, and feedback information. For analysis, the original anomaly label was recoded so that 300 records (6.0%) were treated as anomalous and 4,700 records (94.0%) as normal. The empirical design combines descriptive statistics, Mann-Whitney U tests, chi-square association tests, and a logistic-regression benchmark on a hold-out sample. The results show that major numerical variables do not differ significantly between the two classes, while only account type and resolution status exhibit statistically significant but substantively small associations. The benchmark classifier performs weakly (ROC-AUC = 0.490; average precision = 0.062), indicating that the available variables provide limited discriminatory signal. The study therefore contributes less as an operational fraud-detection model and more as a transparent methodological illustration of how class imbalance, weak labels, and synthetic feature spaces can constrain inference. To strengthen the manuscript for journal submission, the present version expands the literature review with pre-2021 sources, introduces a comparison matrix against prior studies, and adds statistical tables and figures derived from the uploaded dataset.

Downloads

Download data is not yet available.

Issue

Vol. 1 No. 2 (2022)

Section

Articles

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

References

Abdallah, A., Maarof, M. A., & Zainal, A. (2016). Fraud detection system: A survey. Journal of Network and Computer Applications, 68, 90-113. DOI: https://doi.org/10.1016/j.jnca.2016.04.007

Al-Hashedi, K. G., & Magalingam, P. (2021). Financial fraud detection applying data mining techniques: A comprehensive review from 2009 to 2019. Computer Science Review, 40, 100402. DOI: https://doi.org/10.1016/j.cosrev.2021.100402

Bahnsen, A. C., Aouada, D., & Ottersten, B. (2016). Example-dependent cost-sensitive logistic regression for credit card fraud detection. In 2014 13th International Conference on Machine Learning and Applications (ICMLA) / revised journal-extended work frequently cited in fraud analytics literature. DOI: https://doi.org/10.1109/ICMLA.2014.48

Bahnsen, A. C., Stojanovic, A., Aouada, D., & Ottersten, B. (2016). Cost sensitive credit card fraud detection using Bayes minimum risk. Expert Systems with Applications, 42(7), 3335-3342.

Baesens, B., Van Vlasselaer, V., & Verbeke, W. (2015). Fraud Analytics Using Descriptive, Predictive, and Social Network Techniques. Wiley. DOI: https://doi.org/10.1002/9781119146841

Barnett, V., & Lewis, T. (1994). Outliers in Statistical Data (3rd ed.). Wiley.

Beneish, M. D. (1999). The detection of earnings manipulation. Financial Analysts Journal, 55(5), 24-36. DOI: https://doi.org/10.2469/faj.v55.n5.2296

Bhattacharyya, S., Jha, S., Tharakunnel, K., & Westland, J. C. (2011). Data mining for credit card fraud: A comparative study. Decision Support Systems, 50(3), 602-613. DOI: https://doi.org/10.1016/j.dss.2010.08.008

Bolton, R. J., & Hand, D. J. (2002). Statistical fraud detection: A review. Statistical Science, 17(3), 235-255. DOI: https://doi.org/10.1214/ss/1042727940

Breunig, M. M., Kriegel, H.-P., Ng, R. T., & Sander, J. (2000). LOF: Identifying density-based local outliers. Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, 93-104. DOI: https://doi.org/10.1145/342009.335388

Carcillo, F., Le Borgne, Y.-A., Caelen, O., Bontempi, G., & Mazzer, Y. (2019). Scarff: A scalable framework for streaming credit card fraud detection with Spark. Information Fusion, 41, 182-194. DOI: https://doi.org/10.1016/j.inffus.2017.09.005

Carcillo, F., Le Borgne, Y.-A., Caelen, O., Mazzer, Y., & Bontempi, G. (2021). Combining unsupervised and supervised learning in credit card fraud detection. Information Sciences, 557, 317-331. DOI: https://doi.org/10.1016/j.ins.2019.05.042

Chandola, V., Banerjee, A., & Kumar, V. (2009). Anomaly detection: A survey. ACM Computing Surveys, 41(3), Article 15. DOI: https://doi.org/10.1145/1541880.1541882

Dal Pozzolo, A., Caelen, O., Johnson, R. A., & Bontempi, G. (2015). Calibrating probability with undersampling for unbalanced classification. 2015 IEEE Symposium Series on Computational Intelligence, 159-166. DOI: https://doi.org/10.1109/SSCI.2015.33

Dal Pozzolo, A., Boracchi, G., Caelen, O., Alippi, C., & Bontempi, G. (2018). Credit card fraud detection: A realistic modeling and a novel learning strategy. IEEE Transactions on Neural Networks and Learning Systems, 29(8), 3784-3797. DOI: https://doi.org/10.1109/TNNLS.2017.2736643

Fanning, K., & Cogger, K. O. (1998). Neural network detection of management fraud using published financial data. International Journal of Intelligent Systems in Accounting, Finance and Management, 7(1), 21-41. DOI: https://doi.org/10.1002/(SICI)1099-1174(199803)7:1<21::AID-ISAF138>3.0.CO;2-K

Goldstein, M., & Uchida, S. (2016). A comparative evaluation of unsupervised anomaly detection algorithms for multivariate data. PLOS ONE, 11(4), e0152173. DOI: https://doi.org/10.1371/journal.pone.0152173

Hawkins, D. M. (1980). Identification of Outliers. Chapman & Hall. DOI: https://doi.org/10.1007/978-94-015-3994-4

He, H., & Garcia, E. A. (2009). Learning from imbalanced data. IEEE Transactions on Knowledge and Data Engineering, 21(9), 1263-1284. DOI: https://doi.org/10.1109/TKDE.2008.239

Hodge, V. J., & Austin, J. (2004). A survey of outlier detection methodologies. Artificial Intelligence Review, 22, 85-126. DOI: https://doi.org/10.1023/B:AIRE.0000045502.10941.a9

Johnson, J. M., & Khoshgoftaar, T. M. (2019). Survey on deep learning with class imbalance. Journal of Big Data, 6, Article 27. DOI: https://doi.org/10.1186/s40537-019-0192-5

Jurgovsky, J., Granitzer, M., Ziegler, K., Calatroni, L., Portier, P.-E., He-Guelton, L., & Caelen, O. (2018). Sequence classification for credit-card fraud detection. Expert Systems with Applications, 100, 234-245. DOI: https://doi.org/10.1016/j.eswa.2018.01.037

Kirkos, E., Spathis, C., & Manolopoulos, Y. (2007). Data mining techniques for the detection of fraudulent financial statements. Expert Systems with Applications, 32(4), 995-1003. DOI: https://doi.org/10.1016/j.eswa.2006.02.016

Kou, Y., Lu, C.-T., Sirwongwattana, S., & Huang, Y.-P. (2004). Survey of fraud detection techniques. IEEE International Conference on Networking, Sensing and Control, 749-754.

Liu, F. T., Ting, K. M., & Zhou, Z.-H. (2008). Isolation forest. 2008 Eighth IEEE International Conference on Data Mining, 413-422. DOI: https://doi.org/10.1109/ICDM.2008.17

Ngai, E. W. T., Hu, Y., Wong, Y. H., Chen, Y., & Sun, X. (2011). The application of data mining techniques in financial fraud detection: A classification framework and an academic review of literature. Decision Support Systems, 50(3), 559-569. DOI: https://doi.org/10.1016/j.dss.2010.08.006

Pang, G., Shen, C., Cao, L., & van den Hengel, A. (2021). Deep learning for anomaly detection: A review. ACM Computing Surveys, 54(2), Article 38. DOI: https://doi.org/10.1145/3439950

Perols, J. L. (2011). Financial statement fraud detection: An analysis of statistical and machine learning algorithms. Auditing: A Journal of Practice & Theory, 30(2), 19-50. DOI: https://doi.org/10.2308/ajpt-50009

Phua, C., Lee, V., Smith, K., & Gayler, R. (2010). A comprehensive survey of data mining-based fraud detection research. arXiv preprint arXiv:1009.6119.

Randhawa, K., Loo, C. K., Seera, M., Lim, C. P., & Nandi, A. K. (2018). Credit card fraud detection using AdaBoost and majority voting. IEEE Access, 6, 14277-14284. DOI: https://doi.org/10.1109/ACCESS.2018.2806420

Ravisankar, P., Ravi, V., Rao, G. R., & Bose, I. (2011). Detection of financial statement fraud and feature selection using data mining techniques. Decision Support Systems, 50(2), 491-500. DOI: https://doi.org/10.1016/j.dss.2010.11.006

Rousseeuw, P. J., & Croux, C. (1993). Alternatives to the median absolute deviation. Journal of the American Statistical Association, 88(424), 1273-1283. DOI: https://doi.org/10.1080/01621459.1993.10476408

Ruff, L., Kauffmann, J. R., Vandermeulen, R. A., Montavon, G., Samek, W., Kloft, M., Dietterich, T. G., & Muller, K.-R. (2021). A unifying review of deep and shallow anomaly detection. Proceedings of the IEEE, 109(5), 756-795. DOI: https://doi.org/10.1109/JPROC.2021.3052449

Sharma, A., & Panigrahi, P. K. (2013). A review of financial accounting fraud detection based on data mining techniques. International Journal of Computer Applications, 39(1), 37-47. DOI: https://doi.org/10.5120/4787-7016

Thomas, L. C. (2000). A survey of credit and behavioural scoring: Forecasting financial risk of lending to consumers. International Journal of Forecasting, 16(2), 149-172. DOI: https://doi.org/10.1016/S0169-2070(00)00034-0

West, J., & Bhattacharya, M. (2016). Intelligent financial fraud detection: A comprehensive review. Computers & Security, 57, 47-66. DOI: https://doi.org/10.1016/j.cose.2015.09.005

Whitrow, C., Hand, D. J., Juszczak, P., Weston, D., & Adams, N. M. (2009). Transaction aggregation as a strategy for credit card fraud detection. Data Mining and Knowledge Discovery, 18, 30-55. DOI: https://doi.org/10.1007/s10618-008-0116-z

Zhao, Y., Nasrullah, Z., & Li, Z. (2019). PyOD: A Python toolbox for scalable outlier detection. Journal of Machine Learning Research, 20(96), 1-7.

Article Sidebar

Main Article Content

Abstract

Downloads

Article Details

Issue

Section

References