Statistical Profiling of Banking Data for Anomaly Identification:An Applied Study Using a Synthetic Banking Database
Main Article Content
Abstract
This study examines whether statistical analysis can meaningfully distinguish anomalous from non-anomalous records in an openly available banking database. The dataset contains 5,000 observations and 40 variables covering customer, account, transaction, loan, credit-card, and feedback information. For analysis, the original anomaly label was recoded so that 300 records (6.0%) were treated as anomalous and 4,700 records (94.0%) as normal. The empirical design combines descriptive statistics, Mann-Whitney U tests, chi-square association tests, and a logistic-regression benchmark on a hold-out sample. The results show that major numerical variables do not differ significantly between the two classes, while only account type and resolution status exhibit statistically significant but substantively small associations. The benchmark classifier performs weakly (ROC-AUC = 0.490; average precision = 0.062), indicating that the available variables provide limited discriminatory signal. The study therefore contributes less as an operational fraud-detection model and more as a transparent methodological illustration of how class imbalance, weak labels, and synthetic feature spaces can constrain inference. To strengthen the manuscript for journal submission, the present version expands the literature review with pre-2021 sources, introduces a comparison matrix against prior studies, and adds statistical tables and figures derived from the uploaded dataset.
Downloads
Article Details
Issue
Section

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
References
Abdallah, A., Maarof, M. A., & Zainal, A. (2016). Fraud detection system: A survey. Journal of Network and Computer Applications, 68, 90-113. DOI: https://doi.org/10.1016/j.jnca.2016.04.007
Al-Hashedi, K. G., & Magalingam, P. (2021). Financial fraud detection applying data mining techniques: A comprehensive review from 2009 to 2019. Computer Science Review, 40, 100402. DOI: https://doi.org/10.1016/j.cosrev.2021.100402
Bahnsen, A. C., Aouada, D., & Ottersten, B. (2016). Example-dependent cost-sensitive logistic regression for credit card fraud detection. In 2014 13th International Conference on Machine Learning and Applications (ICMLA) / revised journal-extended work frequently cited in fraud analytics literature. DOI: https://doi.org/10.1109/ICMLA.2014.48
Bahnsen, A. C., Stojanovic, A., Aouada, D., & Ottersten, B. (2016). Cost sensitive credit card fraud detection using Bayes minimum risk. Expert Systems with Applications, 42(7), 3335-3342.
Baesens, B., Van Vlasselaer, V., & Verbeke, W. (2015). Fraud Analytics Using Descriptive, Predictive, and Social Network Techniques. Wiley. DOI: https://doi.org/10.1002/9781119146841
Barnett, V., & Lewis, T. (1994). Outliers in Statistical Data (3rd ed.). Wiley.
Beneish, M. D. (1999). The detection of earnings manipulation. Financial Analysts Journal, 55(5), 24-36. DOI: https://doi.org/10.2469/faj.v55.n5.2296
Bhattacharyya, S., Jha, S., Tharakunnel, K., & Westland, J. C. (2011). Data mining for credit card fraud: A comparative study. Decision Support Systems, 50(3), 602-613. DOI: https://doi.org/10.1016/j.dss.2010.08.008
Bolton, R. J., & Hand, D. J. (2002). Statistical fraud detection: A review. Statistical Science, 17(3), 235-255. DOI: https://doi.org/10.1214/ss/1042727940
Breunig, M. M., Kriegel, H.-P., Ng, R. T., & Sander, J. (2000). LOF: Identifying density-based local outliers. Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, 93-104. DOI: https://doi.org/10.1145/342009.335388
Carcillo, F., Le Borgne, Y.-A., Caelen, O., Bontempi, G., & Mazzer, Y. (2019). Scarff: A scalable framework for streaming credit card fraud detection with Spark. Information Fusion, 41, 182-194. DOI: https://doi.org/10.1016/j.inffus.2017.09.005
Carcillo, F., Le Borgne, Y.-A., Caelen, O., Mazzer, Y., & Bontempi, G. (2021). Combining unsupervised and supervised learning in credit card fraud detection. Information Sciences, 557, 317-331. DOI: https://doi.org/10.1016/j.ins.2019.05.042
Chandola, V., Banerjee, A., & Kumar, V. (2009). Anomaly detection: A survey. ACM Computing Surveys, 41(3), Article 15. DOI: https://doi.org/10.1145/1541880.1541882
Dal Pozzolo, A., Caelen, O., Johnson, R. A., & Bontempi, G. (2015). Calibrating probability with undersampling for unbalanced classification. 2015 IEEE Symposium Series on Computational Intelligence, 159-166. DOI: https://doi.org/10.1109/SSCI.2015.33
Dal Pozzolo, A., Boracchi, G., Caelen, O., Alippi, C., & Bontempi, G. (2018). Credit card fraud detection: A realistic modeling and a novel learning strategy. IEEE Transactions on Neural Networks and Learning Systems, 29(8), 3784-3797. DOI: https://doi.org/10.1109/TNNLS.2017.2736643
Fanning, K., & Cogger, K. O. (1998). Neural network detection of management fraud using published financial data. International Journal of Intelligent Systems in Accounting, Finance and Management, 7(1), 21-41. DOI: https://doi.org/10.1002/(SICI)1099-1174(199803)7:1<21::AID-ISAF138>3.0.CO;2-K
Goldstein, M., & Uchida, S. (2016). A comparative evaluation of unsupervised anomaly detection algorithms for multivariate data. PLOS ONE, 11(4), e0152173. DOI: https://doi.org/10.1371/journal.pone.0152173
Hawkins, D. M. (1980). Identification of Outliers. Chapman & Hall. DOI: https://doi.org/10.1007/978-94-015-3994-4
He, H., & Garcia, E. A. (2009). Learning from imbalanced data. IEEE Transactions on Knowledge and Data Engineering, 21(9), 1263-1284. DOI: https://doi.org/10.1109/TKDE.2008.239
Hodge, V. J., & Austin, J. (2004). A survey of outlier detection methodologies. Artificial Intelligence Review, 22, 85-126. DOI: https://doi.org/10.1023/B:AIRE.0000045502.10941.a9
Johnson, J. M., & Khoshgoftaar, T. M. (2019). Survey on deep learning with class imbalance. Journal of Big Data, 6, Article 27. DOI: https://doi.org/10.1186/s40537-019-0192-5
Jurgovsky, J., Granitzer, M., Ziegler, K., Calatroni, L., Portier, P.-E., He-Guelton, L., & Caelen, O. (2018). Sequence classification for credit-card fraud detection. Expert Systems with Applications, 100, 234-245. DOI: https://doi.org/10.1016/j.eswa.2018.01.037
Kirkos, E., Spathis, C., & Manolopoulos, Y. (2007). Data mining techniques for the detection of fraudulent financial statements. Expert Systems with Applications, 32(4), 995-1003. DOI: https://doi.org/10.1016/j.eswa.2006.02.016
Kou, Y., Lu, C.-T., Sirwongwattana, S., & Huang, Y.-P. (2004). Survey of fraud detection techniques. IEEE International Conference on Networking, Sensing and Control, 749-754.
Liu, F. T., Ting, K. M., & Zhou, Z.-H. (2008). Isolation forest. 2008 Eighth IEEE International Conference on Data Mining, 413-422. DOI: https://doi.org/10.1109/ICDM.2008.17
Ngai, E. W. T., Hu, Y., Wong, Y. H., Chen, Y., & Sun, X. (2011). The application of data mining techniques in financial fraud detection: A classification framework and an academic review of literature. Decision Support Systems, 50(3), 559-569. DOI: https://doi.org/10.1016/j.dss.2010.08.006
Pang, G., Shen, C., Cao, L., & van den Hengel, A. (2021). Deep learning for anomaly detection: A review. ACM Computing Surveys, 54(2), Article 38. DOI: https://doi.org/10.1145/3439950
Perols, J. L. (2011). Financial statement fraud detection: An analysis of statistical and machine learning algorithms. Auditing: A Journal of Practice & Theory, 30(2), 19-50. DOI: https://doi.org/10.2308/ajpt-50009
Phua, C., Lee, V., Smith, K., & Gayler, R. (2010). A comprehensive survey of data mining-based fraud detection research. arXiv preprint arXiv:1009.6119.
Randhawa, K., Loo, C. K., Seera, M., Lim, C. P., & Nandi, A. K. (2018). Credit card fraud detection using AdaBoost and majority voting. IEEE Access, 6, 14277-14284. DOI: https://doi.org/10.1109/ACCESS.2018.2806420
Ravisankar, P., Ravi, V., Rao, G. R., & Bose, I. (2011). Detection of financial statement fraud and feature selection using data mining techniques. Decision Support Systems, 50(2), 491-500. DOI: https://doi.org/10.1016/j.dss.2010.11.006
Rousseeuw, P. J., & Croux, C. (1993). Alternatives to the median absolute deviation. Journal of the American Statistical Association, 88(424), 1273-1283. DOI: https://doi.org/10.1080/01621459.1993.10476408
Ruff, L., Kauffmann, J. R., Vandermeulen, R. A., Montavon, G., Samek, W., Kloft, M., Dietterich, T. G., & Muller, K.-R. (2021). A unifying review of deep and shallow anomaly detection. Proceedings of the IEEE, 109(5), 756-795. DOI: https://doi.org/10.1109/JPROC.2021.3052449
Sharma, A., & Panigrahi, P. K. (2013). A review of financial accounting fraud detection based on data mining techniques. International Journal of Computer Applications, 39(1), 37-47. DOI: https://doi.org/10.5120/4787-7016
Thomas, L. C. (2000). A survey of credit and behavioural scoring: Forecasting financial risk of lending to consumers. International Journal of Forecasting, 16(2), 149-172. DOI: https://doi.org/10.1016/S0169-2070(00)00034-0
West, J., & Bhattacharya, M. (2016). Intelligent financial fraud detection: A comprehensive review. Computers & Security, 57, 47-66. DOI: https://doi.org/10.1016/j.cose.2015.09.005
Whitrow, C., Hand, D. J., Juszczak, P., Weston, D., & Adams, N. M. (2009). Transaction aggregation as a strategy for credit card fraud detection. Data Mining and Knowledge Discovery, 18, 30-55. DOI: https://doi.org/10.1007/s10618-008-0116-z
Zhao, Y., Nasrullah, Z., & Li, Z. (2019). PyOD: A Python toolbox for scalable outlier detection. Journal of Machine Learning Research, 20(96), 1-7.