XBRL Open Information Model for Risk Based Tax Audit using Machine Learning

  • Bagas Dwi Suryo Wibowo University of Glasgow
Keywords: Audit-Selection, Assets, Current-Ratio, Financial-Risk, Machine-Learning, Open Information-Model, Risk, Risk-Based, Risk-Scoring, Rule-Based, Standard-Industry-Classification, Tax-Audit, XBRL


Tax audit is an effective instrument for preserving tax
compliance, and risk-based tax audit selection can optimize
it. Risk-based tax audit selection selectively auditing on high
financial risk wealthy taxpayers. In contrast, manually
selecting amid the plethora of taxpayer data is difficult,
prone to human error, costly and time-consuming.
Fortunately, using Extensible Business Report Language
(XBRL) as a well-known financial statement reporting
standard enables automation. This project proposed
software named XAFR as a model for extracting,
transforming, and loading the latest XBRL Open Information
Model (OIM) 1.0 standard US-SEC dataset and provided it as
a data source for risk classification using rule-based risk
scoring and Machine Learning. Several thorough testing
exposed Random Forest classifier as the best model for
Machine Learning risk classification with high accuracy,
revealing the excellent collaboration of rule-based risk
scoring approach with Machine Learning for risk
classification and the importance of XBRL as a transparent
but robust report standard that tax authorities can utilize.
The excellent system integration resulted in the ability to
expose wealthy high-risk taxpayers and high-risk industries
and predict risk classification based on two-year financial
statements. Moreover, this report introduces the critical
importance of RCA (Risk, Current Ratio, Assets) analysis and
SIC (Standard Industry Classification) utilization to generate
risk classification, rank and explanation. This project utilizes
financial indicators in the limited year and leaves the
semantic analysis for future works because of time and
hardware limitations. The possibility of predicting the
possible tax debt prediction are promising Machine Learning
future developments


Abbasi, A., Albrecht, C., Vance, A., & Hansen, J. (2012). Metafraud: a meta-learning
framework for detecting financial fraud. Mis Quarterly, 1293-1327.
Amazon (2021) Cloud Services - Amazon Web Services (AWS), Amazon Web Services, Inc.
Amunategui, M., & Roopaei, M. (2018). Displaying Predictions with Google Maps on
Azure. In Monetizing Machine Learning (pp. 195-235). Apress, Berkeley, CA.
Anaconda (2021) Anaconda with Python 3 on 64-bit Windows — Anaconda documentation.
Anggia, P. (2019). Achieving of income tax with awareness of taxation in Indonesia's
tax law system. Yustisia Jurnal Hukum, 8(2), 292-308.
Apache Software Foundation (2021a) Apache Hadoop. Available at:
https://hadoop.apache.org/ (Accessed: 31 October 2021).
Apache Software Foundation (2021b) Apache SparkTM - Unified Engine for largescale data
analytics. Available at: https://spark.apache.org/ (Accessed: 31 October 2021).
Ashtiani, M. N., & Raahemi, B. (2021). Intelligent fraud detection in financial
statements using machine learning and data mining: a systematic literature
review. IEEE Access.
Bootstrap et al. (2021) Bootstrap. Available at: https://getbootstrap.com/ (Accessed: 31
October 2021).
Canonical (2021) Enterprise Open Source and Linux, Ubuntu. Available at:
https://ubuntu.com/ (Accessed: 31 October 2021).
Carroll, J., & Morris, D. (2015). Agile project management in easy steps. In Easy Steps.
Central Bureau of Statistics of the Republic of Indonesia (2021) Economic and Finance
Publication. Central Bureau of Statistics of the Republic of Indonesia. Available
at: https://www.bps.go.id/site/resultTab (Accessed: 30 August 2021).
El-Bannany, M., Dehghan, A. H., & Khedr, A. M. (2021, March). Prediction of Financial
Statement Fraud using Machine Learning Techniques in UAE. In 2021 18th
International Multi-Conference on Systems, Signals & Devices (SSD) (pp. 649-654).
GmbH (2021) The QR Code Generator, The QR Code Generator. Available at:
https://www.the-qrcode-generator.com/ (Accessed: 17 December 2021).
Gomaa, M. I., Markelevich, A., & Shaw, L. (2011). Introducing XBRL through a
financial statement analysis project. Journal of Accounting Education, 29(2-3),
Google (2021a) Google Forms. Available at: https://docs.google.com/forms/
(Accessed: 17 December 2021).
Google (2021b) YouTube. Available at: https://www.youtube.com/ (Accessed: 17
December 2021).
HashiCorp (2021) Vagrant by HashiCorp, Vagrant by HashiCorp. Available at:
https://www.vagrantup.com/ (Accessed: 31 October 2021).
Hidayattullah, S., Surjandari, I., & Laoh, E. (2020, October). Financial Statement Fraud
Detection in Indonesia Listed Companies using Machine Learning based on
Meta-Heuristic Optimization. In 2020 International Workshop on Big Data and
Information Security (IWBIS) (pp. 79-84). IEEE.
Hooda, N., Bawa, S., & Rana, P. S. (2020). Optimizing fraudulent firm prediction using
ensemble machine learning: a case study of an external audit. Applied Artificial
Intelligence, 34(1), 20-30.
Joblib (2021) Joblib: running Python functions as pipeline jobs — joblib 1.2.0.dev0
documentation. Available at: https://joblib.readthedocs.io/en/latest/
(Accessed: 31 October 2021).
Jurney, R. (2017). Agile data science 2.0: Building full-stack data analytics applications with
Spark. " O'Reilly Media, Inc.".
Khwaja, M. S., Awasthi, R., & Loeprick, J. (Eds.). (2011). Risk-based tax audits:
Approaches and country experiences. World Bank Publications.
Kotsiantis, S., & Kanellopoulos, D. (2008, November). Multi-instance learning for
predicting fraudulent financial statements. In 2008 Third International
Conference on Convergence and Hybrid Information Technology (1), 448-452. IEEE.
Krekel, H. (2021) pytest: helps you write better programs — pytest documentation.
Available at: https://docs.pytest.org/en/6.2.x/ (Accessed: 31 October 2021).
Microsoft (2021a) Explore Windows 11 OS, Computers, Apps, & More Microsoft, Windows.
Available at: https://www.microsoft.com/en-gb/windows (Accessed: 31
October 2021).
Microsoft (2021b) Visual Studio Code - Code Editing. Redefined. Available at:
https://code.visualstudio.com/ (Accessed: 31 October 2021).
MongoDB (2021) MongoDB: the application data platform, MongoDB. Available at:
https://www.mongodb.com (Accessed: 31 October 2021).
Oracle (2021a) JDK 11. Available at: https://openjdk.java.net/projects/jdk/11/
(Accessed: 31 October 2021).
Oracle (2021b) Oracle VM VirtualBox. Available at: https://www.virtualbox.org/
(Accessed: 31 October 2021).
Pallets (2021) Welcome to Flask — Flask Documentation (2.0.x). Available at:
https://flask.palletsprojects.com/en/2.0.x/ (Accessed: 31 October 2021).
Relan, K. (2019). Beginning with flask. In Building REST APIs with Flask (pp. 1-26).
Apress, Berkeley, CA.
SEC, S. (2021) SEC, Financial Statement Data Sets. Available at:
(Accessed: 25 November 2021).
Singh, P. (2018). Machine Learning with PySpark: With Natural Language Processing and
Recommender Systems. Apress.
SpryMedia (2021) DataTables Table plug-in for jQuery. Available at:
https://datatables.net/ (Accessed: 31 October 2021).
Štěpánek, L., Habarta, F., Malá, I., & Marek, L. (2021, July). “Great in, great out” is the
new “garbage in, garbage out”: subsampling from data with no response
variable using various approaches, including unsupervised learning. In 2021
International Conference on Computing, Computational Modelling and Applications
(ICCMA) (pp. 122-129). IEEE.
Ven, B.V. de (2021) Bokeh. Available at: https://bokeh.org/ (Accessed: 31 October
Venters, C., & Mikkilineni, R. (2020, September). Representation and Evolution of
Knowledge Structures to Detect Anomalies in Financial Statements. In 2020
IEEE 29th International Conference on Enabling Technologies: Infrastructure for
Collaborative Enterprises (WETICE) (pp. 58-63). IEEE.
XBRL International (2021a) An Introduction to XBRL, An Introduction to XBRL.
Available at: https://www.xbrl.org/the-standard/what/an-introduction-toxbrl/ (Accessed: 7 December 2021).
XBRL International (2021b) XBRL & Big Data, XBRL & Big Data. Available at:
https://specifications.xbrl.org/big-data.html (Accessed: 28 November 2021).
XBRL International (2021c) XBRL Certified Software, XBRL Certified Software. Available
at: https://software.xbrl.org/ (Accessed: 28 November 2021).
Yao, J., Zhang, J., & Wang, L. (2018, May). A financial statement fraud detection model
based on hybrid data mining methods. In 2018 international conference on
artificial intelligence and big data (ICAIBD) (pp. 57-61). IEEE
How to Cite
B. Suryo Wibowo, “XBRL Open Information Model for Risk Based Tax Audit using Machine Learning”, INJIISCOM, vol. 3, no. 1, pp. 19-44, Apr. 2022.