DATA SCIENCE-DRIVEN APPROACHES TO IDENTIFYING AI-GENERATED CONTENT: MACHINE LEARNING AND NLP MODELS FOR ACADEMIC INTEGRITY AND DIGITAL TRANSPARENCY

Abdullaev Munis Kurbonovich; Kungratov Ilmurod Kuzibay ugli

doi:10.55439/EIT/vol13_iss5/724

Authors

Abdullaev Munis Kurbonovich Head of the Department of "Industrial Management and Digital Technologies" of the International Nordic University https://orcid.org/0000-0003-0290-8453
Kungratov Ilmurod Kuzibay ugli Master's student (data science) in International Nordic University. Scientific journals editorial specialist at Tashkent state university of economics. https://orcid.org/0009-0008-1397-2905

DOI:

https://doi.org/10.55439/EIT/vol13_iss5/724

Keywords:

artificial intelligence; AI-generated content; machine learning; natural language processing; academic integrity; text detection; large language models; generative AI; stylometric analysis; explainable AI; higher education; content authenticity.

Abstract

In the rapidly evolving digital landscape, the prevalence of generative artificial intelligence (GenAI) systems and large-language models (LLMs) has created profound challenges for academic integrity and content authenticity. This paper proposes a data science-driven framework for identifying AI-generated content in academic and digital environments by leveraging advanced machine learning (ML) techniques and natural language processing (NLP) models. First, we survey the current state of AI-generated content detection, reviewing both traditional ML approaches and state-of-the-art transformer-based architectures, and we demonstrate that while recent systems can achieve high accuracy (e.g., over 90 %) in controlled settings, significant limitations remain—especially regarding fairness, generalisability, and bias against non-native English writers. Next, we develop and evaluate a hybrid detection model that combines feature-engineering (lexical, syntactic, stylometric) with embedding-based representations and a supervised classifier trained on a curated dataset of human-written versus AI-generated academic prose. We integrate explainable-AI (XAI) techniques to interpret model decisions and identify the most discriminative features distinguishing human and machine authorship. Our results indicate that the proposed model outperforms baseline detectors in both accuracy and transparency, and we further examine its application to institutional workflows for academic integrity, such as submission screening and authenticity audits. Finally, we discuss ethical, operational and policy implications of deploying such detection systems in higher-education settings, including issues of false-positives, equity, transparency and the evolving “arms-race” between AI generation and detection. By framing detection as part of a broader ecosystem of digital transparency and trust, this research contributes both methodologically and practically to safeguarding academic standards in the era of AI-augmented writing.

References

A. Pegoraro et al., “Testing of detection tools for AI-generated text,” International Journal for Educational Integrity, vol. 19, no. 1, pp. 1-18, 2023.

A. A. Najjar, H. I. Ashqar, O. A. Darwish, E. Hammad, “Detecting AI-Generated Text in Educational Content Using ML and XAI,” arXiv preprint, 2025.

S. Chakraborty et al., “On the Possibilities of AI-Generated Text Detection,” Proc. Machine Learning Research, vol. 235, ICML, 2024.

H. M. S. Jaashana, W. R. A. Bin-Hady, “Stylometric Analysis of AI-Generated Texts,” Cogent Arts & Humanities, vol. 12, pp. 2553162, 2025.

A. Yadagiri et al., “Detecting AI-Generated Text with Pre-Trained Models,” ACL ICON, 2024.

T. T. Nguyen, A. Hatua, A. H. Sung, “How to Detect AI-Generated Texts?,” IEEE UEMCON, 2023.

“A Survey on LLM-Generated Text Detection,” MIT Computational Linguistics, vol. 51, no. 1, pp. 275-302, 2025.

“Simple Techniques to Bypass GenAI Text Detectors,” SpringerOpen Educational Technology Journal, 2024.

“AI-Generated Text Detection Using Deep Learning and Bayesian Optimization,” ResearchGate Preprint, 2024.

“Evaluating the Efficacy of AI Content Detection Tools,” BMC Educational Integrity, vol. 18, pp. 55-70, 2023.

“Detecting AI-Generated Text Based on NLP and Machine Learning,” arXiv, 2024.

“StyloAI: Distinguishing AI-Generated Content with Stylometric Analysis,” arXiv, 2024.

“Stylometric Fingerprinting with Contextual Anomaly Detection,” Preprints.org, 2024.

“An Empirical Study of AI-Generated Text Detection Tools,” ResearchGate, 2024.

“Unveiling ChatGPT Text Using Writing Style,” PMC, 2024.

“Detecting Artificial Intelligence–Generated Versus Human-Written Texts,” PMC, 2024.

“Stylometry Recognizes Human and LLM Texts in Short Documents,” Expert Systems with Applications, Elsevier, 2025.

“The Imitation Game: Detecting Human and AI-Generated Texts,” arXiv, 2023.

“Humanizing Machine-Generated Content: Evading AI-Text Detection through Adversarial Attack,” arXiv, 2024.

“GenAI Content Detection Task 2: AI vs. Human—Academic Essay Authenticity Challenge,” arXiv, 2024.

Abdullaev Munis Kurbonovich and Kungratov Ilmurod Kuzibay ugli, “INTEGRATING INFORMATION AND COMMUNICATION TECHNOLOGIES WITH DATA SCIENCE FOR THE DEVELOPMENT OF NATIONAL ECONOMIC SECTORS”, EIT, vol. 13, no. 4, pp. 83–93, Sep. 2025.

M. K. Abdullaev and I. K. Kungratov, “The importance of data science in the digital transformation of the Uzbekistan economy: Empirical analysis and scientific approaches,” Economics and Innovative Technologies, vol. 13, no. 1, pp. 83–90, 2025. doi: https://doi.org/10.55439/EIT/vol13_iss1/645.

Digital Transformation and Artificial Intelligence in Uzbekistan: Challenges, Innovations, and Future Trends, DTAI – 2024, vol. 1, 2024. [Online]. Available: https://dtai.tsue.uz/index.php/DTAI2024/article/view/314.

D. Khoshimov and I. K. Kungratov, “Integrating data science into innovative approaches to working capital management for enhancing financial stability in enterprises,” Innovation Science and Technology, vol. 1, no. 6, pp. 68–75, 2025. doi: https://doi.org/10.55439/IST/vol1_iss6/179.

Sh. Bobokulov and I. Kungratov, “Бизнес-анализ и оптимизация механизма коммерциализации научно-инновационных разработок организации,” Muhandislik va Iqtisodiyot, vol. 3, no. 1, pp. 7–12, 2025. doi: https://doi.org/10.5281/zenodo.14837564 .

DATA SCIENCE-DRIVEN APPROACHES TO IDENTIFYING AI-GENERATED CONTENT: MACHINE LEARNING AND NLP MODELS FOR ACADEMIC INTEGRITY AND DIGITAL TRANSPARENCY

Authors

DOI:

Keywords:

Abstract

References

Downloads

Published

How to Cite

Issue

Section

License

Most read articles by the same author(s)

Information

Language

Make a Submission

CONNECTED SYSTEMS

Current Issue