THE MAIN WAYS TO IDENTIFY AI-GENERATED DATA IN ACADEMIC AND ONLINE PUBLICATIONS USING MACHINE LEARNING AND NLP MODELS

Mualliflar

  • Abdullaev Munis Kurbonovich Head of the Department of "Industrial Management and Digital Technologies" of the International Nordic University, PhD, Associate Professor. Independent Researcher of TSUE https://orcid.org/0000-0003-0290-8453
  • Kungratov Ilmurod Kuzibay ugli Master's student (data science) in International Nordic University. Scientific journals editorial specialist at TSUE. https://orcid.org/0009-0008-1397-2905

DOI:

https://doi.org/10.55439/EIT/vol14_iss2/816

Kalit so‘zlar:

Synthetic, text, detection, generative, architecture, natural, language, processing, transformer, academic, integrity, stylometry, verification, probability, classifier.

Abstrak

This article comprehensively analyzes contemporary methodologies for identifying text generated by Artificial Intelligence (AI) across academic and online publications. The rapid maturation of generative language architectures has fundamentally altered the processes of information synthesis, significantly complicating the crucial task of determining true authorship origin. The primary objective of this research is to investigate the practical efficacy of identifying synthetic texts by employing Natural Language Processing (NLP) techniques alongside advanced Machine Learning (ML) models. The study conducts a comparative evaluation of stylometric analysis, "zero-shot" probabilistic models, and transformer-based deep learning architectures, such as RoBERTa. Findings demonstrate the systemic ineffectiveness of traditional plagiarism frameworks and validate the high accuracy of hybrid detection models. Specifically, the reliability of ensemble approaches is proven under conditions involving complex algorithmic attacks and automated text paraphrasing. The derived conclusions provide vital practical recommendations for higher education institutions and academic journal editorial boards.

Bibliografik manbalar

Abdullayev, M. K. (2026). Raqamli iqtisodiyotda oliy ta'lim tizimi: sun'iy intellekt integratsiyasi va boshqaruv mexanizmlari. Nordic International University ilmiy maqolalar to'plami, 1(2), 45-56.

Akram, A., Rashid, J., Jaffar, M. A., Faheem, M., & Amin, R. u. (2023). Segmentation and classification of skin lesions using hybrid deep learning method in the Internet of Medical Things. Skin Research and Technology, 29(11), e13524.

Amin, R., & Rashid, J. (2024). Multi-layered deep learning frameworks for digital authenticity verification in smart environments. Journal of Applied Technology, 14(3), 112-125.

Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J. D., Dhariwal, P., ... & Amodei, D. (2020). Language models are few-shot learners. Advances in Neural Information Processing Systems, 33, 1877–1901.

Gehrmann, S., Strobelt, H., & Rush, A. (2019). GLTR: Statistical Detection and Visualization of Generated Text. Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: System Demonstrations.

Jaffar, M. A., & Faheem, M. (2025). Advanced algorithmic detection of synthetic content in distributed academic networks. Eurasian Information Systems Review, 8(1), 89-102.

Krishna, K., Song, Y., Karpinska, M., Wieting, S., & Iyyer, M. (2023). Paraphrasing evades detectors of AI-generated text, but retrieval is an effective defense. arXiv preprint arXiv:2303.13408.

Kungratov, I. K. (2026). The main ways to identify AI-generated data in academic and online publications using machine learning and NLP models. Nordic International University, Master's Dissertation.

Liang, W., Yuksekgonul, M., Mao, Y., Wu, E., & Zou, J. (2023). GPT detectors are biased against non-native English writers. arXiv preprint arXiv:2304.02819.

Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., ... & Stoyanov, V. (2019). RoBERTa: A robustly optimized BERT pretraining approach. arXiv preprint arXiv:1907.11692.

Mitchell, E., Lee, Y., Khazatsky, A., Manning, C. D., & Finn, C. (2023). DetectGPT: Zero-shot machine-generated text detection using probability curvature. arXiv preprint arXiv:2301.11305.

Sadasivan, V. S., Kumar, A., Balasubramanian, S., Wang, W., & Feizi, S. (2023). Can AI-generated text be reliably detected? arXiv preprint arXiv:2303.11156.

Kungratov, I. (2024). DIGITAL TRANSFORMATION AND ARTIFICIAL INTELLIGENCE IN

UZBEKISTAN: CHALLENGES, INNOVATIONS, AND FUTURE TRENDS. DTAI – 2024, 1(DTAI).

Retrieved from https://dtai.tsue.uz/index.php/DTAI2024/article/view/314

Anvarova, M., and I. Kungratov. "Foreign experiences in the development of the

digital education system." Yangi O'zbekiston taraqqiyot strategiyasi talabalar nigohida 1.1

(2023): 731-733.

Бобокулов, Шохрух, and Ильмурод Кунгратов. "Бизнес-анализ и оптимизация

механизма коммерциализации научно-инновационных разработок организации."

MUHANDISLIK VA IQTISODIYOT 3.1 (2025).

Downloads

Nashr qilingan

2026-04-30

Qanday qilib iqtibos keltirish mumkin

Abdullaev Munis Kurbonovich, & Kungratov Ilmurod Kuzibay ugli. (2026). THE MAIN WAYS TO IDENTIFY AI-GENERATED DATA IN ACADEMIC AND ONLINE PUBLICATIONS USING MACHINE LEARNING AND NLP MODELS. Economics and Innovative Technologies, 14(2), 116–124. https://doi.org/10.55439/EIT/vol14_iss2/816

Nashr

Bo'lim

Миллий иқтисодиёт тармоқ ва соҳаларида ахборот-коммуникация технологияларини қўллаш