Raphaƫl Jeanningros1 and Sonam Mittal2, 1Department of Computer Engineering, CY Tech, Pau, FRANCE, 2Department of Information Technology, BK Birla Institute of Engineering and Technology, Pilani, INDIA
In recent years, large language models (LLM) have been increasingly sophisticated, capable of generating text that is difficult to distinguish from human-written text. For this purpose, everyday, new detectors are created as the purpose to follow the fast evolution of LLMs. Thus far, most of the artificial intelligence (AI) text identification systems are working on the basis of the BERT model, but we can still see some systems working on the basis of the TF-IDF model. So, this study aims to understand which differences exist between AI generated content and human written content. In the second part to create one AI text identification system based on the BERT model and another system based on the TF-IDF model. And the last step is to analyze the results of each AI text identification system and conclude on the most efficient and accurate system.
AI generated content detection, Large Language Models, BERT, TF-IDF.
Copyright © NLPSIG 2024