Authors
Meric Demirors, Ahmet Murat Ozbayoglu and Toygar Akgun, TOBB University of Economics and Technology, Turkey
Abstract
Developments in the field of generative-AI have made it extremely difficult to distinguish artificially generated content from real content and their reliable detection has become more important. This research's topic is detecting speeches that are generated by future models in unknown languages and answering "With what information does a model distinguish fake and real audio, does it learn how languages sound, or a specific trait of generated speech?" Multiple models are trained on various datasets to detect synthetic audio signals generated by generative-AI models. Best accuracy scores for different test sessions are 94.92% for known language from unknown model, 98. 44% for an unknown language from known model, and 95. 18% for an unknown language from unknown model.
Keywords
CNN, Bispectrum