R&D Resources
Here are the list of datasets and resources shared publicly by TurkAI.
Multi-Purpose Language Understanding (ÇADA) Dataset
This Multi-Purpose Language Understanding (ÇADA) dataset has been developed to evaluate the performance of Turkish artificial intelligence (AI) models. ÇADA is a comprehensive dataset designed to measure the success of Turkish natural language processing (NLP) models in various tasks.
TVoice Dataset
TVoice is a dataset consisting of Turkish audio clips specifically curated for training speech-to-text (STT) models in Turkish. The dataset places special emphasis on capturing regional accents and dialects from various parts of Turkey, including the Doğu (Eastern), Ege (Aegean), and Kuzey Doğu (Northeastern) regions. TVoice aims to enhance the accuracy and versatility of STT models by providing rich linguistic diversity, helping models better understand and transcribe speech across different local dialects and accents in Turkiye.
** Requires HuggingFace account.
AIRVIC: AI Recognition of Viral CPE
AIRVIC is an artificial intelligence model designed to detect virus-induced cytopathic effects (CPE). These cellular changes, caused by various viruses, are visually indistinguishable and require advanced methods for detection. AIRVIC leverages deep learning to accurately identify CPEs, distinguishing them from other changes like aging.
** Free for non-commercial use.