R&D Resources

Here are the list of datasets and resources shared publicly by TurkAI.

Multi-Purpose Language Understanding (ÇADA) Dataset

This Multi-Purpose Language Understanding (ÇADA) dataset has been developed to evaluate the performance of Turkish artificial intelligence (AI) models. ÇADA is a comprehensive dataset designed to measure the success of Turkish natural language processing (NLP) models in various tasks.

Access to Repository
* Requires HuggingFace account.

TVoice Dataset

TVoice is a dataset consisting of Turkish audio clips specifically curated for training speech-to-text (STT) models in Turkish. The dataset places special emphasis on capturing regional accents and dialects from various parts of Turkey, including the Doğu (Eastern), Ege (Aegean), and Kuzey Doğu (Northeastern) regions. TVoice aims to enhance the accuracy and versatility of STT models by providing rich linguistic diversity, helping models better understand and transcribe speech across different local dialects and accents in Turkiye.

Access Request Form
* After submitting the form you will get the HuggingFace repository access.

** Requires HuggingFace account.

AIRVIC: AI Recognition of Viral CPE

AIRVIC is an artificial intelligence model designed to detect virus-induced cytopathic effects (CPE). These cellular changes, caused by various viruses, are visually indistinguishable and require advanced methods for detection. AIRVIC leverages deep learning to accurately identify CPEs, distinguishing them from other changes like aging.

ACCESS TO APPLICATION
* Use AIRVIC by signing in with your Google account.

** Free for non-commercial use.