English speech datasets include American English, British English, and accented English from native speakers of Chinese, German, French, Japanese, Spanish, etc.
European languages speech datasets include German speech datasets, French speech datasets, Russian speech datasets, Spanish speech datasets, Italian speech datasets, etc. The datasets are all collected from native speakers of the language.
Chinese Mandarin datasets are recorded by Chinese locals, including standard & accented Mandarin speech data and Chinese-English code switch speech data, covering different scenarios, such as car infotainment datasets, wake words datasets, noise datasets; covering different ages.
Chinese dialects speech datasets include speech data of local speakers in major dialect regions, such as Cantonese speech dataset, Shanghai dialect dataset, Minnan dialect dataset, Taiwan Mandarin dataset, Kunming dialect dataset, Wuhan dialect dataset, Changsha dialect dataset, Sichuan dialect dataset, etc.
Asian languages speech datasets include Hindi speech datasets, Japanese speech datasets, Korean speech datasets, Vietnamese speech datasets, Malay speech datasets, Thai speech datasets, etc. The datasets are all collected from native speakers of the language.
Datatang also provides off-the-shelf Portuguese (Brazilian) speech datasets, Hebrew (Israeli) speech datasets.
support requirements of different languages, speed, recording environment, recording devices, age.
in-car, home, meeting room, outdoor, recording room.
support children speech collection of different nations, genders, languages, age groups.
support annotations of speaker attributes, audio attributes, noise and support speech transcription.
support annotations for splitting audio into multiple paragraphs
mark timestamps, voice validity, noise attributes, etc.
Datatang independently developed our privatized annotation platform to fully ensure data safety of our clients. Our annotation platform contains hundreds of modules of annotation tools empowered by our own service process system. The ultimate goal is to realize the highly personalized needs from our clients.Learn More
No data available
Speech Data Solutions