Please fill in your name

Mobile phone format error

Please enter the phone number

Please fill in the full name of the company

Please fill in your e-mail

Requirement description cannot be empty

Successful submission! Thank you for your support.

Format error, Please fill in again


Requirement description format error,Minimum 5 characters required

No data available


Speech Data Solutions

Datatang has 60,000 hours of off-the-shelf speech datasets, which includes English speech datasets of different accents, European languages speech datasets, Chinese Mandarin speech datasets, Chinese dialects speech datasets, Asian languages speech datasets, and speech datasets for other languages. These high-quality speech training datasets can provide great help for our clients' TTS and ASR solutions. In addition to off-the-shelf speech datasets, Datatang also provides speech data collection and annotation services to meet different requirements.

10000 Hours English (different accents) Speech Datasets

English speech datasets include American English, British English, and accented English from native speakers of Chinese, German, French, Japanese, Spanish, etc.

10000 Hours European Languages Speech Datasets

European languages speech datasets include German speech datasets, French speech datasets, Russian speech datasets, Spanish speech datasets, Italian speech datasets, etc. The datasets are all collected from native speakers of the language.

20000 Hours Mandarin Speech Datasets

Chinese Mandarin datasets are recorded by Chinese locals, including standard & accented Mandarin speech data and Chinese-English code switch speech data, covering different scenarios, such as car infotainment datasets, wake words datasets, noise datasets; covering different ages.

10000 Hours Chinese Dialects Speech Datasets

Chinese dialects speech datasets include speech data of local speakers in major dialect regions, such as Cantonese speech dataset, Shanghai dialect dataset, Minnan dialect dataset, Taiwan Mandarin dataset, Kunming dialect dataset, Wuhan dialect dataset, Changsha dialect dataset, Sichuan dialect dataset, etc.

6000 Hours Asian Languages Speech Datasets

Asian languages speech datasets include Hindi speech datasets, Japanese speech datasets, Korean speech datasets, Vietnamese speech datasets, Malay speech datasets, Thai speech datasets, etc. The datasets are all collected from native speakers of the language.

2500 Hours Speech Datasets of Other Languages

Datatang also provides off-the-shelf Portuguese (Brazilian) speech datasets, Hebrew (Israeli) speech datasets.

Customized Speech Data Solution

Get Customized Solution and Quotes within 1 Hour with Your Exclusive Account Manager

support requirements of different languages, speed, recording environment, recording devices, age.

  • Speech Data Solutions_Datatang

in-car, home, meeting room, outdoor, recording room.

  • Speech Data Solutions_Datatang
  • Speech Data Solutions_Datatang

support children speech collection of different nations, genders, languages, age groups.

  • Speech Data Solutions_Datatang

support annotations of speaker attributes, audio attributes, noise and support speech transcription.

  • Speech Data Solutions_Datatang

support annotations for splitting audio into multiple paragraphs

  • Speech Data Solutions_Datatang

mark timestamps, voice validity, noise attributes, etc.

  • Speech Data Solutions_Datatang
Contact Us
Speech Data Solutions_Datatang_DATA ANNOTATION PLATFORM


Datatang independently developed our privatized annotation platform to fully ensure data safety of our clients. Our annotation platform contains hundreds of modules of annotation tools empowered by our own service process system. The ultimate goal is to realize the highly personalized needs from our clients.

Learn More

Terms Privacy Datatang. All Rights Reserved. Legal statement and privacy policy

Contact Us

No data available

Speech Data Solutions

Speech Data Solutions_Datatang