Our off-the-shelf datasets cover 800TB of image and video data, 200,000 hours of speech data, 2 billion pieces of text data, and they are ready to go.

Speech Recognition Datasets Computer Vision Datasets

200,000 hours of speech recognition datasets, recorded by a variety of professional equipment, covering diversified scenes and multi languages.

Datatang.AI AI data services company, data services, data customization, dataset, data collection, data annotation
Data Collection

Professional data collection team, profound data collection experience and quality proven by world’s leading AI companies.

Datatang.AI AI data services company, data services, data customization, dataset, data collection, data annotation
Data Annotation

We have 3 data processing centers, 5,000 experienced annotators, extensive project experience in point cloud, image, video, speech and text.

Our Specialties

Off-the-Shelf Datasets

We offer an extensive volume of datasets covering different fields such as computer vision, speech recognition, and NLP. All the datasets have clear copyright.


Our “Human-in-the-loop” intelligent data labeling technology performs the human-machine interaction semi-automatic labeling pipelines and creates up to 3-4 times efficiency improvement. It has successfully been applied to nearly 5,000 projects.


As world’s leading AI data service provider, we have provided work opportunities for over 80,000 people from more than 50 countries and regions.

Data Security

Our data labeling platform can customize annotation templates and built-in automatic labeling tools. It is made to meet all types of annotation requirements.

Our Customer

Security and Compliance

Datatang has supported us in various projects in CV and speech recognition researches for years. Truly appreciate the prompt turn-around, great parallel projects management skills, and high quality data that Datatang has showcased/provided along the year.

We’re making considerable progress with our algorithmic development thanks to Datatang’s ready-to-go datasets which really help us catch up the project. I would recommend Datatang’s datasets and service to anyone who need reliable training data.

Training Data is a very important composition of ML development. But data labeling is quite labor-intensive. With Datatang’s well-designed platform, annotation service and extraordinary project management, we are able put more focus on improving algorithms and do what we are good at.

Contact Us

Leave your e-mail, we will get in touch with you soon.
Sharpen Your AI with Better Data

Terms Privacy 2021 Datatang. All Rights Reserved. Legal statement and privacy policy