Computer vision

Speech Recognition

Dataset Name	Product type	Capture Content	Data Size	Use of Data
1,000 people, Multiple Races,7 types of facial emotion recognition data	Image	Each person has seven expressions collected: normal, happy, surprised, sad, angry, disgusted, and fearful.	1,000 people	Facial expression recognition
3,000 Images-Human Face Segmentation Data	Image	Segmentation annotation of human face, facial features,body and	3,000 images	Face segmentation
3,000 Images of 106 Facial Landmarks Annotation Data (complex scenarios)	Image	9 facial attributes and 106 facial landmarks 9 facial attributes: including gender, age, race, wearing cap/hat or not, wearing glasses or not, background, face orientation, eye status, mouth status	3,000 images	Facial landmark location Face recognition
100 People-Driver Behavior Collection Data	Image	Dangerous driving behavior, fatigue driving behavior, visual movement behavior	100 people	Driver behavior Detection
100 People-Liveness Detection Data	Image	Living body action video, lip language video, Non-living body video (anti-spoofing sample), Anti-spoofing data of lip language, Anti-spoofing data of RGB images	100 people	liveness detection
50,016 Gesture Recognition Data	Image	18 static hand gestures and 21 keypoints of the hand landmarks	50,016 data	Gesture recognition
100 People-Human Face Recognition Data in Surveillance	Image	Human facial information in surveillance. The labels of gender and age were annotated.	100 people	Face recognition
1,000 people, 7,156 Cross-age Face Images data	Image	The age spans are 10 years, and 4 images of each person in different ages were collected at least.	1,000 people	Face recognition
1,000 People multi-race and Multi-pose Face Images Data	Image	Image quantity: 29 images per person (14 multi-pose face images of indoor scenes + 14 multi- pose face images of outdoor scenes + 1 ID photo) The labels of race, gender, age and face pose were annotated.	1,000 people	Face recognition
10 Categories-200 Groups of Refined Urban Management Data	Image	18 subcategories such as streets, snack streets, shop entrance, corridor, community entrance, construction sites, etc., and each group of data contains 2 images from different angles	200 groups	Refined urban management
3,000 images Natural Scene OCR Data of 12 Languages	Image	Include Asian language family, European language family, and row-level quadrilateral bounding box annotation and transcription for the texts	3,000 images	Multilingual OCR task
100 People with Occlusion and Multi-pose Face Recognition Data	Image	There're 200 images includes 4 kinds of light conditions * 10 kinds of occlusion cases (including non-occluded case) * 5 kinds of face pose. For each image, the labels of face pose and occlusion were annotated.	100 people	Face recognition
100 People- 3D Liveness Detection Data	Image	Living face image data, anti-spoofing data of living face image and anti-spoofing data of mask image of three races with different skin color. Each image corresponds to a depth image, a depth information file, a camera internal parameters file	100 people	Face recognition liveness detection
100 People- Electric Bicycle Entering Elevator Data	Image	For each subject, 1 images and 4 videos were collected and the gender, race and age should be labeled. For each video, the labels of collecting scene and electric bicycle model were annotated.	100 people	Refined urban management
1,435 Images- Alpha Matte Human Body Segmentation Data( fine version)	Image	Collecting half body or full body images, and alpha matte segmentation annotation was done to the collecting human body. Label the subject’s race, gender, age, collecting scene.	1435 images	Semantic Segmentation
200 People- Gait Recognition Data in Surveillance	Image	Each subject walked in slow speed, normal speed and fast speed according to the specified walking route. Each subject should walk 9 times with 3 seasonal clothes( summer, autumn and winner) respectively.	200 People	Gait recognition
200 People- Re-ID Data in Real Surveillance Scenes	Image	Collect 8 kinds of human body orientation, and add bounding boxes and 15 attributes to human body.	200 people	Re-ID
200 People- Re-ID Data in Surveillance	Image	Add bounding boxes and 15 attributes to human body. The gender, age, race, collecting scene, category of clothes, camera number, and camera height of subject should be labeled.	200 people	Re-ID
200 Yellow People - Multi-Pose Face Images & Videos Data	Image	Face pose, head pose, nationality, gender, collecting environment and age	200 people	Face recognition

Dataset Name	Product type	Collection equipment	Data Size	Use of Data
1505 hours- Mandarin Speech Data by Mobile phone	Speech	Mobile phone	1505 hours, 6,278 speakers	Speech recognition Voiceprint Recognition Machine translation
300 hours- Mandarin Conversational Speech Data by Mobile phone	Speech	Mobile phone	300 hours, 440 speakers	Speech recognition Voiceprint Recognition Machine translation
200 Hours- Chinese Children Speech Data by Mobile phone	Speech	Mobile phone	200 hours, 557 speakers	Speech recognition Voiceprint Recognition
200 hours- Mixed Speech with Chinese and English Data by Mobile phone	Speech	Mobile phone	200 hours, 701 speakers	Speech recognition Voiceprint Recognition
300 Hours, 10 Dialects Speech Data by Mobile phone	Speech	Mobile phone	300 hours	Speech recognition Dialect recognition
Interspeech_ Accented English Speech Recognition Competition Data	Speech	Mobile phone	200 hours, 528 speakers	Speech recognition Language recognition
50 People- Far-field Speech Data in Home Environment	Speech	microphone array	50 speakers	Speech enhancement Speech recognition
200 Hours- 10 Foreign Languages Speech Data by Mobile phone	Speech	Mobile phone	200 hours	Acoustic study Language model training Algorithm research

Note: Please apply for datasets reasonably according to the research field. The maximum number of applications for Computer Vision datasets is 6 sets.

Note: Please apply for datasets reasonably according to the research field. The maximum number of applications for speech recognition datasets is 4 sets.