82 Million Cantonese Script Data
Cantonese textual data, 82 million pieces in total; data is collected from Cantonese script text; data set can be used for natural language understanding, knowledge base construction and other tasks.
CantoneseScriptSample
56,920 Car Fine Granularity Comments Annotation Data
It collectes comments from different car forums and fine-grained annotation is carried out on posts commented by users. Annotations include labels of manufacturer, brand, model, attribute, description value, tendency, etc. It can be used in fine-grained natural language understanding research, emotion analysis and some other fields.
Fine-grainedCar commentary dataAnnotationSample
10,000 Chinese News Events Annotation Data
10,000 Chinese news event annotated data. The contents are hot news in 2013. Each piece of news contains one or more events. Each event is annotated. The data is stored in xml and can be used for natural language understanding.
NewsEventsAnnotationSample
47,811 Sentences - Intention Annotation Data in Interactive Scenes
Intent-like single-sentence annotated textual data, the data size is 47811 sentences, annotated with intent classes, including slot and slot value information; the intent field includes music, weather, date, schedule, home equipment, etc.; it is applied to intent recognition research and related fields.
IntentionAnnotationSample
28,699 Intent-type single sentence annotation data
Intent-like single-sentence annotated textual data, the data size is 28699 sentences, artificially written, and annotated with intent classes, including slot and slot value information; the intent field includes music, weather, date, schedule, home equipment, etc.; it is applied to intent recognition research and related fields.
IntentionAnnotationSample
410,000 Groups – Chinese-Korean Parallel Corpus Data
410,000 set of parallel translation corpus betweeen China and Korea, which are stored in txt files. It covers many fields include traveling, medicine, daily, TV play. Data cleaning, desensitization, and quality inspection have been carried out. It can be used as the basic corpus database in text data file as well as used in machine translation.
Chinese-KoreanParallel corpusSample