40,002 Images – OCR Data of Internet Image
40,002 Images – OCR Data of Internet Image. The collecting scenes of this dataset include subtitle, advertisement, cellphone screenshot, comic, emoticon, poster, magazine cover, etc. The language distribution is Chinese, English (a few). For annotation, line-level rectangular bounding box annotation and transcription for the texts were adopted for the internet images (column-level quadrilateral bounding box annotation and transcription for the texts were adopted for small amount of data). The dataset can be used for OCR tasks of internet images.
OCRMultiple types of internet imagesSample
28,699 Intent-type single sentence annotation data
Intent-like single-sentence annotated textual data, the data size is 28699 sentences, artificially written, and annotated with intent classes, including slot and slot value information; the intent field includes music, weather, date, schedule, home equipment, etc.; it is applied to intent recognition research and related fields.