Thai Speech Data

From：Datatang Date：2023-08-04

Among the diverse languages seeking integration into this technology, Thai holds a significant place. Thai speech recognition has been a focal point of research and development, driven by the growing demand for localized and personalized user experiences.

Over the past few years, Thai speech recognition technology has witnessed remarkable advancements, largely due to the availability of extensive linguistic data. The foundation of any speech recognition system lies in its dataset, and Thai is no exception. The abundance of voice data from various sources, including social media, podcasts, and recorded conversations, has played a pivotal role in training machine learning algorithms. As a result, Thai speech recognition systems have achieved unprecedented accuracy and fluency.

However, this progress is not devoid of challenges. The linguistic complexity of Thai poses hurdles in developing accurate recognition models. The language is tonal and features a unique script, demanding a deep understanding of its phonetics and syntax. Acquiring and annotating precise data for Thai speech recognition remains an ongoing challenge. Moreover, ensuring the inclusivity of regional accents and dialects further complicates the data collection process.

Datatang Thai Speech Datasets

203 Hours – Thai Speech Data by Mobile Phone_Reading

Thai speech data (reading) is collected from 498 Thailand native speakers and is recorded in quiet environment. The recording is rich in content, covering multiple categories such as econimics, entertainment, news, figure, and oral. Around 400 sentences for each speaker. The valid data volumn is 203 hours. All texts are manual transcribed with high accuray.

1,077 Hours - Thai Conversational Speech Data by Telephone

The 1,077 Hours - Thai Conversational Speech Data involved 1,986 native speakers, developed with proper balance of gender ratio, Speakers would choose a few familiar topics out of the given list and start conversations to ensure dialogues' fluency and naturalness. The recording devices are various mobile phones. The audio format is 8kHz, 8bit, and all the speech data was recorded in quiet indoor environments. All the speech audio was manually transcribed with text content, the start and end time of each effective sentence, and speaker identification.

Improve Speech Emotion Recognition with High-quality Datasets

With the rise of deep learning, emotion recognition methods based on deep neural networks have been widely used. Speech Emotion Recognition, also known as NER, is a computer simulation of the process of human emotion perception and understanding. Computers are used to analyze emotions, extract emotional feature values, and use these parameters for corresponding modeling and recognition to establish a mapping relationship between feature values and emotions. , and finally classify the emotion.

How Data Empowers Multimodal Machine Learning

In the rapidly evolving landscape of artificial intelligence, one of the most promising frontiers is multimodal machine learning, where algorithms learn from and make decisions based on a combination of different data types such as text, images, audio, and more. At the heart of this innovation lies a fundamental truth: the power of multimodal machine learning is intricately woven with the quality, diversity, and abundance of data.

Thai Speech Data

Previous

Improve Speech Emotion Recognition with High-quality Datasets

Next

How Data Empowers Multimodal Machine Learning

Thai Speech Data

Recent

Datatang is going to attend Interspeech 2023

Empowering Multilingual Speech Recognition in the Automotive Industry

AI-Driven Marketing

Previous

Improve Speech Emotion Recognition with High-quality Datasets

Next

How Data Empowers Multimodal Machine Learning