Here at Regula, we’ve been spending years and tons of effort collecting the world’s most extensive database of document templates worldwide. Those templates now form a solid foundation for reliable document processing and OCR accuracy.
Today, we’re starting a series of posts that cover the peculiarities of document recognition specific to a given country. In this article, we’ll have a closer look at Thai documents.
Let’s get started.
The challenges of processing Thai IDs
The beautiful Thai alphabet contains more than twice as many characters as English (72 vs. 26). This naturally leads to significant variability. Also, some linguistic specifics are associated with the country’s cultural and historical background. All of this requires a non-standard approach to ID document processing and further verifying fields written in Thai with OCR.
The surname concept officially appeared in Thailand only in the 20th century, when Thai citizens had to choose their last names. Some Thais registered simple ones, while others preferred more elaborate family names. This was especially typical for the Chinese Thais who stuck to the meaning their surnames had in Chinese (e.g., “the moonlight falling on the lotus”), which produced long surnames in Thai.
Unlike the Western Smiths and Joneses, surnames in Thailand must be unique. People aren’t supposed to use last names taken by someone else if they aren’t relatives. This often results in adding more characters.
Finally, it’s common for Thai people to change their names for various personal, religious, or superstitious reasons. The latter probably made Mr. San Sroi-soongnoen change his name to Mr. Makelifebetter (Meet Mr Makelifebetter: Theft suspect). The current law, though, keeps excessive creativity at bay. According to the 1962 Personal Names Act, a brand-new Thai surname must be no longer than ten letters, excluding vowel symbols and diacritics.
This means you can encounter Thai names that consist of anywhere from 15 to 40 characters. Such variation can create some difficulties for an OCR engine, so identity verification providers must take this peculiarity into consideration.
When training Regula’s OCR to recognize Thai ID documents, we’ve added some extra space for the name and last name fields, so it can cope with most of them. Of course, it’s impossible to cover all the extreme cases. But if we or any of our customers encounter an extremely long surname, we’ll adjust our Thai document templates, as well as re-train the OCR engine based on the new information. It's a proven and well-established process.
Right now, it’s 2023 for most of us, but step into Thailand and it's also 2566. That’s because Thailand uses two official calendars: the Western Gregorian and the Thai Solar calendar.
The year zero in the Thai Solar calendar is the year when the Buddha entered nirvana, which is 543 years ahead of the Gregorian calendar. This means the date in Thailand may look something like 15 April 2566 BE (2023 CE), where BE stands for the Buddhist era, and CE for the Christian era.
When it comes to identity documents, you can find dates in both Thai and Gregorian formats at the same time on their ID cards. The two dates basically provide the same information: the date of birth, the date of issue, and the expiry date.
For automated data entry purposes alone, the OCR engine doesn’t need to be able to convert a Thai date to a Western date, as it can simply pull the date from the appropriate field.
However, the ability to process the Thai format ensures you extra security, because you can cross-check the document’s validity. That’s exactly what Regula does: not only does our Document Reader SDK recognize the Thai date and convert it into the Gregorian format, but it also makes sure the dates match. Homegrown counterfeit manufacturers often overlook this peculiarity. Plus, the SDK also checks the correctness of the month written in Thai.
Some Thai documents, such as ID cards and driver’s licenses, don’t contain a gender field with a conventional M or F for males and females. However, this information may be required, for example, during a check-in process in a hotel.
To address this issue, Regula uses another peculiarity of Thai ID documents: the fact that they include the prefixes Mr., Miss, or Mrs. as a part of a first name. So, for example, there can be Mr. Kuntapon on the ID, while the actual given name is just Kuntapon. Thanks to lexical analysis, we can automatically determine the gender of the document owner: Mr. will be defined as male, and Miss and Mrs. as females. This information will be written and displayed in the results.
How to effectively process Thai documents
Like any other country, Thai documents contain many security features and a variety of national characteristics. In this article, we have highlighted just a few of them, which may affect the accuracy of reading data by automated means. But there’s more.
The proper solution must know what algorithm is used in Thai documents to calculate the personal number, must know down to the last character what should be written in the Nationality field, and so on. If you’re looking for a solution that considers all of the above and more, you came to the right place.
Regula’s team has cooperated with businesses, regulators, and security agencies worldwide for over 30 years. This has resulted in the world’s largest and most comprehensive database of document templates. It includes all possible Thai identity documents and a variety from other countries, totaling more than 12,500 templates (document types). This allows you to confidently process any document in a breeze and enjoy peace of mind.
Contact us to get some insights from boots-on-the-ground practitioners on how to approach the document processing, verification, and OCR accuracy challenges that you’re facing.