Step into Cambodia, a nation where the echoes of its rich culture are interwoven into the fabric of daily life, including this country’s identity documentation. The challenge? First of all, mastering the Khmer language. This is essential for anyone working with Cambodian IDs, from verifying identities to processing paperwork.
This article explores the challenges and the solutions that facilitate seamless document processing in Cambodia.
The challenges of processing Cambodian IDs
It’s no surprise that Cambodian identity documents are predominantly in Khmer, the national language of Cambodia. Khmer boasts of a script that dates back to the 7th century. At the same time, this ornate script poses a significant challenge for many optical character recognition (OCR) tools made in the 21st century.
For starters, the Khmer alphabet is listed in the Guinness Book of World Records as the world's longest alphabet. It consists of 74 letters: consonants, sub-consonants, and vowels that combine to form distinct syllables.
Another interesting fact is that words in a sentence are written together without any spaces between them. What about punctuation marks? Yes, Khmer has them too (although Western punctuation is also commonly used). For example, ៕ is a period mark used to indicate the end of an entire text or a chapter.
Adapting technology to accurately capture and translate these details is vital for effective Cambodian document processing.
Khmer language in ID documents
The whole set of Cambodian identity documents is pretty standard. They have passports of multiple types, ID cards, driver’s licenses, and more—no different than in most countries. However, information is doubled in the national language. This fact always raises the bar for any OCR tools, especially the ones used for ID document processing.
Perhaps the documents of the greatest interest (and complexity) in this sense are Cambodian ID cards. They are widely used within the country, but unlike, for example, Cambodian passports, the data is almost entirely written in Khmer.
As you can see, the only Latin letters in Cambodian ID cards are those that duplicate the holder’s name and within the MRZ. However, there’s much more data:
Date of birth
Gender
Height
Place of birth
Address
Date of Issue and date of expiry
Identifiers (this field contains distinctive features of the individual, for example, moles, scars, etc.)
You might be surprised not to see any numbers in the fields with dates, but they are there—written in Khmer as well.
Khmer numerals & national calendar
Khmer numerals are distinctive and widely used in Cambodia. Their appearance in identity documents is a common source of complexity in document processing. Recognizing and converting these numerals into a universally understandable format is crucial for verification processes.
Khmer numerals are closely connected to the format of recording various dates within Cambodian ID documents.
Cambodia’s national Khmer calendar is a lunisolar system, known as Chhankitek. It is a blend of the Buddhist Lunar Calendar and the solar cycle, the synchronization of which is accomplished by adding an additional month or day to a particular year.
In Cambodian ID documents, however, another mix is used: the dates are written using the Gregorian calendar, but in Khmer numerals.
How to effectively process Cambodian documents
One may assume that since Cambodians use international passports with Latin characters, there are no issues with handling Cambodian IDs either, as you can use the data from the MRZ. However, that’s not quite so.
The rule of thumb is the more sources you have, the more you can trust verification results. Hence, the key to effective ID processing is allowing for numerous scenarios of how you can get certain data and what you can do with it.
If you’d like to let your Cambodian customers use an ID card for identity verification, the MRZ alone might not be enough. There is additional data, such as their address, that isn’t presented in the MRZ, or the date of issue, which doesn’t appear anywhere except in the visual inspection zone. Being able to read it in Khmer allows you to operate with additional data.
Regula Document Reader SDK has been taught to recognize Khmer script and is still being improved (the alphabet from the Guinness Book ain’t a joke!). It can also recognize Khmer numerals and convert them into Arabic numerals so that then, for example, they can be cross-checked with the dates encoded in the MRZ.
We at Regula already have a proven track record of smoothly handling Cambodian identity documents. If you’re looking for the right approach and technology, contact us for further details.