Language

28 Jul 2025in IDs by countries

Challenges of ID Verification in China: Script Reading and Legal Framework

Maryia Valchanina

Head of Data Processing Department, Regula

This month saw a rollout of China's brand new, state-run ID system, known as Cyberspace ID. The system issues citizens a verified digital ID token that can be used for logging in to internet services. Crucially, these services cannot demand additional personal details beyond what the token provides. This initiative further adds to the complexity of Chinese ID verification and privacy rules—something that businesses operating in the Chinese market must be aware of.

On top of that, the process of ID verification itself is filled with challenges because of the unique script the Chinese language boasts. While it’s convenient that all IDs use Simplified Chinese, as opposed to a range of regional languages, even this version of the language can be hard to process optically.

In this article, we will explore both of these hurdles: we will break down the OCR (Optical Character Recognition) problems caused by the complexity of the Chinese script, as well as the most relevant requirements for KYC in China.

Subscribe

Get posts like this in your inbox with the bi-weekly Regula Blog Digest!

The current state of OCR for the Chinese script

In one of our previous articles, we underscored how the sheer diversity of ID documents in circulation is a big challenge for Chinese ID verification. And this is not the only challenge: the script used in these documents can be too sophisticated for some automated OCR systems to correctly process. What’s more, the script tends to exacerbate many technical problems relevant even to the Latin script, such as the lighting or focus.

Script-specific OCR challenges

Unlike English or other Latin-alphabet languages, Chinese is written in logographic characters that are dense with strokes. A single character can contain a dozen or more distinct strokes in a small font size on an ID card, so characters can smear together, making it hard for software to distinguish similar-looking ones.

For example, used in both surnames and given names, 贝 (bèi, “shell”) is simple and highly symmetrical. However, OCR can still confuse it with 见 (jiàn, “to see”) if the lower sweep is somewhat faded. 吉 (), which is also very common in both surnames and given names, contains 士 (shì, “scholar”) above 口 (kǒu, “mouth”). If stroke boundaries bleed or contrast is weak, OCR may mistake it for 卡 () or 哲 (zhé), which share overlapping parts.

As for more complicated situations, parents sometimes choose auspicious characters like 淼 (miǎo) for names, which consists of three 水 characters and stands for “vast expanse of water.” An OCR system could mistakenly segment 淼 into multiple characters if it doesn’t recognize the triple-stack pattern, reading it as a sequence of character components (氵氵氵) or misidentifying it as other water-related characters.

Another critical issue stems from the fact that some IDs are printed in both Chinese and English. More specifically, Chinese passport data pages show fields in both Chinese characters and English, and even mix scripts in one field (dates are printed as digits with the Chinese character 月 for month).

This is important, because during an ID check, a document reader will attempt to match the given name written in Chinese to the English transliteration of the name found in the machine-readable zone (MRZ). If either the Chinese script or the transliteration is ambiguous, the check may fail: the name “张伟” with two characters might get transliterated as “Zhang Wei” or “ZHANGWEI”. That’s why OCR solutions must have context-aware transliteration logic and language-specific matching.

Chinese ID MRZ Reading

During MRZ reading in Chinese passports, IDV software must convert the Latin script into the original Chinese name and match it against the visual data.

What’s more, national identity cards issued in certain autonomous regions add a second language (naturally, not English this time): for example, Guangxi ID cards include Zhuang script, and Xinjiang IDs may include Uyghur (Arabic script).

Regional Chinese ID cards

ID cards for ethnic Mongolians (left) and Uighurs (right) issued by Chinese authorities display data in both the holder’s native language and Chinese characters.

Technical OCR challenges (compounded by the script)

Regardless of language, the image capture conditions and document design still play a massive role in OCR accuracy. For Chinese ID verification, this is especially true because of the fine details. 

Lighting is a common issue, as harsh reflections or shadows can easily ruin text visibility. Chinese IDs and licenses are often laminated or coated, so overhead lights can easily mirror on the glossy surface. On the other hand, low light conditions introduce noise and require longer exposure, often yielding blurry images if the hand isn’t perfectly steady. 

Lighting is also often an underlying reason for another problem: security features interfering with OCR. The Chinese driver’s license, for example, is highly glossy with vibrant holograms, and even slight tilting causes bright reflections that OCR may interpret as light patches or random shapes across text. Similarly, ghost images, guilloché background patterns, microprinted text, or UV markings can all reduce the contrast or add clutter for the OCR software.

Chinese driver's license - hologram

This is why high-end document readers like the Regula 72X3 use multiple light sources (visible, infrared, ultraviolet) and take multiple images, which software then analyzes. However, for a mobile OCR solution, you may have to rely on the single RGB image, so it’s all about optimizing how that image is captured to minimize security feature interference.

An ID document must also be framed correctly and remain in focus, as a blurry or angled image can cause the OCR to misinterpret lines: for example, 千 (qiān) can be read as 干 (gàn) or 于 ().

Last but not least, the physical condition of an ID is a factor. IDs can be scratched, scuffed, faded, or stained; and any such damage will impact OCR. And in the case of Chinese, a scratch across a word might easily remove a line. Moreover, dirt or smudges can look like false strokes, potentially causing false readings.

KYC Compliance, Simplified

Verify customers in seconds and stay compliant.

The current state of China’s KYC framework

China is known for being one of the most highly controlled and technically demanding KYC environments in the world. In many ways, it’s unlike any other country in how identity data is issued, stored, verified, and regulated. 

How exactly? Here's a breakdown of the factors that define it:

Revised anti-money laundering (AML) law

The original AML Law dates back to 2007, but after 17 years, a major overhaul was needed to keep up with the new technological developments. In November 2024, China’s legislature approved a sweeping revision of the AML Law, effective January 1, 2025.

Key points of the revision included:

  • Expanded scope of coverage: Originally, only traditional financial institutions (banks, securities firms, insurance companies) were explicitly required to implement AML/KYC programs. The new law extends AML obligations to a range of non-financial industries, including real estate developers and agents, precious metal and jewelry dealers, lawyers and accountants involved in financial transactions, etc. 

  • Mandated KYC cooperation: The law underscores that all organizations and individuals must cooperate with KYC efforts, and prohibits anyone from helping others conceal illicit funds. In other words, it basically mandates that if a bank asks to verify your identity or source of funds, you must comply: refusing could be interpreted as non-cooperation under the law. It also means a company cannot, for example, let a client opt out of identity verification if it’s legally required.

  • Stricter customer due diligence: The revised law puts a strong emphasis on customer due diligence (CDD), as it requires a more structured approach to verifying customers, including initial identification, ongoing monitoring, and reverification at certain triggers. For example, financial firms must not only collect ID info at onboarding, but also update it periodically and have risk-based verification (meaning higher risk customers get more frequent or deeper checks).

  • More protection for personal data: The law also includes provisions to protect personal information obtained during KYC/AML processes, and mandates that any personal data collected must be kept confidential. Only under lawful circumstances can that info be shared (e.g., reporting to regulators or as evidence in a case).

  • Connecting AML to national security: Article 1 of the law was revised to say that AML efforts must support national security and public interest. In practice, this doesn’t change what a bank does day-to-day, but it means that violating these rules could be seen as not just a financial infraction but harming national security.

Latest personal and biometric data protection requirements

Back in 2021, the Chinese government enacted the Personal Information Protection Law (PIPL) as well as the Data Security Law (DSL), which are often compared to Europe’s GDPR. These laws create a framework for how personal data, including ID information, must be handled by any business or entity.

Under PIPL, an individual’s identifying information (name, ID number, biometric data, etc.) is considered sensitive personal information. For a company doing KYC in China, this means they should only collect what is necessary for verification, and inform the user about what data is collected and how it will be used. 

One major impact of this is the requirement to limit the cross-border transfer of sensitive data. If an international company is verifying Chinese IDs, they need to be mindful if any such data (like ID copies) are being sent to servers outside China. Otherwise, data export rules could be violated unless properly justified or with user consent. That’s why many companies choose to keep Chinese citizen data within servers in China, especially since China is known for critical data localization practices. Another key element is data retention: Chinese regulations mandate retention of identification data and transaction logs for at least 5 years (a duration that has become the common standard). 

More recently, China has taken an interesting stand on private-sector use of biometrics for identity verification. In March 2025, the Cyberspace Administration of China (CAC) issued the Security Management Measures for the Application of Facial Recognition Technology, which stated that individuals must not be forced to verify their identity via facial recognition.

While the modern trends suggest that facial recognition may soon become a universal KYC requirement worldwide, the Chinese government mandates that there must be an alternative method provided that is “reasonable and convenient” for the user. In other words, a business can use facial recognition for access or login, but if a customer declines to use their face, an alternate ID verification method (e.g., showing an ID card) must be available. Importantly, these restrictions do not apply to police or state security use of facial recognition—the rules specifically focus on companies and non-public entities.

Mandates on real-name verification for telecom and online

Over the recent years, China has also instituted a broad “real-name system” in many domains: service providers are required to collect users' real names, ID numbers, and other critical information. Some of the earliest and most affected industries have been telecom and online services. 

For instance, the Telecommunications Real-Name Regulation requires all phone SIM cards to be registered with the buyer’s real identity (ID document) since 2013. And the Cybersecurity Law of 2017 introduced real-name verification for internet services as well: it states that users of internet platforms must be verified with their true identity information before they can post content or use certain online services. In practice, this means social networks, forums, and even online comment sections often require you to link your account with a verified phone number, which in turn is registered under your real name.

The real-name system has arguably reached its peak this month, as the National Online Identity Authentication Public Service Platform went live on July 15, 2025. The state-run platform issues citizens a “Net Number” and a digital “Net Certificate” that can be used for logging in to internet services without repeatedly handing over a name or national ID number. Users can voluntarily obtain these credentials by verifying their identity documents once through the government app.

As of the system’s release, the new cyber ID is voluntary for both users and service providers. Users may choose whether to apply it as their login credential, and providers are permitted but not required to support it—and must still offer non-cyber‑ID methods for user authentication if someone elects not to use it. 

However, it is fully possible that the industry uptake may create de facto expectations over time, and the system will see universal use with the legal requirement to do so. It has been reported that using the cyber ID reduced the amount of user data collected by platforms by 89%, and by launch, the dedicated app had 16+ million downloads. On top of that, a number of major tech companies (Tencent, Alibaba, ByteDance, etc.) have integrated the system, with 67 platforms supporting the cyber ID as of mid-2025.

Meeting Chinese ID verification challenges with Regula

Given the above challenges, it is hard to find a solution that will work flawlessly under any conditions, especially in the case of Chinese. The best results in this case will come from a combination of an advanced OCR engine (with support for Chinese) as well as an extensive document template library to help the engine interpret the fields correctly.

Regula provides both parts of the solution: Regula Document Reader SDK supports over 138 languages (including Chinese) and more than 600 data fields, while our template database is the biggest in the world, with 15,000+ documents from 252 countries and territories.

In addition, the SDK supports full UI localization for 35 languages (including Chinese-language interfaces), which helps local deployments.

With Regula Document Reader SDK, you will be able to:

  • Authenticate thousands of ID documents from all over the world, including China.

  • Read machine-readable zones (MRZs) and barcodes.

  • Read and authenticate RFID chips. 

  • Verify digital signatures encrypted into barcodes using the ICAO Datastructure format.

  • Verify dynamic security features, including holograms and optically variable ink (OVI).

  • And more.

Let’s drive the future—together. Book a call to learn more about our solutions!

Verify IDs in seconds with Regula SDK

Instantly verify passports, ID cards, driver’s licenses, and more—powered by the world’s largest database of document templates.

On our website, we use cookies to collect technical information. In particular, we process the IP address of your location to personalize the content of the site

Cookie Policy rules