When it comes to remote onboarding, one of the key struggles is ensuring that the person on the other side of the screen isn’t a static image or recording used by a fraudster to impersonate someone else.
This process is known as liveness detection.
With over 8 billion people worldwide, each with a unique appearance, and the growing sophistication of fraud techniques, liveness detection isn’t the easiest task. As with many other things these days, it employs artificial intelligence (AI).
The question remains: how to train AI to recognize live people and protect against identity fraud? What are the common pitfalls, and how to avoid them?
The role of neural networks in liveness checks
As we wrote in another post about AI in identity verification, a neural network is a system inspired by the structure of the human brain. It analyzes vast amounts of data to detect patterns, learns from these observations, and then uses these insights to make informed guesses with new, similar data.
Think of ChatGPT, a prime example of neural networks. It’s trained on billions of text pages across the web to be able to produce meaningful text following a user’s prompt:
Teaching a neural network to do liveness checks involves a similar principle to that applied in ChatGPT. Since neural networks learn to perform tasks by analyzing existing data, you’ll need a lot of relevant data as training material. This data can vary depending on the method: passive liveness will require images, while active liveness will require videos.
In fact, all modern liveness checks use AI in one way or another for a simple reason: effectiveness.
AI algorithms process large amounts of data in real time, and make it possible to verify users’ liveness without noticeable delays. Neural networks are continuously trained and updated with new data, so their performance becomes more effective over time. Last but not least, AI can handle many liveness checks simultaneously. That makes it ideal for services with large user bases, such as online banking, remote onboarding, and e-commerce.
However, the power comes at a price.
You might also like: Can I Use ChatGPT for Identity Verification?
Challenge #1: Collecting extensive data for training
You need a lot of samples to train AI. No, seriously. A LOT. To have your network ready for fieldwork, you’ll need not one but two datasets:
A dataset for actual training. These samples must be images of the same nature, quality, and variety, as they will then be submitted by real users.
A dataset for training result validation. These samples are needed to test how well the network performs. It’s important never to feed items from this dataset to the network during training to avoid “teaching to the test.” In this case, the network might perform well in your testing sandbox, but fail to generalize to new data in real-world applications.
For example, if your validation dataset contains 1,000 samples, a single error translates into a 0.1% error rate. To reach a 99.999% accuracy rate, you’ll need 100,000 examples to prove that the network performs reliably under diverse conditions.
But that’s not all.
Challenge #2: Obtaining enough fraudulent samples
The goal of a liveness check is to mitigate identity fraud. Hence, you’ll also need samples of fraudulent images—as if someone tries to cheat the system. As we said above, accuracy comes at the cost of the amount of data. So, to be 99.9% sure of detecting an attack, half of the items in your dataset should be representations of attacks.
Challenge #3: Collecting a wide variety of attributes
It’s also important to balance your dataset to avoid biases in AI behavior. If you reside, say, in Europe, you may lack samples of portraits of people of African ethnicity. As a result, an otherwise effective system can perform poorly when it comes to verifying such individuals.
Also, people often wear accessories, like glasses, hats, scarves, and more. These items hide parts of the face, making the task more difficult, since it’s not always possible to ask a person to remove an accessory. Furthermore, accessories can be used in fraud attacks, for example, to disguise imperfections in masks. That’s why it is important to have these kinds of samples as well, in order to train the network to address these attributes.
When training Regula’s liveness check module, we consider the following attributes:
Race and ethnicity;
Gender;
Age;
Physical features: variations in facial hair, makeup, scars, etc.;
Accessories: eyewear, headgear, and other accessories that people commonly wear.
The actual proportion of attributes depends on your context. Ideally, your dataset should correlate with the geographic area in which your business operates. So, if your target market includes countries where it’s commonplace for people to have their heads covered for religious reasons, you’ll need a larger share of such samples.
Challenge #4: The risk of mislearning from data
One of the most significant risks is that the AI system may learn incorrect patterns that result in false positives or negatives.
Such errors stem from the way neural networks are trained. When you feed the network with samples, it selects the features that let it give correct answers. Then, the network remembers and generalizes these features so that it can apply them to evaluate a new unfamiliar sample.
However, a training dataset might contain systematic errors. For example, say you collect a large set of “attacks” using a camera with broken pixels. These defects might not be visible to the naked eye, but the network may “learn” it as a meaningful feature. Samples submitted by real-world users won’t have this peculiarity, so the network will produce false positives—i.e., identify fraudsters as genuine users.
The same thing happens when the variety of training samples is limited. If you train a network using only 10 masks, it will likely learn to identify those specific masks with high accuracy. However, in the real world, when a fraudster uses a different mask, the network might fail because it hasn’t generalized the features sufficiently.
💡By the way, this is why it’s dangerous to use AI image generators to create more samples for your training datasets. As they generate images with distortions (like people with six fingers), your network can learn these distortions as an important feature.
Fixing these errors involves adjusting the entire dataset—either by removing misleading data, adding new examples, or both—to help the network learn the correct patterns.
Challenge #5: Complexity of parameters
The network takes lots of parameters into account. Technically, you often won’t even know exactly what it picked up on as an important parameter: a texture in the eyelid area, or a color match in a few spots on the forehead and cheeks.
It is almost impossible to pinpoint the root cause of an error in one specific example. So, all errors are addressed through broad statistical analysis, while isolated cases (3-5 examples of errors) are of little help and cannot guide systemic improvements.
Challenge #6: A human in the loop
To train a network, you need labeled data. In the case of a liveness check, a team member literally labels each image as “live,” “not live,” or “unknown.” Unfortunately, humans are prone to making mistakes.
As issues with a dataset are the root cause of insufficient outcome quality, errors at this stage severely impact the overall result. They are hard to find and correct later, and it may be necessary to revise all the marked-up data.
To minimize the risks at the markup stage, we at Regula provide detailed instructions, where we describe the process in detail. We conduct training sessions and boot camps for employees who are responsible for sample labeling so that they can understand what to do.
In addition, we perform cross-validation of marked-up data. Two people independently label the same samples. In the case of mismatched results, the data is shown to a third, more experienced employee. If the third person cannot make an unambiguous decision, we consult with domain experts and developers. If a decision still can’t be made, we put such examples in a separate group.
Challenge #7: New mentality
AI networks mimic human trial-and-error learning processes, so they work more like humans than traditional deterministic algorithms. This fact means two things:
They are not infallible.
Fixing a mistake in AI behavior isn’t as straightforward as updating a line of code.
Improving AI performance requires a big-picture perspective rather than ad hoc tweaks because addressing an issue in one scenario could unintentionally affect performance in another. You need to systematically work with the entire dataset: re-balance it, add more samples, and run error correction.
Again, the success of these improvements depends on the size of your dataset. If it includes only a handful of samples, the neural network won’t perform as expected in real-world applications.
How to prepare a good dataset for a liveness check
Achieving a reliable AI-based liveness check is the result of an equation where the common peculiarities of neural networks are multiplied by the complexity of human biometrics. However, it boils down to three key steps:
1. Extend your dataset: Continuously adding new samples to your datasets helps the AI adapt to a wide variety of relevant scenarios, and improves its reliability.
2. Revise: Once you collect or update the dataset, you need to revise it for errors or biases, such as mislabeled images or underrepresented groups, and ensure the data is as accurate and inclusive as possible. It’s issues with datasets, like data imbalances, that ultimately cause the technology to show a subpar level of quality.
3. Automate: Automating some aspects of the review process can speed up data validation. For instance, algorithms can be used to check for duplication or estimate the balance between ethnicities.
When training our networks at Regula, we apply principles of auto-learning. The process goes in iterations. A trained network marks up the data, which is then reviewed by a human. The human either confirms the results or corrects them. These corrections are valuable as they highlight the network's imperfections. Once a sufficient number of corrections is accumulated, a new cycle of training and automatic markup begins, continuing the process in a loop.
The idea is simple, yet execution is everything.
Key takeaways
Properly training an AI for a liveness check requires a large number of samples. To reach a 99.999% accuracy rate, you’ll need 100,000 samples in the validation dataset to confirm this metric only.
You’ll need at least two separate datasets to train a neural network: one for training, and one for training validation.
A good dataset for training an AI to perform a liveness check should include diverse samples in terms of ethnicity, gender, age, and other attributes to align with your business’ targeted market landscape.
A good dataset also includes two groups of samples: positive (legitimate images) and negative (fraud attempts). There should be examples of different fraud attempts, not just one type.
The main source of errors in AI-based liveness checks is an imperfect dataset: an insufficient number of samples, imbalances, mislearning issues, etc. To improve the performance, you’ll need to adjust (and sometimes retrain) the entire dataset.