How to Automate KYC with OCR and AI?
Learn how automated KYC , powered by different AI technologies, allows businesses to provide frictionless onboarding experience.
A complete guide with all you need to know about capturing data from passports and ID cards with OCR & AI.
Passport is one of the most common documents used for identity verification. More businesses are required to capture data from passports or other identity documents, such as ID cards and driver’s license, to verify clients’ identities. This is an essential step of the customer onboarding process as this protects organizations from crimes like fraud, corruption, or money laundering. However, this process can be quite time-consuming and labor-intensive if performed manually. After receiving digital copies of clients’ passports or ID cards, the employees of businesses required to follow the Know Your Customer (KYC) standard will have to manually enter all the important information. Imagine how long this will take if a business has to process thousands of identity documents every day.
In order to capture data from passports more efficiently, Optical Character Recognition (OCR) engines were used to convert images into machine-encoded texts. Unfortunately, there are still some shortcomings that cannot be overcome with OCR itself. In this blog post, we will talking about:
Passport and ID cards contain personal information, that businesses need to create new profiles or verify their identities. Some common fields needed include:
Many might not be familiar with the Machine Readable Zone (MRZ) code. The MRZ code is the two lines of texts consisting of numbers and letters that can be found at the bottom of any passports. It was designed to be read by machines and make the identity verification process more efficient and accurate.
In highly regulated industries, knowing enough information about the clients is critical and cannot be omitted in order to avoid criminal activities.
All financial institutions are required by regulatory organizations to conduct KYC procedures before onboarding their clients. To avoid money laundering or other financial crime, creating and running an effective KYC program is paramount and it starts with getting personal information from passports or ID cards of the clients. Automating the process with OCR or intelligent document processing solutions not only provide better customer experience, as clients only have to take a photo of their passports or ID cards instead of typing all the information one by one, but also decrease operational costs associated with manual data entry.
The COVID-19 pandemic has huge economic impacts on the global economy. In early 2020, COVID-19 lockdowns and other precautionary measures taken drove the global economy into crisis. It has created a variety of challenges for everyone and therefore governments over the world have provided relief funds or grants to not only help local businesses and their citizens persevere but more importantly recover to pre-pandemic levels faster.
However, the entire process can take quite long if it is done manually since the applicants will have to enter their personal information and send scanned copies of passports or ID cards. Afterwards, the information will still have to be manually verified and saved to the database. To speed things up, the public sector automates data capture from passports or ID cards with OCR and other AI technologies so that applications can be processed much faster.
Criminals can have devices or numbers registered under others’ names and perform criminal activities without leaving any traces. As a result, it is the telecom service providers’ responsibility to verify clients’ identity. This is usually done by asking clients who are getting new phone numbers to submit scanned copies of their passports or ID cards.
There are some other industries, such as healthcare, travel, insurance, etc., that need to verify clients’ identity or get more information from them for further analysis. To do this more efficiently, the old approach was to adopt OCR technology.
Optical character recognition (OCR) replaces manual data entry by extracting printed or written texts from images or scanned documents. These texts are then converted into machine-readable or machine-encoded formats that can be edited or processed. However, the problem with OCR data capture is that OCR engines do not understand the context and therefore fail to organize the information in a systematic way. Let’s demonstrate it with an example.
If we use an online OCR tool to capture data from this sample driver license, the result will look like:
For any application or system to make use of data, it usually has to be presented in a structured way, most commonly as JSON or CSV files. Therefore, developers have combined OCR with other technologies, such as machine learning, natural language processing, and more, to intelligently recognize, extract, and organize information from passports, ID cards, and other types of identity documents.
The FormX platform allows you to build OCR extraction models for identity documents using a user-friendly web portal. All you have to do is upload a master image and annotate the fields you want. An API that can parse photos of IDs into JSON or Excel will be ready right away.
Step 1: Sign up at FormX.ai
You can create an account at https://auth.formextractorai.com/signup
Step 2: Select from the pre-built IDs
FormX provides extractor for a wide range of IDs around the world, including international passports and personal IDs from Hong Kong, Singapore, Taiwan, Macau, etc.
If the desired document is not available, you can create your own easily.
Step 3: Create new form
Go to the “Form List”, click “Add New Form” and “My document has a fixed format”
Step 4: Upload the Master Image
You can upload an image as the Master. This image should be well lit, clear and without noises. We recommend using a scanned image here.
Step 5: Annotate the Anchor Regions
Select the “Anchor Region” tool (in red) from the toolbar and mark the anchors on the template features on the images. You can use the headings, keys, and image elements that are consistent in the documents as the anchors. Mark at least 3 anchors, the more the better. Ideally, they should span across the 4 corners in the image.
These anchors are used by FormX to locate the document position from the incoming images, and fixing the rotation, and perspective. Pre-processing model is applied on the documents to maximize the performance of data extraction on mobile-captured images.
Step 6: Annotate the Detection Regions
Next, you can select the “Detection Region” tool (in purple) from the toolbar and mark the region containing the values you want to extract.
Then choose the data type from the drop down and give a name to the field.
Step 7: Test the extractor
After annotating all the regions, save the form and test it in the “Test” tab. Upload a document photo and you will see the data are extracted and presented in JSON format below.
Step 8: Obtain Form ID and Access Token
Copy the Form ID and Access Token from the “API” tab.
Step 9: Process the image with the API
The extractor can be integrated with other software using the RESTful API and enrich the automation workflow. Send the image to the API endpoint “https://worker.formextractorai.com/extract” with the Form ID and Access Token. Then, in the API response, you will see the extracted information.
Example with cURL
Example with Python
Collecting and having your data in order is the first step towards digital transformation, which is now a necessity rather than a competitive advantage. FormX has helped a variety of businesses automate their data extraction from different documents, including passports, ID cards, business certificates, receipts, and more, so that they can provide better customer experience, improve operational efficiency, and even reduce costs.
Sign up for a free account or contact us to see how FormX can help you intelligently digitize your passports and ID cards.