Power Up Document Classification with Machine Learning
Automatic document classification leverages AI technologies to automatically recognize and categorize documents, making them easier to manage, search, and processed.
A complete guide with all you need to know about capturing data from passports and ID cards with OCR & AI.
Passport is one of the most common documents used for identity verification. More businesses are required to capture data from passports or other identity documents, such as ID cards and driver’s license, to verify clients’ identities. This is an essential step of the customer onboarding process as this protects organizations from crimes like fraud, corruption, or money laundering. However, this process can be quite time-consuming and labor-intensive if performed manually. After receiving digital copies of clients’ passports or ID cards, the employees of businesses required to follow the Know Your Customer (KYC) standard will have to manually enter all the important information. Imagine how long this will take if a business has to process thousands of identity documents every day.
In order to capture data from passports more efficiently, Optical Character Recognition (OCR) engines were used to convert images into machine-encoded texts. Unfortunately, there are still some shortcomings that cannot be overcome with OCR itself. In this blog post, we will talking about:
Passport and ID cards contain personal information, that businesses need to create new profiles or verify their identities. Some common fields needed include:
Many might not be familiar with the Machine Readable Zone (MRZ) code. The MRZ code is the two lines of texts consisting of numbers and letters that can be found at the bottom of any passports. It was designed to be read by machines and make the identity verification process more efficient and accurate.
In highly regulated industries, knowing enough information about the clients is critical and cannot be omitted in order to avoid criminal activities.
All financial institutions are required by regulatory organizations to conduct KYC procedures before onboarding their clients. To avoid money laundering or other financial crime, creating and running an effective KYC program is paramount and it starts with getting personal information from passports or ID cards of the clients. Automating the process with OCR or intelligent document processing solutions not only provide better customer experience, as clients only have to take a photo of their passports or ID cards instead of typing all the information one by one, but also decrease operational costs associated with manual data entry.
The COVID-19 pandemic has huge economic impacts on the global economy. In early 2020, COVID-19 lockdowns and other precautionary measures taken drove the global economy into crisis. It has created a variety of challenges for everyone and therefore governments over the world have provided relief funds or grants to not only help local businesses and their citizens persevere but more importantly recover to pre-pandemic levels faster.
However, the entire process can take quite long if it is done manually since the applicants will have to enter their personal information and send scanned copies of passports or ID cards. Afterwards, the information will still have to be manually verified and saved to the database. To speed things up, the public sector automates data capture from passports or ID cards with OCR and other AI technologies so that applications can be processed much faster.
Criminals can have devices or numbers registered under others’ names and perform criminal activities without leaving any traces. As a result, it is the telecom service providers’ responsibility to verify clients’ identity. This is usually done by asking clients who are getting new phone numbers to submit scanned copies of their passports or ID cards.
There are some other industries, such as healthcare, travel, insurance, etc., that need to verify clients’ identity or get more information from them for further analysis. To do this more efficiently, the old approach was to adopt OCR technology.
Optical character recognition (OCR) replaces manual data entry by extracting printed or written texts from images or scanned documents. These texts are then converted into machine-readable or machine-encoded formats that can be edited or processed. However, the problem with OCR data capture is that OCR engines do not understand the context and therefore fail to organize the information in a systematic way. Let’s demonstrate it with an example.
If we use an online OCR tool to capture data from this sample driver license, the result will look like:
For any application or system to make use of data, it usually has to be presented in a structured way, most commonly as JSON or CSV files. Therefore, developers have combined OCR with other technologies, such as machine learning, natural language processing, and more, to intelligently recognize, extract, and organize information from passports, ID cards, and other types of identity documents.
Aside from allowing our users to train their own extractors, FormX has also provided several pre-trained extractors for you to use right away. All you have to do is select the corresponding extractor, test it out with some sample images, and integrate with your software/application via API to easily establish an automated passport/ID cards processing workflow.
Step 1: Sign up at FormX.ai
You can create an account at https://auth.formextractorai.com/signup
Step 2: Create an extractor
After creating an account, you can then create different types of extractors based on your needs. FormX provides a set of pre-built extractors and also allows you to train your own extractor by providing sample images and marking the areas where the desired information is located.
Step 3: Select “Government ID / Passport” as your extractor
We’ve pre-trained an extractor allowing our users to extract data from a variety of national IDs, driver’s licenses, and passports.
Step 4: Test your extractor
After selecting your extractor, upload a sample image to test it out. You’ll be able to see the result along with the JSON output.
Step 5: Obtain Form ID and Access Token
Copy the Form ID and Access Token from the “Extract” tab.
Step 7: Process the image with the API
The extractor can be integrated with other software using the RESTful API and enrich the automation workflow. Send the image to the API endpoint *“https://worker.formextractorai.com/extract”* with the Form ID and Access Token. Then, in the API response, you will see the extracted information.
Example with cURL
Example with Python
Collecting and having your data in order is the first step towards digital transformation, which is now a necessity rather than a competitive advantage. FormX has helped a variety of businesses automate their data extraction from different documents, including passports, ID cards, business certificates, receipts, and more, so that they can provide better customer experience, improve operational efficiency, and even reduce costs.
Sign up for a free account or contact us to see how FormX can help you intelligently digitize your passports and ID cards.