Automation

How to Capture Data from Passport and ID Cards with OCR & AI?

A complete guide with all you need to know about capturing data from passports and ID cards with OCR & AI.

Published on
March 23, 2022

Passport is one of the most common documents used for identity verification. More businesses are required to capture data from passports or other identity documents, such as ID cards and driver’s license, to verify clients’ identities. This is an essential step of the customer onboarding process as this protects organizations from crimes like fraud, corruption, or money laundering. However, this process can be quite time-consuming and labor-intensive if performed manually. After receiving digital copies of clients’ passports or ID cards, the employees of businesses required to follow the Know Your Customer (KYC) standard will have to manually enter all the important information. Imagine how long this will take if a business has to process thousands of identity documents every day. 

In order to capture data from passports more efficiently, Optical Character Recognition (OCR) engines were used to convert images into machine-encoded texts. Unfortunately, there are still some shortcomings that cannot be overcome with OCR itself. In this blog post, we will talking about:

What Are Some of the Use Cases of Passport and ID OCR?

Passport and ID cards contain personal information,  that businesses need to create new profiles or verify their identities. Some common fields needed include:

  • Full Name
  • Nationality
  • Date of Birth
  • Place of Birth
  • Gender
  • Date of Issue
  • Expiry Date
  • Location of Issue
  • Document / Passport Number
  • Social Security Number
  • Machine Readable Zone Code

Many might not be familiar with the Machine Readable Zone (MRZ) code. The MRZ code is the two lines of texts consisting of numbers and letters that can be found at the bottom of any passports. It was designed to be read by machines and make the identity verification process more efficient and accurate.

A sample European passport with the machine readable zone code highlighted

In highly regulated industries, knowing enough information about the clients is critical and cannot be omitted in order to avoid criminal activities.

Financial Institutions

All financial institutions are required by regulatory organizations to conduct KYC procedures before onboarding their clients. To avoid money laundering or other financial crime, creating and running an effective KYC program is paramount and it starts with getting personal information from passports or ID cards of the clients. Automating the process with OCR or intelligent document processing solutions not only provide better customer experience, as clients only have to take a photo of their passports or ID cards instead of typing all the information one by one, but also decrease operational costs associated with manual data entry.

Public Sector

The COVID-19 pandemic has huge economic impacts on the global economy. In early 2020, COVID-19 lockdowns and other precautionary measures taken drove the global economy into crisis. It has created a variety of challenges for everyone and therefore governments over the world have provided relief funds or grants to not only help local businesses and their citizens persevere but more importantly recover to pre-pandemic levels faster.

However, the entire process can take quite long if it is done manually since the applicants will have to enter their personal information and send scanned copies of passports or ID cards. Afterwards, the information will still have to be manually verified and saved to the database. To speed things up, the public sector automates data capture from passports or ID cards with OCR and other AI technologies so that applications can be processed much faster.

Telecommunications

Criminals can have devices or numbers registered under others’ names and perform criminal activities without leaving any traces. As a result, it is the telecom service providers’ responsibility to verify clients’ identity. This is usually done by asking clients who are getting new phone numbers to submit scanned copies of their passports or ID cards.

There are some other industries, such as healthcare,  travel, insurance, etc., that need to verify clients’ identity or get more information from them for further analysis. To do this more efficiently, the old approach was to adopt OCR technology.

How Does ID OCR Work and Why Is It Not Enough?

Optical character recognition (OCR) replaces manual data entry by extracting printed or written texts from images or scanned documents. These texts are then converted into machine-readable or machine-encoded formats that can be edited or processed. However, the problem with OCR data capture is that OCR engines do not understand the context and therefore fail to organize the information in a systematic way. Let’s demonstrate it with an example. 

A sample California driver's license.

If we use an online OCR tool to capture data from this sample driver license, the result will look like:

CaliforniausA DRIVER LICENSE
DL 11234562 EXP 08/31/2014 LN SAMPLE FNALEXANDER JOSEPH 2570 24TH STREET ANYTOWN.CA 95818 11-1111 Doe 08/31 /1977 RSTR NONE
FEDERAL LIMITS APPLY
Ct/tic J'aire-e--
SEX M HAIR ELK EYES ERN HGT 5•08" WGT 150 lb ISS DD 013/00/0000NNNANANFONY 08/31/2009

For any application or system to make use of data, it usually has to be presented in a structured way, most commonly as JSON or CSV files. Therefore, developers have combined OCR with other technologies, such as machine learning, natural language processing, and more, to intelligently recognize, extract, and organize information from passports, ID cards, and other types of identity documents. 

Easily Capture Data from Passport and Other ID Documents with FormX

Scheudle a demo with us and learn more

Get demo

How to Automate Data Capture From Passport and ID Cards With Formx Within a Few Minutes?

The FormX platform allows you to build OCR extraction models for identity documents using a user-friendly web portal. All you have to do is upload a master image and annotate the fields you want. An API that can parse photos of IDs into JSON or Excel will be ready right away.

Step 1: Sign up at FormX.ai

You can create an account at https://auth.formextractorai.com/signup

Step 2: Select from the pre-built IDs

FormX provides extractor for a wide range of IDs around the world, including international passports and personal IDs from Hong Kong, Singapore, Taiwan, Macau, etc.

If the desired document is not available, you can create your own easily.

Step 3: Create new form

Go to the “Form List”, click “Add New Form” and “My document has a fixed format”

Step 4: Upload the Master Image

You can upload an image as the Master. This image should be well lit, clear and without noises. We recommend using a scanned image here. 

Step 5: Annotate the Anchor Regions

Select the “Anchor Region” tool (in red) from the toolbar and mark the anchors on the template features on the images. You can use the headings, keys, and image elements that are consistent in the documents as the anchors. Mark at least 3 anchors, the more the better. Ideally, they should span across the 4 corners in the image. 

These anchors are used by FormX to locate the document position from the incoming images, and fixing the rotation, and perspective. Pre-processing model is applied on the documents to maximize the performance of data extraction on mobile-captured images.

Step 6: Annotate the Detection Regions

Next, you can select the “Detection Region” tool (in purple) from the toolbar and mark the region containing the values you want to extract.

Then choose the data type from the drop down and give a name to the field.

Step 7: Test the extractor

After annotating all the regions, save the form and test it in the “Test” tab. Upload a document photo and you will see the data are extracted and presented in JSON format below.

Step 8: Obtain Form ID and Access Token

Copy the Form ID and Access Token from the “API” tab.

Step 9: Process the image with the API

The extractor can be integrated with other software using the RESTful API and enrich the automation workflow. Send the image to the API endpoint “https://worker.formextractorai.com/extract” with the Form ID and Access Token. Then, in the API response, you will see the extracted information.

Example with cURL

```
curl -X POST \
https://worker.formextractorai.com/extract \
-H 'Content-Type: image/jpeg' \
-H 'X-WORKER-FORM-ID: REPLACE-YOUR-FORM-ID-HERE' \
-H 'X-WORKER-TOKEN: REPLACE-YOUR-WORKER-TOKEN-HERE' \
--data-binary "@/path/to/query/image.jpg"
```

Example with Python

```
import requests
url = "https://worker.formextractorai.com/extract"
payload=open('FILE_PATH_TO_IMAGE', 'rb')
headers = {
'X-WORKER-TOKEN': 'ACCESS_TOKEN',
'X-WORKER-FORM-ID': 'FORM_ID',
'Content-Type': 'image/jpeg'
}
response = requests.request("POST", url, headers=headers, data=payload)
print(response.text)
```

Empower Your Business With Automated Data Extraction

Collecting and having your data in order is the first step towards digital transformation, which is now a necessity rather than a competitive advantage. FormX has helped a variety of businesses automate their data extraction from different documents, including passports, ID cards, business certificates, receipts, and more, so that they can provide better customer experience, improve operational efficiency, and even reduce costs. 

Sign up for a free account or contact us to see how FormX can help you intelligently digitize your passports and ID cards. 

Extract data from these documents
Ready to get started?
Schedule a demo
Invoice
Receipts
Purchase Orders
Bank Statements
Contracts & Agreements
HR Forms & Applications
Shipping Orders & Delivery Notes
Loyalty Members Applications
Annual Reports
Business Certificates
Personnel Licenses
And much more!