Automation

How to Capture Data from Passport and ID Cards with OCR & AI?

A complete guide with all you need to know about capturing data from passports and ID cards with OCR & AI.

Last updated:
December 5, 2022

Passport is one of the most common documents used for identity verification. More businesses are required to capture data from passports or other identity documents, such as ID cards and driver’s license, to verify clients’ identities. This is an essential step of the customer onboarding process as this protects organizations from crimes like fraud, corruption, or money laundering. However, this process can be quite time-consuming and labor-intensive if performed manually. After receiving digital copies of clients’ passports or ID cards, the employees of businesses required to follow the Know Your Customer (KYC) standard will have to manually enter all the important information. Imagine how long this will take if a business has to process thousands of identity documents every day. 

In order to capture data from passports more efficiently, Optical Character Recognition (OCR) engines were used to convert images into machine-encoded texts. Unfortunately, there are still some shortcomings that cannot be overcome with OCR itself. In this blog post, we will talking about:

What Are Some of the Use Cases of Passport and ID OCR?

Passport and ID cards contain personal information,  that businesses need to create new profiles or verify their identities. Some common fields needed include:

  • Full Name
  • Nationality
  • Date of Birth
  • Place of Birth
  • Gender
  • Date of Issue
  • Expiry Date
  • Location of Issue
  • Document / Passport Number
  • Social Security Number
  • Machine Readable Zone Code

Many might not be familiar with the Machine Readable Zone (MRZ) code. The MRZ code is the two lines of texts consisting of numbers and letters that can be found at the bottom of any passports. It was designed to be read by machines and make the identity verification process more efficient and accurate.

A sample European passport with the machine readable zone code highlighted

In highly regulated industries, knowing enough information about the clients is critical and cannot be omitted in order to avoid criminal activities.

Financial Institutions

All financial institutions are required by regulatory organizations to conduct KYC procedures before onboarding their clients. To avoid money laundering or other financial crime, creating and running an effective KYC program is paramount and it starts with getting personal information from passports or ID cards of the clients. Automating the process with OCR or intelligent document processing solutions not only provide better customer experience, as clients only have to take a photo of their passports or ID cards instead of typing all the information one by one, but also decrease operational costs associated with manual data entry.

Public Sector

The COVID-19 pandemic has huge economic impacts on the global economy. In early 2020, COVID-19 lockdowns and other precautionary measures taken drove the global economy into crisis. It has created a variety of challenges for everyone and therefore governments over the world have provided relief funds or grants to not only help local businesses and their citizens persevere but more importantly recover to pre-pandemic levels faster.

However, the entire process can take quite long if it is done manually since the applicants will have to enter their personal information and send scanned copies of passports or ID cards. Afterwards, the information will still have to be manually verified and saved to the database. To speed things up, the public sector automates data capture from passports or ID cards with OCR and other AI technologies so that applications can be processed much faster.

Telecommunications

Criminals can have devices or numbers registered under others’ names and perform criminal activities without leaving any traces. As a result, it is the telecom service providers’ responsibility to verify clients’ identity. This is usually done by asking clients who are getting new phone numbers to submit scanned copies of their passports or ID cards.

There are some other industries, such as healthcare,  travel, insurance, etc., that need to verify clients’ identity or get more information from them for further analysis. To do this more efficiently, the old approach was to adopt OCR technology.

How Does ID OCR Work and Why Is It Not Enough?

Optical character recognition (OCR) replaces manual data entry by extracting printed or written texts from images or scanned documents. These texts are then converted into machine-readable or machine-encoded formats that can be edited or processed. However, the problem with OCR data capture is that OCR engines do not understand the context and therefore fail to organize the information in a systematic way. Let’s demonstrate it with an example. 

A sample California driver's license.

If we use an online OCR tool to capture data from this sample driver license, the result will look like:

CaliforniausA DRIVER LICENSE
DL 11234562 EXP 08/31/2014 LN SAMPLE FNALEXANDER JOSEPH 2570 24TH STREET ANYTOWN.CA 95818 11-1111 Doe 08/31 /1977 RSTR NONE
FEDERAL LIMITS APPLY
Ct/tic J'aire-e--
SEX M HAIR ELK EYES ERN HGT 5•08" WGT 150 lb ISS DD 013/00/0000NNNANANFONY 08/31/2009

For any application or system to make use of data, it usually has to be presented in a structured way, most commonly as JSON or CSV files. Therefore, developers have combined OCR with other technologies, such as machine learning, natural language processing, and more, to intelligently recognize, extract, and organize information from passports, ID cards, and other types of identity documents. 

Easily Capture Data from Passport and Other ID Documents with FormX

Scheudle a demo with us and learn more

Get demo

How to Automate Data Capture From Passport and ID Cards With Formx Within a Few Minutes?

Aside from allowing our users to train their own extractors, FormX has also provided several pre-trained extractors for you to use right away. All you have to do is select the corresponding extractor, test it out with some sample images, and integrate with your software/application via API to easily establish an automated passport/ID cards processing workflow.

Step 1: Sign up at FormX.ai

You can create an account at https://auth.formextractorai.com/signup

Step 2: Create an extractor

After creating an account, you can then create different types of extractors based on your needs. FormX provides a set of pre-built extractors and also allows you to train your own extractor by providing sample images and marking the areas where the desired information is located.

Step 3: Select “Government ID / Passport” as your extractor

We’ve pre-trained an extractor allowing our users to extract data from a variety of national IDs, driver’s licenses, and passports.

Step 4: Test your extractor

After selecting your extractor, upload a sample image to test it out. You’ll be able to see the result along with the JSON output.

Step 5: Obtain Form ID and Access Token

Copy the Form ID and Access Token from the “Extract” tab.

Step 7: Process the image with the API

The extractor can be integrated with other software using the RESTful API and enrich the automation workflow. Send the image to the API endpoint *“https://worker.formextractorai.com/extract”* with the Form ID and Access Token. Then, in the API response, you will see the extracted information.

Example with cURL

```
curl -X POST \
https://worker.formextractorai.com/extract \
-H 'Content-Type: image/jpeg' \
-H 'X-WORKER-FORM-ID: REPLACE-YOUR-FORM-ID-HERE' \
-H 'X-WORKER-TOKEN: REPLACE-YOUR-WORKER-TOKEN-HERE' \
--data-binary "@/path/to/query/image.jpg"
```

Example with Python

```
import requests
url = "https://worker.formextractorai.com/extract"
payload=open('FILE_PATH_TO_IMAGE', 'rb')
headers = {
'X-WORKER-TOKEN': 'ACCESS_TOKEN',
'X-WORKER-FORM-ID': 'FORM_ID',
'Content-Type': 'image/jpeg'
}
response = requests.request("POST", url, headers=headers, data=payload)
print(response.text)
```

Empower Your Business With Automated Data Extraction

Collecting and having your data in order is the first step towards digital transformation, which is now a necessity rather than a competitive advantage. FormX has helped a variety of businesses automate their data extraction from different documents, including passports, ID cards, business certificates, receipts, and more, so that they can provide better customer experience, improve operational efficiency, and even reduce costs. 

Sign up for a free account or contact us to see how FormX can help you intelligently digitize your passports and ID cards. 

Extract data from these documents
Ready to get started?
Schedule a demo
Invoice
Receipts
Purchase Orders
Bank Statements
Contracts & Agreements
HR Forms & Applications
Shipping Orders & Delivery Notes
Loyalty Members Applications
Annual Reports
Business Certificates
Personnel Licenses
And much more!