Custom OCR

You can customize the OCR template, extract the structured text information in cards and tickets, and display the results in the key-value format.

Applicable scenarios

Applicable to structured recognition of cards and tickets, such as logistics documents, invoices, business licenses, itineraries, and train tickets.

API reference

Add templates

Before text recognition, you need to create a template through the Add Template API. In order to improve the accuracy of text recognition, the image for template creation should have similar details to the recognized image and clear text content. The same template can contain multiple recognition areas.

To create a template, each area to be recognized needs to specify the four coordinate points of the rectangular frame and the name of the area. You can use common image processing software such as GIMP to assist in obtaining coordinate points.

The following describes how to create a template:

Go to Tools.
Click Select Local Image.
Move the mouse over the image, click the top left corner of the content to be recognized, then slide the mouse to the bottom right corner, release the mouse and enter the corresponding logo name in the pop-up dialog box.
Repeat step 3 if there are multiple contents to be recognized.
Click Copy Result to Clipboard, this content is request Body, create a custom template, and record the template ID after successful creation.
After creating the template, first conduct a text recognition test with the original image and template ID to ensure that the template can accurately recognize the required information.
(Optional) If you find that the extracted information is incomplete, please confirm that the coordinate points are marked correctly and recreate the template by expanding the recognition area appropriately.

Important

The rectangular box area must completely cover the text content to be recognized. Leave enough space for error tolerance on all sides of the recognition area without covering other recognition rectangular areas for accurate recognition.

The following describe the API reference:

HTTP request method: POST
Request body parameters

Name	Type	Required	Description
url	String	Choose url or img.	Image URL address, which supports HTTP/HTTPS and S3 protocols. Supported image formats are jpg/jpeg/png/bmp, with the longest side not exceeding 4096px.
img	String	Choose url or img.	Base64-encoded image data.
type	String	Fixed value is `add`.
template	List	Each element corresponds to the coordinates of an area to be extracted and its name.

Example JSON request

{
    "type": "add", 
    "url": "Image URL address", 
    "template": [
        [
            [[421, 465], [909, 471], [911, 503], [419, 495]], "名称"
        ], 
        [
            [[419, 495], [911, 503], [909, 533], [415, 527]], "识别号"
        ], 
        [
            [[345, 339], [595, 343], [583, 397], [341, 385]], "发票号"
        ]
    ]
}

{
  "type": "add", 
  "img": "Base64-encoded image data",
  "template": [
        [
            [[421, 465], [909, 471], [911, 503], [419, 495]], "名称"
        ], 
        [
            [[419, 495], [911, 503], [909, 533], [415, 527]], "识别号"
        ], 
        [
            [[345, 339], [595, 343], [583, 397], [341, 385]], "发票号"
        ]
    ]
}

Response parameters

Name	Type	Description
template_id	String	Template ID

Example JSON response

{
    "template_id": "模板的ID",
}

Content recognition

After the template is created, you can use the corresponding template ID to perform text recognition on the image, and the returned value is the name and text content of the recognized area in the template.

HTTP request method: POST
Request body parameters

Name	Type	Required	Description
url	String	Choose url or img.	Image URL address, which supports HTTP/HTTPS and S3 protocols. Supported image formats are jpg/jpeg/png/bmp, with the longest side not exceeding 4096px.
img	String	Choose url or img.	Base64 encoded image data.
type	String	Fixed value is `query`.
template_id	String	Existing template ID.

Example JSON request

{
  "template_id": "已存在的模板ID", 
  "url": "Image URL address"
}

Response parameters

Name	Type	Description
key	String	Field name.
value	String	Extracted value.
score	Float	Confidence score.

Example JSON response

[
    {
        "key": "名称", 
        "value": "亚马逊通技术服务(北京)有限公司", 
        "score": 97.98
    }, 
    {
        "key": "识别号", 
        "value": "91110116592334142D", 
        "score": 99.62
    }, 
    {
        "key": "发票号", 
        "value": "4403212222", 
        "score": 96.58
    }
]

Remove templates

If you need to delete a template, you can delete it by specifying the template ID to be deleted. Note that a template cannot be recovered after it has been deleted.

HTTP request method: POST
Request body parameters

Name	Type	Required	Description
template_id	List	Existing template ID.
type	String	Fixed value is `del`.

Example JSON request

{
    "type": "del", 
    "template_id": "已存在模板ID"
}

Response parameters

Name	Type	Description
template_id	String	Removed template ID.

Example JSON response

{
    "template_id": "已删除模板的ID",
}

List all templates

The created templates can be listed by ID.

HTTP request method: POST
Request body parameters

Name	Type	Description
type	String	Fixed value is `list`.

Example JSON request

{
    "type": "list", 
}

Response parameters

Name	Type	Description
template_id_list	List	List of existing templates.

Example JSON response

{
    "template_id_list": ["已存在模板的列表"],
}

API test

You can use the following tools (API explorer, Postman, cURL, Python, Java) to test calling APIs.

API Explorer

Prerequisites

When deploying the solution, you need to：

set the parameter API Explorer to yes.
set the parameter API Gateway Authorization to NONE.

Otherwise, you can only view the API definitions in the API explorer, but cannot test calling API online.

Steps

Sign in to the AWS CloudFormation console.
On the Stacks page, select the solution’s root stack. Do not select the NESTED stack.
Choose the Outputs tab, and find the URL for APIExplorer.
Click the URL to access the API explorer. The APIs that you have selected during deployment will be displayed.
For the API you want to test, click the down arrow to display the request method.
Choose the Try it out button, and enter the correct Body data to test API and check the test result.
Make sure the format is correct, and choose Execute.
Check the returned result in JSON format in the Responses body. If needed, copy or download the result.
Check the Response headers.
(Optional) Choose Clear next to the Execute button to clear the request body and responses.

Postman (AWS_IAM Authentication）

Sign in to the AWS CloudFormation console.
On the Stacks page, select the solution’s root stack.
Choose the Outputs tab, and find the URL with the prefix GeneralOCR.
Create a new tab in Postman. Paste the URL into the address bar, and select POST as the HTTP call method.
Open the Authorization configuration, select Amazon Web Service Signature from the drop-down list, and enter the AccessKey, SecretKey and Amazon Web Service Region of the corresponding account (such as cn-north-1 or cn-northwest-1 ).
Open the Body configuration item and select the raw and JSON data types.
Enter the test data in the Body, and click the Send button to see the corresponding return results.

{
  "url": "Image URL address"
}

cURL

Windows

curl --location --request POST "https://[API_ID].execute-api.[AWS_REGION].amazonaws.com/[STAGE]/custom_ocr" ^
--header "Content-Type: application/json" ^
--data-raw "{\"url\": \"Image URL address\"}"

Linux/MacOS

curl --location --request POST 'https://[API_ID].execute-api.[AWS_REGION].amazonaws.com/[STAGE]/custom_ocr' \
--header 'Content-Type: application/json' \
--data-raw '{
  "url":"Image URL address"
}'

Python (AWS_IAM Authentication)

import requests
import json
from aws_requests_auth.boto_utils import BotoAWSRequestsAuth

auth = BotoAWSRequestsAuth(aws_host='[API_ID].execute-api.[AWS_REGION].amazonaws.com',
                           aws_region='[AWS_REGION]',
                           aws_service='execute-api')

url = 'https://[API_ID].execute-api.[AWS_REGION].amazonaws.com/[STAGE]/custom_ocr'
payload = {
    'url': 'Image URL address'
}
response = requests.request("POST", url, data=json.dumps(payload), auth=auth)
print(json.loads(response.text))

Python (NONE Authentication)

import requests
import json

url = "https://[API_ID].execute-api.[AWS_REGION].amazonaws.com/[STAGE]/custom_ocr"

payload = json.dumps({
  "url": "Image URL address"
})
headers = {
  'Content-Type': 'application/json'
}

response = requests.request("POST", url, headers=headers, data=payload)

print(response.text)

Java

OkHttpClient client = new OkHttpClient().newBuilder()
  .build();
MediaType mediaType = MediaType.parse("application/json");
RequestBody body = RequestBody.create(mediaType, "{\n  \"url\":\"Image URL address\"\n}");
Request request = new Request.Builder()
  .url("https://xxxxxxxxxxx.execute-api.xxxxxxxxx.amazonaws.com/[STAGE]/custom_ocr")
  .method("POST", body)
  .addHeader("Content-Type", "application/json")
  .build();
Response response = client.newCall(request).execute();

Cost estimation

You are responsible for the cost of using each Amazon Web Services service when running the solution. As of this revision, the main cost factors affecting the solution include.

Amazon API Gateway calls
Amazon API Gateway data output
Amazon CloudWatch Logs storage
Amazon Elastic Container Registry storage

If you choose an Amazon Lambda based deployment, the factors also include:

Amazon Lambda invocations
Amazon Lambda running time

If you choose an Amazon SageMaker based deployment, the factors also include:

Amazon SageMaker endpoint node instance type
Amazon SageMaker endpoint node data input
Amazon SageMaker endpoint node data output

Cost estimation example 1

In AWS China (Ningxia) Region operated by NWCD (cn-northwest-1), process an image of 1MB in 1 second

The cost of using this solution to process the image is shown below:

Service	Dimensions	Cost
AWS Lambda	1 million invocations	¥1.36
AWS Lambda	8192MB memory, 1 seconds run each time	¥907.8
Amazon API Gateway	1 million invocations	¥28.94
Amazon API Gateway	100KB data output each time, ¥0.933/GB	¥93.3
Amazon CloudWatch Logs	10KB each time, ¥6.228/GB	¥62.28
Amazon Elastic Container Registry	0.5GB storage, ¥0.69/GB each month	¥0.35
Total		¥1010.06

Cost estimation example 2

In US East (Ohio) Region (us-east-2), process an image of 1MB in 1 seconds

The cost of using this solution to process this image is shown below:

Service	Dimensions	Cost
AWS Lambda	1 million invocations	$0.20
AWS Lambda	8192MB memory, 1 seconds run each time	$133.3
Amazon API Gateway	1 million invocations	$3.5
Amazon API Gateway	100KB data output each time, $0.09/GB	$9
Amazon CloudWatch Logs	10KB each time, $0.50/GB	$5
Amazon Elastic Container Registry	0.5GB存储，$0.1/GB each month	$0.05
Total		$142.95

Uninstall the deployment

You can uninstall the Custom OCR feature via Amazon CloudFormation as described in Add or remove AI features and make sure the CustomOCR parameter is set to no in the parameters section.

Note

Time to uninstall the deployment is approximately: 20 Minutes