Skip to content

Text Similarity

Compare two Chinese words or sentences and return similarity score.

Applicable scenarios

Applicable to search engines, recommendation systems, machine translation, automatic response, named entity recognition, spelling error correction and other scenarios.

API reference

The API supports two input modes: single text or text pair.

Single text mode

With a single text as input, it returns the feature vectors of the text. You need to maintain a vector retrieval system. This is applicable to search or callback scenarios.

  • HTTP request method: POST

  • Request body parameters

Name Type Required Description
text String Yes Text data
  • Example JSON request
{
  "text": "Test"
}
  • Response parameters
Name Type Description
result List List with 768 parameters for a 768-dimensional text vector
  • Example JSON response
{
    "result": [
        0.025645000860095024, 
        0.001914000022225082, 
        0.007929000072181225, 
        ...
    ]
}

Text pair mode

With text pair as input, it returns the cosine similarity of two texts. This is applicable to similarity comparison.

  • HTTP request method: POST

  • Request body parameters

Name Type Required Description
text_1 String Text data.
text_2 String Text data.
  • Example JSON request
{
  "text_1": "Test1",
  "text_2": "Test2"
}
  • Response parameters
Name Type Description
similarity Float Cosine similarity of the text pair, which is a Float value between 0 and 1. The closer it is to 1, the more similar the text pair is.
  • Example JSON response
{
    "similarity": 0.95421
}

API test

You can use the following tools (API explorer, Postman, cURL, Python, Java) to test calling APIs.

API Explorer

Prerequisites

When deploying the solution, you need to:

  • set the parameter API Explorer to yes.
  • set the parameter API Gateway Authorization to NONE.

Otherwise, you can only view the API definitions in the API explorer, but cannot test calling API online.

Steps

  1. Sign in to the AWS CloudFormation console.
  2. On the Stacks page, select the solution’s root stack. Do not select the NESTED stack.

  3. Choose the Outputs tab, and find the URL for APIExplorer.

  4. Click the URL to access the API explorer. The APIs that you have selected during deployment will be displayed.

  5. For the API you want to test, click the down arrow to display the request method.

  6. Choose the Try it out button, and enter the correct Body data to test API and check the test result.
  7. Make sure the format is correct, and choose Execute.
  8. Check the returned result in JSON format in the Responses body. If needed, copy or download the result.
  9. Check the Response headers.
  10. (Optional) Choose Clear next to the Execute button to clear the request body and responses.

Postman (AWS_IAM Authentication)

  1. Sign in to the AWS CloudFormation console.
  2. On the Stacks page, select the solution’s root stack.
  3. Choose the Outputs tab, and find the URL with the prefix GeneralOCR.
  4. Create a new tab in Postman. Paste the URL into the address bar, and select POST as the HTTP call method.

  5. Open the Authorization configuration, select Amazon Web Service Signature from the drop-down list, and enter the AccessKey, SecretKey and Amazon Web Service Region of the corresponding account (such as cn-north-1 or cn-northwest-1 ).

  6. Open the Body configuration item and select the raw and JSON data types.

  7. Enter the test data in the Body, and click the Send button to see the corresponding return results.
{
  "url": "Image URL address"
}

cURL

  • Windows
curl --location --request POST "https://[API_ID].execute-api.[AWS_REGION].amazonaws.com/[STAGE]/text_similarity" ^
--header "Content-Type: application/json" ^
--data-raw "{\"url\": \"Image URL address\"}"
  • Linux/MacOS
curl --location --request POST 'https://[API_ID].execute-api.[AWS_REGION].amazonaws.com/[STAGE]/text_similarity' \
--header 'Content-Type: application/json' \
--data-raw '{
  "url":"Image URL address"
}'

Python (AWS_IAM Authentication)

import requests
import json
from aws_requests_auth.boto_utils import BotoAWSRequestsAuth

auth = BotoAWSRequestsAuth(aws_host='[API_ID].execute-api.[AWS_REGION].amazonaws.com',
                           aws_region='[AWS_REGION]',
                           aws_service='execute-api')

url = 'https://[API_ID].execute-api.[AWS_REGION].amazonaws.com/[STAGE]/text_similarity'
payload = {
    'url': 'Image URL address'
}
response = requests.request("POST", url, data=json.dumps(payload), auth=auth)
print(json.loads(response.text))

Python (NONE Authentication)

import requests
import json

url = "https://[API_ID].execute-api.[AWS_REGION].amazonaws.com/[STAGE]/text_similarity"

payload = json.dumps({
  "url": "Image URL address"
})
headers = {
  'Content-Type': 'application/json'
}

response = requests.request("POST", url, headers=headers, data=payload)

print(response.text)

Java

OkHttpClient client = new OkHttpClient().newBuilder()
  .build();
MediaType mediaType = MediaType.parse("application/json");
RequestBody body = RequestBody.create(mediaType, "{\n  \"url\":\"Image URL address\"\n}");
Request request = new Request.Builder()
  .url("https://xxxxxxxxxxx.execute-api.xxxxxxxxx.amazonaws.com/[STAGE]/text_similarity")
  .method("POST", body)
  .addHeader("Content-Type", "application/json")
  .build();
Response response = client.newCall(request).execute();

Cost Estimation

You are responsible for the cost of using each Amazon Web Services service when running the solution. As of this revision, the main cost factors affecting the solution include.

  • AWS Lambda invocations
  • AWS Lambda running time
  • Amazon API Gateway calls
  • Amazon API Gateway data output
  • Amazon CloudWatch Logs storage
  • Amazon Elastic Container Registry storage

Cost estimation example 1

In AWS China (Ningxia) Region operated by NWCD (cn-northwest-1), in 1 seconds

The cost of using this solution to process the text is shown below:

Service Dimensions Cost
AWS Lambda 1 million invocations ¥1.36
AWS Lambda 8192MB memory, 1 seconds run each time ¥907.8
Amazon API Gateway 1 million invocations ¥28.94
Amazon API Gateway 100KB data output each time, ¥0.933/GB ¥93.3
Amazon CloudWatch Logs 10KB each time, ¥6.228/GB ¥62.28
Amazon Elastic Container Registry 0.5GB storage, ¥0.69/GB each month ¥0.35
Total ¥1010.06

Cost estimation example 2

In US East (Ohio) Region (us-east-2), in 1 seconds

The cost of using this solution to process this text is shown below:

Service Dimensions Cost
AWS Lambda 1 million invocations $0.20
AWS Lambda 8192MB memory, 1 seconds run each time $133.3
Amazon API Gateway 1 million invocations $3.5
Amazon API Gateway 100KB data output each time, $0.09/GB $9
Amazon CloudWatch Logs 10KB each time, $0.50/GB $5
Amazon Elastic Container Registry 0.5GB存储,$0.1/GB each month $0.05
Total $142.95

Uninstall the deployment

You can uninstall the Text Similarity feature via Amazon CloudFormation as described in Add or remove AI features and make sure the TextSimilarity parameter is set to no in the parameters section.

Note

Time to uninstall the deployment is approximately: 10 Minutes