Named Entity Recognition (NER)¶
Rhubarb comes with 50 built-in entities which includes common entities such as LOCATION, EVENT etc. Entities
are available via the Entities class. You can pick and choose which entities to detect and then pass them onto the
run_entity() method.
from rhubarb import DocAnalysis, Entities
da = DocAnalysis(file_path="./test_docs/employee_enrollment.pdf",
boto3_session=session,
pages=[1,3])
resp = da.run_entity(message="Extract all the specified entities from this document.",
entities=[Entities.PERSON, Entities.ADDRESS])
Sample response
{
"output": [
{
"page": 1,
"entities": [
{"PERSON": "Martha C Rivera"},
{"ADDRESS": "8 Any Plaza, 21 Street, Any City, CA 90210"}
]
},
{
"page": 3,
"entities": [
{"PERSON": "Mateo Rivera"},
{"PERSON": "Pat Rivera"},
{"ADDRESS": "8 Any Plaza, 21 Street, Any City, CA 90210"}
]
}
],
"token_usage": {
"input_tokens": 3531,
"output_tokens": 168
}
}
Supported Entities¶
Below is a list of entities that are supported.
Entity |
Description |
|---|---|
|
A physical address, such as ‘100 Main Street, Anytown, USA’ or ‘Suite #12, Building 123’. |
|
An individual’s age, including the quantity and unit of time. |
|
A unique identifier that’s associated with a secret access key; used to sign programmatic AWS requests cryptographically. |
|
A unique identifier that’s associated with an access key. |
|
A three-digit card verification code (CVV) present on VISA, MasterCard, and Discover credit and debit cards. |
|
The expiration date for a credit or debit card. |
|
The number for a credit or debit card. |
|
A date can include a year, month, day, day of week, or time of day. |
|
The number assigned to a driver’s license. |
|
An email address. |
|
An International Bank Account Number has specific formats in each country. |
|
An IPv4 address. |
|
A license plate for a vehicle. |
|
A media access control (MAC) address. |
|
An individual’s name. |
|
An alphanumeric string used as a password. |
|
A phone number. |
|
A four-digit personal identification number (PIN). |
|
A SWIFT code. |
|
A web address. |
|
A user name that identifies an account. |
|
A Vehicle Identification Number (VIN). |
|
A Canadian Health Service Number. |
|
A Canadian Social Insurance Number (SIN). |
|
An Indian Aadhaar number. |
|
An Indian National Rural Employment Guarantee Act (NREGA) number. |
|
An Indian Permanent Account Number. |
|
An Indian Voter ID number. |
|
A UK National Health Service Number. |
|
A UK National Insurance Number (NINO). |
|
A UK Unique Taxpayer Reference (UTR) is a 10-digit number that identifies a taxpayer or a business. |
|
A US bank account number, typically 10 to 12 digits long. |
|
A US bank routing number, typically nine digits long. |
|
A passport number, ranging from six to nine alphanumeric characters. |
|
A US Individual Taxpayer Identification Number (ITIN) is a nine-digit number. |
|
A US Social Security Number (SSN) is a nine-digit number. |
|
A Spanish NIF number (Personal tax ID). |
|
An Italian VAT code number. |
|
Polish PESEL number. |
|
A National Registration Identification Card. |
|
The Australian Business Number (ABN) is a unique 11 digit identifier issued to all entities registered in the Australian Business Register (ABR). |
|
An Australian Company Number is a unique nine-digit number issued by the Australian Securities and Investments Commission to every company registered under the Commonwealth Corporations Act 2001 as an identifier. |
|
The tax file number (TFN) is a unique identifier issued by the Australian Taxation Office to each taxpaying entity. |
|
Medicare number is a unique identifier issued by Australian Government. |
|
A branded product. |
|
An event, such as a festival, concert, election, etc. |
|
Large organizations, such as a government, company, religion, sports team, etc. |
|
Individuals, groups of people, nicknames, fictional characters. |
|
A quantified amount, such as currency, percentages, numbers, bytes, etc. |
|
An official name given to any creation or creative work, such as movies, books, songs, etc. |