PII RecognitionΒΆ
Just like Named Entity Recognition (NER), you can also perform PII (Personally Identifiable Information) detection. Infact, the default NER entities,
is just a super-set of PII entities. For example, Entities.NAME
and Entities.SSN
are PII entities.
from rhubarb import DocAnalysis, Entities
da = DocAnalysis(file_path="./test_docs/employee_enrollment.pdf",
boto3_session=session,
pages=[1,3])
resp = da.run_entity(message="Extract all the specified entities from this document.",
entities=[Entities.ADDRESS, Entities.SSN])
Sample output
{
"output": [
{
"page": 1,
"entities": [
{"SSN": "376 12 1987"},
{"ADDRESS": "8 Any Plaza, 21 Street"}
]
},
{
"page": 3,
"entities": [
{"SSN": "791 36 9771"},
{"ADDRESS": "8 Any Plaza, 21 Street"},
{"SSN": "824 26 2211"},
{"ADDRESS": "8 Any Plaza, 21 Street"}
]
}
],
"token_usage": {
"input_tokens": 3534,
"output_tokens": 183
}
}