@cdklabs/generative-ai-cdk-constructs
@cdklabs/generative-ai-cdk-constructs / opensearchserverless / TokenFilterType
TokenFilterType defines the available token filters for text analysis. Token filters process tokens after they have been created by the tokenizer. They can modify, add, or remove tokens based on specific rules.
CJK_WIDTH:
"cjk_width"
Normalizes CJK width differences by converting all characters to their fullwidth or halfwidth variants
ICU_FOLDING:
"icu_folding"
Applies Unicode folding rules for better text matching
JA_STOP:
"ja_stop"
Removes Japanese stop words from text
KUROMOJI_BASEFORM:
"kuromoji_baseform"
Converts inflected Japanese words to their base form
KUROMOJI_PART_OF_SPEECH:
"kuromoji_part_of_speech"
Tags words with their parts of speech in Japanese text analysis
KUROMOJI_STEMMER:
"kuromoji_stemmer"
Reduces Japanese words to their stem form
LOWERCASE:
"lowercase"
Converts all characters to lowercase
NORI_NUMBER:
"nori_number"
Normalizes Korean numbers to regular Arabic numbers
NORI_PART_OF_SPEECH:
"nori_part_of_speech"
Tags words with their parts of speech in Korean text analysis
NORI_READINGFORM:
"nori_readingform"
Converts Korean text to its reading form