generative-ai-cdk-constructs

@cdklabs/generative-ai-cdk-constructs


@cdklabs/generative-ai-cdk-constructs / opensearchserverless / TokenFilterType

Enumeration: TokenFilterType

TokenFilterType defines the available token filters for text analysis. Token filters process tokens after they have been created by the tokenizer. They can modify, add, or remove tokens based on specific rules.

Enumeration Members

CJK_WIDTH

CJK_WIDTH: "cjk_width"

Normalizes CJK width differences by converting all characters to their fullwidth or halfwidth variants


ICU_FOLDING

ICU_FOLDING: "icu_folding"

Applies Unicode folding rules for better text matching


JA_STOP

JA_STOP: "ja_stop"

Removes Japanese stop words from text


KUROMOJI_BASEFORM

KUROMOJI_BASEFORM: "kuromoji_baseform"

Converts inflected Japanese words to their base form


KUROMOJI_PART_OF_SPEECH

KUROMOJI_PART_OF_SPEECH: "kuromoji_part_of_speech"

Tags words with their parts of speech in Japanese text analysis


KUROMOJI_STEMMER

KUROMOJI_STEMMER: "kuromoji_stemmer"

Reduces Japanese words to their stem form


LOWERCASE

LOWERCASE: "lowercase"

Converts all characters to lowercase


NORI_NUMBER

NORI_NUMBER: "nori_number"

Normalizes Korean numbers to regular Arabic numbers


NORI_PART_OF_SPEECH

NORI_PART_OF_SPEECH: "nori_part_of_speech"

Tags words with their parts of speech in Korean text analysis


NORI_READINGFORM

NORI_READINGFORM: "nori_readingform"

Converts Korean text to its reading form