generative-ai-cdk-constructs

@cdklabs/generative-ai-cdk-constructs


@cdklabs/generative-ai-cdk-constructs / bedrock / WebCrawlerDataSourceAssociationProps

Interface: WebCrawlerDataSourceAssociationProps

Interface to add a new data source to an existing KB.

Extends

Extended by

Properties

chunkingStrategy?

readonly optional chunkingStrategy: ChunkingStrategy

The chunking stategy to use for splitting your documents or content. The chunks are then converted to embeddings and written to the vector index allowing for similarity search and retrieval of the content.

Default

ChunkingStrategy.DEFAULT

Inherited from

DataSourceAssociationProps.chunkingStrategy


crawlingRate?

readonly optional crawlingRate: number

The max rate at which pages are crawled, up to 300 per minute per host. Higher values will decrease sync time but increase the load on the host.

Default

300

crawlingScope?

readonly optional crawlingScope: CrawlingScope

The scope of the crawling.

Default

- CrawlingScope.DEFAULT

customTransformation?

readonly optional customTransformation: CustomTransformation

The custom transformation strategy to use.

Default

- No custom transformation is used.

Inherited from

DataSourceAssociationProps.customTransformation


dataDeletionPolicy?

readonly optional dataDeletionPolicy: DataDeletionPolicy

The data deletion policy to apply to the data source.

Default

- Sets the data deletion policy to the default of the data source type.

Inherited from

DataSourceAssociationProps.dataDeletionPolicy


dataSourceName?

readonly optional dataSourceName: string

The name of the data source.

Default

- A new name will be generated.

Inherited from

DataSourceAssociationProps.dataSourceName


description?

readonly optional description: string

A description of the data source.

Default

- No description is provided.

Inherited from

DataSourceAssociationProps.description


filters?

readonly optional filters: CrawlingFilters

The filters (regular expression patterns) for the crawling. If there’s a conflict, the exclude pattern takes precedence.

Default

None

kmsKey?

readonly optional kmsKey: IKey

The KMS key to use to encrypt the data source.

Default

- Service owned and managed key.

Inherited from

DataSourceAssociationProps.kmsKey


parsingStrategy?

readonly optional parsingStrategy: ParsingStategy

The parsing strategy to use.

Default

- No Parsing Stategy is used.

Inherited from

DataSourceAssociationProps.parsingStrategy


sourceUrls

readonly sourceUrls: string[]

The source urls in the format https://www.sitename.com. Maximum of 100 URLs.