@cdklabs/generative-ai-cdk-constructs • Docs
@cdklabs/generative-ai-cdk-constructs / bedrock / WebCrawlerDataSourceAssociationProps
Interface to add a new data source to an existing KB.
readonly
optional
chunkingStrategy:ChunkingStrategy
The chunking stategy to use for splitting your documents or content. The chunks are then converted to embeddings and written to the vector index allowing for similarity search and retrieval of the content.
ChunkingStrategy.DEFAULT
DataSourceAssociationProps
.chunkingStrategy
readonly
optional
crawlingRate:number
The max rate at which pages are crawled, up to 300 per minute per host. Higher values will decrease sync time but increase the load on the host.
300
readonly
optional
crawlingScope:CrawlingScope
The scope of the crawling.
- CrawlingScope.DEFAULT
readonly
optional
customTransformation:CustomTransformation
The custom transformation strategy to use.
- No custom transformation is used.
DataSourceAssociationProps
.customTransformation
readonly
optional
dataDeletionPolicy:DataDeletionPolicy
The data deletion policy to apply to the data source.
- Sets the data deletion policy to the default of the data source type.
DataSourceAssociationProps
.dataDeletionPolicy
readonly
optional
dataSourceName:string
The name of the data source.
- A new name will be generated.
DataSourceAssociationProps
.dataSourceName
readonly
optional
description:string
A description of the data source.
- No description is provided.
DataSourceAssociationProps
.description
readonly
optional
filters:CrawlingFilters
The filters (regular expression patterns) for the crawling. If there’s a conflict, the exclude pattern takes precedence.
None
readonly
optional
kmsKey:IKey
The KMS key to use to encrypt the data source.
- Service owned and managed key.
DataSourceAssociationProps
.kmsKey
readonly
optional
parsingStrategy:ParsingStategy
The parsing strategy to use.
- No Parsing Stategy is used.
DataSourceAssociationProps
.parsingStrategy
readonly
sourceUrls:string
[]
The source urls in the format https://www.sitename.com
.
Maximum of 100 URLs.