Skip to main content

DataCatalogDatabase

AWS Glue Catalog database for an Amazon S3 dataset.

Overview

DataCatalogDatabase is an AWS Glue Data Catalog Database configured for an Amazon S3 based dataset:

  • The database default location is pointing to an S3 bucket location s3://<locationBucket>/<locationPrefix>/
  • The database can store various tables structured in their respective prefixes, for example: s3://<locationBucket>/<locationPrefix>/<table_prefix>/
  • By default, a database level crawler is scheduled to run once a day (00:01h local timezone). The crawler can be disabled and the schedule/frequency of the crawler can be modified with a cron expression.

Data Catalog Database

Data Catalog encryption

The AWS Glue Data Catalog resources created by the DataCatalogDatabase construct are not encrypted because the encryption is only available at the catalog level. Changing the encryption at the catalog level has a wide impact on existing Glue resources and producers/consumers. Similarly, changing the encryption configuration at the catalog level after this construct is deployed can break all the resources created as part of DSF on AWS.

Usage

class ExampleDefaultDataCatalogDatabaseStack extends cdk.Stack {
constructor(scope: Construct, id: string) {
super(scope, id);
const bucket = new Bucket(this, 'DataCatalogBucket');

new dsf.governance.DataCatalogDatabase(this, 'DataCatalogDatabase', {
locationBucket: bucket,
locationPrefix: '/databasePath',
name: 'example-db',
});
}
}

Modifying the crawler behavior

You can change the default configuration of the AWS Glue Crawler to match your requirements:

  • Enable or disable the crawler
  • Change the crawler run frequency
  • Provide your own key to encrypt the crawler logs
  const encryptionKey = new Key(this, 'CrawlerLogEncryptionKey');

new dsf.governance.DataCatalogDatabase(this, 'DataCatalogDatabase', {
locationBucket: bucket,
locationPrefix: '/databasePath',
name: 'example-db',
autoCrawl: true,
autoCrawlSchedule: {
scheduleExpression: 'cron(1 0 * * ? *)',
},
crawlerLogEncryptionKey: encryptionKey,
crawlerTableLevelDepth: 3,
});