RedshiftServerlessWorkgroup
A Redshift Serverless Workgroup with helpers method for Redshift administration.
Overview
RedshiftServerlessWorkgroup
is a Redshift Serverless Workgroup with the following options/capabilities:
- Deployed in a VPC in private subnets. The network configuation can be customized.
- Provide helper methods for running SQL commands via the Redshift Data API. Commands can be custom or predefined for common administration tasks like creating and granting roles.
- Initialize a Glue Data Catalog integration with auto crawling via Glue Crawlers. This would allow tables in Redshift Serverless to appear in the Glue Data Catalog for the purposes of discovery and integration.
The default VPC created by the construct follows the standard implementation of the VPC L2 CDK Construct. As a result, if no account ID and no region are configured in the CDK Stack, the VPC will only contain 2 AZ and the Redshift Serverless Workgroup will fail to deploy. Be sure to configure the account ID and region to create a 3 AZ VPC.
Usage
- TypeScript
- Python
class ExampleDefaultRedshiftServerlessWorkgroupStack extends Stack {
constructor(scope: Construct, id: string) {
super(scope, id)
const namespace = new dsf.consumption.RedshiftServerlessNamespace(this, 'DefaultRedshiftServerlessNamespace', {
name: "default",
dbName: 'defaultdb',
})
new dsf.consumption.RedshiftServerlessWorkgroup(this, "DefaultRedshiftServerlessWorkgroup", {
name: "default",
namespace: namespace,
})
}
}
class ExampleDefaultRedshiftServerlessWorkgroupStack(Stack):
def __init__(self, scope, id):
super().__init__(scope, id)
namespace = dsf.consumption.RedshiftServerlessNamespace(self, "DefaultRedshiftServerlessNamespace",
name="default",
db_name="defaultdb"
)
dsf.consumption.RedshiftServerlessWorkgroup(self, "DefaultRedshiftServerlessWorkgroup",
name="default",
namespace=namespace
)
Bootstrapping Redshift Serverless w/ RedshiftData Construct
The RedshiftData
construct allows custom SQLs to run against the RedshiftServerlessWorkgroup
via the Data API. This allows users to bootstrap Redshift directly from CDK.
The RedshiftData
construct provides the following helpers for bootstrapping Redshift databases:
- Run a custom SQL command
- Create Redshift roles
- Grant Redshift roles full access to schemas
- Grant Redshift roles read only access
- Run a COPY command to load data
- TypeScript
- Python
const workgroup = new dsf.consumption.RedshiftServerlessWorkgroup(this, "DefaultRedshiftServerlessWorkgroup", {
name: "default",
namespace: namespace,
})
// Run a custom SQL to create a customer table
const createTable = workgroup.runCustomSQL('CreateCustomerTable', "defaultdb",
`
CREATE TABLE customer(
customer_id varchar(50),
salutation varchar(5),
first_name varchar(50),
last_name varchar(50),
email_address varchar(100)
)
diststyle even
`,
"drop table customer"
);
// Run a COPY command to load data into the customer table
const ingestion = workgroup.ingestData('ExampleCopy', "defaultdb", "customer", bucket, "data-products/customer/", "csv ignoreheader 1");
// Add dependencies between Redshift Data API commands because CDK cannot infer them
ingestion.node.addDependency(createTable);
// Create an engineering role in the defaultdb
const dbRole = workgroup.createDbRole('EngineeringRole', 'defaultdb', 'engineering');
// Grant the engineering role full access to the public schema in the defaultdb
const dbSchema = workgroup.grantDbSchemaToRole('EngineeringGrant', 'defaultdb', 'public', 'engineering');
// Enforce dependencies
dbSchema.node.addDependency(dbRole);
workgroup = dsf.consumption.RedshiftServerlessWorkgroup(self, "DefaultRedshiftServerlessWorkgroup",
name="default",
namespace=namespace
)
# Run a custom SQL to create a customer table
create_table = workgroup.run_custom_sQL("CreateCustomerTable", "defaultdb", """
CREATE TABLE customer(
customer_id varchar(50),
salutation varchar(5),
first_name varchar(50),
last_name varchar(50),
email_address varchar(100)
)
diststyle even
""", "drop table customer")
# Run a COPY command to load data into the customer table
ingestion = workgroup.ingest_data("ExampleCopy", "defaultdb", "customer", bucket, "data-products/customer/", "csv ignoreheader 1")
# Add dependencies between Redshift Data API commands because CDK cannot infer them
ingestion.node.add_dependency(create_table)
# Create an engineering role in the defaultdb
db_role = workgroup.create_db_role("EngineeringRole", "defaultdb", "engineering")
# Grant the engineering role full access to the public schema in the defaultdb
db_schema = workgroup.grant_db_schema_to_role("EngineeringGrant", "defaultdb", "public", "engineering")
# Enforce dependencies
db_schema.node.add_dependency(db_role)
Cataloging Redshift Serverless Tables
Redshift tables and databases can also be automatically catalog in Glue Data Catalog using an helper method. This method creates a Glue Catalog database as well as a crawler to populate the database with table metadata from your Redshift database.
The default value of the path that the crawler would use is <databaseName>/public/%
which translates to all the tables in the public schema. Please refer to the crawler documentation for more information for JDBC data sources.
- TypeScript
- Python
class ExampleRedshiftServerlessWorkgroupCatalogStack extends Stack {
constructor(scope: Construct, id: string) {
super(scope, id)
const namespace = new dsf.consumption.RedshiftServerlessNamespace(this, 'DefaultRedshiftServerlessNamespace', {
name: "default",
dbName: 'defaultdb',
})
const workgroup = new dsf.consumption.RedshiftServerlessWorkgroup(this, "DefaultRedshiftServerlessWorkgroup", {
name: "default",
namespace: namespace,
})
workgroup.catalogTables('RedshiftCatalog', "example-redshift-db", 'defaultdb/public/%')
}
}
class ExampleRedshiftServerlessWorkgroupCatalogStack(Stack):
def __init__(self, scope, id):
super().__init__(scope, id)
namespace = dsf.consumption.RedshiftServerlessNamespace(self, "DefaultRedshiftServerlessNamespace",
name="default",
db_name="defaultdb"
)
workgroup = dsf.consumption.RedshiftServerlessWorkgroup(self, "DefaultRedshiftServerlessWorkgroup",
name="default",
namespace=namespace
)
workgroup.catalog_tables("RedshiftCatalog", "example-redshift-db", "defaultdb/public/%")