Skip to main content

Athena Workgroup

An Amazon Athena workgroup with provided configuration.

Overview

AthenaWorkGroup provides Athena workgroup configuration with best-practices:

  • Amazon S3 bucket for query results, based on AnalyticsBucket.
  • Query results are encrypted using AWS KMS Key.
  • Execution Role for the PySpark query engine.
  • A grant method to allow principals to run queries.

Usage

new dsf.consumption.AthenaWorkGroup(this, 'AthenaWorkGroupDefault', {
name: 'athena-default',
resultLocationPrefix: 'athena-default-results/'
})

User provided S3 bucket for query results

You can provide your own S3 bucket for query results. If you do so, you are required to provide a KMS Key that will be used to encrypt query results.

Results encryption

If you provide your own S3 bucket, you also need to provide KMS encryption key to encrypt query results. You also need to grant access to this key for AthenaWorkGroup's executionRole (if Spark engine is used), or for principals that were granted to run queries using AthenaWorkGroup's grantRunQueries method.

caution

You can also decide to provide your KMS Key to encrypt query results with S3 bucket that is provided by the construct (i.e. if you are not providing your own S3 bucket).

new dsf.consumption.AthenaWorkGroup(this, 'AthenaWorkGroupDefault', {
name: 'athena-user-bucket',
resultBucket: userResultsBucket,
resultsEncryptionKey: userDataKey,
resultLocationPrefix: 'athena-wg-results/'
})

Apache Spark (PySpark) Engine version

You can choose Athena query engine from the available options:

The default is set to AUTO which will choose Athena engine version 3.

If you wish to change query engine to PySpark, you will also be able to access the executionRole IAM Role that will be created for you if you don't provide it. You can access the execution role via executionRole property.

  const sparkEngineVersion = dsf.consumption.EngineVersion.PYSPARK_V3

new dsf.consumption.AthenaWorkGroup(this, 'AthenaWorkGroupSpark', {
name: 'athena-spark',
engineVersion: sparkEngineVersion,
resultLocationPrefix: 'athena-wg-results/'
})

Construct properties

You can leverage different properties to customize your Athena workgroup. For example, you can use resultsRetentionPeriod to specify the retention period for your query results. You can provide your KMS Key for encryption even if you use provided results bucket. You can explore other properties available in AthenaWorkGroupProps.

new dsf.consumption.AthenaWorkGroup(this, 'AthenaWorkGroupProperties', {
name: 'athena-properties',
bytesScannedCutoffPerQuery: 104857600,
resultLocationPrefix: 'athena-results/',
resultsEncryptionKey: userDataKey,
resultsRetentionPeriod: Duration.days(1),
})

Grant permission to run queries

We provide grantRunQueries method to grant permission to principals to run queries using the workgroup.

  const athenaWg = new dsf.consumption.AthenaWorkGroup(this, 'AthenaWorkGroupGrant', {
name: 'athena-grant',
resultLocationPrefix: 'athena-results/',
})

athenaWg.grantRunQueries(athenaExampleRole)

Workgroup removal

You can specify if Athena Workgroup construct resources should be deleted when CDK Stack is destroyed using removalPolicy. To have an additional layer of protection, we require users to set a global context value for data removal in their CDK applications.

Athena workgroup will be destroyed only if both the removal policy parameter of the construct and DSF global removal policy are set to remove objects.

If set to be destroyed, Athena workgroup construct will use recursiveDeleteOption, that will delete the workgroup and its contents even if it contains any named queries.

You can set @data-solutions-framework-on-aws/removeDataOnDestroy (true or false) global data removal policy in cdk.json:

cdk.json
{
"context": {
"@data-solutions-framework-on-aws/removeDataOnDestroy": true
}
}