Class TPCDSMetadataHandler
- java.lang.Object
-
- com.amazonaws.athena.connector.lambda.handlers.MetadataHandler
-
- com.amazonaws.athena.connectors.tpcds.TPCDSMetadataHandler
-
- All Implemented Interfaces:
com.amazonaws.services.lambda.runtime.RequestStreamHandler
public class TPCDSMetadataHandler extends MetadataHandler
Handles metadata requests for the Athena TPC-DS Connector.For more detail, please see the module's README.md, some notable characteristics of this class include:
1. Provides 5 Schems, each representing a different scale factor (1,10,100,250,1000) 2. Each schema has 25 TPC-DS tables 3. Each table is divided into NUM_SPLITS splits * scale_factor/10
-
-
Field Summary
Fields Modifier and Type Field Description protected static TPCDSQueryPassthrough
queryPassthrough
protected static Set<String>
SCHEMA_NAMES
protected static String
SPLIT_NUMBER_FIELD
protected static String
SPLIT_SCALE_FACTOR_FIELD
protected static String
SPLIT_TOTAL_NUMBER_FIELD
-
Fields inherited from class com.amazonaws.athena.connector.lambda.handlers.MetadataHandler
configOptions, DISABLE_SPILL_ENCRYPTION, KMS_KEY_ID_ENV, SPILL_BUCKET_ENV, SPILL_PREFIX_ENV
-
-
Constructor Summary
Constructors Modifier Constructor Description protected
TPCDSMetadataHandler(EncryptionKeyFactory keyFactory, software.amazon.awssdk.services.secretsmanager.SecretsManagerClient secretsManager, software.amazon.awssdk.services.athena.AthenaClient athena, String spillBucket, String spillPrefix, Map<String,String> configOptions)
TPCDSMetadataHandler(Map<String,String> configOptions)
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description GetDataSourceCapabilitiesResponse
doGetDataSourceCapabilities(BlockAllocator allocator, GetDataSourceCapabilitiesRequest request)
Used to describe the types of capabilities supported by a data source.GetTableResponse
doGetQueryPassthroughSchema(BlockAllocator allocator, GetTableRequest request)
Used to get definition (field names, types, descriptions, etc...) of a Query PassThrough.GetSplitsResponse
doGetSplits(BlockAllocator allocator, GetSplitsRequest request)
Used to split-up the reads required to scan the requested batch of partition(s).GetTableResponse
doGetTable(BlockAllocator allocator, GetTableRequest request)
Used to get definition (field names, types, descriptions, etc...) of a Table using the static metadata provided by TerraData's TPCDS generator.ListSchemasResponse
doListSchemaNames(BlockAllocator allocator, ListSchemasRequest request)
Returns our static list of schemas which correspond to the scale factor of the dataset we will generate.ListTablesResponse
doListTables(BlockAllocator allocator, ListTablesRequest request)
Used to get the list of static tables from TerraData's TPCDS generator.void
getPartitions(BlockWriter blockWriter, GetTableLayoutRequest request, QueryStatusChecker queryStatusChecker)
We do not support partitioning at this time since Partition Pruning Performance is not part of the dimensions we test using TPCDS.protected GetSplitsResponse
setupQueryPassthroughSplit(GetSplitsRequest request)
Helper function that provides a single partition for Query Pass-Through-
Methods inherited from class com.amazonaws.athena.connector.lambda.handlers.MetadataHandler
doGetTableLayout, doHandleRequest, doPing, enhancePartitionSchema, getSecret, handleRequest, makeEncryptionKey, makeSpillLocation, onPing, resolveSecrets
-
-
-
-
Field Detail
-
SPLIT_NUMBER_FIELD
protected static final String SPLIT_NUMBER_FIELD
- See Also:
- Constant Field Values
-
SPLIT_TOTAL_NUMBER_FIELD
protected static final String SPLIT_TOTAL_NUMBER_FIELD
- See Also:
- Constant Field Values
-
SPLIT_SCALE_FACTOR_FIELD
protected static final String SPLIT_SCALE_FACTOR_FIELD
- See Also:
- Constant Field Values
-
queryPassthrough
protected static final TPCDSQueryPassthrough queryPassthrough
-
-
Constructor Detail
-
TPCDSMetadataHandler
protected TPCDSMetadataHandler(EncryptionKeyFactory keyFactory, software.amazon.awssdk.services.secretsmanager.SecretsManagerClient secretsManager, software.amazon.awssdk.services.athena.AthenaClient athena, String spillBucket, String spillPrefix, Map<String,String> configOptions)
-
-
Method Detail
-
doGetDataSourceCapabilities
public GetDataSourceCapabilitiesResponse doGetDataSourceCapabilities(BlockAllocator allocator, GetDataSourceCapabilitiesRequest request)
Description copied from class:MetadataHandler
Used to describe the types of capabilities supported by a data source. An engine can use this to determine what portions of the query to push down. A connector that returns any optimization will guarantee that the associated predicate will be pushed down.- Overrides:
doGetDataSourceCapabilities
in classMetadataHandler
- Parameters:
allocator
- Tool for creating and managing Apache Arrow Blocks.request
- Provides details about the catalog being used.- Returns:
- A GetDataSourceCapabilitiesResponse object which returns a map of supported optimizations that the connector is advertising to the consumer. The connector assumes all responsibility for whatever is passed here.
-
doListSchemaNames
public ListSchemasResponse doListSchemaNames(BlockAllocator allocator, ListSchemasRequest request)
Returns our static list of schemas which correspond to the scale factor of the dataset we will generate.- Specified by:
doListSchemaNames
in classMetadataHandler
- Parameters:
allocator
- Tool for creating and managing Apache Arrow Blocks.request
- Provides details on who made the request and which Athena catalog they are querying.- Returns:
- A ListSchemasResponse which primarily contains a Set
of schema names and a catalog name corresponding the Athena catalog that was queried. - See Also:
MetadataHandler
-
doListTables
public ListTablesResponse doListTables(BlockAllocator allocator, ListTablesRequest request)
Used to get the list of static tables from TerraData's TPCDS generator.- Specified by:
doListTables
in classMetadataHandler
- Parameters:
allocator
- Tool for creating and managing Apache Arrow Blocks.request
- Provides details on who made the request and which Athena catalog and database they are querying.- Returns:
- A ListTablesResponse which primarily contains a List
enumerating the tables in this catalog, database tuple. It also contains the catalog name corresponding the Athena catalog that was queried. - See Also:
MetadataHandler
-
doGetTable
public GetTableResponse doGetTable(BlockAllocator allocator, GetTableRequest request)
Used to get definition (field names, types, descriptions, etc...) of a Table using the static metadata provided by TerraData's TPCDS generator.- Specified by:
doGetTable
in classMetadataHandler
- Parameters:
allocator
- Tool for creating and managing Apache Arrow Blocks.request
- Provides details on who made the request and which Athena catalog, database, and table they are querying.- Returns:
- A GetTableResponse which primarily contains:
1. An Apache Arrow Schema object describing the table's columns, types, and descriptions.
2. A Set
of partition column names (or empty if the table isn't partitioned). - See Also:
MetadataHandler
-
doGetQueryPassthroughSchema
public GetTableResponse doGetQueryPassthroughSchema(BlockAllocator allocator, GetTableRequest request) throws Exception
Description copied from class:MetadataHandler
Used to get definition (field names, types, descriptions, etc...) of a Query PassThrough.- Overrides:
doGetQueryPassthroughSchema
in classMetadataHandler
- Parameters:
allocator
- Tool for creating and managing Apache Arrow Blocks.request
- Provides details on who made the request and which Athena catalog, database, and table they are querying.- Returns:
- A GetTableResponse which primarily contains:
1. An Apache Arrow Schema object describing the table's columns, types, and descriptions.
2. A Set
of partition column names (or empty if the table isn't partitioned). - Throws:
Exception
-
getPartitions
public void getPartitions(BlockWriter blockWriter, GetTableLayoutRequest request, QueryStatusChecker queryStatusChecker) throws Exception
We do not support partitioning at this time since Partition Pruning Performance is not part of the dimensions we test using TPCDS. By making this a NoOp the Athena Federation SDK will automatically generate a single placeholder partition to signal to Athena that there is indeed data that needs to be read and that it should call get splits.- Specified by:
getPartitions
in classMetadataHandler
- Parameters:
blockWriter
- Used to write rows (partitions) into the Apache Arrow response.request
- Provides details of the catalog, database, and table being queried as well as any filter predicate.queryStatusChecker
- A QueryStatusChecker that you can use to stop doing work for a query that has already terminated- Throws:
Exception
- See Also:
MetadataHandler
-
doGetSplits
public GetSplitsResponse doGetSplits(BlockAllocator allocator, GetSplitsRequest request)
Used to split-up the reads required to scan the requested batch of partition(s). We are generating a fixed number of splits based on the scale factor.- Specified by:
doGetSplits
in classMetadataHandler
- Parameters:
allocator
- Tool for creating and managing Apache Arrow Blocks.request
- Provides details of the catalog, database, table, andpartition(s) being queried as well as any filter predicate.- Returns:
- A GetSplitsResponse which primarily contains:
1. A Set
which represent read operations Amazon Athena must perform by calling your read function. 2. (Optional) A continuation token which allows you to paginate the generation of splits for large queries. - See Also:
MetadataHandler
-
setupQueryPassthroughSplit
protected GetSplitsResponse setupQueryPassthroughSplit(GetSplitsRequest request)
Helper function that provides a single partition for Query Pass-Through
-
-