Class MetadataHandler
- java.lang.Object
-
- com.amazonaws.athena.connector.lambda.handlers.MetadataHandler
-
- All Implemented Interfaces:
com.amazonaws.services.lambda.runtime.RequestStreamHandler
- Direct Known Subclasses:
AmazonMskMetadataHandler
,AwsCmdbMetadataHandler
,BigQueryMetadataHandler
,CloudwatchMetadataHandler
,ExampleMetadataHandler
,GlueMetadataHandler
,JdbcMetadataHandler
,KafkaMetadataHandler
,MetricsMetadataHandler
,TPCDSMetadataHandler
public abstract class MetadataHandler extends Object implements com.amazonaws.services.lambda.runtime.RequestStreamHandler
This class defines the functionality required by any valid source of federated metadata for Athena. It is recommended that all connectors extend this class for Metadata operations though it is possible for you to write your own from the ground up as long as you satisfy the wire protocol. For all cases we've encountered it has made more sense to start with this base class and use it's implementation for most of the boilerplate related to Lambda and resource lifecycle so we could focus on the task of integrating with the source we were interested in.
-
-
Field Summary
Fields Modifier and Type Field Description protected Map<String,String>
configOptions
protected static String
DISABLE_SPILL_ENCRYPTION
protected static String
KMS_KEY_ID_ENV
protected static String
SPILL_BUCKET_ENV
protected static String
SPILL_PREFIX_ENV
-
Constructor Summary
Constructors Constructor Description MetadataHandler(EncryptionKeyFactory encryptionKeyFactory, software.amazon.awssdk.services.secretsmanager.SecretsManagerClient secretsManager, software.amazon.awssdk.services.athena.AthenaClient athena, String sourceType, String spillBucket, String spillPrefix, Map<String,String> configOptions)
MetadataHandler(String sourceType, Map<String,String> configOptions)
When MetadataHandler is used as a Lambda, the "Main" class will pass in System.getenv() as the configOptions.
-
Method Summary
All Methods Instance Methods Abstract Methods Concrete Methods Modifier and Type Method Description GetDataSourceCapabilitiesResponse
doGetDataSourceCapabilities(BlockAllocator allocator, GetDataSourceCapabilitiesRequest request)
Used to describe the types of capabilities supported by a data source.GetTableResponse
doGetQueryPassthroughSchema(BlockAllocator allocator, GetTableRequest request)
Used to get definition (field names, types, descriptions, etc...) of a Query PassThrough.abstract GetSplitsResponse
doGetSplits(BlockAllocator allocator, GetSplitsRequest request)
Used to split-up the reads required to scan the requested batch of partition(s).abstract GetTableResponse
doGetTable(BlockAllocator allocator, GetTableRequest request)
Used to get definition (field names, types, descriptions, etc...) of a Table.GetTableLayoutResponse
doGetTableLayout(BlockAllocator allocator, GetTableLayoutRequest request)
Used to get the partitions that must be read from the request table in order to satisfy the requested predicate.protected void
doHandleRequest(BlockAllocator allocator, com.fasterxml.jackson.databind.ObjectMapper objectMapper, MetadataRequest req, OutputStream outputStream)
abstract ListSchemasResponse
doListSchemaNames(BlockAllocator allocator, ListSchemasRequest request)
Used to get the list of schemas (aka databases) that this source contains.abstract ListTablesResponse
doListTables(BlockAllocator allocator, ListTablesRequest request)
Used to get the list of tables that this source contains.PingResponse
doPing(PingRequest request)
Used to warm up your function as well as to discovery its capabilities (e.g.void
enhancePartitionSchema(SchemaBuilder partitionSchemaBuilder, GetTableLayoutRequest request)
This method can be used to add additional fields to the schema of our partition response.abstract void
getPartitions(BlockWriter blockWriter, GetTableLayoutRequest request, QueryStatusChecker queryStatusChecker)
Used to get the partitions that must be read from the request table in order to satisfy the requested predicate.protected String
getSecret(String secretName)
void
handleRequest(InputStream inputStream, OutputStream outputStream, com.amazonaws.services.lambda.runtime.Context context)
protected EncryptionKey
makeEncryptionKey()
protected SpillLocation
makeSpillLocation(MetadataRequest request)
Used to make a spill location for a split.void
onPing(PingRequest request)
Provides you a signal that can be used to warm up your function.protected String
resolveSecrets(String rawString)
Resolves any secrets found in the supplied string, for example: MyString${WithSecret} would have ${WithSecret} by the corresponding value of the secret in AWS Secrets Manager with that name.
-
-
-
Field Detail
-
SPILL_BUCKET_ENV
protected static final String SPILL_BUCKET_ENV
- See Also:
- Constant Field Values
-
SPILL_PREFIX_ENV
protected static final String SPILL_PREFIX_ENV
- See Also:
- Constant Field Values
-
KMS_KEY_ID_ENV
protected static final String KMS_KEY_ID_ENV
- See Also:
- Constant Field Values
-
DISABLE_SPILL_ENCRYPTION
protected static final String DISABLE_SPILL_ENCRYPTION
- See Also:
- Constant Field Values
-
-
Constructor Detail
-
MetadataHandler
public MetadataHandler(String sourceType, Map<String,String> configOptions)
When MetadataHandler is used as a Lambda, the "Main" class will pass in System.getenv() as the configOptions. Otherwise in situations where the connector is used outside of a Lambda, it may be a config map that is passed in.- Parameters:
sourceType
- Used to aid in logging diagnostic info when raising a support case.
-
MetadataHandler
public MetadataHandler(EncryptionKeyFactory encryptionKeyFactory, software.amazon.awssdk.services.secretsmanager.SecretsManagerClient secretsManager, software.amazon.awssdk.services.athena.AthenaClient athena, String sourceType, String spillBucket, String spillPrefix, Map<String,String> configOptions)
- Parameters:
sourceType
- Used to aid in logging diagnostic info when raising a support case.
-
-
Method Detail
-
resolveSecrets
protected String resolveSecrets(String rawString)
Resolves any secrets found in the supplied string, for example: MyString${WithSecret} would have ${WithSecret} by the corresponding value of the secret in AWS Secrets Manager with that name. If no such secret is found the function throws.- Parameters:
rawString
- The string in which you'd like to replace SecretsManager placeholders. (e.g. ThisIsA${Secret}Here - The ${Secret} would be replaced with the contents of an SecretsManager secret called Secret. If no such secret is found, the function throws. If no ${} are found in the input string, nothing is replaced and the original string is returned.
-
makeEncryptionKey
protected EncryptionKey makeEncryptionKey()
-
makeSpillLocation
protected SpillLocation makeSpillLocation(MetadataRequest request)
Used to make a spill location for a split. Each split should have a unique spill location, so be sure to call this method once per split!- Parameters:
request
-- Returns:
- A unique spill location.
-
handleRequest
public final void handleRequest(InputStream inputStream, OutputStream outputStream, com.amazonaws.services.lambda.runtime.Context context) throws IOException
- Specified by:
handleRequest
in interfacecom.amazonaws.services.lambda.runtime.RequestStreamHandler
- Throws:
IOException
-
doHandleRequest
protected final void doHandleRequest(BlockAllocator allocator, com.fasterxml.jackson.databind.ObjectMapper objectMapper, MetadataRequest req, OutputStream outputStream) throws Exception
- Throws:
Exception
-
doListSchemaNames
public abstract ListSchemasResponse doListSchemaNames(BlockAllocator allocator, ListSchemasRequest request) throws Exception
Used to get the list of schemas (aka databases) that this source contains.- Parameters:
allocator
- Tool for creating and managing Apache Arrow Blocks.request
- Provides details on who made the request and which Athena catalog they are querying.- Returns:
- A ListSchemasResponse which primarily contains a Set
of schema names and a catalog name corresponding the Athena catalog that was queried. - Throws:
Exception
-
doListTables
public abstract ListTablesResponse doListTables(BlockAllocator allocator, ListTablesRequest request) throws Exception
Used to get the list of tables that this source contains.- Parameters:
allocator
- Tool for creating and managing Apache Arrow Blocks.request
- Provides details on who made the request and which Athena catalog and database they are querying.- Returns:
- A ListTablesResponse which primarily contains a List
enumerating the tables in this catalog, database tuple. It also contains the catalog name corresponding the Athena catalog that was queried. - Throws:
Exception
-
doGetQueryPassthroughSchema
public GetTableResponse doGetQueryPassthroughSchema(BlockAllocator allocator, GetTableRequest request) throws Exception
Used to get definition (field names, types, descriptions, etc...) of a Query PassThrough.- Parameters:
allocator
- Tool for creating and managing Apache Arrow Blocks.request
- Provides details on who made the request and which Athena catalog, database, and table they are querying.- Returns:
- A GetTableResponse which primarily contains:
1. An Apache Arrow Schema object describing the table's columns, types, and descriptions.
2. A Set
of partition column names (or empty if the table isn't partitioned). - Throws:
Exception
-
doGetTable
public abstract GetTableResponse doGetTable(BlockAllocator allocator, GetTableRequest request) throws Exception
Used to get definition (field names, types, descriptions, etc...) of a Table.- Parameters:
allocator
- Tool for creating and managing Apache Arrow Blocks.request
- Provides details on who made the request and which Athena catalog, database, and table they are querying.- Returns:
- A GetTableResponse which primarily contains:
1. An Apache Arrow Schema object describing the table's columns, types, and descriptions.
2. A Set
of partition column names (or empty if the table isn't partitioned). - Throws:
Exception
-
doGetTableLayout
public GetTableLayoutResponse doGetTableLayout(BlockAllocator allocator, GetTableLayoutRequest request) throws Exception
Used to get the partitions that must be read from the request table in order to satisfy the requested predicate.- Parameters:
allocator
- Tool for creating and managing Apache Arrow Blocks.request
- Provides details of the catalog, database, and table being queried as well as any filter predicate.- Returns:
- A GetTableLayoutResponse which primarily contains:
1. An Apache Arrow Block with 0 or more partitions to read. 0 partitions implies there are 0 rows to read.
2. Set
of partition column names which should correspond to columns in your Apache Arrow Block. - Throws:
Exception
-
enhancePartitionSchema
public void enhancePartitionSchema(SchemaBuilder partitionSchemaBuilder, GetTableLayoutRequest request)
This method can be used to add additional fields to the schema of our partition response. Athena expects each partitions in the response to have a column corresponding to your partition columns. You can choose to add additional columns to that response which Athena will ignore but will pass on to you when it call GetSplits(...) for each partition.- Parameters:
partitionSchemaBuilder
- The SchemaBuilder you can use to add additional columns and metadata to the partitions response.request
- The GetTableLayoutResquest that triggered this call.
-
getPartitions
public abstract void getPartitions(BlockWriter blockWriter, GetTableLayoutRequest request, QueryStatusChecker queryStatusChecker) throws Exception
Used to get the partitions that must be read from the request table in order to satisfy the requested predicate.- Parameters:
blockWriter
- Used to write rows (partitions) into the Apache Arrow response.request
- Provides details of the catalog, database, and table being queried as well as any filter predicate.queryStatusChecker
- A QueryStatusChecker that you can use to stop doing work for a query that has already terminated- Throws:
Exception
-
doGetSplits
public abstract GetSplitsResponse doGetSplits(BlockAllocator allocator, GetSplitsRequest request) throws Exception
Used to split-up the reads required to scan the requested batch of partition(s).- Parameters:
allocator
- Tool for creating and managing Apache Arrow Blocks.request
- Provides details of the catalog, database, table, andpartition(s) being queried as well as any filter predicate.- Returns:
- A GetSplitsResponse which primarily contains:
1. A Set
which represent read operations Amazon Athena must perform by calling your read function. 2. (Optional) A continuation token which allows you to paginate the generation of splits for large queries. - Throws:
Exception
-
doGetDataSourceCapabilities
public GetDataSourceCapabilitiesResponse doGetDataSourceCapabilities(BlockAllocator allocator, GetDataSourceCapabilitiesRequest request)
Used to describe the types of capabilities supported by a data source. An engine can use this to determine what portions of the query to push down. A connector that returns any optimization will guarantee that the associated predicate will be pushed down.- Parameters:
allocator
- Tool for creating and managing Apache Arrow Blocks.request
- Provides details about the catalog being used.- Returns:
- A GetDataSourceCapabilitiesResponse object which returns a map of supported optimizations that the connector is advertising to the consumer. The connector assumes all responsibility for whatever is passed here.
-
doPing
public PingResponse doPing(PingRequest request)
Used to warm up your function as well as to discovery its capabilities (e.g. SDK capabilities)- Parameters:
request
- The PingRequest.- Returns:
- A PingResponse.
-
onPing
public void onPing(PingRequest request)
Provides you a signal that can be used to warm up your function.- Parameters:
request
- The PingRequest.
-
-