Class CloudwatchMetadataHandler
- java.lang.Object
-
- com.amazonaws.athena.connector.lambda.handlers.MetadataHandler
-
- com.amazonaws.athena.connectors.cloudwatch.CloudwatchMetadataHandler
-
- All Implemented Interfaces:
com.amazonaws.services.lambda.runtime.RequestStreamHandler
public class CloudwatchMetadataHandler extends MetadataHandler
Handles metadata requests for the Athena Cloudwatch Connector.For more detail, please see the module's README.md, some notable characteristics of this class include:
1. Each LogGroup is treated as a schema (aka database). 2. Each LogStream is treated as a table. 3. A special 'all_log_streams' view is added which allows you to query all LogStreams in a LogGroup. 4. LogStreams area treated as partitions and scanned in parallel. 5. Timestamp predicates are pushed into Cloudwatch itself.
-
-
Field Summary
Fields Modifier and Type Field Description protected static String
ALL_LOG_STREAMS_TABLE
protected static org.apache.arrow.vector.types.pojo.Schema
CLOUDWATCH_SCHEMA
protected static String
LOG_GROUP_FIELD
protected static String
LOG_MSG_FIELD
protected static String
LOG_STREAM_FIELD
protected static String
LOG_STREAM_SIZE_FIELD
protected static String
LOG_TIME_FIELD
protected static int
MAX_SPLITS_PER_REQUEST
-
Fields inherited from class com.amazonaws.athena.connector.lambda.handlers.MetadataHandler
configOptions, DISABLE_SPILL_ENCRYPTION, KMS_KEY_ID_ENV, SPILL_BUCKET_ENV, SPILL_PREFIX_ENV
-
-
Constructor Summary
Constructors Modifier Constructor Description CloudwatchMetadataHandler(Map<String,String> configOptions)
protected
CloudwatchMetadataHandler(software.amazon.awssdk.services.cloudwatchlogs.CloudWatchLogsClient awsLogs, EncryptionKeyFactory keyFactory, software.amazon.awssdk.services.secretsmanager.SecretsManagerClient secretsManager, software.amazon.awssdk.services.athena.AthenaClient athena, String spillBucket, String spillPrefix, Map<String,String> configOptions)
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description GetDataSourceCapabilitiesResponse
doGetDataSourceCapabilities(BlockAllocator allocator, GetDataSourceCapabilitiesRequest request)
Used to describe the types of capabilities supported by a data source.GetTableResponse
doGetQueryPassthroughSchema(BlockAllocator allocator, GetTableRequest request)
Used to get definition (field names, types, descriptions, etc...) of a Query PassThrough.GetSplitsResponse
doGetSplits(BlockAllocator allocator, GetSplitsRequest request)
Each partition is converted into a single Split which means we will potentially read all LogStreams required for the query in parallel.GetTableResponse
doGetTable(BlockAllocator blockAllocator, GetTableRequest getTableRequest)
Returns the pre-set schema for the request Cloudwatch table (LogStream) and schema (LogGroup) after validating that it exists.ListSchemasResponse
doListSchemaNames(BlockAllocator blockAllocator, ListSchemasRequest listSchemasRequest)
List LogGroups in your Cloudwatch account treating each as a 'schema' (aka database)ListTablesResponse
doListTables(BlockAllocator blockAllocator, ListTablesRequest listTablesRequest)
List LogStreams within the requested schema (aka LogGroup) in your Cloudwatch account treating each as a 'table'.void
enhancePartitionSchema(SchemaBuilder partitionSchemaBuilder, GetTableLayoutRequest request)
We add one additional field to the partition schema.void
getPartitions(BlockWriter blockWriter, GetTableLayoutRequest request, QueryStatusChecker queryStatusChecker)
Gets the list of LogStreams that need to be scanned to satisfy the requested table.-
Methods inherited from class com.amazonaws.athena.connector.lambda.handlers.MetadataHandler
doGetTableLayout, doHandleRequest, doPing, getSecret, handleRequest, makeEncryptionKey, makeSpillLocation, onPing, resolveSecrets
-
-
-
-
Field Detail
-
MAX_SPLITS_PER_REQUEST
protected static final int MAX_SPLITS_PER_REQUEST
- See Also:
- Constant Field Values
-
ALL_LOG_STREAMS_TABLE
protected static final String ALL_LOG_STREAMS_TABLE
- See Also:
- Constant Field Values
-
LOG_STREAM_FIELD
protected static final String LOG_STREAM_FIELD
- See Also:
- Constant Field Values
-
LOG_GROUP_FIELD
protected static final String LOG_GROUP_FIELD
- See Also:
- Constant Field Values
-
LOG_TIME_FIELD
protected static final String LOG_TIME_FIELD
- See Also:
- Constant Field Values
-
LOG_MSG_FIELD
protected static final String LOG_MSG_FIELD
- See Also:
- Constant Field Values
-
LOG_STREAM_SIZE_FIELD
protected static final String LOG_STREAM_SIZE_FIELD
- See Also:
- Constant Field Values
-
CLOUDWATCH_SCHEMA
protected static final org.apache.arrow.vector.types.pojo.Schema CLOUDWATCH_SCHEMA
-
-
Constructor Detail
-
CloudwatchMetadataHandler
protected CloudwatchMetadataHandler(software.amazon.awssdk.services.cloudwatchlogs.CloudWatchLogsClient awsLogs, EncryptionKeyFactory keyFactory, software.amazon.awssdk.services.secretsmanager.SecretsManagerClient secretsManager, software.amazon.awssdk.services.athena.AthenaClient athena, String spillBucket, String spillPrefix, Map<String,String> configOptions)
-
-
Method Detail
-
doListSchemaNames
public ListSchemasResponse doListSchemaNames(BlockAllocator blockAllocator, ListSchemasRequest listSchemasRequest) throws TimeoutException
List LogGroups in your Cloudwatch account treating each as a 'schema' (aka database)- Specified by:
doListSchemaNames
in classMetadataHandler
- Parameters:
blockAllocator
- Tool for creating and managing Apache Arrow Blocks.listSchemasRequest
- Provides details on who made the request and which Athena catalog they are querying.- Returns:
- A ListSchemasResponse which primarily contains a Set
of schema names and a catalog name corresponding the Athena catalog that was queried. - Throws:
TimeoutException
- See Also:
MetadataHandler
-
doListTables
public ListTablesResponse doListTables(BlockAllocator blockAllocator, ListTablesRequest listTablesRequest) throws TimeoutException
List LogStreams within the requested schema (aka LogGroup) in your Cloudwatch account treating each as a 'table'.- Specified by:
doListTables
in classMetadataHandler
- Parameters:
blockAllocator
- Tool for creating and managing Apache Arrow Blocks.listTablesRequest
- Provides details on who made the request and which Athena catalog and database they are querying.- Returns:
- A ListTablesResponse which primarily contains a List
enumerating the tables in this catalog, database tuple. It also contains the catalog name corresponding the Athena catalog that was queried. - Throws:
TimeoutException
- See Also:
MetadataHandler
-
doGetTable
public GetTableResponse doGetTable(BlockAllocator blockAllocator, GetTableRequest getTableRequest)
Returns the pre-set schema for the request Cloudwatch table (LogStream) and schema (LogGroup) after validating that it exists.- Specified by:
doGetTable
in classMetadataHandler
- Parameters:
blockAllocator
- Tool for creating and managing Apache Arrow Blocks.getTableRequest
- Provides details on who made the request and which Athena catalog, database, and table they are querying.- Returns:
- A GetTableResponse which primarily contains:
1. An Apache Arrow Schema object describing the table's columns, types, and descriptions.
2. A Set
of partition column names (or empty if the table isn't partitioned). - See Also:
MetadataHandler
-
enhancePartitionSchema
public void enhancePartitionSchema(SchemaBuilder partitionSchemaBuilder, GetTableLayoutRequest request)
We add one additional field to the partition schema. This field is used for our own purposes and ignored by Athena but it will get passed to calls to GetSplits(...) which is where we will set it on our Split without the need to call Cloudwatch a second time.- Overrides:
enhancePartitionSchema
in classMetadataHandler
- Parameters:
partitionSchemaBuilder
- The SchemaBuilder you can use to add additional columns and metadata to the partitions response.request
- The GetTableLayoutResquest that triggered this call.- See Also:
MetadataHandler
-
getPartitions
public void getPartitions(BlockWriter blockWriter, GetTableLayoutRequest request, QueryStatusChecker queryStatusChecker) throws Exception
Gets the list of LogStreams that need to be scanned to satisfy the requested table. In most cases this will be just 1 LogStream and this results in just 1 partition. If, however, the request is for the special ALL_LOG_STREAMS view then all LogStreams in the requested LogGroup (schema) are queried and turned into partitions 1:1.- Specified by:
getPartitions
in classMetadataHandler
- Parameters:
blockWriter
- Used to write rows (partitions) into the Apache Arrow response.request
- Provides details of the catalog, database, and table being queried as well as any filter predicate.queryStatusChecker
- A QueryStatusChecker that you can use to stop doing work for a query that has already terminated- Throws:
Exception
- See Also:
MetadataHandler
-
doGetSplits
public GetSplitsResponse doGetSplits(BlockAllocator allocator, GetSplitsRequest request)
Each partition is converted into a single Split which means we will potentially read all LogStreams required for the query in parallel.- Specified by:
doGetSplits
in classMetadataHandler
- Parameters:
allocator
- Tool for creating and managing Apache Arrow Blocks.request
- Provides details of the catalog, database, table, andpartition(s) being queried as well as any filter predicate.- Returns:
- A GetSplitsResponse which primarily contains:
1. A Set
which represent read operations Amazon Athena must perform by calling your read function. 2. (Optional) A continuation token which allows you to paginate the generation of splits for large queries. - See Also:
MetadataHandler
-
doGetDataSourceCapabilities
public GetDataSourceCapabilitiesResponse doGetDataSourceCapabilities(BlockAllocator allocator, GetDataSourceCapabilitiesRequest request)
Description copied from class:MetadataHandler
Used to describe the types of capabilities supported by a data source. An engine can use this to determine what portions of the query to push down. A connector that returns any optimization will guarantee that the associated predicate will be pushed down.- Overrides:
doGetDataSourceCapabilities
in classMetadataHandler
- Parameters:
allocator
- Tool for creating and managing Apache Arrow Blocks.request
- Provides details about the catalog being used.- Returns:
- A GetDataSourceCapabilitiesResponse object which returns a map of supported optimizations that the connector is advertising to the consumer. The connector assumes all responsibility for whatever is passed here.
-
doGetQueryPassthroughSchema
public GetTableResponse doGetQueryPassthroughSchema(BlockAllocator allocator, GetTableRequest request) throws Exception
Description copied from class:MetadataHandler
Used to get definition (field names, types, descriptions, etc...) of a Query PassThrough.- Overrides:
doGetQueryPassthroughSchema
in classMetadataHandler
- Parameters:
allocator
- Tool for creating and managing Apache Arrow Blocks.request
- Provides details on who made the request and which Athena catalog, database, and table they are querying.- Returns:
- A GetTableResponse which primarily contains:
1. An Apache Arrow Schema object describing the table's columns, types, and descriptions.
2. A Set
of partition column names (or empty if the table isn't partitioned). - Throws:
Exception
-
-