Class CloudwatchMetadataHandler

  • All Implemented Interfaces:
    com.amazonaws.services.lambda.runtime.RequestStreamHandler

    public class CloudwatchMetadataHandler
    extends MetadataHandler
    Handles metadata requests for the Athena Cloudwatch Connector.

    For more detail, please see the module's README.md, some notable characteristics of this class include:

    1. Each LogGroup is treated as a schema (aka database). 2. Each LogStream is treated as a table. 3. A special 'all_log_streams' view is added which allows you to query all LogStreams in a LogGroup. 4. LogStreams area treated as partitions and scanned in parallel. 5. Timestamp predicates are pushed into Cloudwatch itself.

    • Constructor Detail

      • CloudwatchMetadataHandler

        public CloudwatchMetadataHandler​(Map<String,​String> configOptions)
      • CloudwatchMetadataHandler

        protected CloudwatchMetadataHandler​(com.amazonaws.services.logs.AWSLogs awsLogs,
                                            EncryptionKeyFactory keyFactory,
                                            com.amazonaws.services.secretsmanager.AWSSecretsManager secretsManager,
                                            com.amazonaws.services.athena.AmazonAthena athena,
                                            String spillBucket,
                                            String spillPrefix,
                                            Map<String,​String> configOptions)
    • Method Detail

      • doListSchemaNames

        public ListSchemasResponse doListSchemaNames​(BlockAllocator blockAllocator,
                                                     ListSchemasRequest listSchemasRequest)
                                              throws TimeoutException
        List LogGroups in your Cloudwatch account treating each as a 'schema' (aka database)
        Specified by:
        doListSchemaNames in class MetadataHandler
        Parameters:
        blockAllocator - Tool for creating and managing Apache Arrow Blocks.
        listSchemasRequest - Provides details on who made the request and which Athena catalog they are querying.
        Returns:
        A ListSchemasResponse which primarily contains a Set of schema names and a catalog name corresponding the Athena catalog that was queried.
        Throws:
        TimeoutException
        See Also:
        MetadataHandler
      • doListTables

        public ListTablesResponse doListTables​(BlockAllocator blockAllocator,
                                               ListTablesRequest listTablesRequest)
                                        throws TimeoutException
        List LogStreams within the requested schema (aka LogGroup) in your Cloudwatch account treating each as a 'table'.
        Specified by:
        doListTables in class MetadataHandler
        Parameters:
        blockAllocator - Tool for creating and managing Apache Arrow Blocks.
        listTablesRequest - Provides details on who made the request and which Athena catalog and database they are querying.
        Returns:
        A ListTablesResponse which primarily contains a List enumerating the tables in this catalog, database tuple. It also contains the catalog name corresponding the Athena catalog that was queried.
        Throws:
        TimeoutException
        See Also:
        MetadataHandler
      • doGetTable

        public GetTableResponse doGetTable​(BlockAllocator blockAllocator,
                                           GetTableRequest getTableRequest)
        Returns the pre-set schema for the request Cloudwatch table (LogStream) and schema (LogGroup) after validating that it exists.
        Specified by:
        doGetTable in class MetadataHandler
        Parameters:
        blockAllocator - Tool for creating and managing Apache Arrow Blocks.
        getTableRequest - Provides details on who made the request and which Athena catalog, database, and table they are querying.
        Returns:
        A GetTableResponse which primarily contains: 1. An Apache Arrow Schema object describing the table's columns, types, and descriptions. 2. A Set of partition column names (or empty if the table isn't partitioned).
        See Also:
        MetadataHandler
      • enhancePartitionSchema

        public void enhancePartitionSchema​(SchemaBuilder partitionSchemaBuilder,
                                           GetTableLayoutRequest request)
        We add one additional field to the partition schema. This field is used for our own purposes and ignored by Athena but it will get passed to calls to GetSplits(...) which is where we will set it on our Split without the need to call Cloudwatch a second time.
        Overrides:
        enhancePartitionSchema in class MetadataHandler
        Parameters:
        partitionSchemaBuilder - The SchemaBuilder you can use to add additional columns and metadata to the partitions response.
        request - The GetTableLayoutResquest that triggered this call.
        See Also:
        MetadataHandler
      • getPartitions

        public void getPartitions​(BlockWriter blockWriter,
                                  GetTableLayoutRequest request,
                                  QueryStatusChecker queryStatusChecker)
                           throws Exception
        Gets the list of LogStreams that need to be scanned to satisfy the requested table. In most cases this will be just 1 LogStream and this results in just 1 partition. If, however, the request is for the special ALL_LOG_STREAMS view then all LogStreams in the requested LogGroup (schema) are queried and turned into partitions 1:1.
        Specified by:
        getPartitions in class MetadataHandler
        Parameters:
        blockWriter - Used to write rows (partitions) into the Apache Arrow response.
        request - Provides details of the catalog, database, and table being queried as well as any filter predicate.
        queryStatusChecker - A QueryStatusChecker that you can use to stop doing work for a query that has already terminated
        Throws:
        Exception
        See Also:
        MetadataHandler
      • doGetSplits

        public GetSplitsResponse doGetSplits​(BlockAllocator allocator,
                                             GetSplitsRequest request)
        Each partition is converted into a single Split which means we will potentially read all LogStreams required for the query in parallel.
        Specified by:
        doGetSplits in class MetadataHandler
        Parameters:
        allocator - Tool for creating and managing Apache Arrow Blocks.
        request - Provides details of the catalog, database, table, andpartition(s) being queried as well as any filter predicate.
        Returns:
        A GetSplitsResponse which primarily contains: 1. A Set which represent read operations Amazon Athena must perform by calling your read function. 2. (Optional) A continuation token which allows you to paginate the generation of splits for large queries.
        See Also:
        MetadataHandler
      • doGetDataSourceCapabilities

        public GetDataSourceCapabilitiesResponse doGetDataSourceCapabilities​(BlockAllocator allocator,
                                                                             GetDataSourceCapabilitiesRequest request)
        Description copied from class: MetadataHandler
        Used to describe the types of capabilities supported by a data source. An engine can use this to determine what portions of the query to push down. A connector that returns any optimization will guarantee that the associated predicate will be pushed down.
        Overrides:
        doGetDataSourceCapabilities in class MetadataHandler
        Parameters:
        allocator - Tool for creating and managing Apache Arrow Blocks.
        request - Provides details about the catalog being used.
        Returns:
        A GetDataSourceCapabilitiesResponse object which returns a map of supported optimizations that the connector is advertising to the consumer. The connector assumes all responsibility for whatever is passed here.
      • doGetQueryPassthroughSchema

        public GetTableResponse doGetQueryPassthroughSchema​(BlockAllocator allocator,
                                                            GetTableRequest request)
                                                     throws Exception
        Description copied from class: MetadataHandler
        Used to get definition (field names, types, descriptions, etc...) of a Query PassThrough.
        Overrides:
        doGetQueryPassthroughSchema in class MetadataHandler
        Parameters:
        allocator - Tool for creating and managing Apache Arrow Blocks.
        request - Provides details on who made the request and which Athena catalog, database, and table they are querying.
        Returns:
        A GetTableResponse which primarily contains: 1. An Apache Arrow Schema object describing the table's columns, types, and descriptions. 2. A Set of partition column names (or empty if the table isn't partitioned).
        Throws:
        Exception