Class RedisMetadataHandler

  • All Implemented Interfaces:
    com.amazonaws.services.lambda.runtime.RequestStreamHandler

    public class RedisMetadataHandler
    extends GlueMetadataHandler
    Handles metadata requests for the Athena Redis Connector using Glue for schema.

    For more detail, please see the module's README.md, some notable characteristics of this class include:

    1. Uses Glue table properties (redis-endpoint, redis-value-type, redis-key-prefix, redis-keys-zset, redis-ssl-flag, redis-cluster-flag, and redis-db-number) to provide schema as well as connectivity details to Redis. 2. Attempts to resolve sensitive fields such as redis-endpoint via SecretsManager so that you can substitute variables with values from by doing something like hostname:port:password=${my_secret}

    • Constructor Detail

      • RedisMetadataHandler

        public RedisMetadataHandler​(Map<String,​String> configOptions)
      • RedisMetadataHandler

        protected RedisMetadataHandler​(software.amazon.awssdk.services.glue.GlueClient awsGlue,
                                       EncryptionKeyFactory keyFactory,
                                       software.amazon.awssdk.services.secretsmanager.SecretsManagerClient secretsManager,
                                       software.amazon.awssdk.services.athena.AthenaClient athena,
                                       RedisConnectionFactory redisConnectionFactory,
                                       String spillBucket,
                                       String spillPrefix,
                                       Map<String,​String> configOptions)
    • Method Detail

      • doGetDataSourceCapabilities

        public GetDataSourceCapabilitiesResponse doGetDataSourceCapabilities​(BlockAllocator allocator,
                                                                             GetDataSourceCapabilitiesRequest request)
        Description copied from class: MetadataHandler
        Used to describe the types of capabilities supported by a data source. An engine can use this to determine what portions of the query to push down. A connector that returns any optimization will guarantee that the associated predicate will be pushed down.
        Overrides:
        doGetDataSourceCapabilities in class MetadataHandler
        Parameters:
        allocator - Tool for creating and managing Apache Arrow Blocks.
        request - Provides details about the catalog being used.
        Returns:
        A GetDataSourceCapabilitiesResponse object which returns a map of supported optimizations that the connector is advertising to the consumer. The connector assumes all responsibility for whatever is passed here.
      • doGetTable

        public GetTableResponse doGetTable​(BlockAllocator blockAllocator,
                                           GetTableRequest request)
                                    throws Exception
        Retrieves the schema for the request Table from Glue then enriches that result with Redis specific metadata and columns.
        Overrides:
        doGetTable in class GlueMetadataHandler
        Parameters:
        blockAllocator - Tool for creating and managing Apache Arrow Blocks.
        request - Provides details on who made the request and which Athena catalog, database, and table they are querying.
        Returns:
        A GetTableResponse mostly containing the columns, their types, and any table properties for the requested table.
        Throws:
        Exception
      • doGetQueryPassthroughSchema

        public GetTableResponse doGetQueryPassthroughSchema​(BlockAllocator allocator,
                                                            GetTableRequest request)
                                                     throws Exception
        Description copied from class: MetadataHandler
        Used to get definition (field names, types, descriptions, etc...) of a Query PassThrough.
        Overrides:
        doGetQueryPassthroughSchema in class MetadataHandler
        Parameters:
        allocator - Tool for creating and managing Apache Arrow Blocks.
        request - Provides details on who made the request and which Athena catalog, database, and table they are querying.
        Returns:
        A GetTableResponse which primarily contains: 1. An Apache Arrow Schema object describing the table's columns, types, and descriptions. 2. A Set of partition column names (or empty if the table isn't partitioned).
        Throws:
        Exception
      • enhancePartitionSchema

        public void enhancePartitionSchema​(SchemaBuilder partitionSchemaBuilder,
                                           GetTableLayoutRequest request)
        Description copied from class: MetadataHandler
        This method can be used to add additional fields to the schema of our partition response. Athena expects each partitions in the response to have a column corresponding to your partition columns. You can choose to add additional columns to that response which Athena will ignore but will pass on to you when it call GetSplits(...) for each partition.
        Overrides:
        enhancePartitionSchema in class MetadataHandler
        Parameters:
        partitionSchemaBuilder - The SchemaBuilder you can use to add additional columns and metadata to the partitions response.
        request - The GetTableLayoutResquest that triggered this call.
      • getPartitions

        public void getPartitions​(BlockWriter blockWriter,
                                  GetTableLayoutRequest request,
                                  QueryStatusChecker queryStatusChecker)
                           throws Exception
        Even though our table doesn't support complex layouts or partitioning, we need to convey that there is at least 1 partition to read as part of the query or Athena will assume partition pruning found no candidate layouts to read. We also use this 1 partition to carry settings that we will need in order to generate splits.
        Specified by:
        getPartitions in class MetadataHandler
        Parameters:
        blockWriter - Used to write rows (partitions) into the Apache Arrow response.
        request - Provides details of the catalog, database, and table being queried as well as any filter predicate.
        queryStatusChecker - A QueryStatusChecker that you can use to stop doing work for a query that has already terminated
        Throws:
        Exception
      • doGetSplits

        public GetSplitsResponse doGetSplits​(BlockAllocator blockAllocator,
                                             GetSplitsRequest request)
        If the table is comprised of multiple key prefixes, then we parallelize those by making them each a split.
        Specified by:
        doGetSplits in class MetadataHandler
        Parameters:
        blockAllocator - Tool for creating and managing Apache Arrow Blocks.
        request - Provides details of the catalog, database, table, andpartition(s) being queried as well as any filter predicate.
        Returns:
        A GetSplitsResponse which primarily contains: 1. A Set which represent read operations Amazon Athena must perform by calling your read function. 2. (Optional) A continuation token which allows you to paginate the generation of splits for large queries.
      • convertField

        protected org.apache.arrow.vector.types.pojo.Field convertField​(String name,
                                                                        String type)
        Overrides the default Glue Type to Apache Arrow Type mapping so that we can fail fast on tables which define types that are not supported by this connector.
        Overrides:
        convertField in class GlueMetadataHandler
        Parameters:
        name - The name of the field in Glue.
        type - The type of the field in Glue.
        Returns:
        The corresponding Apache Arrow Field.