Class MetadataHandler

    • Constructor Detail

      • MetadataHandler

        public MetadataHandler​(String sourceType,
                               Map<String,​String> configOptions)
        When MetadataHandler is used as a Lambda, the "Main" class will pass in System.getenv() as the configOptions. Otherwise in situations where the connector is used outside of a Lambda, it may be a config map that is passed in.
        Parameters:
        sourceType - Used to aid in logging diagnostic info when raising a support case.
      • MetadataHandler

        public MetadataHandler​(EncryptionKeyFactory encryptionKeyFactory,
                               software.amazon.awssdk.services.secretsmanager.SecretsManagerClient secretsManager,
                               software.amazon.awssdk.services.athena.AthenaClient athena,
                               String sourceType,
                               String spillBucket,
                               String spillPrefix,
                               Map<String,​String> configOptions)
        Parameters:
        sourceType - Used to aid in logging diagnostic info when raising a support case.
    • Method Detail

      • resolveSecrets

        protected String resolveSecrets​(String rawString)
        Resolves any secrets found in the supplied string, for example: MyString${WithSecret} would have ${WithSecret} by the corresponding value of the secret in AWS Secrets Manager with that name. If no such secret is found the function throws.
        Parameters:
        rawString - The string in which you'd like to replace SecretsManager placeholders. (e.g. ThisIsA${Secret}Here - The ${Secret} would be replaced with the contents of an SecretsManager secret called Secret. If no such secret is found, the function throws. If no ${} are found in the input string, nothing is replaced and the original string is returned.
      • getSecret

        protected String getSecret​(String secretName)
      • makeEncryptionKey

        protected EncryptionKey makeEncryptionKey()
      • makeSpillLocation

        protected SpillLocation makeSpillLocation​(MetadataRequest request)
        Used to make a spill location for a split. Each split should have a unique spill location, so be sure to call this method once per split!
        Parameters:
        request -
        Returns:
        A unique spill location.
      • handleRequest

        public final void handleRequest​(InputStream inputStream,
                                        OutputStream outputStream,
                                        com.amazonaws.services.lambda.runtime.Context context)
                                 throws IOException
        Specified by:
        handleRequest in interface com.amazonaws.services.lambda.runtime.RequestStreamHandler
        Throws:
        IOException
      • doListSchemaNames

        public abstract ListSchemasResponse doListSchemaNames​(BlockAllocator allocator,
                                                              ListSchemasRequest request)
                                                       throws Exception
        Used to get the list of schemas (aka databases) that this source contains.
        Parameters:
        allocator - Tool for creating and managing Apache Arrow Blocks.
        request - Provides details on who made the request and which Athena catalog they are querying.
        Returns:
        A ListSchemasResponse which primarily contains a Set of schema names and a catalog name corresponding the Athena catalog that was queried.
        Throws:
        Exception
      • doListTables

        public abstract ListTablesResponse doListTables​(BlockAllocator allocator,
                                                        ListTablesRequest request)
                                                 throws Exception
        Used to get the list of tables that this source contains.
        Parameters:
        allocator - Tool for creating and managing Apache Arrow Blocks.
        request - Provides details on who made the request and which Athena catalog and database they are querying.
        Returns:
        A ListTablesResponse which primarily contains a List enumerating the tables in this catalog, database tuple. It also contains the catalog name corresponding the Athena catalog that was queried.
        Throws:
        Exception
      • doGetQueryPassthroughSchema

        public GetTableResponse doGetQueryPassthroughSchema​(BlockAllocator allocator,
                                                            GetTableRequest request)
                                                     throws Exception
        Used to get definition (field names, types, descriptions, etc...) of a Query PassThrough.
        Parameters:
        allocator - Tool for creating and managing Apache Arrow Blocks.
        request - Provides details on who made the request and which Athena catalog, database, and table they are querying.
        Returns:
        A GetTableResponse which primarily contains: 1. An Apache Arrow Schema object describing the table's columns, types, and descriptions. 2. A Set of partition column names (or empty if the table isn't partitioned).
        Throws:
        Exception
      • doGetTable

        public abstract GetTableResponse doGetTable​(BlockAllocator allocator,
                                                    GetTableRequest request)
                                             throws Exception
        Used to get definition (field names, types, descriptions, etc...) of a Table.
        Parameters:
        allocator - Tool for creating and managing Apache Arrow Blocks.
        request - Provides details on who made the request and which Athena catalog, database, and table they are querying.
        Returns:
        A GetTableResponse which primarily contains: 1. An Apache Arrow Schema object describing the table's columns, types, and descriptions. 2. A Set of partition column names (or empty if the table isn't partitioned).
        Throws:
        Exception
      • doGetTableLayout

        public GetTableLayoutResponse doGetTableLayout​(BlockAllocator allocator,
                                                       GetTableLayoutRequest request)
                                                throws Exception
        Used to get the partitions that must be read from the request table in order to satisfy the requested predicate.
        Parameters:
        allocator - Tool for creating and managing Apache Arrow Blocks.
        request - Provides details of the catalog, database, and table being queried as well as any filter predicate.
        Returns:
        A GetTableLayoutResponse which primarily contains: 1. An Apache Arrow Block with 0 or more partitions to read. 0 partitions implies there are 0 rows to read. 2. Set of partition column names which should correspond to columns in your Apache Arrow Block.
        Throws:
        Exception
      • enhancePartitionSchema

        public void enhancePartitionSchema​(SchemaBuilder partitionSchemaBuilder,
                                           GetTableLayoutRequest request)
        This method can be used to add additional fields to the schema of our partition response. Athena expects each partitions in the response to have a column corresponding to your partition columns. You can choose to add additional columns to that response which Athena will ignore but will pass on to you when it call GetSplits(...) for each partition.
        Parameters:
        partitionSchemaBuilder - The SchemaBuilder you can use to add additional columns and metadata to the partitions response.
        request - The GetTableLayoutResquest that triggered this call.
      • getPartitions

        public abstract void getPartitions​(BlockWriter blockWriter,
                                           GetTableLayoutRequest request,
                                           QueryStatusChecker queryStatusChecker)
                                    throws Exception
        Used to get the partitions that must be read from the request table in order to satisfy the requested predicate.
        Parameters:
        blockWriter - Used to write rows (partitions) into the Apache Arrow response.
        request - Provides details of the catalog, database, and table being queried as well as any filter predicate.
        queryStatusChecker - A QueryStatusChecker that you can use to stop doing work for a query that has already terminated
        Throws:
        Exception
      • doGetSplits

        public abstract GetSplitsResponse doGetSplits​(BlockAllocator allocator,
                                                      GetSplitsRequest request)
                                               throws Exception
        Used to split-up the reads required to scan the requested batch of partition(s).
        Parameters:
        allocator - Tool for creating and managing Apache Arrow Blocks.
        request - Provides details of the catalog, database, table, andpartition(s) being queried as well as any filter predicate.
        Returns:
        A GetSplitsResponse which primarily contains: 1. A Set which represent read operations Amazon Athena must perform by calling your read function. 2. (Optional) A continuation token which allows you to paginate the generation of splits for large queries.
        Throws:
        Exception
      • doGetDataSourceCapabilities

        public GetDataSourceCapabilitiesResponse doGetDataSourceCapabilities​(BlockAllocator allocator,
                                                                             GetDataSourceCapabilitiesRequest request)
        Used to describe the types of capabilities supported by a data source. An engine can use this to determine what portions of the query to push down. A connector that returns any optimization will guarantee that the associated predicate will be pushed down.
        Parameters:
        allocator - Tool for creating and managing Apache Arrow Blocks.
        request - Provides details about the catalog being used.
        Returns:
        A GetDataSourceCapabilitiesResponse object which returns a map of supported optimizations that the connector is advertising to the consumer. The connector assumes all responsibility for whatever is passed here.
      • doPing

        public PingResponse doPing​(PingRequest request)
        Used to warm up your function as well as to discovery its capabilities (e.g. SDK capabilities)
        Parameters:
        request - The PingRequest.
        Returns:
        A PingResponse.
      • onPing

        public void onPing​(PingRequest request)
        Provides you a signal that can be used to warm up your function.
        Parameters:
        request - The PingRequest.