Class DataLakeGen2MetadataHandler

  • All Implemented Interfaces:
    com.amazonaws.services.lambda.runtime.RequestStreamHandler

    public class DataLakeGen2MetadataHandler
    extends JdbcMetadataHandler
    • Method Detail

      • doGetDataSourceCapabilities

        public GetDataSourceCapabilitiesResponse doGetDataSourceCapabilities​(BlockAllocator allocator,
                                                                             GetDataSourceCapabilitiesRequest request)
        Description copied from class: MetadataHandler
        Used to describe the types of capabilities supported by a data source. An engine can use this to determine what portions of the query to push down. A connector that returns any optimization will guarantee that the associated predicate will be pushed down.
        Overrides:
        doGetDataSourceCapabilities in class MetadataHandler
        Parameters:
        allocator - Tool for creating and managing Apache Arrow Blocks.
        request - Provides details about the catalog being used.
        Returns:
        A GetDataSourceCapabilitiesResponse object which returns a map of supported optimizations that the connector is advertising to the consumer. The connector assumes all responsibility for whatever is passed here.
      • getPartitionSchema

        public org.apache.arrow.vector.types.pojo.Schema getPartitionSchema​(String catalogName)
        Description copied from class: JdbcMetadataHandler
        Delegates creation of partition schema to database type implementation.
        Specified by:
        getPartitionSchema in class JdbcMetadataHandler
        Parameters:
        catalogName - Athena provided catalog name.
        Returns:
        schema. See Schema
      • getPartitions

        public void getPartitions​(BlockWriter blockWriter,
                                  GetTableLayoutRequest getTableLayoutRequest,
                                  QueryStatusChecker queryStatusChecker)
        The partitions are being implemented based on the type of data externally in case of Gen 2. Considering the ADLS Gen2 data has already been partitioned and distributed within Gen 2 storage system, connector will fetch data as single split.
        Specified by:
        getPartitions in class JdbcMetadataHandler
        Parameters:
        blockWriter -
        getTableLayoutRequest -
        queryStatusChecker -
        Throws:
        Exception
      • doGetSplits

        public GetSplitsResponse doGetSplits​(BlockAllocator blockAllocator,
                                             GetSplitsRequest getSplitsRequest)
        Description copied from class: MetadataHandler
        Used to split-up the reads required to scan the requested batch of partition(s).
        Specified by:
        doGetSplits in class JdbcMetadataHandler
        Parameters:
        blockAllocator - Tool for creating and managing Apache Arrow Blocks.
        getSplitsRequest - Provides details of the catalog, database, table, andpartition(s) being queried as well as any filter predicate.
        Returns:
        A GetSplitsResponse which primarily contains: 1. A Set which represent read operations Amazon Athena must perform by calling your read function. 2. (Optional) A continuation token which allows you to paginate the generation of splits for large queries.
      • doGetTable

        public GetTableResponse doGetTable​(BlockAllocator blockAllocator,
                                           GetTableRequest getTableRequest)
                                    throws Exception
        Description copied from class: MetadataHandler
        Used to get definition (field names, types, descriptions, etc...) of a Table.
        Overrides:
        doGetTable in class JdbcMetadataHandler
        Parameters:
        blockAllocator - Tool for creating and managing Apache Arrow Blocks.
        getTableRequest - Provides details on who made the request and which Athena catalog, database, and table they are querying.
        Returns:
        A GetTableResponse which primarily contains: 1. An Apache Arrow Schema object describing the table's columns, types, and descriptions. 2. A Set of partition column names (or empty if the table isn't partitioned).
        Throws:
        Exception