Class NeptuneMetadataHandler

  • All Implemented Interfaces:
    com.amazonaws.services.lambda.runtime.RequestStreamHandler

    public class NeptuneMetadataHandler
    extends GlueMetadataHandler
    This class is part of an tutorial that will walk you through how to build a connector for your custom data source. The README for this module (athena-example) will guide you through preparing your development environment, modifying this example Metadatahandler, building, deploying, and then using your new source in an Athena query.

    More specifically, this class is responsible for providing Athena with metadata about the schemas (aka databases), tables, and table partitions that your source contains. Lastly, this class tells Athena how to split up reads against this source. This gives you control over the level of performance and parallelism your source can support.

    For more examples, please see the other connectors in this repository (e.g. athena-cloudwatch, athena-docdb, etc...)

    • Constructor Detail

      • NeptuneMetadataHandler

        public NeptuneMetadataHandler​(Map<String,​String> configOptions)
      • NeptuneMetadataHandler

        protected NeptuneMetadataHandler​(software.amazon.awssdk.services.glue.GlueClient glue,
                                         NeptuneConnection neptuneConnection,
                                         EncryptionKeyFactory keyFactory,
                                         software.amazon.awssdk.services.secretsmanager.SecretsManagerClient awsSecretsManager,
                                         software.amazon.awssdk.services.athena.AthenaClient athena,
                                         String spillBucket,
                                         String spillPrefix,
                                         Map<String,​String> configOptions)
    • Method Detail

      • doGetDataSourceCapabilities

        public GetDataSourceCapabilitiesResponse doGetDataSourceCapabilities​(BlockAllocator allocator,
                                                                             GetDataSourceCapabilitiesRequest request)
        Description copied from class: MetadataHandler
        Used to describe the types of capabilities supported by a data source. An engine can use this to determine what portions of the query to push down. A connector that returns any optimization will guarantee that the associated predicate will be pushed down.
        Overrides:
        doGetDataSourceCapabilities in class MetadataHandler
        Parameters:
        allocator - Tool for creating and managing Apache Arrow Blocks.
        request - Provides details about the catalog being used.
        Returns:
        A GetDataSourceCapabilitiesResponse object which returns a map of supported optimizations that the connector is advertising to the consumer. The connector assumes all responsibility for whatever is passed here.
      • doListSchemaNames

        public ListSchemasResponse doListSchemaNames​(BlockAllocator allocator,
                                                     ListSchemasRequest request)
        Since the entire Neptune cluster is considered as a single graph database, just return the glue database name provided as a single database (schema) name.
        Overrides:
        doListSchemaNames in class GlueMetadataHandler
        Parameters:
        allocator - Tool for creating and managing Apache Arrow Blocks.
        request - Provides details on who made the request and which Athena catalog they are querying.
        Returns:
        A ListSchemasResponse which primarily contains a Set of schema names and a catalog name corresponding the Athena catalog that was queried.
        See Also:
        GlueMetadataHandler
      • doListTables

        public ListTablesResponse doListTables​(BlockAllocator allocator,
                                               ListTablesRequest request)
        Used to get the list of tables that this data source contains. In this case, fetch list of tables in the Glue database provided.
        Overrides:
        doListTables in class GlueMetadataHandler
        Parameters:
        allocator - Tool for creating and managing Apache Arrow Blocks.
        request - Provides details on who made the request and which Athena catalog and database they are querying.
        Returns:
        A ListTablesResponse which primarily contains a List enumerating the tables in this catalog, database tuple. It also contains the catalog name corresponding the Athena catalog that was queried.
        See Also:
        GlueMetadataHandler
      • doGetTable

        public GetTableResponse doGetTable​(BlockAllocator blockAllocator,
                                           GetTableRequest request)
                                    throws Exception
        Used to get definition (field names, types, descriptions, etc...) of a Table.
        Overrides:
        doGetTable in class GlueMetadataHandler
        Parameters:
        allocator - Tool for creating and managing Apache Arrow Blocks.
        request - Provides details on who made the request and which Athena catalog, database, and table they are querying.
        Returns:
        A GetTableResponse which primarily contains: 1. An Apache Arrow Schema object describing the table's columns, types, and descriptions. 2. A Set of partition column names (or empty if the table isn't partitioned). 3. A TableName object confirming the schema and table name the response is for. 4. A catalog name corresponding the Athena catalog that was queried.
        Throws:
        Exception
      • getPartitions

        public void getPartitions​(BlockWriter blockWriter,
                                  GetTableLayoutRequest request,
                                  QueryStatusChecker queryStatusChecker)
                           throws Exception
        Our table doesn't support complex layouts or partitioning so we simply make this method a NoOp.
        Specified by:
        getPartitions in class MetadataHandler
        Parameters:
        blockWriter - Used to write rows (partitions) into the Apache Arrow response.
        request - Provides details of the catalog, database, and table being queried as well as any filter predicate.
        queryStatusChecker - A QueryStatusChecker that you can use to stop doing work for a query that has already terminated
        Throws:
        Exception
      • doGetSplits

        public GetSplitsResponse doGetSplits​(BlockAllocator blockAllocator,
                                             GetSplitsRequest request)
        Used to split-up the reads required to scan the requested batch of partition(s).
        Specified by:
        doGetSplits in class MetadataHandler
        Parameters:
        blockAllocator - Tool for creating and managing Apache Arrow Blocks.
        request - Provides details of the catalog, database, table, andpartition(s) being queried as well as any filter predicate.
        Returns:
        A GetSplitsResponse which primarily contains: 1. A Set which represent read operations Amazon Athena must perform by calling your read function. 2. (Optional) A continuation token which allows you to paginate the generation of splits for large queries.
      • convertField

        protected org.apache.arrow.vector.types.pojo.Field convertField​(String name,
                                                                        String glueType)
        Description copied from class: GlueMetadataHandler
        Maps a Glue field to an Apache Arrow Field.
        Overrides:
        convertField in class GlueMetadataHandler
        Parameters:
        name - The name of the field in Glue.
        glueType - The type of the field in Glue.
        Returns:
        The corresponding Apache Arrow Field.
      • doGetQueryPassthroughSchema

        public GetTableResponse doGetQueryPassthroughSchema​(BlockAllocator allocator,
                                                            GetTableRequest request)
                                                     throws Exception
        Description copied from class: MetadataHandler
        Used to get definition (field names, types, descriptions, etc...) of a Query PassThrough.
        Overrides:
        doGetQueryPassthroughSchema in class MetadataHandler
        Parameters:
        allocator - Tool for creating and managing Apache Arrow Blocks.
        request - Provides details on who made the request and which Athena catalog, database, and table they are querying.
        Returns:
        A GetTableResponse which primarily contains: 1. An Apache Arrow Schema object describing the table's columns, types, and descriptions. 2. A Set of partition column names (or empty if the table isn't partitioned).
        Throws:
        Exception