Class HbaseMetadataHandler
- java.lang.Object
-
- com.amazonaws.athena.connector.lambda.handlers.MetadataHandler
-
- com.amazonaws.athena.connector.lambda.handlers.GlueMetadataHandler
-
- com.amazonaws.athena.connectors.hbase.HbaseMetadataHandler
-
- All Implemented Interfaces:
FederationRequestHandler,com.amazonaws.services.lambda.runtime.RequestStreamHandler
public class HbaseMetadataHandler extends GlueMetadataHandler
Handles metadata requests for the Athena HBase Connector.For more detail, please see the module's README.md, some notable characteristics of this class include:
1. Uses a Glue table property (hbase-metadata-flag) to indicate that the table (whose name matched the HBase table name) can indeed be used to supplement metadata from HBase itself. 2. Uses a Glue table property (hbase-native-storage-flag) to indicate that the table is stored in HBase using native byte storage (e.g. int as 4 BYTES instead of int serialized as a String). 3. Attempts to resolve sensitive fields such as HBase connection strings via SecretsManager so that you can substitute variables with values from by doing something like hostname:port:password=${my_secret}
-
-
Nested Class Summary
-
Nested classes/interfaces inherited from class com.amazonaws.athena.connector.lambda.handlers.GlueMetadataHandler
GlueMetadataHandler.DatabaseFilter, GlueMetadataHandler.TableFilter
-
-
Field Summary
Fields Modifier and Type Field Description protected static StringEND_KEY_FIELDprotected static StringHBASE_CONN_STRprotected static StringHBASE_NATIVE_STORAGE_FLAGprotected static StringREGION_ID_FIELDprotected static StringREGION_NAME_FIELDprotected static StringSTART_KEY_FIELD-
Fields inherited from class com.amazonaws.athena.connector.lambda.handlers.GlueMetadataHandler
COLUMN_NAME_MAPPING_PROPERTY, DATETIME_FORMAT_MAPPING_PROPERTY, DATETIME_FORMAT_MAPPING_PROPERTY_NORMALIZED, GET_TABLES_REQUEST_MAX_RESULTS, GLUE_TABLE_CONTAINS_PREVIOUSLY_UNSUPPORTED_TYPE, SOURCE_TABLE_PROPERTY, VIEW_METADATA_FIELD
-
Fields inherited from class com.amazonaws.athena.connector.lambda.handlers.MetadataHandler
configOptions, DISABLE_SPILL_ENCRYPTION, KMS_KEY_ID_ENV, SPILL_BUCKET_ENV, SPILL_PREFIX_ENV
-
-
Constructor Summary
Constructors Modifier Constructor Description HbaseMetadataHandler(Map<String,String> configOptions)protectedHbaseMetadataHandler(software.amazon.awssdk.services.glue.GlueClient awsGlue, EncryptionKeyFactory keyFactory, software.amazon.awssdk.services.secretsmanager.SecretsManagerClient secretsManager, software.amazon.awssdk.services.athena.AthenaClient athena, HbaseConnectionFactory connectionFactory, String spillBucket, String spillPrefix, Map<String,String> configOptions)
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description protected org.apache.arrow.vector.types.pojo.FieldconvertField(String name, String glueType)Maps a Glue field to an Apache Arrow Field.GetDataSourceCapabilitiesResponsedoGetDataSourceCapabilities(BlockAllocator allocator, GetDataSourceCapabilitiesRequest request)Used to describe the types of capabilities supported by a data source.GetTableResponsedoGetQueryPassthroughSchema(BlockAllocator allocator, GetTableRequest request)Used to get definition (field names, types, descriptions, etc...) of a Query PassThrough.GetSplitsResponsedoGetSplits(BlockAllocator blockAllocator, GetSplitsRequest request)If the table is spread across multiple region servers, then we parallelize the scan by making each region server a split.GetTableResponsedoGetTable(BlockAllocator blockAllocator, GetTableRequest request)If Glue is enabled as a source of supplemental metadata we look up the requested Schema/Table in Glue and filters out any results that don't have the HBASE_METADATA_FLAG set.ListSchemasResponsedoListSchemaNames(BlockAllocator blockAllocator, ListSchemasRequest request)List namespaces in your HBase instance treating each as a 'schema' (aka database)ListTablesResponsedoListTables(BlockAllocator blockAllocator, ListTablesRequest request)List tables in the requested schema in your HBase instance treating the requested schema as an HBase namespace.voidgetPartitions(BlockWriter blockWriter, GetTableLayoutRequest request, QueryStatusChecker queryStatusChecker)Our table doesn't support complex layouts or partitioning so leave this as a NoOp and the SDK will notice that we do not have any partition columns, nor have we set an custom fields using enhancePartitionSchema(...), and as a result the SDK will generate a single place holder partition for us.protected GetSplitsResponsesetupQueryPassthroughSplit(GetSplitsRequest request)Helper function that provides a single partition for Query Pass-Through-
Methods inherited from class com.amazonaws.athena.connector.lambda.handlers.GlueMetadataHandler
doGetTable, doListSchemaNames, doListTables, getAwsGlue, getCatalog, getColumnNameMapping, getSourceTableName, populateSourceTableNameIfAvailable
-
Methods inherited from class com.amazonaws.athena.connector.lambda.handlers.MetadataHandler
doGetTableLayout, doHandleRequest, doPing, enhancePartitionSchema, getCachableSecretsManager, getRequestOverrideConfig, getSecret, handleRequest, makeEncryptionKey, makeSpillLocation, onPing, resolveSecrets, resolveWithDefaultCredentials
-
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
-
Methods inherited from interface com.amazonaws.athena.connector.lambda.handlers.FederationRequestHandler
getAthenaClient, getRequestOverrideConfig, getS3Client, getSessionCredentials
-
-
-
-
Field Detail
-
HBASE_NATIVE_STORAGE_FLAG
protected static final String HBASE_NATIVE_STORAGE_FLAG
- See Also:
- Constant Field Values
-
HBASE_CONN_STR
protected static final String HBASE_CONN_STR
- See Also:
- Constant Field Values
-
START_KEY_FIELD
protected static final String START_KEY_FIELD
- See Also:
- Constant Field Values
-
END_KEY_FIELD
protected static final String END_KEY_FIELD
- See Also:
- Constant Field Values
-
REGION_ID_FIELD
protected static final String REGION_ID_FIELD
- See Also:
- Constant Field Values
-
REGION_NAME_FIELD
protected static final String REGION_NAME_FIELD
- See Also:
- Constant Field Values
-
-
Constructor Detail
-
HbaseMetadataHandler
protected HbaseMetadataHandler(software.amazon.awssdk.services.glue.GlueClient awsGlue, EncryptionKeyFactory keyFactory, software.amazon.awssdk.services.secretsmanager.SecretsManagerClient secretsManager, software.amazon.awssdk.services.athena.AthenaClient athena, HbaseConnectionFactory connectionFactory, String spillBucket, String spillPrefix, Map<String,String> configOptions)
-
-
Method Detail
-
doGetDataSourceCapabilities
public GetDataSourceCapabilitiesResponse doGetDataSourceCapabilities(BlockAllocator allocator, GetDataSourceCapabilitiesRequest request)
Description copied from class:MetadataHandlerUsed to describe the types of capabilities supported by a data source. An engine can use this to determine what portions of the query to push down. A connector that returns any optimization will guarantee that the associated predicate will be pushed down.- Overrides:
doGetDataSourceCapabilitiesin classMetadataHandler- Parameters:
allocator- Tool for creating and managing Apache Arrow Blocks.request- Provides details about the catalog being used.- Returns:
- A GetDataSourceCapabilitiesResponse object which returns a map of supported optimizations that the connector is advertising to the consumer. The connector assumes all responsibility for whatever is passed here.
-
doListSchemaNames
public ListSchemasResponse doListSchemaNames(BlockAllocator blockAllocator, ListSchemasRequest request) throws IOException
List namespaces in your HBase instance treating each as a 'schema' (aka database)- Overrides:
doListSchemaNamesin classGlueMetadataHandler- Parameters:
blockAllocator- Tool for creating and managing Apache Arrow Blocks.request- Provides details on who made the request and which Athena catalog they are querying.- Returns:
- The ListSchemasResponse which mostly contains the list of schemas (aka databases).
- Throws:
IOException- See Also:
GlueMetadataHandler
-
doListTables
public ListTablesResponse doListTables(BlockAllocator blockAllocator, ListTablesRequest request)
List tables in the requested schema in your HBase instance treating the requested schema as an HBase namespace.- Overrides:
doListTablesin classGlueMetadataHandler- Parameters:
blockAllocator- Tool for creating and managing Apache Arrow Blocks.request- Provides details on who made the request and which Athena catalog they are querying.- Returns:
- The ListTablesResponse which mostly contains the list of table names.
- See Also:
GlueMetadataHandler
-
doGetTable
public GetTableResponse doGetTable(BlockAllocator blockAllocator, GetTableRequest request) throws Exception
If Glue is enabled as a source of supplemental metadata we look up the requested Schema/Table in Glue and filters out any results that don't have the HBASE_METADATA_FLAG set. If no matching results were found in Glue, then we resort to inferring the schema of the HBase table using HbaseSchemaUtils.inferSchema(...). If there is no such table in HBase the operation will fail.- Overrides:
doGetTablein classGlueMetadataHandler- Parameters:
blockAllocator- Tool for creating and managing Apache Arrow Blocks.request- Provides details on who made the request and which Athena catalog, database, and table they are querying.- Returns:
- A GetTableResponse mostly containing the columns, their types, and any table properties for the requested table.
- Throws:
Exception- See Also:
GlueMetadataHandler
-
getPartitions
public void getPartitions(BlockWriter blockWriter, GetTableLayoutRequest request, QueryStatusChecker queryStatusChecker)
Our table doesn't support complex layouts or partitioning so leave this as a NoOp and the SDK will notice that we do not have any partition columns, nor have we set an custom fields using enhancePartitionSchema(...), and as a result the SDK will generate a single place holder partition for us. This is because we need to convey that there is at least 1 partition to read as part of the query or Athena will assume partition pruning found no candidate layouts to read.- Specified by:
getPartitionsin classMetadataHandler- Parameters:
blockWriter- Used to write rows (partitions) into the Apache Arrow response.request- Provides details of the catalog, database, and table being queried as well as any filter predicate.queryStatusChecker- A QueryStatusChecker that you can use to stop doing work for a query that has already terminated- See Also:
GlueMetadataHandler
-
doGetSplits
public GetSplitsResponse doGetSplits(BlockAllocator blockAllocator, GetSplitsRequest request) throws IOException
If the table is spread across multiple region servers, then we parallelize the scan by making each region server a split.- Specified by:
doGetSplitsin classMetadataHandler- Parameters:
blockAllocator- Tool for creating and managing Apache Arrow Blocks.request- Provides details of the catalog, database, table, andpartition(s) being queried as well as any filter predicate.- Returns:
- A GetSplitsResponse which primarily contains:
1. A Set
which represent read operations Amazon Athena must perform by calling your read function. 2. (Optional) A continuation token which allows you to paginate the generation of splits for large queries. - Throws:
IOException- See Also:
GlueMetadataHandler
-
convertField
protected org.apache.arrow.vector.types.pojo.Field convertField(String name, String glueType)
Description copied from class:GlueMetadataHandlerMaps a Glue field to an Apache Arrow Field.- Overrides:
convertFieldin classGlueMetadataHandler- Parameters:
name- The name of the field in Glue.glueType- The type of the field in Glue.- Returns:
- The corresponding Apache Arrow Field.
- See Also:
GlueMetadataHandler
-
doGetQueryPassthroughSchema
public GetTableResponse doGetQueryPassthroughSchema(BlockAllocator allocator, GetTableRequest request) throws Exception
Description copied from class:MetadataHandlerUsed to get definition (field names, types, descriptions, etc...) of a Query PassThrough.- Overrides:
doGetQueryPassthroughSchemain classMetadataHandler- Parameters:
allocator- Tool for creating and managing Apache Arrow Blocks.request- Provides details on who made the request and which Athena catalog, database, and table they are querying.- Returns:
- A GetTableResponse which primarily contains:
1. An Apache Arrow Schema object describing the table's columns, types, and descriptions.
2. A Set
of partition column names (or empty if the table isn't partitioned). - Throws:
Exception
-
setupQueryPassthroughSplit
protected GetSplitsResponse setupQueryPassthroughSplit(GetSplitsRequest request)
Helper function that provides a single partition for Query Pass-Through
-
-