Class ElasticsearchMetadataHandler
- java.lang.Object
-
- com.amazonaws.athena.connector.lambda.handlers.MetadataHandler
-
- com.amazonaws.athena.connector.lambda.handlers.GlueMetadataHandler
-
- com.amazonaws.athena.connectors.elasticsearch.ElasticsearchMetadataHandler
-
- All Implemented Interfaces:
com.amazonaws.services.lambda.runtime.RequestStreamHandler
public class ElasticsearchMetadataHandler extends GlueMetadataHandler
This class is responsible for providing Athena with metadata about the domain (aka databases), indices, contained in your Elasticsearch instance. Additionally, this class tells Athena how to split up reads against this source. This gives you control over the level of performance and parallelism your source can support.
-
-
Nested Class Summary
-
Nested classes/interfaces inherited from class com.amazonaws.athena.connector.lambda.handlers.GlueMetadataHandler
GlueMetadataHandler.DatabaseFilter, GlueMetadataHandler.TableFilter
-
-
Field Summary
Fields Modifier and Type Field Description protected static String
INDEX_KEY
protected static String
SHARD_KEY
Key used to store shard information in the Split's properties map (later used by the Record Handler).-
Fields inherited from class com.amazonaws.athena.connector.lambda.handlers.GlueMetadataHandler
COLUMN_NAME_MAPPING_PROPERTY, DATETIME_FORMAT_MAPPING_PROPERTY, DATETIME_FORMAT_MAPPING_PROPERTY_NORMALIZED, GET_TABLES_REQUEST_MAX_RESULTS, GLUE_TABLE_CONTAINS_PREVIOUSLY_UNSUPPORTED_TYPE, SOURCE_TABLE_PROPERTY, VIEW_METADATA_FIELD
-
Fields inherited from class com.amazonaws.athena.connector.lambda.handlers.MetadataHandler
configOptions, DISABLE_SPILL_ENCRYPTION, KMS_KEY_ID_ENV, SPILL_BUCKET_ENV, SPILL_PREFIX_ENV
-
-
Constructor Summary
Constructors Modifier Constructor Description ElasticsearchMetadataHandler(Map<String,String> configOptions)
protected
ElasticsearchMetadataHandler(software.amazon.awssdk.services.glue.GlueClient awsGlue, EncryptionKeyFactory keyFactory, software.amazon.awssdk.services.secretsmanager.SecretsManagerClient awsSecretsManager, software.amazon.awssdk.services.athena.AthenaClient athena, String spillBucket, String spillPrefix, ElasticsearchDomainMapProvider domainMapProvider, AwsRestHighLevelClientFactory clientFactory, long queryTimeout, Map<String,String> configOptions)
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description protected org.apache.arrow.vector.types.pojo.Field
convertField(String fieldName, String glueType)
Maps a Glue field to an Apache Arrow Field.GetDataSourceCapabilitiesResponse
doGetDataSourceCapabilities(BlockAllocator allocator, GetDataSourceCapabilitiesRequest request)
Used to describe the types of capabilities supported by a data source.GetTableResponse
doGetQueryPassthroughSchema(BlockAllocator allocator, GetTableRequest request)
Used to get definition (field names, types, descriptions, etc...) of a Query PassThrough.GetSplitsResponse
doGetSplits(BlockAllocator allocator, GetSplitsRequest request)
Used to split-up the reads required to scan the requested index by shard.GetTableResponse
doGetTable(BlockAllocator allocator, GetTableRequest request)
Used to get definition (field names, types, descriptions, etc...) of a Table.ListSchemasResponse
doListSchemaNames(BlockAllocator allocator, ListSchemasRequest request)
Used to get the list of domains (aka databases) for the Elasticsearch service.ListTablesResponse
doListTables(BlockAllocator allocator, ListTablesRequest request)
Used to get the list of indices contained in the specified domain.void
getPartitions(BlockWriter blockWriter, GetTableLayoutRequest request, QueryStatusChecker queryStatusChecker)
Elasticsearch does not support partitioning so this method is a NoOp.-
Methods inherited from class com.amazonaws.athena.connector.lambda.handlers.GlueMetadataHandler
doGetTable, doListSchemaNames, doListTables, getAwsGlue, getCatalog, getColumnNameMapping, getSourceTableName, populateSourceTableNameIfAvailable
-
Methods inherited from class com.amazonaws.athena.connector.lambda.handlers.MetadataHandler
doGetTableLayout, doHandleRequest, doPing, enhancePartitionSchema, getSecret, handleRequest, makeEncryptionKey, makeSpillLocation, onPing, resolveSecrets
-
-
-
-
Field Detail
-
SHARD_KEY
protected static final String SHARD_KEY
Key used to store shard information in the Split's properties map (later used by the Record Handler).- See Also:
- Constant Field Values
-
INDEX_KEY
protected static final String INDEX_KEY
- See Also:
- Constant Field Values
-
-
Constructor Detail
-
ElasticsearchMetadataHandler
public ElasticsearchMetadataHandler(Map<String,String> configOptions)
-
ElasticsearchMetadataHandler
protected ElasticsearchMetadataHandler(software.amazon.awssdk.services.glue.GlueClient awsGlue, EncryptionKeyFactory keyFactory, software.amazon.awssdk.services.secretsmanager.SecretsManagerClient awsSecretsManager, software.amazon.awssdk.services.athena.AthenaClient athena, String spillBucket, String spillPrefix, ElasticsearchDomainMapProvider domainMapProvider, AwsRestHighLevelClientFactory clientFactory, long queryTimeout, Map<String,String> configOptions)
-
-
Method Detail
-
doListSchemaNames
public ListSchemasResponse doListSchemaNames(BlockAllocator allocator, ListSchemasRequest request)
Used to get the list of domains (aka databases) for the Elasticsearch service.- Overrides:
doListSchemaNames
in classGlueMetadataHandler
- Parameters:
allocator
- Tool for creating and managing Apache Arrow Blocks.request
- Provides details on who made the request and which Athena catalog they are querying.- Returns:
- A ListSchemasResponse which primarily contains a Set
of schema names and a catalog name corresponding the Athena catalog that was queried.
-
doListTables
public ListTablesResponse doListTables(BlockAllocator allocator, ListTablesRequest request) throws IOException
Used to get the list of indices contained in the specified domain.- Overrides:
doListTables
in classGlueMetadataHandler
- Parameters:
allocator
- Tool for creating and managing Apache Arrow Blocks.request
- Provides details on who made the request and which Athena catalog and database they are querying.- Returns:
- A ListTablesResponse which primarily contains a List
enumerating the tables in this catalog, database tuple. It also contains the catalog name corresponding the Athena catalog that was queried. - Throws:
RuntimeException
- when the domain does not exist in the map, or the client is unable to retrieve the indices from the Elasticsearch instance.IOException
-
doGetTable
public GetTableResponse doGetTable(BlockAllocator allocator, GetTableRequest request)
Used to get definition (field names, types, descriptions, etc...) of a Table.- Overrides:
doGetTable
in classGlueMetadataHandler
- Parameters:
allocator
- Tool for creating and managing Apache Arrow Blocks.request
- Provides details on who made the request and which Athena catalog, database, and table they are querying.- Returns:
- A GetTableResponse which primarily contains:
1. An Apache Arrow Schema object describing the table's columns, types, and descriptions.
2. A Set
of partition column names (or empty if the table isn't partitioned). 3. A TableName object confirming the schema and table name the response is for. 4. A catalog name corresponding the Athena catalog that was queried. - Throws:
RuntimeException
- when the domain does not exist in the map, or the client is unable to retrieve mapping information for the index from the Elasticsearch instance.
-
getPartitions
public void getPartitions(BlockWriter blockWriter, GetTableLayoutRequest request, QueryStatusChecker queryStatusChecker)
Elasticsearch does not support partitioning so this method is a NoOp.- Specified by:
getPartitions
in classMetadataHandler
- Parameters:
blockWriter
- Used to write rows (partitions) into the Apache Arrow response.request
- Provides details of the catalog, database, and table being queried as well as any filter predicate.queryStatusChecker
- A QueryStatusChecker that you can use to stop doing work for a query that has already terminated
-
doGetSplits
public GetSplitsResponse doGetSplits(BlockAllocator allocator, GetSplitsRequest request) throws IOException
Used to split-up the reads required to scan the requested index by shard. Cluster-health information is retrieved for shards associated with the specified index. A split will then be generated for each shard that is primary and active.- Specified by:
doGetSplits
in classMetadataHandler
- Parameters:
allocator
- Tool for creating and managing Apache Arrow Blocks.request
- Provides details of the catalog, domain, and index being queried, as well as any filter predicate.- Returns:
- A GetSplitsResponse which primarily contains:
1. A Set
each containing a domain and endpoint, and the shard to be retrieved by the Record handler. 2. (Optional) A continuation token which allows you to paginate the generation of splits for large queries. - Throws:
RuntimeException
- when the domain does not exist in the map, or an error occurs while processing the cluster/shard health information.IOException
-
doGetDataSourceCapabilities
public GetDataSourceCapabilitiesResponse doGetDataSourceCapabilities(BlockAllocator allocator, GetDataSourceCapabilitiesRequest request)
Description copied from class:MetadataHandler
Used to describe the types of capabilities supported by a data source. An engine can use this to determine what portions of the query to push down. A connector that returns any optimization will guarantee that the associated predicate will be pushed down.- Overrides:
doGetDataSourceCapabilities
in classMetadataHandler
- Parameters:
allocator
- Tool for creating and managing Apache Arrow Blocks.request
- Provides details about the catalog being used.- Returns:
- A GetDataSourceCapabilitiesResponse object which returns a map of supported optimizations that the connector is advertising to the consumer. The connector assumes all responsibility for whatever is passed here.
-
doGetQueryPassthroughSchema
public GetTableResponse doGetQueryPassthroughSchema(BlockAllocator allocator, GetTableRequest request) throws Exception
Description copied from class:MetadataHandler
Used to get definition (field names, types, descriptions, etc...) of a Query PassThrough.- Overrides:
doGetQueryPassthroughSchema
in classMetadataHandler
- Parameters:
allocator
- Tool for creating and managing Apache Arrow Blocks.request
- Provides details on who made the request and which Athena catalog, database, and table they are querying.- Returns:
- A GetTableResponse which primarily contains:
1. An Apache Arrow Schema object describing the table's columns, types, and descriptions.
2. A Set
of partition column names (or empty if the table isn't partitioned). - Throws:
Exception
-
convertField
protected org.apache.arrow.vector.types.pojo.Field convertField(String fieldName, String glueType)
Description copied from class:GlueMetadataHandler
Maps a Glue field to an Apache Arrow Field.- Overrides:
convertField
in classGlueMetadataHandler
- Parameters:
fieldName
- The name of the field in Glue.glueType
- The type of the field in Glue.- Returns:
- The corresponding Apache Arrow Field.
- See Also:
GlueMetadataHandler
-
-