Class StorageMetadata
- java.lang.Object
-
- com.amazonaws.athena.connectors.gcs.storage.StorageMetadata
-
public class StorageMetadata extends Object
-
-
Constructor Summary
Constructors Constructor Description StorageMetadata(String gcsCredentialJsonString)
Instantiate a storage data source object with provided config
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description org.apache.arrow.vector.types.pojo.Schema
buildTableSchema(software.amazon.awssdk.services.glue.model.Table table, org.apache.arrow.memory.BufferAllocator allocator)
Builds the table schema based on the provided fieldprotected Optional<String>
getAnyFilenameInPath(String bucket, String prefixPath)
Retrieves the filename of any file that has a non-zero size within the bucket/prefixprotected org.apache.arrow.vector.types.pojo.Schema
getFileSchema(String bucketName, String path, org.apache.arrow.dataset.file.FileFormat format, org.apache.arrow.memory.BufferAllocator allocator)
List<Map<String,String>>
getPartitionFolders(org.apache.arrow.vector.types.pojo.Schema schema, TableName tableInfo, Constraints constraints, software.amazon.awssdk.services.glue.GlueClient awsGlue)
Retrieves a list of partition folders from the GCS bucket based on partition.pattern Table parameter and partition keys set forth in Glue table.List<String>
getStorageSplits(URI locationUri)
Retrieves a list of StorageSplit that essentially contain the list of all files for a given table type in a storage location
-
-
-
Constructor Detail
-
StorageMetadata
public StorageMetadata(String gcsCredentialJsonString) throws IOException
Instantiate a storage data source object with provided config- Parameters:
gcsCredentialJsonString
- An instance of GcsDatasourceConfig that contains necessary properties for instantiating an appropriate data source- Throws:
IOException
- If occurs during initializing input stream with GCS credential JSON
-
-
Method Detail
-
getStorageSplits
public List<String> getStorageSplits(URI locationUri)
Retrieves a list of StorageSplit that essentially contain the list of all files for a given table type in a storage location- Parameters:
locationUri
- location uri- Returns:
- A list of files
-
getPartitionFolders
public List<Map<String,String>> getPartitionFolders(org.apache.arrow.vector.types.pojo.Schema schema, TableName tableInfo, Constraints constraints, software.amazon.awssdk.services.glue.GlueClient awsGlue) throws URISyntaxException
Retrieves a list of partition folders from the GCS bucket based on partition.pattern Table parameter and partition keys set forth in Glue table. If the summary from the constraints is empty (no where clauses or unsupported clauses), it will essentially return all the partition folders from the GCS bucket. If there is any constraints to apply, it will apply constraints to filter selected partition folder, to narrow down the data load- Parameters:
schema
- An instance ofSchema
that describes underlying Table's schematableInfo
- Name of the tableconstraints
- An instance ofConstraints
, captured from where clausesawsGlue
- An instance ofAWSGlue
- Returns:
- A list of
Map
instances - Throws:
URISyntaxException
- Throws if any occurs during parsing Uri
-
getAnyFilenameInPath
protected Optional<String> getAnyFilenameInPath(String bucket, String prefixPath)
Retrieves the filename of any file that has a non-zero size within the bucket/prefix- Parameters:
bucket
- Name of the bucketprefixPath
- Prefix (aka, folder in Storage service) of the bucket from where this method with retrieve files- Returns:
- A single file name under the prefix
-
getFileSchema
protected org.apache.arrow.vector.types.pojo.Schema getFileSchema(String bucketName, String path, org.apache.arrow.dataset.file.FileFormat format, org.apache.arrow.memory.BufferAllocator allocator)
-
buildTableSchema
public org.apache.arrow.vector.types.pojo.Schema buildTableSchema(software.amazon.awssdk.services.glue.model.Table table, org.apache.arrow.memory.BufferAllocator allocator) throws URISyntaxException
Builds the table schema based on the provided field- Parameters:
table
- Glue table object- Returns:
- An instance of
Schema
- Throws:
URISyntaxException
-
-