Class StorageMetadata


  • public class StorageMetadata
    extends Object
    • Constructor Summary

      Constructors 
      Constructor Description
      StorageMetadata​(String gcsCredentialJsonString)
      Instantiate a storage data source object with provided config
    • Constructor Detail

      • StorageMetadata

        public StorageMetadata​(String gcsCredentialJsonString)
                        throws IOException
        Instantiate a storage data source object with provided config
        Parameters:
        gcsCredentialJsonString - An instance of GcsDatasourceConfig that contains necessary properties for instantiating an appropriate data source
        Throws:
        IOException - If occurs during initializing input stream with GCS credential JSON
    • Method Detail

      • getStorageSplits

        public List<String> getStorageSplits​(URI locationUri)
        Retrieves a list of StorageSplit that essentially contain the list of all files for a given table type in a storage location
        Parameters:
        locationUri - location uri
        Returns:
        A list of files
      • getPartitionFolders

        public List<Map<String,​String>> getPartitionFolders​(org.apache.arrow.vector.types.pojo.Schema schema,
                                                                  TableName tableInfo,
                                                                  Constraints constraints,
                                                                  software.amazon.awssdk.services.glue.GlueClient awsGlue)
                                                           throws URISyntaxException
        Retrieves a list of partition folders from the GCS bucket based on partition.pattern Table parameter and partition keys set forth in Glue table. If the summary from the constraints is empty (no where clauses or unsupported clauses), it will essentially return all the partition folders from the GCS bucket. If there is any constraints to apply, it will apply constraints to filter selected partition folder, to narrow down the data load
        Parameters:
        schema - An instance of Schema that describes underlying Table's schema
        tableInfo - Name of the table
        constraints - An instance of Constraints, captured from where clauses
        awsGlue - An instance of AWSGlue
        Returns:
        A list of Map instances
        Throws:
        URISyntaxException - Throws if any occurs during parsing Uri
      • getAnyFilenameInPath

        protected Optional<String> getAnyFilenameInPath​(String bucket,
                                                        String prefixPath)
        Retrieves the filename of any file that has a non-zero size within the bucket/prefix
        Parameters:
        bucket - Name of the bucket
        prefixPath - Prefix (aka, folder in Storage service) of the bucket from where this method with retrieve files
        Returns:
        A single file name under the prefix
      • getFileSchema

        protected org.apache.arrow.vector.types.pojo.Schema getFileSchema​(String bucketName,
                                                                          String path,
                                                                          org.apache.arrow.dataset.file.FileFormat format,
                                                                          org.apache.arrow.memory.BufferAllocator allocator)
      • buildTableSchema

        public org.apache.arrow.vector.types.pojo.Schema buildTableSchema​(software.amazon.awssdk.services.glue.model.Table table,
                                                                          org.apache.arrow.memory.BufferAllocator allocator)
                                                                   throws URISyntaxException
        Builds the table schema based on the provided field
        Parameters:
        table - Glue table object
        Returns:
        An instance of Schema
        Throws:
        URISyntaxException