Class PartitionUtil


  • public class PartitionUtil
    extends Object
    • Method Detail

      • getPartitionColumnData

        public static Map<String,​String> getPartitionColumnData​(com.amazonaws.services.glue.model.Table table,
                                                                      String partitionFolder)
        Returns a map of partition column names to their values
        Parameters:
        table - response of get table from AWS Glue
        partitionFolder - partition folder name
        Returns:
        Map case insensitive partition column name -> partition column value
      • getPartitionColumnData

        protected static Map<String,​String> getPartitionColumnData​(String partitionPattern,
                                                                         String partitionFolder,
                                                                         String folderNameRegEx,
                                                                         List<com.amazonaws.services.glue.model.Column> partitionColumns)
        Return a list of storage partition(column name, column type and value)
        Parameters:
        partitionPattern - Name of the bucket
        partitionFolder - partition folder name
        folderNameRegEx - folder name regular expression
        partitionColumns - partition column name list
        Returns:
        Map case insensitive partition column name -> partition column value map
      • getRegExExpression

        protected static Optional<String> getRegExExpression​(com.amazonaws.services.glue.model.Table table)
        Return a regular expression for partition pattern from AWS Glue. This will dynamically generate a regular expression to match a folder within the GCS to see if the folder conforms with the partition keys already setup in the AWS Glue Table (if any)
        Parameters:
        table - response of get table from AWS Glue
        Returns:
        optional Sting of regular expression
      • getPartitionsFolderLocationUri

        public static URI getPartitionsFolderLocationUri​(com.amazonaws.services.glue.model.Table table,
                                                         List<org.apache.arrow.vector.FieldVector> fieldVectors,
                                                         int readerPosition)
                                                  throws URISyntaxException
        Determine the partition folder URI based on Table's partition.pattern and value retrieved from partition field reader (form readWithConstraint() method of GcsRecordHandler) For example, for the following partition.pattern of the Glue Table:

        /folderName1=${partitionKey1}

        And for the following partition row (from getPartitions() method in GcsMetadataHandler):

        Partition fields and value:

        • Partition column: folderName1
        • Partition column value: asdf

        when the Table's Location URI is gs://my_table/data/ this method will return a URI that refer to the GCS location: gs://my_table/data/folderName1=asdf
        Returns:
        Gcs location URI
        Throws:
        URISyntaxException