Class BlockUtils


  • public class BlockUtils
    extends Object
    This utility class abstracts many facets of reading and writing values into Apache Arrow's FieldReader and FieldVector objects.
    • Method Summary

      All Methods Static Methods Concrete Methods 
      Modifier and Type Method Description
      static int copyRows​(Block srcBlock, Block dstBlock, int firstRow, int lastRow)
      Copies a inclusive range of rows from one block to another.
      static String fieldToString​(org.apache.arrow.vector.complex.reader.FieldReader reader)
      Used to convert a single cell for the given FieldReader to a human readable string.
      static Class getJavaType​(org.apache.arrow.vector.types.Types.MinorType minorType)  
      static boolean isNullRow​(Block block, int row)
      Checks if a row is null by checking that all fields in that row are null (aka not set).
      static Block newBlock​(BlockAllocator allocator, String columnName, org.apache.arrow.vector.types.pojo.ArrowType type, Object... values)
      Creates a new Block with a single column and populated with the provided values.
      static Block newBlock​(BlockAllocator allocator, String columnName, org.apache.arrow.vector.types.pojo.ArrowType type, Collection<Object> values)
      Creates a new Block with a single column and populated with the provided values.
      static Block newEmptyBlock​(BlockAllocator allocator, String columnName, org.apache.arrow.vector.types.pojo.ArrowType type)
      Creates a new, empty, Block with a single column.
      static String rowToString​(Block block, int row)
      Used to convert a specific row in the provided Block to a human readable string.
      static void setComplexValue​(org.apache.arrow.vector.FieldVector vector, int pos, FieldResolver resolver, Object value)
      Used to set complex values (Struct, List, etc...) on the provided FieldVector.
      static void setValue​(org.apache.arrow.vector.FieldVector vector, int pos, Object value)
      Used to set values (Int, BigInt, Bit, etc...) on the provided FieldVector.
      static void unsetRow​(int row, Block block)
      In some filtering situations it can be useful to 'unset' a row as an indication to a later processing stage that the row is irrelevant.
      protected static void writeAllValue​(org.apache.arrow.vector.complex.writer.FieldWriter writer, org.apache.arrow.vector.types.pojo.Field field, org.apache.arrow.memory.BufferAllocator allocator, int pos, FieldResolver resolver, Object value, boolean fromMapOrStruct)  
      protected static void writeList​(org.apache.arrow.memory.BufferAllocator allocator, org.apache.arrow.vector.complex.writer.FieldWriter writer, org.apache.arrow.vector.types.pojo.Field field, int pos, Iterable value, FieldResolver resolver)
      Used to write a List value.
      protected static void writeMap​(org.apache.arrow.memory.BufferAllocator allocator, org.apache.arrow.vector.complex.writer.BaseWriter.MapWriter writer, org.apache.arrow.vector.types.pojo.Field field, int pos, Object value, FieldResolver resolver)
      Used to write a Map value.
      protected static void writeSimpleValue​(org.apache.arrow.vector.complex.writer.FieldWriter writer, org.apache.arrow.vector.types.pojo.Field field, org.apache.arrow.memory.BufferAllocator allocator, Object value, boolean fromMapOrStruct)
      Used to write an individual value into a field, multiple calls to this method per-cell are expected in order to write the N values of a list of size N.
      protected static void writeStruct​(org.apache.arrow.memory.BufferAllocator allocator, org.apache.arrow.vector.complex.writer.BaseWriter.StructWriter writer, org.apache.arrow.vector.types.pojo.Field field, int pos, Object value, FieldResolver resolver)
      Used to write a Struct value.
    • Field Detail

      • UTC_ZONE_ID

        public static final ZoneId UTC_ZONE_ID
    • Method Detail

      • newBlock

        public static Block newBlock​(BlockAllocator allocator,
                                     String columnName,
                                     org.apache.arrow.vector.types.pojo.ArrowType type,
                                     Object... values)
        Creates a new Block with a single column and populated with the provided values.
        Parameters:
        allocator - The BlockAllocator to use when creating the Block.
        columnName - The name of the single column in the Block's Schema.
        type - The Apache Arrow Type of the column.
        values - The values to write to the new Block. Each value will be its own row.
        Returns:
        The newly created Block with a single column Schema at populated with the provided values.
      • newBlock

        public static Block newBlock​(BlockAllocator allocator,
                                     String columnName,
                                     org.apache.arrow.vector.types.pojo.ArrowType type,
                                     Collection<Object> values)
        Creates a new Block with a single column and populated with the provided values.
        Parameters:
        allocator - The BlockAllocator to use when creating the Block.
        columnName - The name of the single column in the Block's Schema.
        type - The Apache Arrow Type of the column.
        values - The values to write to the new Block. Each value will be its own row.
        Returns:
        The newly created Block with a single column Schema at populated with the provided values.
      • newEmptyBlock

        public static Block newEmptyBlock​(BlockAllocator allocator,
                                          String columnName,
                                          org.apache.arrow.vector.types.pojo.ArrowType type)
        Creates a new, empty, Block with a single column.
        Parameters:
        allocator - The BlockAllocator to use when creating the Block.
        columnName - The name of the single column in the Block's Schema.
        type - The Apache Arrow Type of the column.
        Returns:
        The newly created, empty, Block with a single column Schema.
      • setComplexValue

        public static void setComplexValue​(org.apache.arrow.vector.FieldVector vector,
                                           int pos,
                                           FieldResolver resolver,
                                           Object value)
        Used to set complex values (Struct, List, etc...) on the provided FieldVector.
        Parameters:
        vector - The FieldVector into which we should write the provided value.
        pos - The row number that the value should be written to.
        resolver - The FieldResolver that can be used to map your value to the complex type (mostly for Structs, Maps).
        value - The value to write.
      • setValue

        public static void setValue​(org.apache.arrow.vector.FieldVector vector,
                                    int pos,
                                    Object value)
        Used to set values (Int, BigInt, Bit, etc...) on the provided FieldVector.
        Parameters:
        vector - The FieldVector into which we should write the provided value.
        pos - The row number that the value should be written to.
        value - The value to write.
      • rowToString

        public static String rowToString​(Block block,
                                         int row)
        Used to convert a specific row in the provided Block to a human readable string. This is useful for diagnostic logging.
        Parameters:
        block - The Block to read the row from.
        row - The row number to read.
        Returns:
        The human readable String representation of the requested row.
      • fieldToString

        public static String fieldToString​(org.apache.arrow.vector.complex.reader.FieldReader reader)
        Used to convert a single cell for the given FieldReader to a human readable string.
        Parameters:
        reader - The FieldReader from which we should read the current cell. This means the position to be read should have been set on the reader before calling this method.
        Returns:
        The human readable String representation of the value at the FieldReaders current position.
      • copyRows

        public static int copyRows​(Block srcBlock,
                                   Block dstBlock,
                                   int firstRow,
                                   int lastRow)
        Copies a inclusive range of rows from one block to another.
        Parameters:
        srcBlock - The source Block to copy the range of rows from.
        dstBlock - The destination Block to copy the range of rows to.
        firstRow - The first row we'd like to copy.
        lastRow - The last row we'd like to copy.
        Returns:
        The number of rows that were copied.
      • isNullRow

        public static boolean isNullRow​(Block block,
                                        int row)
        Checks if a row is null by checking that all fields in that row are null (aka not set).
        Parameters:
        block - The Block we'd like to check.
        row - The row number we'd like to check.
        Returns:
        True if the entire row is null (aka all fields null/unset), False if any field has a non-null value.
      • writeList

        protected static void writeList​(org.apache.arrow.memory.BufferAllocator allocator,
                                        org.apache.arrow.vector.complex.writer.FieldWriter writer,
                                        org.apache.arrow.vector.types.pojo.Field field,
                                        int pos,
                                        Iterable value,
                                        FieldResolver resolver)
        Used to write a List value.
        Parameters:
        allocator - The BlockAllocator which can be used to generate Apache Arrow Buffers for types which require conversion to an Arrow Buffer before they can be written using the FieldWriter.
        writer - The FieldWriter for the List field we'd like to write into.
        field - The Schema details of the List Field we are writing into.
        pos - The position (row) in the Apache Arrow batch we are writing to.
        value - An iterator to the collection of values we want to write into the row.
        resolver - The field resolver that can be used to extract individual values from the value iterator.
      • writeStruct

        protected static void writeStruct​(org.apache.arrow.memory.BufferAllocator allocator,
                                          org.apache.arrow.vector.complex.writer.BaseWriter.StructWriter writer,
                                          org.apache.arrow.vector.types.pojo.Field field,
                                          int pos,
                                          Object value,
                                          FieldResolver resolver)
        Used to write a Struct value.
        Parameters:
        allocator - The BlockAllocator which can be used to generate Apache Arrow Buffers for types which require conversion to an Arrow Buffer before they can be written using the FieldWriter.
        writer - The FieldWriter for the Struct field we'd like to write into.
        field - The Schema details of the Struct Field we are writing into.
        pos - The position (row) in the Apache Arrow batch we are writing to.
        value - The value we'd like to write as a struct.
        resolver - The field resolver that can be used to extract individual Struct fields from the value.
      • writeMap

        protected static void writeMap​(org.apache.arrow.memory.BufferAllocator allocator,
                                       org.apache.arrow.vector.complex.writer.BaseWriter.MapWriter writer,
                                       org.apache.arrow.vector.types.pojo.Field field,
                                       int pos,
                                       Object value,
                                       FieldResolver resolver)
        Used to write a Map value.
        Parameters:
        allocator - The BlockAllocator which can be used to generate Apache Arrow Buffers for types which require conversion to an Arrow Buffer before they can be written using the FieldWriter.
        writer - The FieldWriter for the Map field we'd like to write into.
        field - The Schema details of the Map Field we are writing into.
        pos - The position (row) in the Apache Arrow batch we are writing to.
        value - The value we'd like to write as a Map.
        resolver - The field resolver that can be used to extract individual Struct Map from the value.
      • writeAllValue

        protected static void writeAllValue​(org.apache.arrow.vector.complex.writer.FieldWriter writer,
                                            org.apache.arrow.vector.types.pojo.Field field,
                                            org.apache.arrow.memory.BufferAllocator allocator,
                                            int pos,
                                            FieldResolver resolver,
                                            Object value,
                                            boolean fromMapOrStruct)
        Parameters:
        writer - The FieldWriter for the Map field we'd like to write into.
        field - The Schema details of the Map Field we are writing into.
        allocator - The BlockAllocator which can be used to generate Apache Arrow Buffers for types
        pos - The position (row) in the Apache Arrow batch we are writing to.
        resolver - The field resolver that can be used to extract individual Struct Map from the value.
        value - The value we'd like to write as a Map.
        fromMapOrStruct - Is field from map or struct
      • writeSimpleValue

        protected static void writeSimpleValue​(org.apache.arrow.vector.complex.writer.FieldWriter writer,
                                               org.apache.arrow.vector.types.pojo.Field field,
                                               org.apache.arrow.memory.BufferAllocator allocator,
                                               Object value,
                                               boolean fromMapOrStruct)
        Used to write an individual value into a field, multiple calls to this method per-cell are expected in order to write the N values of a list of size N.
        Parameters:
        writer - The FieldWriter (already positioned at the row) that we want to write into.
        field - The concrete type of the values.
        allocator - The BlockAllocator that can be used for allocating Arrow Buffers for fields which require conversion to Arrow Buff before being written.
        value - The value to write.
        fromMapOrStruct - write the simple value for non map/struct or map/struct type
      • unsetRow

        public static void unsetRow​(int row,
                                    Block block)
        In some filtering situations it can be useful to 'unset' a row as an indication to a later processing stage that the row is irrelevant. The mechanism by which we 'unset' a row is actually field type specific and as such this method is not supported for all field types.
        Parameters:
        row - The row number to unset in the provided Block.
        block - The Block where we'd like to unset the specified row.
      • getJavaType

        public static Class getJavaType​(org.apache.arrow.vector.types.Types.MinorType minorType)