Class Block

  • All Implemented Interfaces:
    AutoCloseable

    public class Block
    extends SchemaAware
    implements AutoCloseable
    This class is used to provide a convenient interface for working (reading/writing) Apache Arrow Batches. As such this class is mostly a holder for an Apache Arrow Schema and the associated VectorSchema (used for read/write). The class also includes helper functions for easily loading/unloading data in the form of Arrow Batches.
    • Constructor Summary

      Constructors 
      Constructor Description
      Block​(String allocatorId, org.apache.arrow.vector.types.pojo.Schema schema, org.apache.arrow.vector.VectorSchemaRoot vectorSchema)
      Used by a BlockAllocator to construct a block by setting the key values that a Block 'holds'.
    • Method Summary

      All Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      void close()
      Frees all Apache Arrow Buffers and resources associated with this block.
      void constrain​(ConstraintEvaluator constraintEvaluator)
      Used to constrain writes to the Block.
      boolean equals​(Object o)
      Provides some basic equality checking for a Block.
      boolean equalsAsSet​(Object o)
      Provides some basic equality checking for a Block ignoring ordering.
      String getAllocatorId()  
      ConstraintEvaluator getConstraintEvaluator()
      Returns the ConstraintEvaluator used by the block.
      org.apache.arrow.vector.complex.reader.FieldReader getFieldReader​(String fieldName)
      Provides access to the Apache Arrow FieldReader for the given field name.
      List<org.apache.arrow.vector.complex.reader.FieldReader> getFieldReaders()
      Provides access to the list of all top-level FieldReaders in this Block.
      org.apache.arrow.vector.FieldVector getFieldVector​(String fieldName)
      Provides access to the Apache Arrow FieldVector which can be used to write values for the given field name.
      List<org.apache.arrow.vector.FieldVector> getFieldVectors()
      Provides access to the list of all top-level FieldVectors in this Block.
      org.apache.arrow.vector.ipc.message.ArrowRecordBatch getRecordBatch()
      Used to unload the Apache Arrow data in this Block in preparation for Serialization.
      int getRowCount()
      Returns the current row count as set by calling setRowCount(...)
      org.apache.arrow.vector.types.pojo.Schema getSchema()  
      long getSize()
      Calculates the current used size in 'bytes' for all Apache Arrow Buffers that comprise the row data for this Block.
      protected org.apache.arrow.vector.VectorSchemaRoot getVectorSchema()
      Provides access to the Apache Arrow Vector Schema when direct access to Apache Arrow is required.
      int hashCode()
      Provides some basic hashcode capabilities for the Block.
      protected org.apache.arrow.vector.types.pojo.Schema internalGetSchema()
      Provides access to the Schema object.
      void loadRecordBatch​(org.apache.arrow.vector.ipc.message.ArrowRecordBatch batch)
      Used to load Apache Arrow data into this Block after it has been deserialized.
      boolean offerComplexValue​(String fieldName, int row, FieldResolver fieldResolver, Object value)
      Attempts to set the provided value for the given field name and row.
      boolean offerValue​(String fieldName, int row, Object value)
      Attempts to write the provided value to the specified field on the specified row.
      boolean setComplexValue​(String fieldName, int row, FieldResolver fieldResolver, Object value)
      Attempts to set the provided value for the given field name and row.
      void setRowCount​(int rowCount)
      Sets the valid row count on the underlying Apache Arrow Vector Schema.
      boolean setValue​(String fieldName, int row, Object value)
      Writes the provided value to the specified field on the specified row.
      String toString()  
    • Constructor Detail

      • Block

        public Block​(String allocatorId,
                     org.apache.arrow.vector.types.pojo.Schema schema,
                     org.apache.arrow.vector.VectorSchemaRoot vectorSchema)
        Used by a BlockAllocator to construct a block by setting the key values that a Block 'holds'. Most of the meaningful construction actually takes place within the BlockAllocator that calls this constructor.
        Parameters:
        allocatorId - Identifier of the BlockAllocator that owns the Block's memory resources.
        schema - The schema of the data that can be read/written to the provided VectorSchema.
        vectorSchema - Used to read/write values from the Apache Arrow memory buffers owned by this object.
    • Method Detail

      • constrain

        public void constrain​(ConstraintEvaluator constraintEvaluator)
        Used to constrain writes to the Block.
        Parameters:
        constraintEvaluator - The ConstraintEvaluator to use check if we should allow a value to be written to the Block.
      • getConstraintEvaluator

        public ConstraintEvaluator getConstraintEvaluator()
        Returns the ConstraintEvaluator used by the block.
      • getAllocatorId

        public String getAllocatorId()
      • getSchema

        public org.apache.arrow.vector.types.pojo.Schema getSchema()
      • setValue

        public boolean setValue​(String fieldName,
                                int row,
                                Object value)
        Writes the provided value to the specified field on the specified row. This method does _not_ update the row count on the underlying Apache Arrow VectorSchema. You must call setRowCount(...) to ensure the values your have written are considered 'valid rows' and thus available when you attempt to serialize this Block. This method replies on BlockUtils' field conversion/coercion logic to convert the provided value into a type that matches Apache Arrow's supported serialization format. For more details on coercion please see @BlockUtils
        Parameters:
        fieldName - The name of the field you wish to write to.
        row - The row number to write to. Note that Apache Arrow Blocks begin with row 0 just like a typical array.
        value - The value you wish to write.
        Returns:
        True if the value was written to the Block, False if the value was not written due to failing a constraint.
      • offerValue

        public boolean offerValue​(String fieldName,
                                  int row,
                                  Object value)
        Attempts to write the provided value to the specified field on the specified row. This method does _not_ update the row count on the underlying Apache Arrow VectorSchema. You must call setRowCount(...) to ensure the values your have written are considered 'valid rows' and thus available when you attempt to serialize this Block. This method replies on BlockUtils' field conversion/coercion logic to convert the provided value into a type that matches Apache Arrow's supported serialization format. For more details on coercion please see @BlockUtils
        Parameters:
        fieldName - The name of the field you wish to write to.
        row - The row number to write to. Note that Apache Arrow Blocks begin with row 0 just like a typical array.
        value - The value you wish to write.
        Returns:
        True if the value was written to the Block (even if the field is missing from the Block), False if the value was not written due to failing a constraint.
      • setComplexValue

        public boolean setComplexValue​(String fieldName,
                                       int row,
                                       FieldResolver fieldResolver,
                                       Object value)
        Attempts to set the provided value for the given field name and row. If the Block's schema does not contain such a field, this method does nothing and returns false.
        Parameters:
        fieldName - The name of the field you wish to write to.
        row - The row number to write to. Note that Apache Arrow Blocks begin with row 0 just like a typical array.
        value - The value you wish to write.
        Returns:
        True if the value was written to the Block, False if the value was not written due to failing a constraint.
      • offerComplexValue

        public boolean offerComplexValue​(String fieldName,
                                         int row,
                                         FieldResolver fieldResolver,
                                         Object value)
        Attempts to set the provided value for the given field name and row. If the Block's schema does not contain such a field, this method does nothing and returns false.
        Parameters:
        fieldName - The name of the field you wish to write to.
        row - The row number to write to. Note that Apache Arrow Blocks begin with row 0 just like a typical array.
        value - The value you wish to write.
        Returns:
        True if the value was written to the Block (even if the field is missing from the Block), False if the value was not written due to failing a constraint.
      • getVectorSchema

        protected org.apache.arrow.vector.VectorSchemaRoot getVectorSchema()
        Provides access to the Apache Arrow Vector Schema when direct access to Apache Arrow is required.
        Returns:
        The Apache Arrow Vector Schema.
      • setRowCount

        public void setRowCount​(int rowCount)
        Sets the valid row count on the underlying Apache Arrow Vector Schema.
        Parameters:
        rowCount - The row count to set.
      • getRowCount

        public int getRowCount()
        Returns the current row count as set by calling setRowCount(...)
        Returns:
        The current valud row count for the Apache Arrow Vector Schema.
      • getFieldReader

        public org.apache.arrow.vector.complex.reader.FieldReader getFieldReader​(String fieldName)
        Provides access to the Apache Arrow FieldReader for the given field name.
        Parameters:
        fieldName - The name of the field to retrieve.
        Returns:
        The FieldReader that can be used to read values from the Block for the specified field.
      • getFieldVector

        public org.apache.arrow.vector.FieldVector getFieldVector​(String fieldName)
        Provides access to the Apache Arrow FieldVector which can be used to write values for the given field name.
        Parameters:
        fieldName - The name of the field to retrieve.
        Returns:
        The FieldVector that can be used to read values from the Block for the specified field or NULL if the field is not in this Block's Schema.
      • getFieldReaders

        public List<org.apache.arrow.vector.complex.reader.FieldReader> getFieldReaders()
        Provides access to the list of all top-level FieldReaders in this Block.
        Returns:
        List containing the top-level FieldReaders for this block.
      • getSize

        public long getSize()
        Calculates the current used size in 'bytes' for all Apache Arrow Buffers that comprise the row data for this Block.
        Returns:
        The used bytes of row data in this Block.
      • getFieldVectors

        public List<org.apache.arrow.vector.FieldVector> getFieldVectors()
        Provides access to the list of all top-level FieldVectors in this Block.
        Returns:
        List containing the top-level FieldVectors for this block.
      • getRecordBatch

        public org.apache.arrow.vector.ipc.message.ArrowRecordBatch getRecordBatch()
        Used to unload the Apache Arrow data in this Block in preparation for Serialization.
        Returns:
        An ArrowRecordBatch containing all row data in this Block for use in serializing the Block.
      • loadRecordBatch

        public void loadRecordBatch​(org.apache.arrow.vector.ipc.message.ArrowRecordBatch batch)
        Used to load Apache Arrow data into this Block after it has been deserialized.
        Parameters:
        batch - An ArrowRecordBatch containing all row data you'd like to load into this Block.
      • close

        public void close()
                   throws Exception
        Frees all Apache Arrow Buffers and resources associated with this block.
        Specified by:
        close in interface AutoCloseable
        Throws:
        Exception
      • internalGetSchema

        protected org.apache.arrow.vector.types.pojo.Schema internalGetSchema()
        Description copied from class: SchemaAware
        Provides access to the Schema object.
        Specified by:
        internalGetSchema in class SchemaAware
        Returns:
        The Schema currently being used by this object.
      • equals

        public boolean equals​(Object o)
        Provides some basic equality checking for a Block. This method has some draw backs in that is isn't a deep equality and will not work for some large complex blocks. At present this method is useful for testing purposes but may be refactored in a future release.
        Overrides:
        equals in class Object
      • equalsAsSet

        public boolean equalsAsSet​(Object o)
        Provides some basic equality checking for a Block ignoring ordering. This method has some draw backs in that is isn't a deep equality and will not work for some large complex blocks. At present this method is useful for testing purposes but may be refactored in a future release.
      • hashCode

        public int hashCode()
        Provides some basic hashcode capabilities for the Block. This method has some draw backs in that it is difficult to maintain as we add new types and becomes error prone when and slow if missused. This challenge is compounded when understanding the right/wrong ways to use this are not easy to convey.
        Overrides:
        hashCode in class Object