Class Block
- java.lang.Object
-
- com.amazonaws.athena.connector.lambda.data.SchemaAware
-
- com.amazonaws.athena.connector.lambda.data.Block
-
- All Implemented Interfaces:
AutoCloseable
public class Block extends SchemaAware implements AutoCloseable
This class is used to provide a convenient interface for working (reading/writing) Apache Arrow Batches. As such this class is mostly a holder for an Apache Arrow Schema and the associated VectorSchema (used for read/write). The class also includes helper functions for easily loading/unloading data in the form of Arrow Batches.
-
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description void
close()
Frees all Apache Arrow Buffers and resources associated with this block.void
constrain(ConstraintEvaluator constraintEvaluator)
Used to constrain writes to the Block.boolean
equals(Object o)
Provides some basic equality checking for a Block.boolean
equalsAsSet(Object o)
Provides some basic equality checking for a Block ignoring ordering.String
getAllocatorId()
ConstraintEvaluator
getConstraintEvaluator()
Returns the ConstraintEvaluator used by the block.org.apache.arrow.vector.complex.reader.FieldReader
getFieldReader(String fieldName)
Provides access to the Apache Arrow FieldReader for the given field name.List<org.apache.arrow.vector.complex.reader.FieldReader>
getFieldReaders()
Provides access to the list of all top-level FieldReaders in this Block.org.apache.arrow.vector.FieldVector
getFieldVector(String fieldName)
Provides access to the Apache Arrow FieldVector which can be used to write values for the given field name.List<org.apache.arrow.vector.FieldVector>
getFieldVectors()
Provides access to the list of all top-level FieldVectors in this Block.org.apache.arrow.vector.ipc.message.ArrowRecordBatch
getRecordBatch()
Used to unload the Apache Arrow data in this Block in preparation for Serialization.int
getRowCount()
Returns the current row count as set by calling setRowCount(...)org.apache.arrow.vector.types.pojo.Schema
getSchema()
long
getSize()
Calculates the current used size in 'bytes' for all Apache Arrow Buffers that comprise the row data for this Block.protected org.apache.arrow.vector.VectorSchemaRoot
getVectorSchema()
Provides access to the Apache Arrow Vector Schema when direct access to Apache Arrow is required.int
hashCode()
Provides some basic hashcode capabilities for the Block.protected org.apache.arrow.vector.types.pojo.Schema
internalGetSchema()
Provides access to the Schema object.void
loadRecordBatch(org.apache.arrow.vector.ipc.message.ArrowRecordBatch batch)
Used to load Apache Arrow data into this Block after it has been deserialized.boolean
offerComplexValue(String fieldName, int row, FieldResolver fieldResolver, Object value)
Attempts to set the provided value for the given field name and row.boolean
offerValue(String fieldName, int row, Object value)
Attempts to write the provided value to the specified field on the specified row.boolean
setComplexValue(String fieldName, int row, FieldResolver fieldResolver, Object value)
Attempts to set the provided value for the given field name and row.void
setRowCount(int rowCount)
Sets the valid row count on the underlying Apache Arrow Vector Schema.boolean
setValue(String fieldName, int row, Object value)
Writes the provided value to the specified field on the specified row.String
toString()
-
Methods inherited from class com.amazonaws.athena.connector.lambda.data.SchemaAware
getFields, getMetaData, getMetaData
-
-
-
-
Constructor Detail
-
Block
public Block(String allocatorId, org.apache.arrow.vector.types.pojo.Schema schema, org.apache.arrow.vector.VectorSchemaRoot vectorSchema)
Used by a BlockAllocator to construct a block by setting the key values that a Block 'holds'. Most of the meaningful construction actually takes place within the BlockAllocator that calls this constructor.- Parameters:
allocatorId
- Identifier of the BlockAllocator that owns the Block's memory resources.schema
- The schema of the data that can be read/written to the provided VectorSchema.vectorSchema
- Used to read/write values from the Apache Arrow memory buffers owned by this object.
-
-
Method Detail
-
constrain
public void constrain(ConstraintEvaluator constraintEvaluator)
Used to constrain writes to the Block.- Parameters:
constraintEvaluator
- The ConstraintEvaluator to use check if we should allow a value to be written to the Block.
-
getConstraintEvaluator
public ConstraintEvaluator getConstraintEvaluator()
Returns the ConstraintEvaluator used by the block.
-
getAllocatorId
public String getAllocatorId()
-
getSchema
public org.apache.arrow.vector.types.pojo.Schema getSchema()
-
setValue
public boolean setValue(String fieldName, int row, Object value)
Writes the provided value to the specified field on the specified row. This method does _not_ update the row count on the underlying Apache Arrow VectorSchema. You must call setRowCount(...) to ensure the values your have written are considered 'valid rows' and thus available when you attempt to serialize this Block. This method replies on BlockUtils' field conversion/coercion logic to convert the provided value into a type that matches Apache Arrow's supported serialization format. For more details on coercion please see @BlockUtils- Parameters:
fieldName
- The name of the field you wish to write to.row
- The row number to write to. Note that Apache Arrow Blocks begin with row 0 just like a typical array.value
- The value you wish to write.- Returns:
- True if the value was written to the Block, False if the value was not written due to failing a constraint.
-
offerValue
public boolean offerValue(String fieldName, int row, Object value)
Attempts to write the provided value to the specified field on the specified row. This method does _not_ update the row count on the underlying Apache Arrow VectorSchema. You must call setRowCount(...) to ensure the values your have written are considered 'valid rows' and thus available when you attempt to serialize this Block. This method replies on BlockUtils' field conversion/coercion logic to convert the provided value into a type that matches Apache Arrow's supported serialization format. For more details on coercion please see @BlockUtils- Parameters:
fieldName
- The name of the field you wish to write to.row
- The row number to write to. Note that Apache Arrow Blocks begin with row 0 just like a typical array.value
- The value you wish to write.- Returns:
- True if the value was written to the Block (even if the field is missing from the Block), False if the value was not written due to failing a constraint.
-
setComplexValue
public boolean setComplexValue(String fieldName, int row, FieldResolver fieldResolver, Object value)
Attempts to set the provided value for the given field name and row. If the Block's schema does not contain such a field, this method does nothing and returns false.- Parameters:
fieldName
- The name of the field you wish to write to.row
- The row number to write to. Note that Apache Arrow Blocks begin with row 0 just like a typical array.value
- The value you wish to write.- Returns:
- True if the value was written to the Block, False if the value was not written due to failing a constraint.
-
offerComplexValue
public boolean offerComplexValue(String fieldName, int row, FieldResolver fieldResolver, Object value)
Attempts to set the provided value for the given field name and row. If the Block's schema does not contain such a field, this method does nothing and returns false.- Parameters:
fieldName
- The name of the field you wish to write to.row
- The row number to write to. Note that Apache Arrow Blocks begin with row 0 just like a typical array.value
- The value you wish to write.- Returns:
- True if the value was written to the Block (even if the field is missing from the Block), False if the value was not written due to failing a constraint.
-
getVectorSchema
protected org.apache.arrow.vector.VectorSchemaRoot getVectorSchema()
Provides access to the Apache Arrow Vector Schema when direct access to Apache Arrow is required.- Returns:
- The Apache Arrow Vector Schema.
-
setRowCount
public void setRowCount(int rowCount)
Sets the valid row count on the underlying Apache Arrow Vector Schema.- Parameters:
rowCount
- The row count to set.
-
getRowCount
public int getRowCount()
Returns the current row count as set by calling setRowCount(...)- Returns:
- The current valud row count for the Apache Arrow Vector Schema.
-
getFieldReader
public org.apache.arrow.vector.complex.reader.FieldReader getFieldReader(String fieldName)
Provides access to the Apache Arrow FieldReader for the given field name.- Parameters:
fieldName
- The name of the field to retrieve.- Returns:
- The FieldReader that can be used to read values from the Block for the specified field.
-
getFieldVector
public org.apache.arrow.vector.FieldVector getFieldVector(String fieldName)
Provides access to the Apache Arrow FieldVector which can be used to write values for the given field name.- Parameters:
fieldName
- The name of the field to retrieve.- Returns:
- The FieldVector that can be used to read values from the Block for the specified field or NULL if the field is not in this Block's Schema.
-
getFieldReaders
public List<org.apache.arrow.vector.complex.reader.FieldReader> getFieldReaders()
Provides access to the list of all top-level FieldReaders in this Block.- Returns:
- List
containing the top-level FieldReaders for this block.
-
getSize
public long getSize()
Calculates the current used size in 'bytes' for all Apache Arrow Buffers that comprise the row data for this Block.- Returns:
- The used bytes of row data in this Block.
-
getFieldVectors
public List<org.apache.arrow.vector.FieldVector> getFieldVectors()
Provides access to the list of all top-level FieldVectors in this Block.- Returns:
- List
containing the top-level FieldVectors for this block.
-
getRecordBatch
public org.apache.arrow.vector.ipc.message.ArrowRecordBatch getRecordBatch()
Used to unload the Apache Arrow data in this Block in preparation for Serialization.- Returns:
- An ArrowRecordBatch containing all row data in this Block for use in serializing the Block.
-
loadRecordBatch
public void loadRecordBatch(org.apache.arrow.vector.ipc.message.ArrowRecordBatch batch)
Used to load Apache Arrow data into this Block after it has been deserialized.- Parameters:
batch
- An ArrowRecordBatch containing all row data you'd like to load into this Block.
-
close
public void close() throws Exception
Frees all Apache Arrow Buffers and resources associated with this block.- Specified by:
close
in interfaceAutoCloseable
- Throws:
Exception
-
internalGetSchema
protected org.apache.arrow.vector.types.pojo.Schema internalGetSchema()
Description copied from class:SchemaAware
Provides access to the Schema object.- Specified by:
internalGetSchema
in classSchemaAware
- Returns:
- The Schema currently being used by this object.
-
equals
public boolean equals(Object o)
Provides some basic equality checking for a Block. This method has some draw backs in that is isn't a deep equality and will not work for some large complex blocks. At present this method is useful for testing purposes but may be refactored in a future release.
-
equalsAsSet
public boolean equalsAsSet(Object o)
Provides some basic equality checking for a Block ignoring ordering. This method has some draw backs in that is isn't a deep equality and will not work for some large complex blocks. At present this method is useful for testing purposes but may be refactored in a future release.
-
hashCode
public int hashCode()
Provides some basic hashcode capabilities for the Block. This method has some draw backs in that it is difficult to maintain as we add new types and becomes error prone when and slow if missused. This challenge is compounded when understanding the right/wrong ways to use this are not easy to convey.
-
-