Class HbaseSchemaUtils
- java.lang.Object
-
- com.amazonaws.athena.connectors.hbase.HbaseSchemaUtils
-
public class HbaseSchemaUtils extends Object
Collection of helpful utilities that handle HBase schema inference, type, and naming conversion.
-
-
Field Summary
Fields Modifier and Type Field Description protected static String
ROW_COLUMN_NAME
-
Method Summary
All Methods Static Methods Concrete Methods Modifier and Type Method Description static Object
coerceType(boolean isNative, org.apache.arrow.vector.types.pojo.ArrowType type, byte[] value)
Helper that can coerce the given HBase value to the requested Apache Arrow type.static String[]
extractColumnParts(String glueColumnName)
Helper which can go from a Glue/Apache Arrow column name to its HBase family + column.static org.apache.arrow.vector.types.pojo.Schema
inferSchema(HBaseConnection client, org.apache.hadoop.hbase.TableName tableName, int numToScan)
This method will produce an Apache Arrow Schema for the given TableName and HBase connection by scanning up to the requested number of rows and using basic schema inference to determine data types.static org.apache.arrow.vector.types.Types.MinorType
inferType(String strVal)
Given a value from HBase attempt to infer it's type.static byte[]
toBytes(boolean isNative, Object value)
Used to convert from Apache Arrow typed values to HBase values.
-
-
-
Field Detail
-
ROW_COLUMN_NAME
protected static final String ROW_COLUMN_NAME
- See Also:
- Constant Field Values
-
-
Method Detail
-
inferSchema
public static org.apache.arrow.vector.types.pojo.Schema inferSchema(HBaseConnection client, org.apache.hadoop.hbase.TableName tableName, int numToScan) throws IOException
This method will produce an Apache Arrow Schema for the given TableName and HBase connection by scanning up to the requested number of rows and using basic schema inference to determine data types.- Parameters:
client
- The HBase connection to use for the scan operation.tableName
- The HBase TableName for which to produce an Apache Arrow Schema.numToScan
- The number of records to scan as part of producing the Schema.- Returns:
- An Apache Arrow Schema representing the schema of the HBase table.
- Throws:
IOException
-
inferType
public static org.apache.arrow.vector.types.Types.MinorType inferType(String strVal)
Given a value from HBase attempt to infer it's type.- Parameters:
value
- An HBase value.- Returns:
- The Apache Arrow Minor Type most closely associated with the provided value.
-
coerceType
public static Object coerceType(boolean isNative, org.apache.arrow.vector.types.pojo.ArrowType type, byte[] value)
Helper that can coerce the given HBase value to the requested Apache Arrow type.- Parameters:
isNative
- If True, the HBase value is stored using native bytes. If False, the value is serialized as a String.type
- The Apache Arrow Type that the value should be coerced to before returning.value
- The HBase value to coerce.- Returns:
- The coerced value which is now allowed with the provided Apache Arrow type.
-
extractColumnParts
public static String[] extractColumnParts(String glueColumnName)
Helper which can go from a Glue/Apache Arrow column name to its HBase family + column.- Parameters:
glueColumnName
- The input column name in format "family:column".- Returns:
-
toBytes
public static byte[] toBytes(boolean isNative, Object value)
Used to convert from Apache Arrow typed values to HBase values.- Parameters:
isNative
- If True, the HBase value should be stored using native bytes. If False, the value should be serialized as a String before storing it.value
- The value to convert.- Returns:
- The HBase byte representation of the value.
-
-