Class HbaseSchemaUtils


  • public class HbaseSchemaUtils
    extends Object
    Collection of helpful utilities that handle HBase schema inference, type, and naming conversion.
    • Method Summary

      All Methods Static Methods Concrete Methods 
      Modifier and Type Method Description
      static Object coerceType​(boolean isNative, org.apache.arrow.vector.types.pojo.ArrowType type, byte[] value)
      Helper that can coerce the given HBase value to the requested Apache Arrow type.
      static String[] extractColumnParts​(String glueColumnName)
      Helper which can go from a Glue/Apache Arrow column name to its HBase family + column.
      static org.apache.hadoop.hbase.TableName getQualifiedTable​(TableName tableName)
      Helper which goes from an Athena Federation SDK TableName to an HBase TableName.
      static String getQualifiedTableName​(TableName tableName)
      Helper which goes from an Athena Federation SDK TableName to an HBase table name string.
      static org.apache.arrow.vector.types.pojo.Schema inferSchema​(HBaseConnection client, TableName tableName, int numToScan)
      This method will produce an Apache Arrow Schema for the given TableName and HBase connection by scanning up to the requested number of rows and using basic schema inference to determine data types.
      static org.apache.arrow.vector.types.Types.MinorType inferType​(String strVal)
      Given a value from HBase attempt to infer it's type.
      static byte[] toBytes​(boolean isNative, Object value)
      Used to convert from Apache Arrow typed values to HBase values.
    • Method Detail

      • inferSchema

        public static org.apache.arrow.vector.types.pojo.Schema inferSchema​(HBaseConnection client,
                                                                            TableName tableName,
                                                                            int numToScan)
        This method will produce an Apache Arrow Schema for the given TableName and HBase connection by scanning up to the requested number of rows and using basic schema inference to determine data types.
        Parameters:
        client - The HBase connection to use for the scan operation.
        tableName - The HBase TableName for which to produce an Apache Arrow Schema.
        numToScan - The number of records to scan as part of producing the Schema.
        Returns:
        An Apache Arrow Schema representing the schema of the HBase table.
      • getQualifiedTableName

        public static String getQualifiedTableName​(TableName tableName)
        Helper which goes from an Athena Federation SDK TableName to an HBase table name string.
        Parameters:
        tableName - An Athena Federation SDK TableName.
        Returns:
        The corresponding HBase table name string.
      • getQualifiedTable

        public static org.apache.hadoop.hbase.TableName getQualifiedTable​(TableName tableName)
        Helper which goes from an Athena Federation SDK TableName to an HBase TableName.
        Parameters:
        tableName - An Athena Federation SDK TableName.
        Returns:
        The corresponding HBase TableName.
      • inferType

        public static org.apache.arrow.vector.types.Types.MinorType inferType​(String strVal)
        Given a value from HBase attempt to infer it's type.
        Parameters:
        value - An HBase value.
        Returns:
        The Apache Arrow Minor Type most closely associated with the provided value.
      • coerceType

        public static Object coerceType​(boolean isNative,
                                        org.apache.arrow.vector.types.pojo.ArrowType type,
                                        byte[] value)
        Helper that can coerce the given HBase value to the requested Apache Arrow type.
        Parameters:
        isNative - If True, the HBase value is stored using native bytes. If False, the value is serialized as a String.
        type - The Apache Arrow Type that the value should be coerced to before returning.
        value - The HBase value to coerce.
        Returns:
        The coerced value which is now allowed with the provided Apache Arrow type.
      • extractColumnParts

        public static String[] extractColumnParts​(String glueColumnName)
        Helper which can go from a Glue/Apache Arrow column name to its HBase family + column.
        Parameters:
        glueColumnName - The input column name in format "family:column".
        Returns:
      • toBytes

        public static byte[] toBytes​(boolean isNative,
                                     Object value)
        Used to convert from Apache Arrow typed values to HBase values.
        Parameters:
        isNative - If True, the HBase value should be stored using native bytes. If False, the value should be serialized as a String before storing it.
        value - The value to convert.
        Returns:
        The HBase byte representation of the value.