Class HbaseSchemaUtils


  • public class HbaseSchemaUtils
    extends Object
    Collection of helpful utilities that handle HBase schema inference, type, and naming conversion.
    • Method Summary

      All Methods Static Methods Concrete Methods 
      Modifier and Type Method Description
      static Object coerceType​(boolean isNative, org.apache.arrow.vector.types.pojo.ArrowType type, byte[] value)
      Helper that can coerce the given HBase value to the requested Apache Arrow type.
      static String[] extractColumnParts​(String glueColumnName)
      Helper which can go from a Glue/Apache Arrow column name to its HBase family + column.
      static org.apache.arrow.vector.types.pojo.Schema inferSchema​(HBaseConnection client, org.apache.hadoop.hbase.TableName tableName, int numToScan)
      This method will produce an Apache Arrow Schema for the given TableName and HBase connection by scanning up to the requested number of rows and using basic schema inference to determine data types.
      static org.apache.arrow.vector.types.Types.MinorType inferType​(String strVal)
      Given a value from HBase attempt to infer it's type.
      static byte[] toBytes​(boolean isNative, Object value)
      Used to convert from Apache Arrow typed values to HBase values.
    • Method Detail

      • inferSchema

        public static org.apache.arrow.vector.types.pojo.Schema inferSchema​(HBaseConnection client,
                                                                            org.apache.hadoop.hbase.TableName tableName,
                                                                            int numToScan)
                                                                     throws IOException
        This method will produce an Apache Arrow Schema for the given TableName and HBase connection by scanning up to the requested number of rows and using basic schema inference to determine data types.
        Parameters:
        client - The HBase connection to use for the scan operation.
        tableName - The HBase TableName for which to produce an Apache Arrow Schema.
        numToScan - The number of records to scan as part of producing the Schema.
        Returns:
        An Apache Arrow Schema representing the schema of the HBase table.
        Throws:
        IOException
      • inferType

        public static org.apache.arrow.vector.types.Types.MinorType inferType​(String strVal)
        Given a value from HBase attempt to infer it's type.
        Parameters:
        value - An HBase value.
        Returns:
        The Apache Arrow Minor Type most closely associated with the provided value.
      • coerceType

        public static Object coerceType​(boolean isNative,
                                        org.apache.arrow.vector.types.pojo.ArrowType type,
                                        byte[] value)
        Helper that can coerce the given HBase value to the requested Apache Arrow type.
        Parameters:
        isNative - If True, the HBase value is stored using native bytes. If False, the value is serialized as a String.
        type - The Apache Arrow Type that the value should be coerced to before returning.
        value - The HBase value to coerce.
        Returns:
        The coerced value which is now allowed with the provided Apache Arrow type.
      • extractColumnParts

        public static String[] extractColumnParts​(String glueColumnName)
        Helper which can go from a Glue/Apache Arrow column name to its HBase family + column.
        Parameters:
        glueColumnName - The input column name in format "family:column".
        Returns:
      • toBytes

        public static byte[] toBytes​(boolean isNative,
                                     Object value)
        Used to convert from Apache Arrow typed values to HBase values.
        Parameters:
        isNative - If True, the HBase value should be stored using native bytes. If False, the value should be serialized as a String before storing it.
        value - The value to convert.
        Returns:
        The HBase byte representation of the value.