HBase Shell - Create a reduced table from existing Hbase table - shell

I want to create a reduced version of an HBase Table via Hbase shell. For example:
HBase Table 'test' is already present in HBase with following info:
TableName: 'test'
ColumnFamily: 'f'
Columns: 'f:col1', 'f:col2', 'f:col3', 'f:col4'
I want to create another table in HBase 'test_reduced' which looks like this
TableName: 'test_reduced'
ColumnFamily: 'f'
Columns: 'f:col1', 'f:col3'
How can we do this via HBase shell ? I know how to copy the table using snapshot command So I am mainly looking for dropping column names in HBase Table.

can't do it. you need to use Hbase Client API.
1- read the table in.
2- only "put" columns you want into your new table.
Cloudera came close by enabling users to perform "Partial HBase table copies" with "CopyTable" function, but that will allow you to change column_family names only ... (I am not sure you are using cloudera), but even that, is not what you are looking for.
for your ref:
http://blog.cloudera.com/blog/2012/06/online-hbase-backups-with-copytable-2/

Related

Spark(2.3) not able to identify new columns in Parquet table added via Hive Alter Table command

I have a Hive Parquet table which I am creating using Spark 2.3 API df.saveAstable. There is a separate Hive process that alters the same parquet table to add columns (based on requirements).
However, next time when I try to read the same parquet table into Spark dataframe, the new column which was added to the parquet table using Hive Alter Table command is not showing up in the df.printSchema output.
Based on initial analysis, it seems that there might be some conflict, and Spark is using its own schema instead of reading the Hive metastore.
Hence, I tried the below options :
Changing the spark setting:
spark.sql.hive.convertMetastoreParquet=false
and Refreshing the spark catalog:
spark.catalog.refreshTable("table_name")
However, the above two options are not solving the problem.
Any suggestions or alternatives would be super helpful.
This sounds like a bug described in SPARK-21841. JIRA description also contains the idea for a possible workaround:
...Interestingly enough it appears that if you create the table
differently like:
spark.sql("create table mydb.t1 select ip_address from mydb.test_table limit 1")
Run your alter table on mydb.t1 val t1 = spark.table("mydb.t1")
Then it works properly...
To fix this solution, you have to use the same alter command used in hive to spark-shell as well.
spark.sql("alter table TABLE_NAME add COLUMNS (col_A string)")

How to use describe 'table_name' in HBase shell to create a table.

I have to create a table in different cluster and i only have description of hbase table as handy. how do i create the new hbase table in different cluster?
go to hbase shell by typing Hbase shell in terminal in you new cluster, then give command create ‘<table name>’,’<column family>’ give you table name and column family name which you already have from describe 'table name' from previous cluster.
for more info:
https://www.tutorialspoint.com/hbase/hbase_create_table.htm
https://www.tutorialspoint.com/hbase/hbase_describe_and_alter.htm

table are created in hbase shell are not detected in phoenix shell

When I create table by phoenix shell it is detected in hbase shell by command list, but the same is not identified in Phoenix.
Phoenix just detects tables that are created in phoenix shell in addition HBase default table.
How can I fix this problem?
The problem is Phoenix is case-sensitive and only identify those tables that have names in uppercase.
You need to create a view on top of HBase table to perform any query in Phoenix.
To create a view, you need to be in phoenix and issue create view command like below
CREATE VIEW "<table_name>" ( ROWKEY VARCHAR PRIMARY KEY, "<column_family_name>"."<column_name>" <data_type>, "<column_family_name>"."<column_name>" <data_type> )
For more details you can check How to use existing HBase table in Apache Phoenix

Sqoop - Create empty hive partitioned table based on schema of oracle partitioned table

I have an oracle table which has 80 columns and id partitioned on state column. My requirement is to create a hive table with similar schema of oracle table and partitioned on state.
I tried using sqoop -create-hive-table option. But keep getting an error
ERROR sqoop.Sqoop: Got exception running Sqoop: java.lang.IllegalArgumentException: Partition key state cannot be a column to import.
I understand that in Hive the partitioned column should not be in table definition, but then how do I get around the issue?
I do not want to manually write create table command, as I have 50 such tables to import and would like to use sqoop.
Any suggestion or ideas?
Thanks
There is a turn around for this.
Below is the procedure i fallow :
On Oracle run query to get the schema for a table and store it to a file.
Move that file to Hadoop
On Hadoop create a shell script which constructs a HQL file.
That hql file contains "Hive create table statement along with columns". For this we can use the above file(Oracle schema file copied to hadoop).
For this script to run u need to just pass Hive database name,table name, partition column name,path, etc.. depending on u r customization level.At the end of this shell script add "hive -f HQL filename".
If everything is ready it just takes couple of mins for each table creation.

How can I know all the column in hbase table?

In hbase shell , I use describe 'table_name' , there is only column_family return. How can I get to know all the column in each columnfamily?
As #zsxwing said you need to scan all the rows since in HBase each row can have a completely different schema (that's part of the power of Hadoop - the ability to store poly-structured data). You can see the HFile file structure and see that HBase doesn't track the columns
Thus the column family(s) and its(their) setting are in fact the schema of the HBase table and that's what you get when you 'describe' it

Resources