Migrate clickhouse DB to new server - clickhouse

I have one DB on a clickhouse-server. I want to migrate it to a new clickhouse server. How to do it correctly? I tried using clickhouse-backup, but got errors
2021/05/31 15:21:45 warn can't create table 'Data.Calls': <nil>, will try again
2021/05/31 15:21:45 warn can't create table 'Data.Fcst_1': <nil>, will try again
2021/05/31 15:21:45 warn can't create table 'Data.Fcst_3': <nil>, will try again
2021/05/31 15:21:45 warn can't create table 'Data.GAInfo': <nil>, will try again
2021/05/31 15:21:45 warn can't create table 'Data.HistoryLogs': <nil>, will try again
2021/05/31 15:21:45 warn can't create table 'Data.Info': <nil>, will try again
2021/05/31 15:21:45 warn can't create table 'Data.Info1': <nil>, will try again
2021/05/31 15:21:45 warn can't create table 'Data.Info15': <nil>, will try again
2021/05/31 15:21:45 warn can't create table 'Data.Info548': <nil>, will try again
2021/05/31 15:21:45 warn can't create table 'Data.Info60': <nil>, will try again
2021/05/31 15:21:45 warn can't create table 'Data.InfoAXPNew': <nil>, will try again
2021/05/31 15:21:45 warn can't create table 'Data.InfoAXPOld': <nil>, will try again
2021/05/31 15:21:45 warn can't create table 'Data.InfoPageView': <nil>, will try again

It depends on connectivity between old and new clusters.
It it is present (and good), you can try remote function which is the most flexible way to migrate.
I would further suggest to write a script which can read table using remote function and create table on new cluster and then inserts data either in one go or daily, weekly or monthly iteration with WHERE somedatecolumn BETWEEN {start_date} AND {end_date}.
And if you cannot establish direct connectivity between old and new clusters, you will have to fall back to conventional ways of data extraction and upload.

Related

Unable to create Transactional ORC table in Hive

I am trying to create a transactional ORC table in Hive using beeline.
DDL:
CREATE TABLE employee_trans (
id int,
name string,
age int,
gender string)
STORED AS ORC
TBLPROPERTIES ('transactional'='true');
I have also set the below properties:
SET hive.support.concurrency=true;
SET hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;
But I am getting the below error,
Error: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:The table must be bucketed and stored using an ACID compliant format (such as ORC))
Can someone please help!

Access ENUM from HIVE to IMPALA without changing the existing HIVE schema

Requirement: I have a HIVE table which has ENUM datatype and would like to access the same from IMPALA environment.
Blocker: As IMPALA does not support ENUM I could not access the data from HIVE table.
Restriction: Should not change the existing HIVE schema.
Can some one please take a look and suggest me the way I can resolve this issue ?
Code snippet:(In Impala)
ABC is HIVE table
enum_col datatype is ENUM
invalidate metadata ABC;
select enum_col from ABC;
Error:
TableLoadingException: Failed to load metadata for table: ABC CAUSED BY:
AnalysisException: Unsupported type 'enum' of column 'enum_col'
Ways I've tried: None of them do work
select cast(enum_col as string) from ABC;
select coalesce (cast(enum_col as string), null) from ABC;

Load Hbase table from hive

I am trying to load the hbase table from hive table, for that I am using the following approach and it works fine if I have only single column family in hbase table, however if I have multiple families it throws error.
Approach
source table
CREATE EXTERNAL TABLE temp.employee_orc(id String, name String, Age int)
STORED AS ORC
LOCATION '/tmp/employee_orc/table';
Create Hive table with Hbase Serde
CREATE TABLE temp.employee_hbase(id String, name String, age int)
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES ('hbase.columns.mapping' = ':key,emp:name,emp:Age')
TBLPROPERTIES("hbase.table.name" = "bda:employee_hbase", "hfile.family.path"="/tmp/employee_hbase/emp", "hive.hbase.generatehfiles"="true");
export the hbase files
SET hive.hbase.generatehfiles=true;
INSERT OVERWRITE TABLE temp.employee_hbase SELECT DISTINCT id, name, Age FROM temp.employee_orc CLUSTER BY id;
Load the hbase table
export HADOOP_CLASSPATH=`hbase classpath`
hadoop jar /usr/hdp/current/hbase-client/lib/hbase-server.jar completebulkload /tmp/employee_hbase/ 'bda:employee_hbase'
Error
I am getting following error if I have multiple column family in Hbase table,
java.lang.RuntimeException: Hive Runtime Error while closing operators: java.io.IOException: Multiple family directories found in hdfs://hadoopdev/apps/hive/warehouse/temp.db/employee_hbase/_temporary/0/_temporary/attempt_1527799542731_1180_r_000000_0
is there another way to load Hbase table if not this approach?
Bulk load from hive to hbase, The target table can only have a single column family.
bulk load of hbase
You can use hbase bulkload hbase_bulkload with support multiple column family
Or you can use multiple hive table for each column family

alter table/add columns in non native table in hive

I created a hive table with a storage handler and now I want to add a column to that table but it gives me below error:
[Code: 10134, SQL State: 42000] Error while compiling statement: FAILED:
SemanticException [Error 10134]: ALTER TABLE can only be used for [ADDPROPS,
DROPPROPS] to a non-native table
As per the hive documentation any hive table you create with storage handler is non native table.
Here's a link https://cwiki.apache.org/confluence/display/Hive/StorageHandlers
There is a JIRA case for enhancement is open with Apache for the same.
https://issues.apache.org/jira/browse/HIVE-1240
For ex, I am using Druid Storage Handler in my case.
I created a hive table using:
CREATE TABLE druid_table_1
(`__time` TIMESTAMP, `dimension1` STRING, `metric1` int)
STORED BY 'org.apache.hadoop.hive.druid.DruidStorageHandler';
and then I am trying to add a column:
ALTER TABLE druid_table_1 ADD COLUMNS (`dimension2` STRING);
With above approach I am getting an error.
Is there any other way to add a column to non native tables in hive without recreating it?
Patch is available in HDP 2.5+ from Hortonworks. Support for ADD columns has been added in ALTER statement.
Column can be added into druid table using ALTER table DDL in hive.
ALTER TABLE ADD COLUMNS (col_name data_type)
There is no need to specify partition spec as these are druid backed hive tables and partition/storage is maintained by druid.

Spark Sql 1.5 dataframe saveAsTable how to add hive table properties

I am running spark sql on hive. I need to add auto.purge table properties while creating new hive table. I tried below code to add options while calling saveAsTable method :
inputDF.write.option("auto.purge" -> "true").saveAsTable(hiveTableName)
Above line of code added a property under WITH SERDEPROPERTIES of table.
I need to add this property under TBLPROPERTIES section of hive DDL.
Finally i found a solution, I am not sure if this is the best solution.
Unfortunately Spark 1.5 sql saveAsTable method doesn't support table property as input.They are creating new tableProperties map before hive table creation.
check out below code:
https://github.com/apache/spark/blob/v1.5.0/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala
To add table properties to existing hive table use alter table command.
ALTER TABLE table_name SET TBLPROPERTIES ('auto.purge'='true');
Above command will add table property to hive meta store.
To drop existing table inside encryption zone run above command before drop command.

Resources