Hive move & restore some partitions - hadoop

I have a 50TB managed partitioned (by date) Hive table, from which I want to move some old partitions to an external HDD, in order to restore them back later if required.
The scripts are as follows:
Move out:
$ hdfs dfs -get ${HIVE_WAREHOUSE_TABLE_PATH}/ingest_date=2016-01-01 ${LOCAL_TABLE_PATH}/2016-01-01
$ hdfs dfs -rm -r -skipTrash ${HIVE_WAREHOUSE_TABLE_PATH}/ingest_date=2016-01-01
$ hive -e "ALTER TABLE ${TABLE} DROP IF EXISTS PARTITION (ingestion_date='2016-01-01') PURGE;"
Restore:
$ hdfs dfs -put ${LOCAL_TABLE_PATH}/2016-01-01 ${HIVE_WAREHOUSE_TABLE_PATH}/ingest_date=2016-01-01
$ hive -e "ALTER TABLE ${TABLE} ADD PARTITION (ingest_date='2016-01-01') LOCATION ${HIVE_WAREHOUSE_TABLE_PATH}/ingest_date=2016-01-01;"
Do I miss something in the above strategy?
I have tried:
$ hive --hivevar local_path=${LOCAL_TABLE_PATH} -e "EXPORT TABLE myDatabase.theTable PARTITION (ingest_date='2016-01-01') to '${local_path}/2016-01-01';"
but this takes too long to CopyTable for 1 year partitions, and I am trying to avoid this.
Thank you,
Gee

Related

Dump data from beeline in HDFS directory

i am writing a bash script to export dynamic sql query into a hql file in HDFS directory.I am going to run this bash through oozie.
sql_v= select 'create table table_name from user_tab_columns where ...;'
beeline -u "$sql_v" > local_path
sql_v variable will store dynamic create table command which i want to store in a hql file in hdfs directory. If i run above 2 steps it runs fine because i am storing data in local path but instead of passing local_path i want to store sql in hdfs directory.Is there a way i can pass hdfs path instead of local_path like below but this doesn't work. Can i use any other command instead of beeline to achieve this ?
beeline -u "$sql_v" | hdfs dfs -appendToFile -
If the goal is to write the output of beeline to hdfs file then below options should work fine since both commands will pipe the standard output of beeline to hadoop commands as input which is recognized by (-).
beeline -u beeline_connection_string .... -e "$sql_v" | hadoop fs -put - /user/userid/file.hql
beeline -u beeline_connection_string .... -e "$sql_v" | hadoop fs -appendToFile - /user/userid/file.hql
Note:
1. It's a little unclear based on your question and comments on why can't you use the suggestion given by #cricket_007 and why to go for a beeline in particular.
echo "$sql_v" > file.hql
hadoop fs -put file.hql /user/userid/file.hql
beeline -u beeline_connection_string .... -e "$sql_v" > file.hql
hadoop fs -appendToFile file.hql /user/userid/file.hql
beeline -u beeline_connection_string .... -e "$sql_v" > file.hql
hadoop fs -put file.hql /user/userid/file.hql
If oozie shell action is used to run the bash script which containing the sql_v and beeline command, beeline needs to be present in the node where shell action will run if not you will face beeline not found an error.
Refer: beeline-command-not-found-error

How can I solve the error "file:/user/hive/warehouse/records is not a directory or unable to create one"?

hive> CREATE TABLE records (year STRING, temperature INT, quality INT)
> ROW FORMAT DELIMITED
> FIELDS TERMINATED BY '\t';
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:file:/user/hive/warehouse/records is not a directory or unable to create one)
How can I solve the error?
where is /user/hive/warehouse/ located? In my local ext4 filesystems under Ubuntu, there is no /user/hive/warehouse/ such a path.
How can I get information about i.e. examine /user/hive/warehouse/?
You should create /user/hive/warehouse folder in hdfs file system before running hive commands.
Hive internally uses hadoop hdfs file system to store database data. You can check the hdfs directory path in hive-default.xml and/or hive-site.xml configuration file or in hive terminal, using below command
hive> set hive.metastore.warehouse.dir;
As mentioned Hive uses Hadoop, so
Hadoop must be installed and running status
HADOOP_HOME environment variable must be set
export HADOOP_HOME=hadoop-install-dir
export PATH=$PATH:$HADOOP_HOME/bin
Directories in hdfs file system must be created and given access to hive
hadoop fs -mkdir -p /tmp
hadoop fs -mkdir -p /user/hive/warehouse
hadoop fs -chmod g+w /tmp
hadoop fs -chmod g+w /user/hive/warehouse
To list directories in hdfs file system
hadoop fs -ls /user
hadoop fs -ls /
hadoop fs -ls /user/hive/
Hive Wiki page

Oozie iterative workflow

I am building an application to ingest data from MYSQL DB to hive tables. App will be scheduled to execute every day.
The very first action is to read a Hive table to load import table info e.g name, type etc., and create a list of tables in a file to import. Next a Sqoop action to transfer data for each table in sequence.
Is it possible to create a shell script Oozie action which will iterate through the table list and launch oozie sub-workflow Sqoop action for each table in sequence? Could you provide some reference? Also any suggestion of a better approach!
I have come up with following shell script containing Sqoop action. It works fine with some environment variable tweaking.
hdfs_path='hdfs://quickstart.cloudera:8020/user/cloudera/workflow/table_metadata' table_temp_path='hdfs://quickstart.cloudera:8020/user/cloudera/workflow/hive_temp
if $(hadoop fs -test -e $hdfs_path)
then
for file in $(hadoop fs -ls $hdfs_path | grep -o -e "$hdfs_path/*.*");
do
echo ${file}
TABLENAME=$(hadoop fs -cat ${file});
echo $TABLENAME
HDFSPATH=$table_temp_path
sqoop import --connect jdbc:mysql://quickstart.cloudera:3306/retail_db --table departments --username=retail_dba --password=cloudera --direct -m 1 --delete-target-dir --target-dir $table_temp_path/$TABLENAME
done
fi

Is there any chance to retrieve files after deleting from HDFS trash

In our project we have four environments such has Production, Development, UAT and QA. I am using the UAT environment.
We have a cluster with 43 Data Nodes. My role is HDFS clean up. Unfortunately I deleted some jobs from Hive database and also from trash. Is there any chance to retrieve those files and tables?
I am using fallowing commands:
hadoop fs -du -h / | grep ' T'
hadoop fs -rm -r source path
hadoop fs -rm -r .Trash/some path

How to import/export hbase data via hdfs (hadoop commands)

I have saved my crawled data by nutch in Hbase whose file system is hdfs. Then I copied my data (One table of hbase) from hdfs directly to some local directory by command
hadoop fs -CopyToLocal /hbase/input ~/Documents/output
After that, I copied that data back to another hbase (other system) by following command
hadoop fs -CopyFromLocal ~/Documents/input /hbase/mydata
It is saved in hdfs and when I use list command in hbase shell, it shows it as another table i.e 'mydata' but when I run scan command, it says there is no table with 'mydata' name.
What is problem with above procedure?
In simple words:
I want to copy hbase table to my local file system by using a hadoop command
Then, I want to save it directly in hdfs in another system by hadoop command
Finally, I want the table to be appeared in hbase and display its data as the original table
If you want to export the table from one hbase cluster and import it to another, use any one of the following method:
Using Hadoop
Export
$ bin/hadoop jar <path/to/hbase-{version}.jar> export \
<tablename> <outputdir> [<versions> [<starttime> [<endtime>]]
NOTE: Copy the output directory in hdfs from the source to destination cluster
Import
$ bin/hadoop jar <path/to/hbase-{version}.jar> import <tablename> <inputdir>
Note: Both outputdir and inputdir are in hdfs.
Using Hbase
Export
$ bin/hbase org.apache.hadoop.hbase.mapreduce.Export \
<tablename> <outputdir> [<versions> [<starttime> [<endtime>]]]
Copy the output directory in hdfs from the source to destination cluster
Import
$ bin/hbase org.apache.hadoop.hbase.mapreduce.Import <tablename> <inputdir>
Reference: Hbase tool to export and import
If you can use the Hbase command instead to backup hbase tables you can use the Hbase ExportSnapshot Tool which copies the hfiles,logs and snapshot metadata to other filesystem(local/hdfs/s3) using a map reduce job.
Take snapshot of the table
$ ./bin/hbase shell
hbase> snapshot 'myTable', 'myTableSnapshot-122112'
Export to the required file system
$ ./bin/hbase class org.apache.hadoop.hbase.snapshot.ExportSnapshot -snapshot MySnapshot -copy-to fs://path_to_your_directory
You can export it back from the local file system to hdfs:///srv2:8082/hbase and run the restore command from hbase shell to recover the table from the snapshot.
$ ./bin/hbase shell
hbase> disable 'myTable'
hbase> restore_snapshot 'myTableSnapshot-122112'
Reference:Hbase Snapshots

Resources