Exporting/ETL data from Hive to MySQL - hadoop

I have a table in Hive called foo. I want to load data into MySQL table bar from Hive table foo. I want to do it using Sqoop Action - Oozie.
I am looking for an Oozie workflow.
--
My Hive table schema is:
hive> CREATE EXTERNAL TABLE IF NOT EXISTS foo (
id int, city string
)
row format delimited
fields terminated by '\t'
lines terminated by '\n'
LOCATION
'/user/cloudera/foo' ;
--
My MySQL table schema is:
mysql> create table bar (id int, city varchar(25));
--
I loaded local file foo to Hive table foo:
[cloudera#localhost ~]$ cat foo
1 a
4 b
Content of file foo are tab separated.
hive> load data local inpath '/home/cloudera/foo' into table foo;
--
Sqoop command could be something like:
sqoop export --connect jdbc:mysql://ap1.abcxyz.net/test --username rio --password r3o% --table bar --export-dir /user/cloudera/foo --input-fields-terminated-by '\t' --input-lines-terminated-by '\n'
--
Thanks,
Rio

Related

Export hql output to csv in beeline

I am trying to export my hql output to csv in beeline using below command :
beeline -u "jdbc:hive2://****/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2"?tez.queue.name=devices-jobs --outputformat=csv2 -e "use schema_name; select * from table_name where open_time_new>= '2020-07-13' and open_time_new < '2020-07-22'" > filename.csv
The problem is that some column values in the table contains commas which pushes the data of same column to the next column value.
For eg:
| abcd | as per data,outage fault,xxxx.
| xyz |as per the source,ghfg,hjhjg.
The above data will get saved as 4 column instead of 2.
Need help!
Try the approach with local directory:
insert overwrite local directory '/tmp/local_csv_report'
row format delimited fields terminated by "," escaped by '\\'
select *
from table_name
where open_time_new >= '2020-07-13'
and open_time_new < '2020-07-22'
This will create several csv files under your local /tmp/local_csv_report directory, so using simple cat after that will merge the results into a single file.

How to export a Hive table into a CSV file including header?

I used this Hive query to export a table into a CSV file.
hive -f mysql.sql
row format delimited fields terminated by ','
select * from Mydatabase,Mytable limit 100"
cat /LocalPath/* > /LocalPath/table.csv
However, it does not include table column names.
How to export in csv the column names ?
show tablename ?
You should add set hive.cli.print.header=true; before your select query to get column names as the first row of your output. The output would look as Mytable.col1, Mytable.col2 ....
If you don't want the table name with the column names, use set hive.resultset.use.unique.column.names=false;. The first row of your output would then look like col1, col2 ...
Invoking hive command-line with the parameters suggested in the other answer here works for a plain select. So, you can extract the column names and create the csv to start with, as follows:
hive -S --hiveconf hive.cli.print.header=true --hiveconf hive.resultset.use.unique.column.names=false --database Mydatabase -e 'select * from Mytable limit 0;' > /LocalPath/table.csv
Post which you can have the actual data extraction part run, except this time, remember to append to the csv:
cat /LocalPath/* >> /LocalPath/table.csv ## From your question with >> for append

Sqoop import Null string

The Null values are displayed as '\N' when a hive external table is queried.
Below is the sqoop import script:
sqoop import -libjars /usr/lib/sqoop/lib/tdgssconfig.jar,/usr/lib/sqoop/lib/terajdbc4.jar -Dmapred.job.queue.name=xxxxxx \
--connect jdbc:teradata://xxx.xx.xxx.xx/DATABASE=$db,LOGMECH=LDAP --connection-manager org.apache.sqoop.teradata.TeradataConnManager \
--username $user --password $pwd --query "
select col1,col2,col3 from $db.xxx
where \$CONDITIONS" \
--null-string '\N' --null-non-string '\N' \
--fields-terminated-by '\t' --num-mappers 6 \
--split-by job_number \
--delete-target-dir \
--target-dir $hdfs_loc
Please advise what change should be done to the script so that nulls are displayed as nulls when the external hive table is queried.
Sathiyan- Below are my findings after many trials
If (null string) property is not included during sqoop import, then NULLs are stored as [blank for integer columns] and [blank for string columns] in HDFS.
2.If the HIVE table on top of HDFS is queried, we would see [NULL for integer column] and [blank for String columns]
If the (--null-string '\N') property is included during sqoop import, then NULLs are stored as ['\N' for both integer and string columns].
If the HIVE table on top of HDFS is queried, we would see [NULL for both integer and string columns not '\N']
In your sqoop script you mentioned --null-string '\N' --null-non-string '\N which means,
--null-string '\N' = The string to be written for a null value for string columns
--null-non-string '\N' = The string to be written for a null value for non-string columns
If any value is NULL in the table and we want to sqoop that table ,then sqoop will import NULL value as string null in HDFS. So, that will create problem to use Null condition in our query using hive
For example: – Lets insert NULL value to mysql table “cities”.
mysql> insert into cities values(6,7,NULL);
By default, Sqoop will import NULL value as string null in HDFS.
Lets sqoop and see what happens:–
sqoop import –connect jdbc:mysql://localhost:3306/sqoop –username sqoop -P –table cities –hive-import –hive-overwrite –hive-table vikas.cities -m 1
http://deltafrog.com/how-to-handle-null-value-during-sqoop-import-export/
In The sqoop import command remove the --null-string and --null-non-string '\N' option.
by default system will assign null for both strings and non string values.
I have tried --null-string '\N' and --null-string '' and other options but getting blank and different issues.

Hive table creation error through Bash Shell

Can anyone give me why I am getting error while creating partitioed table from bash shell.
[cloudera#localhost ~]$ hive -e "create table peoplecountry (
name1 string,
name2 string,
salary int,
country string
)
partitioned by (country string)
row format delimited
column terminated by '\n'";
Logging initialized using configuration in jar:file:/usr/lib/hive/lib/hive-common-0.10.0-cdh4.7.0.jar!/hive-log4j.properties
Hive history file=/tmp/cloudera/hive_job_log_0fdf7083-8ab4-499f-8048-a85f162d1357_376056456.txt
FAILED: ParseException line 8:0 missing EOF at 'column' near 'delimited'
If you meant newline at end of each row of your data then you need to use:
line terminated by '\n'
instead of column terminated by ,
In case you meant each column in the row to separated by a delimiter , then specify as
fields terminated by '\n'
refer :
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL

hive load data:how to specify file column separator and dynamic partition columns?

well I had some question on loading mysql data into hive2, and don't know how to specify the separator, I tried for serval times but got nothing.
Here below is the hive table,id is the partition column,
0: jdbc:hive2://localhost/> desc test;
+-----------+------------+----------+
| col_name | data_type | comment |
+-----------+------------+----------+
| a | string | |
| id | int | |
+-----------+------------+----------+
When i execute
load data local inpath 'file:///root/test' into table test partition (id=1);
it says:
Invalid path ''file:///root/test'': No files matching path file
but it do exists.
I wish to dynamic partitioned by the specified file,so i add the very column into the file like this:
root#<namenode|~>:#cat /root/test
a,1
b,2
but it also failed,the docs say nothing about this,i guess it doesn't support right now.
dose anyone got some idea in it? any help will be appreciated!
If you want to specify column sperators it uses the command;
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
Replace the ',' with your separator
Also if you want to partition a Hive table you specify the column which you want to terminate on using;
CREATE TABLE Foo (bar int )
PARTITIONED BY (testpartition string)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','

Resources