Hadoop - sqoop Export/Import Partitioned table - hadoop

Can anyone explain how to export partitioned table from hive to MYSQL database?
And how to import into a hive partitioned table from mysql?
I have read the documents in google but not sure on the latest techniques that can be used.
Thanks

sqoop to hive partition import
1. create a table in mysql with 4 fields (id, name, age, sex)
CREATE TABLE `mon2`
(`id` int, `name` varchar(43), `age` int, `sex` varchar(334))
2. insert data into mysql table using csv abc.csv
1,mahesh,23,m
2,ramesh,32,m
3,prerna,43,f
4,jitu,23,m
5,sandip,32,m
6,gps,43,f
mysql> source location_of_your_csv/abc.csv
3. now start your hadoop service and goto $SQOOP_HOME and write sqoop import query for partition hive import.
sqoop import \
--connect jdbc:mysql://localhost:3306/apr \
--username root \
--password root \
-e "select id, name, age from mon2 where sex='m' and \$CONDITIONS" \
--target-dir /user/hive/warehouse/hive_part \
--split-by id \
--hive-overwrite \
--hive-import \
--create-hive-table \
--hive-partition-key sex \
--hive-partition-value 'm' \
--fields-terminated-by ',' \
--hive-table mar.hive_part \
--direct
hive to sqoop export with partition
1. create hive_temp table for load data
create table hive_temp
(id int, name string, age int, gender string)
row format delimited fields terminated by ',';
2. load data
load data local inpath '/home/zicone/Documents/pig_to_hbase/stack.csv' into table hive_temp;
3. create a partition table with a specific column that you want to partition.
create table hive_part1
(id int, name string, age int)
partitioned by (gender string)
row format delimited fields terminated by ',';
4. add a partition in hive_temp table
alter table hive_part1 add partition(gender='m');
5. copy data from temp to hive_part table
insert overwrite table hive_part1 partition(gender='m')
select id, name, age from hive_temp where gender='m';
6. sqoop export command
create a table in mysql
mysql> create table mon3 like mon2;
sqoop export \
--connect jdbc:mysql://localhost:3306/apr \
--table mon3 \
--export-dir /user/hive/warehouse/mar.db/hive_part1/gender=m \
-m 1 \
--username root \
--password root
now goto mysql terminal and run
select * from mon3;
hope it works for you
:)

Related

Importing subset of columns from RDBMS to Hive table with Sqoop

Suppose we have mysql database called lastdb with table person. This table contains 4 columns called: id, firstname, lastname, age.
Rows inside person table:
1, Firstname, Lastname, 20
I want to import data from this mysql person table to hive table with the same structure but only from the first and the last column from table person. So after my import the rows in hive table should look like this:
1, NULL, NULL, 20
I tried this sqoop command:
sqoop import --connect jdbc:mysql://localhost:3306/lastdb --table person --username root --password pass --hive-import --hive-database lastdb --hive-table person --columns id,age
But it imports rows to hive table in this format:
1, 20, NULL, NULL
Could anyone tell me how to fix that?
Given that rows inside your MySQL table are id, firstname, lastname, age of values: 1, NULL, NULL, 20, then, running the sqoop import script below will give your desired outcome in the hive person table.
~]$ sqoop import \
--connect \
jdbc:mysql://localhost:3306/lastdb \
--username=root \
--password=pass \
--query 'select * from person WHERE $CONDITIONS' \
--hive-import \
--hive-table person \
--hive-database lastdb \
--target-dir /user/hive/warehouse/person -m 1

unknown column 'cities.city' in 'field list'

unknown column 'cities.city' in 'field list'
sqoop import \
--connect jdbc:mysql://localhost/sivam_db \
--username root \
--password cloudera \
--query 'select cities.city as ccity,normcities.city as ncity from cities full join normcities using(id) where $CONDITIONS' \
--split-by id \
--target-dir /user/duplicatecolumn \
--m 1 \
--boundary-query "select min(id),max(id) from cities" \
--mapreduce-job-name fjoin \
--direct
I have checked all the posts related to this error and tried too, but still not resolved.
Schema of Cities:
create table cities(id int not null auto_increment,country varchar(30) not null,city varchar(30) not null, primary key (id));
Schema of normcities:
create table normcities(id int not null auto_increment,country_id int not null,city varchar(30) not null, primary key(id));
In the above query in sqoop command, the output will be having 2 columns which display city names and this is only specified in the select statement and no other columns are retrieved. So full join gives an error as we filter only matched city names..
so remove the word 'full' and run the commnad.
we will get output.

Sqoop import issue

I am importing a table into hive. So i have created a external table on hadoop and import data from oracle using sqoop. but the problem is when i am querying data all columns are into one column in hive.
Table:
CREATE EXTERNAL TABLE `default.dba_cdr_head`(
`BI_FILE_NAME` varchar(50),
`BI_FILE_ID` int,
`UPDDATE` TIMESTAMP)
LOCATION
'hdfs:/tmp/dba_cdr_head';
Sqoop:
sqoop import \
--connect jdbc:oracle:thin:#172.16.XX.XX:15xx:CALLS \
--username username\
--password password \
--table CALLS.DBM_CDR_HEAD \
--columns "BI_FILE_NAME, BI_FILE_ID, UPDDATE" \
--target-dir /tmp/dba_cdr_head \
--hive-table default.dba_cdr_head
data looks like as below:
hive> select * from dba_cdr_head limit 5;
OK
CFT_SEP0801_20120724042610_20120724043808M,231893, NULL NULL
CFT_SEP1002_20120724051341_20120724052057M,232467, NULL NULL
CFT_SEP1002_20120724052057_20120724052817M,232613, NULL NULL
CFT_SEP0701_20120724054201_20120724055154M,232904, NULL NULL
CFT_SEP0601_20120724054812_20120724055853M,233042, NULL NULL
Time taken: 3.693 seconds, Fetched: 5 row(s)
I have changed table create option ( ROW FORMAT DELIMITED FIELDS TERMINATED BY ',') and it solved.
CREATE EXTERNAL TABLE `default.dba_cdr_head`(
`BI_FILE_NAME` varchar(50),
`BI_FILE_ID` int,
`UPDDATE` TIMESTAMP)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
LOCATION
'hdfs:/tmp/dba_cdr_head';

Error unrecognized argument --hive-partition-key

I am getting error Unrecognized argument --hive-partition-key , when I run the following statement:
sqoop import
--connect 'jdbc:sqlserver://192.168.56.1;database=xyz_dms_cust_100;username-hadoop;password=hadoop'
--table e_purchase_category
--hive_import
--delete-target-dir
--hive-table purchase_category_p
--hive-partition-key "creation_date"
--hive-partition-value "2015-02-02"
The partitioned table exists.
Hive partition key (creation_date in your example) should not be part of your database table when you are using hive-import. When you are trying to create table in hive with partition you will not include partition column in your table schema. The same applies to sqoop hive-import as well.
Based on your sqoop command, i am guessing that creation_date column is present in your SQLServer table. If yes, you might be getting this error
ERROR tool.ImportTool: Imported Failed:
Partition key creation_date cannot be a column to import.
To resolve this issue, i have two solutions:
Make sure that the partition column is not present in the SQLServer table. So, when sqoop creates hive table it includes that partition column and its value as directory in hive warehouse.
Change the sqoop command by including a free form query to get all the columns expect the partiton column and do hive-import. Below is a example for this solution
Example:
sqoop import
--connect jdbc:mysql://localhost:3306/hadoopexamples
--query 'select City.ID, City.Name, City.District, City.Population from City where $CONDITIONS'
--target-dir /user/XXXX/City
--delete-target-dir
--hive-import
--hive-table City
--hive-partition-key "CountryCode"
--hive-partition-value "USA"
--fields-terminated-by ','
-m 1
Another method:
You can also try to do your tasks in different steps:
Create a partition table in hive (Example: city_partition)
Load data from RDBMS to sqoop using hive-import into a plain hive table (Example: city)
Using insert overwrite, import data into partition table (city_partition) from plain hive table (city) like:
INSERT OVERWRITE TABLE city_partition
PARTITION (CountryCode='USA')
SELECT id, name, district, population FROM city;
It could applied too :
sqoop import --connect jdbc:mysql://localhost/akash
--username root
--P
--table mytest
--where "dob='2019-12-28'"
--columns "id,name,salary"
--target-dir /user/cloudera/
--m 1 --hive-table mytest
--hive-import
--hive-overwrite
--hive-partition-key dob
--hive-partition-value '2019-12-28'

Import data from oracle into hive using sqoop - cannot use --hive-partition-key

I have a simple table:
create table osoba(id number, imie varchar2(100), nazwisko varchar2(100), wiek integer);
insert into osoba values(1, 'pawel','kowalski',36);
insert into osoba values(2, 'john','smith',55);
insert into osoba values(3, 'paul','psmithski',44);
insert into osoba values(4, 'jakub','kowalski',70);
insert into osoba values(5, 'scott','tiger',70);
commit;
that i want to import into Hive using sqoop. I want to have partitioned table in Hive. This is my sqoop command:
-bash-4.1$ sqoop import -Dmapred.job.queue.name=pr --connect "jdbc:oracle:thin:#localhost:1521/oracle_database" \
--username "user" --password "password" --table osoba --hive-import \
--hive-table pk.pk_osoba --delete-target-dir --hive-overwrite \
--hive-partition-key nazwisko
And I get an error:
FAILED: SemanticException [Error 10035]: Column repeated in partitioning columns
Can anyone suggest how --hive-partition-key parameter should be used?
Sqoop command without --hive-partition-key parameter works fine and creates table pk.pk_osoba in Hive.
Regards
Pawel
In columns option provide all those columns name which you want to import except the partition column. We don't specify the partition column in --columns option as it get automatically added.
Below is the example:
I am importing DEMO2 table with columns ID,NAME, COUNTRY. I have not specified COUNTRY column name in --columns options as it is my partition column name.
$SQOOP_HOME/bin/sqoop import --connect jdbc:oracle:thin:#192.168.41.67:1521/orcl --username orcluser1 --password impetus --hive-import --table DEMO2 --hive-table "DEMO2" --columns ID,NAME --hive-partition-key 'COUNTRY' --hive-partition-value 'INDIA' --m 1 --verbose --delete-target-dir --target-dir /tmp/13/DEMO2

Resources