Error unrecognized argument --hive-partition-key - hadoop

I am getting error Unrecognized argument --hive-partition-key , when I run the following statement:
sqoop import
--connect 'jdbc:sqlserver://192.168.56.1;database=xyz_dms_cust_100;username-hadoop;password=hadoop'
--table e_purchase_category
--hive_import
--delete-target-dir
--hive-table purchase_category_p
--hive-partition-key "creation_date"
--hive-partition-value "2015-02-02"
The partitioned table exists.

Hive partition key (creation_date in your example) should not be part of your database table when you are using hive-import. When you are trying to create table in hive with partition you will not include partition column in your table schema. The same applies to sqoop hive-import as well.
Based on your sqoop command, i am guessing that creation_date column is present in your SQLServer table. If yes, you might be getting this error
ERROR tool.ImportTool: Imported Failed:
Partition key creation_date cannot be a column to import.
To resolve this issue, i have two solutions:
Make sure that the partition column is not present in the SQLServer table. So, when sqoop creates hive table it includes that partition column and its value as directory in hive warehouse.
Change the sqoop command by including a free form query to get all the columns expect the partiton column and do hive-import. Below is a example for this solution
Example:
sqoop import
--connect jdbc:mysql://localhost:3306/hadoopexamples
--query 'select City.ID, City.Name, City.District, City.Population from City where $CONDITIONS'
--target-dir /user/XXXX/City
--delete-target-dir
--hive-import
--hive-table City
--hive-partition-key "CountryCode"
--hive-partition-value "USA"
--fields-terminated-by ','
-m 1
Another method:
You can also try to do your tasks in different steps:
Create a partition table in hive (Example: city_partition)
Load data from RDBMS to sqoop using hive-import into a plain hive table (Example: city)
Using insert overwrite, import data into partition table (city_partition) from plain hive table (city) like:
INSERT OVERWRITE TABLE city_partition
PARTITION (CountryCode='USA')
SELECT id, name, district, population FROM city;

It could applied too :
sqoop import --connect jdbc:mysql://localhost/akash
--username root
--P
--table mytest
--where "dob='2019-12-28'"
--columns "id,name,salary"
--target-dir /user/cloudera/
--m 1 --hive-table mytest
--hive-import
--hive-overwrite
--hive-partition-key dob
--hive-partition-value '2019-12-28'

Related

How to pass a string value in sqoop free form query

I need to import data from few different SQL servers which have same tables, table structure and even primary key value. So to uniquely identify a record, ingested from a SQLserver say "S1", i want to have a extra column - say "serverName" in my hive tables. How should i add this in my sqoop free form query.
All i want to do is pass a hardcoded value along with list of columns such that the hardcoded column value should get stored in Hive. Once done, I can take care of dynamically changing this value depending upon the server data.
sqoop import --connect "connDetails" --username "user" --password "pass" --query "select col1, col2, col3, 'S1' from table where \$CONDITIONS" --hive-import --hive-overwrite --hive-table stg.T1 --split-by col1 --as-textfile --target-dir T1 --hive-drop-import-delims
S1 being the hardcoded value here. I am thinking in SQL-way that when you pass a hardcode value, same is returned as the query result. Any pointers how to get this done?
Thanks in Advance.
SOLVED: Actually it just needed an alias for the hardcoded value. So the sqoop command executed is -
sqoop import --connect "connDetails" --username "user" --password "pass" --query "select col1, col2, col3, 'S1' as serverName from table where \$CONDITIONS" --hive-import --hive-overwrite --hive-table stg.T1 --split-by col1 --as-textfile --target-dir T1 --hive-drop-import-delims

Importing subset of columns from RDBMS to Hive table with Sqoop

Suppose we have mysql database called lastdb with table person. This table contains 4 columns called: id, firstname, lastname, age.
Rows inside person table:
1, Firstname, Lastname, 20
I want to import data from this mysql person table to hive table with the same structure but only from the first and the last column from table person. So after my import the rows in hive table should look like this:
1, NULL, NULL, 20
I tried this sqoop command:
sqoop import --connect jdbc:mysql://localhost:3306/lastdb --table person --username root --password pass --hive-import --hive-database lastdb --hive-table person --columns id,age
But it imports rows to hive table in this format:
1, 20, NULL, NULL
Could anyone tell me how to fix that?
Given that rows inside your MySQL table are id, firstname, lastname, age of values: 1, NULL, NULL, 20, then, running the sqoop import script below will give your desired outcome in the hive person table.
~]$ sqoop import \
--connect \
jdbc:mysql://localhost:3306/lastdb \
--username=root \
--password=pass \
--query 'select * from person WHERE $CONDITIONS' \
--hive-import \
--hive-table person \
--hive-database lastdb \
--target-dir /user/hive/warehouse/person -m 1

Hadoop-Sqoop import without an integer value using split-by

I am importing data from memsql to Hdfs using Sqoop. My source table in Memsql doesn't have any integer value, I created a new table including a new column 'test' with the existing columns.
FOllowing is the query
sqoop import --connect jdbc:mysql://XXXXXXXXX:3306/db_name --username XXXX --password XXXXX --query "select closed,extract_date,open,close,cast(floor(rand()*1000000 as int) as test from tble_name where \$CONDITIONS" --target-dir /user/XXXX--split-by test;
this query gave me following error :
com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'as int) as test from table_name where (1 = 0)' at line 1
I tried it another way as well:
sqoop import --connect jdbc:mysql://XXXXX:3306/XXXX --username XXXX --password XXXX --query "select closed,extract_date,open,close,ceiling(rand()*1000000) as test from table_name where \$CONDITIONS" --target-dir /user/dfsdlf --split-by test;
With the following query the job gets executed, but there is no data being transferred. It says split-by column is of float type and change it to integer type strictly.
Please help me with this to change split-by column as integer type from float type
The problem mostly seems to be related with the use of alias as the --split-by parameter.
If it's required to use the particular column in the query , you can run the query
'select closed,extract_date,open,close,ceiling(rand()*1000000) from table_name' in the console, get the column name thus coming for the table in the console and use it in --split-by 'complete_column_name_from_console' (here it should be --split-by 'ceiling(rand()*1000000)') .

Hadoop - sqoop Export/Import Partitioned table

Can anyone explain how to export partitioned table from hive to MYSQL database?
And how to import into a hive partitioned table from mysql?
I have read the documents in google but not sure on the latest techniques that can be used.
Thanks
sqoop to hive partition import
1. create a table in mysql with 4 fields (id, name, age, sex)
CREATE TABLE `mon2`
(`id` int, `name` varchar(43), `age` int, `sex` varchar(334))
2. insert data into mysql table using csv abc.csv
1,mahesh,23,m
2,ramesh,32,m
3,prerna,43,f
4,jitu,23,m
5,sandip,32,m
6,gps,43,f
mysql> source location_of_your_csv/abc.csv
3. now start your hadoop service and goto $SQOOP_HOME and write sqoop import query for partition hive import.
sqoop import \
--connect jdbc:mysql://localhost:3306/apr \
--username root \
--password root \
-e "select id, name, age from mon2 where sex='m' and \$CONDITIONS" \
--target-dir /user/hive/warehouse/hive_part \
--split-by id \
--hive-overwrite \
--hive-import \
--create-hive-table \
--hive-partition-key sex \
--hive-partition-value 'm' \
--fields-terminated-by ',' \
--hive-table mar.hive_part \
--direct
hive to sqoop export with partition
1. create hive_temp table for load data
create table hive_temp
(id int, name string, age int, gender string)
row format delimited fields terminated by ',';
2. load data
load data local inpath '/home/zicone/Documents/pig_to_hbase/stack.csv' into table hive_temp;
3. create a partition table with a specific column that you want to partition.
create table hive_part1
(id int, name string, age int)
partitioned by (gender string)
row format delimited fields terminated by ',';
4. add a partition in hive_temp table
alter table hive_part1 add partition(gender='m');
5. copy data from temp to hive_part table
insert overwrite table hive_part1 partition(gender='m')
select id, name, age from hive_temp where gender='m';
6. sqoop export command
create a table in mysql
mysql> create table mon3 like mon2;
sqoop export \
--connect jdbc:mysql://localhost:3306/apr \
--table mon3 \
--export-dir /user/hive/warehouse/mar.db/hive_part1/gender=m \
-m 1 \
--username root \
--password root
now goto mysql terminal and run
select * from mon3;
hope it works for you
:)

Import data from oracle into hive using sqoop - cannot use --hive-partition-key

I have a simple table:
create table osoba(id number, imie varchar2(100), nazwisko varchar2(100), wiek integer);
insert into osoba values(1, 'pawel','kowalski',36);
insert into osoba values(2, 'john','smith',55);
insert into osoba values(3, 'paul','psmithski',44);
insert into osoba values(4, 'jakub','kowalski',70);
insert into osoba values(5, 'scott','tiger',70);
commit;
that i want to import into Hive using sqoop. I want to have partitioned table in Hive. This is my sqoop command:
-bash-4.1$ sqoop import -Dmapred.job.queue.name=pr --connect "jdbc:oracle:thin:#localhost:1521/oracle_database" \
--username "user" --password "password" --table osoba --hive-import \
--hive-table pk.pk_osoba --delete-target-dir --hive-overwrite \
--hive-partition-key nazwisko
And I get an error:
FAILED: SemanticException [Error 10035]: Column repeated in partitioning columns
Can anyone suggest how --hive-partition-key parameter should be used?
Sqoop command without --hive-partition-key parameter works fine and creates table pk.pk_osoba in Hive.
Regards
Pawel
In columns option provide all those columns name which you want to import except the partition column. We don't specify the partition column in --columns option as it get automatically added.
Below is the example:
I am importing DEMO2 table with columns ID,NAME, COUNTRY. I have not specified COUNTRY column name in --columns options as it is my partition column name.
$SQOOP_HOME/bin/sqoop import --connect jdbc:oracle:thin:#192.168.41.67:1521/orcl --username orcluser1 --password impetus --hive-import --table DEMO2 --hive-table "DEMO2" --columns ID,NAME --hive-partition-key 'COUNTRY' --hive-partition-value 'INDIA' --m 1 --verbose --delete-target-dir --target-dir /tmp/13/DEMO2

Resources