I've got an error importing data from teradate to an Hadoop cluster using Sqoop.
My tera table have 2 columns title (not columns name)equivalent. Is there an automatic way to use col name instead of col title in my sqoop job ?
I've tried to use a "Select * from table" as a query but does not work.
And I can't change col title in teradata.
Here my job code :
sqoop job -Dmapred.job.queue.name=shortduration \
--create inc_My_Table \
-- import \
--connect jdbc:teradata://RCT/DATABASE=DWHBIG \
--driver com.teradata.jdbc.TeraDriver \
--username MBIGDATA -P \
--query "select a.* from My_Table a where \$CONDITIONS" \
--target-dir /data/source/fb/$i \
--check-column DAT_MAJ_DWH \
--incremental append \
--last-value 2001-01-01 \
--split-by ID
Any idea ? Thanks
Since Teradata JDBC Driver 16.00.00.28, you can use connection URL parameter COLUMN_NAME to control the behavior of the getColumnName and getColumnLabel to return the column name, column title, or As-clause name, That should resolve your problem.
COLUMN_NAME=OFF (the default) specifies that the
ResultSetMetaData.getColumnName method should return the AS-clause
name if available, or the column name if available, or the column
title, and specifies that the ResultSetMetaData.getColumnLabel method
should return the column title.
COLUMN_NAME=ON specifies that, when StatementInfo parcel support is
available, the ResultSetMetaData.getColumnName method should return
the column name if available, and specifies that the
ResultSetMetaData.getColumnLabel method should return the AS-clause
name if available, or the column name if available, or the column
title. This option has no effect when StatementInfo parcel support is
unavailable.
I finally found a solution. I aliased the column that was causing the issue using the AS SQL command!
Related
I am importing data from memsql to Hdfs using Sqoop. My source table in Memsql doesn't have any integer value, I created a new table including a new column 'test' with the existing columns.
FOllowing is the query
sqoop import --connect jdbc:mysql://XXXXXXXXX:3306/db_name --username XXXX --password XXXXX --query "select closed,extract_date,open,close,cast(floor(rand()*1000000 as int) as test from tble_name where \$CONDITIONS" --target-dir /user/XXXX--split-by test;
this query gave me following error :
com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'as int) as test from table_name where (1 = 0)' at line 1
I tried it another way as well:
sqoop import --connect jdbc:mysql://XXXXX:3306/XXXX --username XXXX --password XXXX --query "select closed,extract_date,open,close,ceiling(rand()*1000000) as test from table_name where \$CONDITIONS" --target-dir /user/dfsdlf --split-by test;
With the following query the job gets executed, but there is no data being transferred. It says split-by column is of float type and change it to integer type strictly.
Please help me with this to change split-by column as integer type from float type
The problem mostly seems to be related with the use of alias as the --split-by parameter.
If it's required to use the particular column in the query , you can run the query
'select closed,extract_date,open,close,ceiling(rand()*1000000) from table_name' in the console, get the column name thus coming for the table in the console and use it in --split-by 'complete_column_name_from_console' (here it should be --split-by 'ceiling(rand()*1000000)') .
My doubt is, Say, I have a file A1.csv with 2000 records on sql-server table, I import this data into hdfs, later that day I have added 3000 records to the same file on sql-server table.
Now, I want to run incremental import for the second chunk of data to be added on hdfs, but, I do not want complete 3000 records to be imported. I need only some data according to my necessity to be imported, like, 1000 records with certain condition to be imported as part of the increment import.
Is there a way to do that using sqoop incremental import command?
Please Help, Thank you.
You need a unique key or a Timestamp field to identify the deltas which is the new 1000 records in your case. using that field you have to options to bring in the data to Hadoop.
Option 1
is to use the sqoop incremental append, below is the example of it
sqoop import \
--connect jdbc:oracle:thin:#enkx3-scan:1521:dbm2 \
--username wzhou \
--password wzhou \
--table STUDENT \
--incremental append \
--check-column student_id \
-m 4 \
--split-by major
Arguments :
--check-column (col) #Specifies the column to be examined when determining which rows to import.
--incremental (mode) #Specifies how Sqoop determines which rows are new. Legal values for mode include append and lastmodified.
--last-value (value) Specifies the maximum value of the check column from the previous import.
Option 2
Using the --query argument in sqoop where you can use the native sql for mysql/any database you connect to.
Example :
sqoop import \
--query 'SELECT a.*, b.* FROM a JOIN b on (a.id == b.id) WHERE $CONDITIONS' \
--split-by a.id --target-dir /user/foo/joinresults
sqoop import \
--query 'SELECT a.*, b.* FROM a JOIN b on (a.id == b.id) WHERE $CONDITIONS' \
-m 1 --target-dir /user/foo/joinresults
Can anyone explain how to export partitioned table from hive to MYSQL database?
And how to import into a hive partitioned table from mysql?
I have read the documents in google but not sure on the latest techniques that can be used.
Thanks
sqoop to hive partition import
1. create a table in mysql with 4 fields (id, name, age, sex)
CREATE TABLE `mon2`
(`id` int, `name` varchar(43), `age` int, `sex` varchar(334))
2. insert data into mysql table using csv abc.csv
1,mahesh,23,m
2,ramesh,32,m
3,prerna,43,f
4,jitu,23,m
5,sandip,32,m
6,gps,43,f
mysql> source location_of_your_csv/abc.csv
3. now start your hadoop service and goto $SQOOP_HOME and write sqoop import query for partition hive import.
sqoop import \
--connect jdbc:mysql://localhost:3306/apr \
--username root \
--password root \
-e "select id, name, age from mon2 where sex='m' and \$CONDITIONS" \
--target-dir /user/hive/warehouse/hive_part \
--split-by id \
--hive-overwrite \
--hive-import \
--create-hive-table \
--hive-partition-key sex \
--hive-partition-value 'm' \
--fields-terminated-by ',' \
--hive-table mar.hive_part \
--direct
hive to sqoop export with partition
1. create hive_temp table for load data
create table hive_temp
(id int, name string, age int, gender string)
row format delimited fields terminated by ',';
2. load data
load data local inpath '/home/zicone/Documents/pig_to_hbase/stack.csv' into table hive_temp;
3. create a partition table with a specific column that you want to partition.
create table hive_part1
(id int, name string, age int)
partitioned by (gender string)
row format delimited fields terminated by ',';
4. add a partition in hive_temp table
alter table hive_part1 add partition(gender='m');
5. copy data from temp to hive_part table
insert overwrite table hive_part1 partition(gender='m')
select id, name, age from hive_temp where gender='m';
6. sqoop export command
create a table in mysql
mysql> create table mon3 like mon2;
sqoop export \
--connect jdbc:mysql://localhost:3306/apr \
--table mon3 \
--export-dir /user/hive/warehouse/mar.db/hive_part1/gender=m \
-m 1 \
--username root \
--password root
now goto mysql terminal and run
select * from mon3;
hope it works for you
:)
I am getting error Unrecognized argument --hive-partition-key , when I run the following statement:
sqoop import
--connect 'jdbc:sqlserver://192.168.56.1;database=xyz_dms_cust_100;username-hadoop;password=hadoop'
--table e_purchase_category
--hive_import
--delete-target-dir
--hive-table purchase_category_p
--hive-partition-key "creation_date"
--hive-partition-value "2015-02-02"
The partitioned table exists.
Hive partition key (creation_date in your example) should not be part of your database table when you are using hive-import. When you are trying to create table in hive with partition you will not include partition column in your table schema. The same applies to sqoop hive-import as well.
Based on your sqoop command, i am guessing that creation_date column is present in your SQLServer table. If yes, you might be getting this error
ERROR tool.ImportTool: Imported Failed:
Partition key creation_date cannot be a column to import.
To resolve this issue, i have two solutions:
Make sure that the partition column is not present in the SQLServer table. So, when sqoop creates hive table it includes that partition column and its value as directory in hive warehouse.
Change the sqoop command by including a free form query to get all the columns expect the partiton column and do hive-import. Below is a example for this solution
Example:
sqoop import
--connect jdbc:mysql://localhost:3306/hadoopexamples
--query 'select City.ID, City.Name, City.District, City.Population from City where $CONDITIONS'
--target-dir /user/XXXX/City
--delete-target-dir
--hive-import
--hive-table City
--hive-partition-key "CountryCode"
--hive-partition-value "USA"
--fields-terminated-by ','
-m 1
Another method:
You can also try to do your tasks in different steps:
Create a partition table in hive (Example: city_partition)
Load data from RDBMS to sqoop using hive-import into a plain hive table (Example: city)
Using insert overwrite, import data into partition table (city_partition) from plain hive table (city) like:
INSERT OVERWRITE TABLE city_partition
PARTITION (CountryCode='USA')
SELECT id, name, district, population FROM city;
It could applied too :
sqoop import --connect jdbc:mysql://localhost/akash
--username root
--P
--table mytest
--where "dob='2019-12-28'"
--columns "id,name,salary"
--target-dir /user/cloudera/
--m 1 --hive-table mytest
--hive-import
--hive-overwrite
--hive-partition-key dob
--hive-partition-value '2019-12-28'
I have an SQL table with the following columns:
name, fname, e-mail, phone
How to import this table with Sqoop into CSV file on HDFS with:
An extra phone2 column so to have the following format of output CSV record:
name, fname, e-mail, phone, phone2
where phone2 has udef value for all output records.
Some input records may have an empty e-mail field which results in CSV lines with ,, fields like this:
John,Smith,,1234567
How to replace ,, empty strings with undef string? To have CSV with records like:
John ,Smith ,undef ,1234567, undef
Tom, Brooks, toom#abc.com, 78979878, undef
...
etc
Sqoop can take a query so in addition to specifying your --null-string and --null-non-string options, you can specify any old query to export. For you, your query is pretty simple:
select name, fname, e-mail, phone, null AS phone2 FROM people
And then you just drop it in your sqoop command. Note that you may need to do --map-column-java to tell sqoop what data type you want the columns to be since with a custom query, it won't nec. be able to figure it out.
sqoop \
--query 'select name, fname, e-mail, phone, null AS phone2 FROM people'
--null-string UNDEF
--null-non-string UNDEF
... connection info and other options, if nec....
Bonus tip: some databases can export super fast with the --direct option enabled so you may want to look at that, depending on the size of your table.
As far as i know , While importing SQL data to Sqoop there is no way to add Extra columns.
But it is possible to change the null values to some other values using null-string. For Example,
sqoop import \
--connect jdbc:mysql://mysql.example.com/sqoop \
--username sqoop \
--password sqoop \
--table cities \
--null-string 'UNDEF' \
--null-non-string 'UNDEF'
The above code changed the null value to 'UNDEF'.