How to use sqoop to export the default hive delimited output? - hadoop

I have a hive query:
insert override directory /x
select ...
Then I'm try to export the data with sqoop
sqoop export --connect jdbc:mysql://mysqlm/site --username site --password site --table x_data --export-dir /x --input-fields-terminated-by 0x01 --lines-terminated-by '\n'
But this seems to fail to parse the fields according to delimiter
What am I missing?
I think the --input-fields-terminated-by 0x01 part doesn't work as expected?
I do not want to create additional tables in hive that contains the query results.
stack trace:
2013-09-24 05:39:21,705 ERROR org.apache.sqoop.mapreduce.TextExportMapper: Exception:
java.lang.NumberFormatException: For input string: "9-2"
at java.lang.NumberFormatException.forInputString(NumberFormatException.java:48)
at java.lang.Integer.parseInt(Integer.java:458)
...
The vi view of output
16-09-2013 23^A1182^A-1^APub_X^A21782^AIT^A1^A0^A0^A0^A0^A0.0^A0.0^A0.0
16-09-2013 23^A1182^A6975^ASoMo Audience Corp^A2336143^AUS^A1^A1^A0^A0^A0^A0.2^A0.0^A0.0
16-09-2013 23^A1183^A-1^APub_UK, Inc.^A1564001^AGB^A1^A0^A0^A0^A0^A0.0^A0.0^A0.0
17-09-2013 00^A1120^A-1^APub_US^A911^A--^A181^A0^A0^A0^A0^A0.0^A0.0^A0.0

I've found the correct solution for that special character in bash
#!/bin/bash
# ... your script
hive_char=$( printf "\x01" )
sqoop export --connect jdbc:mysql://mysqlm/site --username site --password site --table x_data --export-dir /x --input-fields-terminated-by ${hive_char} --lines-terminated-by '\n'
The problem was in correct separator recognition (nothing to do with types and schema) and that was achieved by hive_char.
Another possibility to encode this special character in linux to command-line is to type Cntr+V+A

Using
--input-fields-terminated-by '\001' --lines-terminated-by '\n'
as flags in the sqoop export command seems to do the trick for me.
So, in your example, the full command would be:
sqoop export --connect jdbc:mysql://mysqlm/site --username site --password site --table x_data --export-dir /x --input-fields-terminated-by '\001' --lines-terminated-by '\n'

I think its the DataType mismatch with your RDBMS schema.
Try to find the column name of "9-2" value and check the datatype in RDBMS schema.
If its int or numeric then Sqoop will parse the value and insert. And as it seems "9-2" is not numeric value.
Let me know if this doesn't work.

It seems like sqoop is taking '0' as a delimiter .
You are getting an error because:-
First column in your mysql table could be varchar and second column is a number.
As per below string:-
16- 0 9-2 0 13 23^A1182^A-1^APub_X^A21782^AIT^A1^A0^A0^A0^A0^A0.0^A0.0^A0.0
Your first column parsed by sqoop is :-16-
and second column is:-9-2
So its better to specify a delimiter in quotes('0x01')
or
(Its always easy and has better control)use hive create table command as:-
create table tablename row format delimited fields terminated by '\t' as select ... and specify '\t' as delimiter in your sqoop command.

Related

Special characters are not proper after sqooping data into Hive from teradata

I'm trying to sqoop the teradata table into Hive using below "sqoop-import" command.
sqoop tdimport
-Dtdch.output.hdfs.avro.schema.file=/tmp/data/country.avsc --connect jdbc:teradata://tdserver/database=SALES --username tduser
--password tdpw --as-avrodatafile --target-dir /tmp/data/country_avro --table COUNTRY
--split-by SALESCOUNTRYCODE --num-mappers 1
The teradata table contains special characters in some columns.After sqooping into Hive, the special characters are not coming proper.
Is there any way to enable the special characters while firing the sqoop import command?
Do we need to use UTF-8, to resolve this issue ?
Can anyone please suggest me regarding this issue ...

sqoop export commands for the data which has spaces before in hdfs

i have data which has stored in hdfs, the data has space before and after of the value, when i try to export to mysql, it gives numberformat exception but when i create data without space, it has inserted into mysql successfully.
my question is can't we export the data which has space from hdfs to mysql usong sqoop export command?
The data which i used
1201, adi, sen manager, 30000, it
1201, pavan, jun manager, 5000, cs
1203, santhosh, junior, 60000, mech
i created table like
create table emp(id BIGINT,name varchar(20),desg varchar(20),salary BIGINT,dept varchar(20));
sqoop command -- sqoop export \
--connect jdbc:mysql://127.0.0.1/mydb \
--username root \
--table emp \
--m 1 \
--export-dir /mydir \
--input-fields-terminated-by ',' \
--input-lines-terminated-by '\n'
result: numberformatexception input string:'1201'
can't parse the data
i discussed in forum, they said trim the space but i wants to know that automatically trim the spaces while perform sqoop export.
can somebody give suggestions on this?
You can do 1 simple thing:
Create temporary table in MySQL with all VARCHAR
create table emp-temp(id BIGINT,name varchar(20),desg varchar(20),salary BIGINT,dept varchar(20));
Now create another with numeric fields after TRIM() and CAST()
create table emp as select CAST(TRIM(id) AS UNSIGNED), name, desg, CAST(TRIM(salary) AS UNSIGNED), dept FROM emp_temp;
Sqoop internally runs MapReduce job.
Simple solution is to run a Mapper and trim the spaces in your data and get the output in different file and run sqoop export on new file.

Sqoop Export to Oracle-Caused by: java.lang.RuntimeException: Can't parse input data: '\N'

Sqoop export to oracle fails with the below exception
Caused by: java.lang.RuntimeException: Can't parse input data: '\N'
I have null columns in HDFS.
Below is the command i used.
sqoop export --connect jdbc:oracle:thin:#XXXXXXXXXXXXX \
--username XX \
--password XXXXX \
--table XXXXXXXXXXXXXXXXXX\
--export-dir '/datalake/qa/etl/XXXXXXX/XXXXXXXXXXXX' --input-fields-terminated-by ',' --input-null-string '\\N' --input-null-non-string '\\N'
and I tried with --input-null-string "\\\\N" --input-null-non-string "\\\\N" still no luck.
The issue Is caused by the NUL character in the text.
for oracle database we dont need to mention --input-null-string for Null values which i tried in different ways thinking this is the cause for issue.
I checked the log files of the map task that was failed and found the NULL character in the string which is causing the issue.
I resolved the issue by using regexp_replace before exporting to HDFS directory using hive query
regexp_replace(regexp_replace(rtrim(A.chat_agent_text),',','.'),'\0','.')
The issue is resolved now and sqoop export is successful
Observation:
Can't parse input data: '\N' not always related to Null values of a column

Sqoop creating insert statements containing multiple records

we are trying to load the data from sqoop to netezza. And we are facing the following issue.
java.io.IOException: org.netezza.error.NzSQLException: ERROR:
Example Input dataset is as shown below:
1,2,3
1,3,4
sqoop command is as shown below:
sqoop export --table <tablename> --export-dir <path>
--input-fields-terminated-by '\t' --input-lines-terminated-by '\n' --connect
'jdbc:netezza://<host>/<db>' --driver org.netezza.Driver
--username <username> --password <passwrd>
The Sqoop is creating an insert statement in the following way:
insert into (c1,c2,c3) values (1,2,3),(1,3,4).
We are able to load one record but when we try to load the data to multiple records, the error is as said above.
Your help is highly appreciated.
Making sqoop.export.records.per.statement=1 will definitely help but this will make the export process extremely slow if your export record count is very large say "5 Million".
To solve this you need add following things:
1.) A properties file sqoop.properties, it must contain this property jdbc.transaction.isolation=TRANSACTION_READ_UNCOMMITTED (It avoids deadlock during exports)
also in the export command you need to specify this:
--connection-param-file /path/to/sqoop.properties
2.) Also sqoop.export.records.per.statement=100, making this will increase the speed of export.
3.) Third you have to add --batch, Use batch mode for underlying statement execution.
So you final export will look like this,
sqoop export -D sqoop.export.records.per.statement=100 --table <tablename> --export-dir <path>
--input-fields-terminated-by '\t' --input-lines-terminated-by '\n' --connect
'jdbc:netezza://<host>/<db>' --driver org.netezza.Driver
--username <username> --password <passwrd>
--connection-param-file /path/to/sqoop.properties
--batch
Hope this will help.
You can customise the number of rows that will be used in one insert statement with property "sqoop.export.records.per.statement". For example for Netezza you will need to set it to 1:
sqoop export -Dsqoop.export.records.per.statement=1 --connect ...
I would recommend you to also take a look on Apache Sqoop Cookbook where this and many other tips are described.

Encoding columns in Hive

I'm importing a table from mysql to hive using Sqoop. Some columns are latin1 encoded. Is there any way to do either:
Set the encoding for those columns as latin1 in Hive. OR
Convert the columns to utf-8 while importing with sqoop?
In Hive --default-character-set is used to set the character set for whole database not specific to few columns. I was not able to find Sqoop parameter which will convert tables columns to utf-8 in fly rather the columns are expected to set type fixed.
$ sqoop import --connect jdbc:mysql://server.foo.com/db --table bar \
--direct -- --default-character-set=latin1
I believe you would need to convert Latin1 columns to utf-8 first in your MySql and then you can import from Sqoop. You can use the following script to convert the all the columns into utf-8, which I found here.
mysql --database=dbname -B -N -e "SHOW TABLES" | \
awk '{print "ALTER TABLE", $1, "CONVERT TO CHARACTER SET utf8 COLLATE \
utf8_general_ci;"}' | mysql --database=dbname &
Turned out the problem was unrelated. The column works fine regardless of encoding...but the table's schema had changed in mysql. I assumed that since I'm passing in the overwrite flag, sqoop would remake the table every time in Hive. Not so! The schema changes in mysql didn't get transferred to Hive, so the data in the md5 column was actually data from a different column.
The "fix" we settled on was, before every sqoop import check for schema changes, and if there was a change, drop the table and re-import. This forces a schema update in Hive.
Edit: my original sqoop command was something like:
sqoop import --connect jdbc:mysql://HOST:PORT/DB --username USERNAME --password PASSWORD --table uploads --hive-table uploads --hive-import --hive-overwrite --split-by id --num-mappers 8 --hive-drop-import-delims --null-string '\\N' --null-non-string '\\N'
But now I manually issue a drop table uploads to hive first if the schema changes.

Resources