how to pass string into split by condition in sqoop - sqoop

I have a sqoop query like this.
sqoop import -Ddb2.jcc.sslConnection=true --connect jdbc:db2://192.1.1.2:6060/DB2M --username ${username} --password $password --query "
SELECT ACCOUNT_DATE,DIV_VALUE,from ${qualifier}.DTL where year = '${year}' AND SUBSTR(LOSS_TRAN,1,1) NOT IN ('1','9') and \$CONDITIONS " -m 500 --split-by "DIV_VALUE" --fields-terminated-by '|' --target-dir s3://test${env}/${year}
The split by command is throwing an exception. I am not able to pass string into Split by function.Any help would be appreciated.

Split, by default look for the integer column. If you want to perform splitting using string column, you need to enable property:
-Dorg.apache.sqoop.splitter.allow_text_splitter=true in your Sqoop command.

Related

How to pass column names having spaces to sqoop --map-column-java

I have to import data using sqoop, my source column names are having spaces in between them, so while I am adding it in --map-column-java parameter getting the error.
Sample Sqoop import:
sqoop import --connect jdbc-con --username "user1" --query "select * from table where \$CONDITIONS" --target-dir /target/path/ -m 1 --map-column-java data col1=String, data col2=String, data col3=String --as-avrodatafile
Column names:
data col1,
data col2,
data col3
Error:
19/03/07 07:31:55 DEBUG sqoop.Sqoop: Malformed mapping. Column mapping should be the form key=value[,key=value]*
java.lang.IllegalArgumentException: Malformed mapping. Column mapping should be the form key=value[,key=value]*
at org.apache.sqoop.SqoopOptions.parseColumnMapping(SqoopOptions.java:1355)
at org.apache.sqoop.SqoopOptions.setMapColumnJava(SqoopOptions.java:1375)
at org.apache.sqoop.tool.BaseSqoopTool.applyCodeGenOptions(BaseSqoopTool.java:1363)
at org.apache.sqoop.tool.ImportTool.applyOptions(ImportTool.java:1011)
at org.apache.sqoop.tool.SqoopTool.parseArguments(SqoopTool.java:435)
at org.apache.sqoop.Sqoop.run(Sqoop.java:135)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:183)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:234)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:243)
at org.apache.sqoop.Sqoop.main(Sqoop.java:252)
Malformed mapping. Column mapping should be the form key=value[,key=value]*
Able to resolve this issue:
1. Spaces issue:
sqoop import --connect jdbc-con --username "user1" --query "select * from table where \$CONDITIONS" --target-dir /target/path/ -m 1 --map-column-java "data col1=String, data col2=String, data col3=String" --as-avrodatafile
2. ERROR tool.ImportTool: Import failed: Cannot convert SQL type 2005:
3 columns in source are having 2005 and nvarchar added them in --map-column-java resolved this issue
3. org.apache.avro.file.DataFileWriter$AppendWriteException: org.apache.avro.UnresolvedUnionException: Not in union ["null","long"]: 1****
This is causing due to using * in select query, so modified sqoop query as:
sqoop import --connect jdbc-con --username "user1" --query "select [col1,data col2,data col3] from table where \$CONDITIONS" --target-dir /target/path/ -m 1 --map-column-java "data col1=String, data col2=String, data col3=String" --as-avrodatafile
Instead of using you can use this one method
I have used it and it works
here I am casting the columns to string so that timestamp could not change to int
keep note of that point It will help you to make your string properly
address = <localhost/server-ip-address/>
port = put your database port number
Sqoop is expecting the comma-separated list of mapping in form 'name of column'='new type'
columns-name = give your database column name of timestamp or date time to date
database-name = give your datbase name
database-user-name = put your user name
password = put your password
demo to understand the code properly
sqoop import --map-column-java "columns-name=String" --connect jdbc:postgresql://address:port/database-name --username user-name --password database-password --query "select * from demo where \$CONDITIONS;" -m 1 --target-dir /jdbc/star --as-parquetfile --enclosed-by '\"'
demo of code for single-column
sqoop import --map-column-java "date_of_birth=String" --connect jdbc:postgresql://192.168.0.1:1928/alpha --username postgres --password mysecretpass --query "select * from demo where \$CONDITIONS;" -m 1 --target-dir /jdbc/star --as-parquetfile --enclosed-by '\"'
demo of code for dealing with multiple columns
sqoop import --map-column-java "date_of_birth=String,create_date=String" --connect jdbc:postgresql://192.168.0.1:1928/alpha --username postgres --password mysecretpass --query "select * from demo where \$CONDITIONS;" -m 1 --target-dir /jdbc/star --as-parquetfile --enclosed-by '\"'

Importing data from Oracle by Sqoop to Hive with date format in Oracle

I want to import a table with date format from oracle database by Sqoop to Hive. I don't understand why imported fields in Hive are string and not timestamp (like Oracle). I prefer to do not use map-colum-hive because i want to do that automaticly for every date format for each table.
This is my Sqoop request :
sqoop import -D mapDateToTimestamp=true --direct --connect
jdbc:oracle:thin:#//$connection --username $username -password
$password -m 7 --table $table1 --hive-import --hive-overwrite
--hive-database $database1 --hive-table $table2 --where "$date>to_date('$dd/mm/yyyy hh:mm:ss','dd/mm/yyyy hh24:mi:ss')"
--null-string '\N' --null-non-string '\N' --target-dir $targetdirectory -- --schema $schema
I tried it with mapDateToTimestamp = True and oracle.jdbc.mapDateToTimestamp = True both, data are imported but data with date format are in string.
Do you have a solution ? suggestions ? advices ?
Thanks a lot !

How to pass date into shell script for a sqoop command dynamically?

Im working on a sqoop import with the following command:
#!/bin/bash
while IFS=":" read -r server dbname table; do
sqoop eval --connect jdbc:mysql://$server/$dbname --username root --password cloudera --table mydata --hive-import --hive-table dynpart --check-column id --last-value $(hive -e "select max(id) from dynpart"); --hive-partition-key 'thisday' --hive-partition-value '01-01-2016'
done<tables.txt
Im doing the partition for everyday.
Hive table:
create table dynpart(id int, name char(30), city char(30))
partitioned by(thisday char(10))
row format delimited
fields terminated by ','
stored as textfile
location '/hive/mytables'
tblproperties("comment"="partition column: thisday structure is dd-mm-yyyy");
But I don't want to give the partition value directly as I want to create a sqoop job and run it everyday. In the script, how can I pass the date value to sqoop command dynamically (format: dd/mm/yyyy) instead of giving it directly ?
Any help is appreciated.
you can use the shell command date to get it (ubuntu 14.04):
$ date +%d/%m/%Y
22/03/2017
You Can try the below Code
#!/bin/bash
DATE=$(date +"%d-%m-%y")
while IFS=":" read -r server dbname table; do
sqoop eval --connect jdbc:mysql://$server/$dbname --username root --password cloudera --table mydata --hive-import --hive-table dynpart --check-column id --last-value $(hive -e "select max(id) from dynpart"); --hive-partition-key 'thisday' --hive-partition-value $DATE
done<tables.txt
Hope this Helps

Passing System date to Sqoop

I have a hive table suppose student( id, name , date) and in mysql I have a table named student(id, name) , using sqoop I'm importing data incrementally and I want the system date to be added while I'm importing , how can I achieve that ?
Sqoop Query :
sqoop import --connect jdbc:mysql:dbName --username userName --password pass --m mapperNo --query 'select id, name from syudent WHERE $CONDITIONS' --target-dir outputPath --append --check-column id --incremental append --last-value last_Value
and I'm writing this entire script inside a shell script and taking the current date using shell command and want to pass it inside the sqoop query so that while importing it addes the current date .
you can pass thru variable,
eg:
cur_date=$(date +%d/%m/%y)
sqoop import --connect jdbc:mysql:dbName --username userName --password pass --m mapperNo --query "select id, name, '$cur_date' from syudent WHERE $CONDITIONS" --target-dir outputPath --append --check-column id --incremental append --last-value last_Value
let me know if this works.

How to specify multiple conditions in sqoop?

Sqoop version: 1.4.6.2.3.4.0-3485
I have been trying to import data using sqoop using the following command:
sqoop import -libjars /usr/local/bfm/lib/java/jConnect-6/6.0.0/jconn3-6.0.0.jar --connect jdbc:sybase:db --username user --password 'pwd' --driver com.sybase.jdbc3.jdbc.SybDriver --query 'SELECT a.* from table1 a,table2 b where b.run_group=a.run_group and a.date<"7/22/2016" AND $CONDITIONS' --target-dir /user/user/a/ --verbose --hive-import --hive-table default.temp_a --split-by id
I get the following error:
Invalid column name '7/22/2016'
I have tried enclosing the query in double quotes, but then it says:
CONDITIONS: Undefined variable.
Tried several combinations of single/double quotes and escaping $CONDITIONS and using a --where switch as well.
PS: The conditions are non numeric. (It works for cases like where x<10 or so, but not in case where it's a string or date)
In your command --split-by=id should be --split-by=a.id, I would use join instead of adding extra where condition, also I would convert date to (specified string value) VARCHR (using sybase specific function)
sqoop import -libjars /usr/local/bfm/lib/java/jConnect-6/6.0.0/jconn3-6.0.0.jar \
--connect jdbc:sybase:db \
--username user \
--password 'pwd' \
--driver com.sybase.jdbc3.jdbc.SybDriver \
--query "SELECT a.* from table1 a join table2 b on a.id=b.id where a.run_group=b.run_group and convert(varchar, a.date, 101) < '7/22/2016' AND \$CONDITIONS" \
--target-dir /user/user/a/ \
--verbose \
--hive-import \
--hive-table default.temp_a \
--split-by a.id
A workaround that can be used: -options-file
Copy the query in your options file and use the switch.
The options file might be as:
--query
select * \
from table t1 \
where t1.field="text" \
and t1.value="value" \
and $CONDITIONS
Note: Not sure if it was a particular version issue or not but --query directly in the command just refused to work with $CONDITIONS. (Yes, I tried escaping it with \ and several other combinations of quotations)

Resources