I have a hive table suppose student( id, name , date) and in mysql I have a table named student(id, name) , using sqoop I'm importing data incrementally and I want the system date to be added while I'm importing , how can I achieve that ?
Sqoop Query :
sqoop import --connect jdbc:mysql:dbName --username userName --password pass --m mapperNo --query 'select id, name from syudent WHERE $CONDITIONS' --target-dir outputPath --append --check-column id --incremental append --last-value last_Value
and I'm writing this entire script inside a shell script and taking the current date using shell command and want to pass it inside the sqoop query so that while importing it addes the current date .
you can pass thru variable,
eg:
cur_date=$(date +%d/%m/%y)
sqoop import --connect jdbc:mysql:dbName --username userName --password pass --m mapperNo --query "select id, name, '$cur_date' from syudent WHERE $CONDITIONS" --target-dir outputPath --append --check-column id --incremental append --last-value last_Value
let me know if this works.
Related
I have to import data using sqoop, my source column names are having spaces in between them, so while I am adding it in --map-column-java parameter getting the error.
Sample Sqoop import:
sqoop import --connect jdbc-con --username "user1" --query "select * from table where \$CONDITIONS" --target-dir /target/path/ -m 1 --map-column-java data col1=String, data col2=String, data col3=String --as-avrodatafile
Column names:
data col1,
data col2,
data col3
Error:
19/03/07 07:31:55 DEBUG sqoop.Sqoop: Malformed mapping. Column mapping should be the form key=value[,key=value]*
java.lang.IllegalArgumentException: Malformed mapping. Column mapping should be the form key=value[,key=value]*
at org.apache.sqoop.SqoopOptions.parseColumnMapping(SqoopOptions.java:1355)
at org.apache.sqoop.SqoopOptions.setMapColumnJava(SqoopOptions.java:1375)
at org.apache.sqoop.tool.BaseSqoopTool.applyCodeGenOptions(BaseSqoopTool.java:1363)
at org.apache.sqoop.tool.ImportTool.applyOptions(ImportTool.java:1011)
at org.apache.sqoop.tool.SqoopTool.parseArguments(SqoopTool.java:435)
at org.apache.sqoop.Sqoop.run(Sqoop.java:135)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:183)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:234)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:243)
at org.apache.sqoop.Sqoop.main(Sqoop.java:252)
Malformed mapping. Column mapping should be the form key=value[,key=value]*
Able to resolve this issue:
1. Spaces issue:
sqoop import --connect jdbc-con --username "user1" --query "select * from table where \$CONDITIONS" --target-dir /target/path/ -m 1 --map-column-java "data col1=String, data col2=String, data col3=String" --as-avrodatafile
2. ERROR tool.ImportTool: Import failed: Cannot convert SQL type 2005:
3 columns in source are having 2005 and nvarchar added them in --map-column-java resolved this issue
3. org.apache.avro.file.DataFileWriter$AppendWriteException: org.apache.avro.UnresolvedUnionException: Not in union ["null","long"]: 1****
This is causing due to using * in select query, so modified sqoop query as:
sqoop import --connect jdbc-con --username "user1" --query "select [col1,data col2,data col3] from table where \$CONDITIONS" --target-dir /target/path/ -m 1 --map-column-java "data col1=String, data col2=String, data col3=String" --as-avrodatafile
Instead of using you can use this one method
I have used it and it works
here I am casting the columns to string so that timestamp could not change to int
keep note of that point It will help you to make your string properly
address = <localhost/server-ip-address/>
port = put your database port number
Sqoop is expecting the comma-separated list of mapping in form 'name of column'='new type'
columns-name = give your database column name of timestamp or date time to date
database-name = give your datbase name
database-user-name = put your user name
password = put your password
demo to understand the code properly
sqoop import --map-column-java "columns-name=String" --connect jdbc:postgresql://address:port/database-name --username user-name --password database-password --query "select * from demo where \$CONDITIONS;" -m 1 --target-dir /jdbc/star --as-parquetfile --enclosed-by '\"'
demo of code for single-column
sqoop import --map-column-java "date_of_birth=String" --connect jdbc:postgresql://192.168.0.1:1928/alpha --username postgres --password mysecretpass --query "select * from demo where \$CONDITIONS;" -m 1 --target-dir /jdbc/star --as-parquetfile --enclosed-by '\"'
demo of code for dealing with multiple columns
sqoop import --map-column-java "date_of_birth=String,create_date=String" --connect jdbc:postgresql://192.168.0.1:1928/alpha --username postgres --password mysecretpass --query "select * from demo where \$CONDITIONS;" -m 1 --target-dir /jdbc/star --as-parquetfile --enclosed-by '\"'
I have a sqoop query like this.
sqoop import -Ddb2.jcc.sslConnection=true --connect jdbc:db2://192.1.1.2:6060/DB2M --username ${username} --password $password --query "
SELECT ACCOUNT_DATE,DIV_VALUE,from ${qualifier}.DTL where year = '${year}' AND SUBSTR(LOSS_TRAN,1,1) NOT IN ('1','9') and \$CONDITIONS " -m 500 --split-by "DIV_VALUE" --fields-terminated-by '|' --target-dir s3://test${env}/${year}
The split by command is throwing an exception. I am not able to pass string into Split by function.Any help would be appreciated.
Split, by default look for the integer column. If you want to perform splitting using string column, you need to enable property:
-Dorg.apache.sqoop.splitter.allow_text_splitter=true in your Sqoop command.
I want to import a table with date format from oracle database by Sqoop to Hive. I don't understand why imported fields in Hive are string and not timestamp (like Oracle). I prefer to do not use map-colum-hive because i want to do that automaticly for every date format for each table.
This is my Sqoop request :
sqoop import -D mapDateToTimestamp=true --direct --connect
jdbc:oracle:thin:#//$connection --username $username -password
$password -m 7 --table $table1 --hive-import --hive-overwrite
--hive-database $database1 --hive-table $table2 --where "$date>to_date('$dd/mm/yyyy hh:mm:ss','dd/mm/yyyy hh24:mi:ss')"
--null-string '\N' --null-non-string '\N' --target-dir $targetdirectory -- --schema $schema
I tried it with mapDateToTimestamp = True and oracle.jdbc.mapDateToTimestamp = True both, data are imported but data with date format are in string.
Do you have a solution ? suggestions ? advices ?
Thanks a lot !
Im working on a sqoop import with the following command:
#!/bin/bash
while IFS=":" read -r server dbname table; do
sqoop eval --connect jdbc:mysql://$server/$dbname --username root --password cloudera --table mydata --hive-import --hive-table dynpart --check-column id --last-value $(hive -e "select max(id) from dynpart"); --hive-partition-key 'thisday' --hive-partition-value '01-01-2016'
done<tables.txt
Im doing the partition for everyday.
Hive table:
create table dynpart(id int, name char(30), city char(30))
partitioned by(thisday char(10))
row format delimited
fields terminated by ','
stored as textfile
location '/hive/mytables'
tblproperties("comment"="partition column: thisday structure is dd-mm-yyyy");
But I don't want to give the partition value directly as I want to create a sqoop job and run it everyday. In the script, how can I pass the date value to sqoop command dynamically (format: dd/mm/yyyy) instead of giving it directly ?
Any help is appreciated.
you can use the shell command date to get it (ubuntu 14.04):
$ date +%d/%m/%Y
22/03/2017
You Can try the below Code
#!/bin/bash
DATE=$(date +"%d-%m-%y")
while IFS=":" read -r server dbname table; do
sqoop eval --connect jdbc:mysql://$server/$dbname --username root --password cloudera --table mydata --hive-import --hive-table dynpart --check-column id --last-value $(hive -e "select max(id) from dynpart"); --hive-partition-key 'thisday' --hive-partition-value $DATE
done<tables.txt
Hope this Helps
Consider a table departments with below data-
ID -1,2,3,8000
Name- A,B,C,D
I imported data into HDFS using sqoop
Added 2 new rows with ID 4 and 5 into MySQL
Performed incremental import with last value as 3 and mode=append
Data imported has two rows for 8000 ID as the condition used is department_id>3
How can I tweak the below command to make sure duplicate rows are created.
sqoop import
--connect "jdbc:mysql://quickstart.cloudera:3306/retail_db"
--username=retail_dba
--password=cloudera
--table departments
--target-dir/user/cloudera/dep1
--append
--check-column "department_id"
--incremental append
--last-value 3
You can not tweak this command.
--incremental append is for appending new data with --check-column > -last-value.
For your usecase you should use --incremental lastmodified.
--check-column should be of date, time, datetime and timestamp data types.
If you added new records after --last-value, it will fetch all the records (new or updated)
Sample command:
sqoop import
--connect "jdbc:mysql://quickstart.cloudera:3306/retail_db" \
--username=retail_dba \
--password=cloudera \
--table departments \
--target-dir/user/cloudera/dep1 \
--incremental lastmodified \
--check-column last_update_date \
--last-value "2015-10-20 06:00:01"
All the records added after "2015-10-20 06:00:01" will be imported.
Check sqoop documentation for more details.