query with special charaters in sqoop - sqoop

Can we use query in sqoop options file like i used query with special characters when i ran the query its giving an error incorrect syntax near "\" . Should I use any escape character in the properties file ?
In option file I have mentioned query and using in sqoop import command.
Propertiesfile:
--query "select top 10 source_system,company_code,gl_document,***************negative_posting_flag, to_number(to_varchar(to_date(create_tmstmp),'yyyymm')) as part_date from c_fin_a.gl_transaction_data where to_number(to_varchar(to_date(create_tmstmp),'yyyymm'))=201602 and \$CONDITIONS"
sqoop import command
sudo sqoop import \
--options-file /home/emaarae/sqoop_shell/sqoop_hdfs.properties \
--append \
--null-string '' \
--null-non-string '' \
--fields-terminated-by '\001' \
--lines-terminated-by '\n' \
--m 15

Related

Sqoop - FileAlreadyExists exception

I need some help with sqoop.
First of all, I'm sorry, my english isn't very good.
Using the folowing command:
sqoop import -D mapreduce.output.fileoutputformat.compress=false --num-mappers 1 --connection-manager "com.quest.oraoop.OraOopConnManager" --connect "jdbc:Oracle:thin:#(DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=TCP)(HOST=myserver)(PORT=1534)))(CONNECT_DATA=(SERVICE_NAME=myservice)))" --username "rodrigo" --password pwd \
--query "SELECT column1, column2 from myTable where \$CONDITIONS" \
--null-string '' --null-non-string '' --fields-terminated-by '|' \
--lines-terminated-by '\n' --as-textfile --target-dir /data/rodrigo/myTable \
--hive-import --hive-partition-key yearmonthday --hive-partition-value '20180101' --hive-overwrite --verbose -P --m 1 --hive-table myTable
My table is already created, because I must create a solicitation for create a table in my hive database, so I can't create dinamically inside sqoop command.
I have permission to create the directory in hdfs.
When I remove the directory, sqoop logs an error saying that I have no create table permissions, and when I already create the diretory, it returns a FileAlreadyExistsException.
What can I do to solve that?
Thanks from Brazil.

Sqoop export having duplicate entries in to table without primary key

I have a table department_id,department_name,LastModifieddate;
when i run the command like below
sqoop export \
--connect "jdbc:mysql://ip-172-31-13-154:3306/sqoopex" \
--username sqoopuser \
--password NHkkP876rp \
--table dep_prasad \
--input-fields-terminated-by '|' \
--input-lines-terminated-by '\n' \
--export-dir /user/venkateswarlujvs2821/dep_prasad/ \
--num-mappers 2 \
--outdir /user/venkateswarlujvs2821/dep_prasad
It works fine and is inserting records
when i modify the file which is there in HDFS and adding some more records
and when i try to export it .It is inserting duplicate entries in to my sql
i am using the following sqoop command for the second time.
sqoop export \
--connect "jdbc:mysql://ip-172-31-13-154:3306/sqoopex" \
--username sqoopuser \
--password NHkkP876rp \
--table dep_prasad \
--input-fields-terminated-by '|' \
--input-lines-terminated-by '\n' \
--update-key department_id \
--update-mode allowinsert \
--export-dir /user/venkateswarlujvs2821/dep_prasad/ \
--num-mappers 2 \
--outdir /user/venkateswarlujvs2821/dep_prasad
Note:My table DO NOT HAVE PRIMARY KEY
I want to update only the new records how can i do that?

sqoop issue while importing data from SAP HANA

We are currently we are moving data from SAP HANA to Hadoop using sqoop.
SAP HANA tables uses '' character in table name and column names. our reqular sqoop command is working, but it is failing when I use "Split by". Can any one pls help.
code:
/usr/hdp/sqoop/bin/sqoop import \
--connect "jdbc:sap://***-***.**.*****.com:30015" \
--username DFIT_SUPP_USR --password **** \
--driver com.sap.db.jdbc.Driver \
--query "select '\"/BA1/C55LGENT/\"' FROM \"_SYS_BIC\".\"sap.fs.frdp.300.RDL/BV_RDL_ZAFI______Z_SLPD\" where \$CONDITIONS and (\"/BA1/C55LGENT\") IN ('0000000671','0000000615') and (\"/BA1/C55LGENT\" != '0000000022') AND (\"/BIC/ZCINTEIND\" ='01') AND (\"/BA1/IGL_ACCOUNT\") IN ( '0000401077', '0000401035') AND (\"/BA1/C55POSTD\">= '20170101' AND \"/BA1/C55POSTD\" <='20170101')" \
--target-dir /user/arekapalli/pfit_export_test12 \
--delete-target-dir \
--split-by //BA1//C55LGENT// \
-m 10
Below is the error we got..
Caused by: com.sap.db.jdbc.exceptions.JDBCDriverException: SAP DBTech JDBC: [257] (at 12): sql syntax error: incorrect syntax near "/": line 1 col 12 (at pos 12)
your problem is probably here
--query "select '\"/BA1/C55LGENT/\"' FROM \"_SYS_BIC\".\"sap.fs.frdp.300.RDL/BV_RDL_ZAFI______Z_SLPD\" where \$CONDITIONS and (\"/BA1/C55LGENT\") IN ('0000000671','0000000615') and (\"/BA1/C55LGENT\" != '0000000022') AND (\"/BIC/ZCINTEIND\" ='01') AND (\"/BA1/IGL_ACCOUNT\") IN ( '0000401077', '0000401035') AND (\"/BA1/C55POSTD\">= '20170101' AND \"/BA1/C55POSTD\" <='20170101')" \
you are assuming that the "\" is a escape character used from the terminal, that is probabliy wrong. try the following
--query 'select "/BA1/C55LGENT/" FROM "_SYS_BIC"."sap.fs.frdp.300.RDL/BV_RDL_ZAFI______Z_SLPD" where \$CONDITIONS and ("/BA1/C55LGENT") IN ("0000000671","0000000615") and ("/BA1/C55LGENT" != "0000000022") AND ("/BIC/ZCINTEIND" ="01") AND ("/BA1/IGL_ACCOUNT") IN ( "0000401077", "0000401035") AND ("/BA1/C55POSTD">= "20170101" AND "/BA1/C55POSTD" <="20170101")' \
I am not a sap user, so maybe could be something wrong with the query, anyway you can see that I removed all your ' from the query and I used the as delimiter of the query

sqoop hive import with partitions

I have some sqoop jobs importing into hive that I want to partition, but I can't get it to function. The import will actually work: the table is sqooped, it's visible in hive, there's data but the partition parameters I'm expecting to see don't appear when I describe the table. I HAVE sqooped this table as a csv, created an external parquet table, and inserted the data into that (which works), but I want to be able to avoid the extra steps if possible. here's my current code. Am I missing something or am I trying to do the impossible? thanks!
sqoop import -Doraoop.import.hint=" " \
--options-file /home/[user]/pass.txt \
--verbose \
--connect jdbc:oracle:thin:#ldap://oid:389/cn=OracleContext,dc=[employer],dc=com/SQSOP051 \
--username [user]\
--num-mappers 10 \
--hive-import \
--query "select DISC_PROF_SK_ID, CLM_RT_DISC_IND, EASY_PAY_PLN_DISC_IND, TO_CHAR(L40_ATOMIC_TS,'YYYY') as YEAR, TO_CHAR(L40_ATOMIC_TS,'MM') as MONTH from ${DataSource[index]}.$TableName where \$CONDITIONS" \
--hive-database [dru_user] \
--hcatalog-partition-keys YEAR \
--hcatalog-partition-values '2015' \
--target-dir hdfs://nameservice1/data/res/warehouse/finance/[dru_user]/Claims_Data/$TableName \
--hive-table $TableName'testing' \
--split-by ${SplitBy[index]} \
--delete-target-dir \
--direct \
--null-string '\\N' \
--null-non-string '\\N' \
--as-parquetfile \
You can replace the options-file by --password-file. However that will not solve the partition problem. For the partition problem you can try creating the partition-ed table $TableName partitioned first before the import.
sqoop import -Doraoop.import.hint=" " \
--password-file /home/[user]/pass.txt \
--verbose \
--connect jdbc:oracle:thin:#ldap://oid:389/cn=OracleContext,dc=[employer],dc=com/SQSOP051 \
--username [user] \
--num-mappers 10 \
--hive-import \
--query "SELECT disc_prof_sk_id,
clm_rt_disc_ind,
easy_pay_pln_disc_ind,
To_char(l40_atomic_ts,'YYYY') AS year,
To_char(l40_atomic_ts,'MM') AS month
FROM ${DataSource[index]}.$tablename
WHERE \$conditions" \
--hcatalog-database [dru_user] \
--hcatalog-partition-key YEAR \
--hcatalog-partition-values '2015' \
--target-dir hdfs://nameservice1/data/res/warehouse/finance/[dru_user]/Claims_Data/$TableName \
--hcatalog-table $TableName \
--split-by ${SplitBy[index]} \
--delete-target-dir \
--direct \
--null-string '\\N' \
--null-non-string '\\N' \
--as-parquetfile

How to specify multiple conditions in sqoop?

Sqoop version: 1.4.6.2.3.4.0-3485
I have been trying to import data using sqoop using the following command:
sqoop import -libjars /usr/local/bfm/lib/java/jConnect-6/6.0.0/jconn3-6.0.0.jar --connect jdbc:sybase:db --username user --password 'pwd' --driver com.sybase.jdbc3.jdbc.SybDriver --query 'SELECT a.* from table1 a,table2 b where b.run_group=a.run_group and a.date<"7/22/2016" AND $CONDITIONS' --target-dir /user/user/a/ --verbose --hive-import --hive-table default.temp_a --split-by id
I get the following error:
Invalid column name '7/22/2016'
I have tried enclosing the query in double quotes, but then it says:
CONDITIONS: Undefined variable.
Tried several combinations of single/double quotes and escaping $CONDITIONS and using a --where switch as well.
PS: The conditions are non numeric. (It works for cases like where x<10 or so, but not in case where it's a string or date)
In your command --split-by=id should be --split-by=a.id, I would use join instead of adding extra where condition, also I would convert date to (specified string value) VARCHR (using sybase specific function)
sqoop import -libjars /usr/local/bfm/lib/java/jConnect-6/6.0.0/jconn3-6.0.0.jar \
--connect jdbc:sybase:db \
--username user \
--password 'pwd' \
--driver com.sybase.jdbc3.jdbc.SybDriver \
--query "SELECT a.* from table1 a join table2 b on a.id=b.id where a.run_group=b.run_group and convert(varchar, a.date, 101) < '7/22/2016' AND \$CONDITIONS" \
--target-dir /user/user/a/ \
--verbose \
--hive-import \
--hive-table default.temp_a \
--split-by a.id
A workaround that can be used: -options-file
Copy the query in your options file and use the switch.
The options file might be as:
--query
select * \
from table t1 \
where t1.field="text" \
and t1.value="value" \
and $CONDITIONS
Note: Not sure if it was a particular version issue or not but --query directly in the command just refused to work with $CONDITIONS. (Yes, I tried escaping it with \ and several other combinations of quotations)

Resources