We are currently we are moving data from SAP HANA to Hadoop using sqoop.
SAP HANA tables uses '' character in table name and column names. our reqular sqoop command is working, but it is failing when I use "Split by". Can any one pls help.
code:
/usr/hdp/sqoop/bin/sqoop import \
--connect "jdbc:sap://***-***.**.*****.com:30015" \
--username DFIT_SUPP_USR --password **** \
--driver com.sap.db.jdbc.Driver \
--query "select '\"/BA1/C55LGENT/\"' FROM \"_SYS_BIC\".\"sap.fs.frdp.300.RDL/BV_RDL_ZAFI______Z_SLPD\" where \$CONDITIONS and (\"/BA1/C55LGENT\") IN ('0000000671','0000000615') and (\"/BA1/C55LGENT\" != '0000000022') AND (\"/BIC/ZCINTEIND\" ='01') AND (\"/BA1/IGL_ACCOUNT\") IN ( '0000401077', '0000401035') AND (\"/BA1/C55POSTD\">= '20170101' AND \"/BA1/C55POSTD\" <='20170101')" \
--target-dir /user/arekapalli/pfit_export_test12 \
--delete-target-dir \
--split-by //BA1//C55LGENT// \
-m 10
Below is the error we got..
Caused by: com.sap.db.jdbc.exceptions.JDBCDriverException: SAP DBTech JDBC: [257] (at 12): sql syntax error: incorrect syntax near "/": line 1 col 12 (at pos 12)
your problem is probably here
--query "select '\"/BA1/C55LGENT/\"' FROM \"_SYS_BIC\".\"sap.fs.frdp.300.RDL/BV_RDL_ZAFI______Z_SLPD\" where \$CONDITIONS and (\"/BA1/C55LGENT\") IN ('0000000671','0000000615') and (\"/BA1/C55LGENT\" != '0000000022') AND (\"/BIC/ZCINTEIND\" ='01') AND (\"/BA1/IGL_ACCOUNT\") IN ( '0000401077', '0000401035') AND (\"/BA1/C55POSTD\">= '20170101' AND \"/BA1/C55POSTD\" <='20170101')" \
you are assuming that the "\" is a escape character used from the terminal, that is probabliy wrong. try the following
--query 'select "/BA1/C55LGENT/" FROM "_SYS_BIC"."sap.fs.frdp.300.RDL/BV_RDL_ZAFI______Z_SLPD" where \$CONDITIONS and ("/BA1/C55LGENT") IN ("0000000671","0000000615") and ("/BA1/C55LGENT" != "0000000022") AND ("/BIC/ZCINTEIND" ="01") AND ("/BA1/IGL_ACCOUNT") IN ( "0000401077", "0000401035") AND ("/BA1/C55POSTD">= "20170101" AND "/BA1/C55POSTD" <="20170101")' \
I am not a sap user, so maybe could be something wrong with the query, anyway you can see that I removed all your ' from the query and I used the as delimiter of the query
Related
I created sqoop process which imports data from MS SQL to Hive, but I have a problem with 'char' type fields. Sqoop import code:
sqoop import \
--create-hcatalog-table \
--connect "connection_parameters" \
--username USER \
--driver net.sourceforge.jtds.jdbc.Driver \
--null-string '' \
--null-non-string '' \
--class-name TABLE_X \
--hcatalog-table TABLE_X_TEST \
--hcatalog-database default \
--hcatalog-storage-stanza "stored as orc tblproperties ('orc.compress'='SNAPPY')" \
--map-column-hive "column_1=char(10),column_2=char(35)" \
--num-mappers 1 \
--query "select top 10 "column_1", "column_2" from TABLE_X where \$CONDITIONS" \
--outdir "/tmp"
column_1 which is type char(10) should be NULL if there is no data. But Hive fills the field with 10 spaces.
column_2 which is type char(35) should be NULL too, but there are 35 spaces.
It is huge problem because I cannot run query like this:
select count(*) from TABLE_X_TEST where column_1 is NULL and column_2 is NULL;
But I have to use this one:
select count(*) from TABLE_X_TEST where column_1 = ' ' and column_2 = ' ';
I tried change query parameter and use trim function:
--query "select top 10 rtrim(ltrim("column_1")), rtrim(ltrim("column_2")) from TABLE_X where \$CONDITIONS"
but it does not work, so I suppose it is not a problem with source, but with Hive.
How I can prevent Hive from inserting spaces in empty fields?
You need to change these parameters:
--null-string '\\N' \
--null-non-string '\\N' \
Hive, by default, expects that the NULL value will be encoded using the string constant \N. Sqoop, by default, encodes it using the string constant null. To rectify the mismatch, you’ll need to override Sqoop’s default behavior with Hive’s using parameters --null-string and --null-non-string (this is what you do but with incorrect values). For details, see docs.
I tried without giving the options of null-string and null-non-string for creating orc tables using Sqoop hcatalog, all the nulls in source are reflecting as NULL and I am able to query using is null function.
Let me know if you found any other solution to handle null's.
I need some help with sqoop.
First of all, I'm sorry, my english isn't very good.
Using the folowing command:
sqoop import -D mapreduce.output.fileoutputformat.compress=false --num-mappers 1 --connection-manager "com.quest.oraoop.OraOopConnManager" --connect "jdbc:Oracle:thin:#(DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=TCP)(HOST=myserver)(PORT=1534)))(CONNECT_DATA=(SERVICE_NAME=myservice)))" --username "rodrigo" --password pwd \
--query "SELECT column1, column2 from myTable where \$CONDITIONS" \
--null-string '' --null-non-string '' --fields-terminated-by '|' \
--lines-terminated-by '\n' --as-textfile --target-dir /data/rodrigo/myTable \
--hive-import --hive-partition-key yearmonthday --hive-partition-value '20180101' --hive-overwrite --verbose -P --m 1 --hive-table myTable
My table is already created, because I must create a solicitation for create a table in my hive database, so I can't create dinamically inside sqoop command.
I have permission to create the directory in hdfs.
When I remove the directory, sqoop logs an error saying that I have no create table permissions, and when I already create the diretory, it returns a FileAlreadyExistsException.
What can I do to solve that?
Thanks from Brazil.
As a beginner in Hadoop field, i was trying my hands on Sqoop tool (Version : Sqoop 1.4.6-cdh5.8.0).
Though i referred to various sites and forums but i could not get workable solution where in i could import data using any other delimiter other than ,.
PFB the code that i have used :
--- Connecting to MySql, creating table and records with , in string.
mysql> create database GRHadoop;
Query OK, 1 row affected (0.00 sec)
mysql> use GRHadoop;
Database changed
mysql> Create table sitecustomer(Customerid int(10), Customername varchar(100),Productid int(4),Salary int(20));
Query OK, 0 rows affected (0.22 sec)
mysql> Insert into sitecustomer values(1,'Sohail',100,50000),(2,'Reshma',200,80000),(3,'Tom',200,60000);
Query OK, 3 rows affected (0.06 sec)
Records: 3 Duplicates: 0 Warnings: 0
mysql> Insert into sitecustomer values(4,'Su,kama',300,50000),(5,'Ram,bha',100,80000),(6,'Suz',200,60000);
Query OK, 3 rows affected (0.03 sec)
Records: 3 Duplicates: 0 Warnings: 0
Sqoop Command :
sqoop import \
--connect jdbc:mysql://127.0.0.1:3306/GRHadoop \
--username root \
--password cloudera \
--table sitecustomer \
--input-fields-terminated-by '|' \
--lines-terminated-by "\n" \
--target-dir /user/cloudera/GR/Sqoop/sitecustomer_data \
--m 1;
Expected Output :
1|Sohail|100|50000
2|Reshma|200|80000
3|Tom|200|60000
4|Su,kama|300|50000
5|Ram,bha|100|80000
6|Suz|200|60000
Actual output :
1,Sohail,100,50000
2,Reshma,200,80000
3,Tom,200,60000
4,Su,kama,300,50000
5,Ram,bha,100,80000
6,Suz,200,60000
Please guide where i am getting it wrong.
The --input-fields-terminated-by argument is to tell Sqoop how to parse the input files during export. You should be using --fields-terminated-by, this argument controls how the output is formatted.
sqoop import \
--connect jdbc:mysql://127.0.0.1:3306/GRHadoop \
--username root \
--password cloudera \
--table sitecustomer \
--fields-terminated-by '|' \
--lines-terminated-by "\n" \
--target-dir /user/cloudera/GR/Sqoop/sitecustomer_data \
--m 1;
Sqoop version: 1.4.6.2.3.4.0-3485
I have been trying to import data using sqoop using the following command:
sqoop import -libjars /usr/local/bfm/lib/java/jConnect-6/6.0.0/jconn3-6.0.0.jar --connect jdbc:sybase:db --username user --password 'pwd' --driver com.sybase.jdbc3.jdbc.SybDriver --query 'SELECT a.* from table1 a,table2 b where b.run_group=a.run_group and a.date<"7/22/2016" AND $CONDITIONS' --target-dir /user/user/a/ --verbose --hive-import --hive-table default.temp_a --split-by id
I get the following error:
Invalid column name '7/22/2016'
I have tried enclosing the query in double quotes, but then it says:
CONDITIONS: Undefined variable.
Tried several combinations of single/double quotes and escaping $CONDITIONS and using a --where switch as well.
PS: The conditions are non numeric. (It works for cases like where x<10 or so, but not in case where it's a string or date)
In your command --split-by=id should be --split-by=a.id, I would use join instead of adding extra where condition, also I would convert date to (specified string value) VARCHR (using sybase specific function)
sqoop import -libjars /usr/local/bfm/lib/java/jConnect-6/6.0.0/jconn3-6.0.0.jar \
--connect jdbc:sybase:db \
--username user \
--password 'pwd' \
--driver com.sybase.jdbc3.jdbc.SybDriver \
--query "SELECT a.* from table1 a join table2 b on a.id=b.id where a.run_group=b.run_group and convert(varchar, a.date, 101) < '7/22/2016' AND \$CONDITIONS" \
--target-dir /user/user/a/ \
--verbose \
--hive-import \
--hive-table default.temp_a \
--split-by a.id
A workaround that can be used: -options-file
Copy the query in your options file and use the switch.
The options file might be as:
--query
select * \
from table t1 \
where t1.field="text" \
and t1.value="value" \
and $CONDITIONS
Note: Not sure if it was a particular version issue or not but --query directly in the command just refused to work with $CONDITIONS. (Yes, I tried escaping it with \ and several other combinations of quotations)
Can we use query in sqoop options file like i used query with special characters when i ran the query its giving an error incorrect syntax near "\" . Should I use any escape character in the properties file ?
In option file I have mentioned query and using in sqoop import command.
Propertiesfile:
--query "select top 10 source_system,company_code,gl_document,***************negative_posting_flag, to_number(to_varchar(to_date(create_tmstmp),'yyyymm')) as part_date from c_fin_a.gl_transaction_data where to_number(to_varchar(to_date(create_tmstmp),'yyyymm'))=201602 and \$CONDITIONS"
sqoop import command
sudo sqoop import \
--options-file /home/emaarae/sqoop_shell/sqoop_hdfs.properties \
--append \
--null-string '' \
--null-non-string '' \
--fields-terminated-by '\001' \
--lines-terminated-by '\n' \
--m 15