Sqoop with duplicate column name - hadoop

I wrote a sqoop with duplicate column name (have alias) but it threw me an error message "Duplicate Column identifier specified: 'id'". I modified sqoop to have concat function and now it gives me an error "Hive does not support the SQL type for column a"
sqoop import \
--connect jdbc:mysql://foo.test.net/mfg \
--username pingp \
--password 987yjd \
--hive-import \
--hive-table third_map \
--query "select concat(r.id,'') a, concat(p.id,'') b from tblDimMfg r join tblDimMfg p on r.id = p.id where r.Name = 'bbp' and p.Name = 'bbt' and \$CONDITIONS" \
--target-dir /user/test/hivehome/mysql/third_map \
--fields-terminated-by '\t' \
--hive-drop-import-delims \
-m 1
Any suggestion?
Thank you,
Rio

The resolution is the create a sub-select where the duplicate column names are then it works.

Related

Sqoop - FileAlreadyExists exception

I need some help with sqoop.
First of all, I'm sorry, my english isn't very good.
Using the folowing command:
sqoop import -D mapreduce.output.fileoutputformat.compress=false --num-mappers 1 --connection-manager "com.quest.oraoop.OraOopConnManager" --connect "jdbc:Oracle:thin:#(DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=TCP)(HOST=myserver)(PORT=1534)))(CONNECT_DATA=(SERVICE_NAME=myservice)))" --username "rodrigo" --password pwd \
--query "SELECT column1, column2 from myTable where \$CONDITIONS" \
--null-string '' --null-non-string '' --fields-terminated-by '|' \
--lines-terminated-by '\n' --as-textfile --target-dir /data/rodrigo/myTable \
--hive-import --hive-partition-key yearmonthday --hive-partition-value '20180101' --hive-overwrite --verbose -P --m 1 --hive-table myTable
My table is already created, because I must create a solicitation for create a table in my hive database, so I can't create dinamically inside sqoop command.
I have permission to create the directory in hdfs.
When I remove the directory, sqoop logs an error saying that I have no create table permissions, and when I already create the diretory, it returns a FileAlreadyExistsException.
What can I do to solve that?
Thanks from Brazil.

sqoop issue while importing data from SAP HANA

We are currently we are moving data from SAP HANA to Hadoop using sqoop.
SAP HANA tables uses '' character in table name and column names. our reqular sqoop command is working, but it is failing when I use "Split by". Can any one pls help.
code:
/usr/hdp/sqoop/bin/sqoop import \
--connect "jdbc:sap://***-***.**.*****.com:30015" \
--username DFIT_SUPP_USR --password **** \
--driver com.sap.db.jdbc.Driver \
--query "select '\"/BA1/C55LGENT/\"' FROM \"_SYS_BIC\".\"sap.fs.frdp.300.RDL/BV_RDL_ZAFI______Z_SLPD\" where \$CONDITIONS and (\"/BA1/C55LGENT\") IN ('0000000671','0000000615') and (\"/BA1/C55LGENT\" != '0000000022') AND (\"/BIC/ZCINTEIND\" ='01') AND (\"/BA1/IGL_ACCOUNT\") IN ( '0000401077', '0000401035') AND (\"/BA1/C55POSTD\">= '20170101' AND \"/BA1/C55POSTD\" <='20170101')" \
--target-dir /user/arekapalli/pfit_export_test12 \
--delete-target-dir \
--split-by //BA1//C55LGENT// \
-m 10
Below is the error we got..
Caused by: com.sap.db.jdbc.exceptions.JDBCDriverException: SAP DBTech JDBC: [257] (at 12): sql syntax error: incorrect syntax near "/": line 1 col 12 (at pos 12)
your problem is probably here
--query "select '\"/BA1/C55LGENT/\"' FROM \"_SYS_BIC\".\"sap.fs.frdp.300.RDL/BV_RDL_ZAFI______Z_SLPD\" where \$CONDITIONS and (\"/BA1/C55LGENT\") IN ('0000000671','0000000615') and (\"/BA1/C55LGENT\" != '0000000022') AND (\"/BIC/ZCINTEIND\" ='01') AND (\"/BA1/IGL_ACCOUNT\") IN ( '0000401077', '0000401035') AND (\"/BA1/C55POSTD\">= '20170101' AND \"/BA1/C55POSTD\" <='20170101')" \
you are assuming that the "\" is a escape character used from the terminal, that is probabliy wrong. try the following
--query 'select "/BA1/C55LGENT/" FROM "_SYS_BIC"."sap.fs.frdp.300.RDL/BV_RDL_ZAFI______Z_SLPD" where \$CONDITIONS and ("/BA1/C55LGENT") IN ("0000000671","0000000615") and ("/BA1/C55LGENT" != "0000000022") AND ("/BIC/ZCINTEIND" ="01") AND ("/BA1/IGL_ACCOUNT") IN ( "0000401077", "0000401035") AND ("/BA1/C55POSTD">= "20170101" AND "/BA1/C55POSTD" <="20170101")' \
I am not a sap user, so maybe could be something wrong with the query, anyway you can see that I removed all your ' from the query and I used the as delimiter of the query

impala incremental last modified

I have a Sqoop import to bring in data from Oracle with a join on two tables. I need to do a --incremental last-modified based on a column which is common on both the tables:
--query "SELECT customer_info.customer_id, customer_info.customer_name,
customer.date_created, sales_info.last_update_date as sales_last_update_date
from customer_info
inner join
sales_info ON customer_info.customer_id = sales_info.customer_id
AND \$CONDITIONS" \
--split-by "customer_id" \
--fields-terminated-by '\t' \
--target-dir (name_of_dir) \
--incremental lastmodified \
--check-column sales_last_update_date \
The last_update_date is common on both the columns.
But I get the error:
ORA-00904: "sales_last_update_date": invalid identifier

sqoop hive import with partitions

I have some sqoop jobs importing into hive that I want to partition, but I can't get it to function. The import will actually work: the table is sqooped, it's visible in hive, there's data but the partition parameters I'm expecting to see don't appear when I describe the table. I HAVE sqooped this table as a csv, created an external parquet table, and inserted the data into that (which works), but I want to be able to avoid the extra steps if possible. here's my current code. Am I missing something or am I trying to do the impossible? thanks!
sqoop import -Doraoop.import.hint=" " \
--options-file /home/[user]/pass.txt \
--verbose \
--connect jdbc:oracle:thin:#ldap://oid:389/cn=OracleContext,dc=[employer],dc=com/SQSOP051 \
--username [user]\
--num-mappers 10 \
--hive-import \
--query "select DISC_PROF_SK_ID, CLM_RT_DISC_IND, EASY_PAY_PLN_DISC_IND, TO_CHAR(L40_ATOMIC_TS,'YYYY') as YEAR, TO_CHAR(L40_ATOMIC_TS,'MM') as MONTH from ${DataSource[index]}.$TableName where \$CONDITIONS" \
--hive-database [dru_user] \
--hcatalog-partition-keys YEAR \
--hcatalog-partition-values '2015' \
--target-dir hdfs://nameservice1/data/res/warehouse/finance/[dru_user]/Claims_Data/$TableName \
--hive-table $TableName'testing' \
--split-by ${SplitBy[index]} \
--delete-target-dir \
--direct \
--null-string '\\N' \
--null-non-string '\\N' \
--as-parquetfile \
You can replace the options-file by --password-file. However that will not solve the partition problem. For the partition problem you can try creating the partition-ed table $TableName partitioned first before the import.
sqoop import -Doraoop.import.hint=" " \
--password-file /home/[user]/pass.txt \
--verbose \
--connect jdbc:oracle:thin:#ldap://oid:389/cn=OracleContext,dc=[employer],dc=com/SQSOP051 \
--username [user] \
--num-mappers 10 \
--hive-import \
--query "SELECT disc_prof_sk_id,
clm_rt_disc_ind,
easy_pay_pln_disc_ind,
To_char(l40_atomic_ts,'YYYY') AS year,
To_char(l40_atomic_ts,'MM') AS month
FROM ${DataSource[index]}.$tablename
WHERE \$conditions" \
--hcatalog-database [dru_user] \
--hcatalog-partition-key YEAR \
--hcatalog-partition-values '2015' \
--target-dir hdfs://nameservice1/data/res/warehouse/finance/[dru_user]/Claims_Data/$TableName \
--hcatalog-table $TableName \
--split-by ${SplitBy[index]} \
--delete-target-dir \
--direct \
--null-string '\\N' \
--null-non-string '\\N' \
--as-parquetfile

How to specify multiple conditions in sqoop?

Sqoop version: 1.4.6.2.3.4.0-3485
I have been trying to import data using sqoop using the following command:
sqoop import -libjars /usr/local/bfm/lib/java/jConnect-6/6.0.0/jconn3-6.0.0.jar --connect jdbc:sybase:db --username user --password 'pwd' --driver com.sybase.jdbc3.jdbc.SybDriver --query 'SELECT a.* from table1 a,table2 b where b.run_group=a.run_group and a.date<"7/22/2016" AND $CONDITIONS' --target-dir /user/user/a/ --verbose --hive-import --hive-table default.temp_a --split-by id
I get the following error:
Invalid column name '7/22/2016'
I have tried enclosing the query in double quotes, but then it says:
CONDITIONS: Undefined variable.
Tried several combinations of single/double quotes and escaping $CONDITIONS and using a --where switch as well.
PS: The conditions are non numeric. (It works for cases like where x<10 or so, but not in case where it's a string or date)
In your command --split-by=id should be --split-by=a.id, I would use join instead of adding extra where condition, also I would convert date to (specified string value) VARCHR (using sybase specific function)
sqoop import -libjars /usr/local/bfm/lib/java/jConnect-6/6.0.0/jconn3-6.0.0.jar \
--connect jdbc:sybase:db \
--username user \
--password 'pwd' \
--driver com.sybase.jdbc3.jdbc.SybDriver \
--query "SELECT a.* from table1 a join table2 b on a.id=b.id where a.run_group=b.run_group and convert(varchar, a.date, 101) < '7/22/2016' AND \$CONDITIONS" \
--target-dir /user/user/a/ \
--verbose \
--hive-import \
--hive-table default.temp_a \
--split-by a.id
A workaround that can be used: -options-file
Copy the query in your options file and use the switch.
The options file might be as:
--query
select * \
from table t1 \
where t1.field="text" \
and t1.value="value" \
and $CONDITIONS
Note: Not sure if it was a particular version issue or not but --query directly in the command just refused to work with $CONDITIONS. (Yes, I tried escaping it with \ and several other combinations of quotations)

Resources