Sqoop-import to hive from HANA with special characters in table name - sqoop

I am trying to SQOOP from SAP HANA database, My purpose is to do a direct hive import, I used the command as required (which works for most of the tables) but in some cases the import doesn't work as there are special characters in the SAP table name for e.g table name is "/BIC/AS100/"
Due to the "/" in the table name.
I am unable to do a direct hive import.
Is there any way I can import the table and create a new hive table with a proper name.

Thanks , Sathiyan ,
the issue is resolved .
I did a direct hive import specifying a new table name of choice .
still the columns names are imported with the special character but we can handle that in hive . for e.g
Select `/bic/xyz` from tablename ; ( the back tick escapes the special character )

Related

Drop Columns while importing Data in sqoop

I am importing data from oracle to hive . My table doesn't have any integer columns which can be used in my primary keys .So I am not able to use it in my split-by column.
Alternatively I created a row_num column for all rows present in the table . Then this row_num column will be used in split-by column. Finally I want to drop this column from my hive table.
Column list is huge ,I dont want to select all columns using --columns neither I want to create any temporary table for this purpose.
Please let me know whether we can handle this in sqoop arguments.
Can Any little tweek on the --query parameter help you?
Something below.
sqoop import --query 'query string'

After performing sqoop import from rdbms , how to check whether the data is properly imported or not in hive

Is there any tools available?
Normally I check by doing manual checks like count(*), min , max , doing select where query in both rdbms and hive table. Is there any other way?
Please use --validate in sqoop import or export to get row count between source and destination.
Update: Column Level checking.
There is no in built parameter in sqoop to achieve this.But you can do this as below:
1.Store the data imported in a temp table.
Use shell script for below:
2.Get the data from source table and compare it with temp table using shell variables.
3.If it matches,then copy the data from temp to original table

Sqoop Direct Import Netezza Table Permissions

We are using netezza direct to import data from Netezza to Hadoop as part of POC.
Have couple of questions on Netezza specific and Netezza Sqoop Integration.
Q1. Does Sqoop direct mode always require CREATE EXTERNAL TABLE and DROP privilege to perform direct transfer?
Q2. Does external table get created in Netezza ? If yes, which database ? I see Sqoop using below query :
CREATE EXTERNAL TABLE '/yarn/local/usercache/someuser/appcache/application_1483624176418_42787/work/task_1483624176418_42787_m_000000/nzexttable-0.txt'
USING (REMOTESOURCE 'JDBC'
BOOLSTYLE 'T_F'
CRINSTRING FALSE DELIMITER 44 ENCODING
'internal' FORMAT 'Text' INCLUDEZEROSECONDS TRUE
NULLVALUE 'null' MAXERRORS 1)
AS SELECT * FROM SOME_TBL WHERE (DATASLICEID % 3)
Does it create in Database selected in db URL ? jdbc:netezza://somehostname:5480/SOME_DB_1
Q3. If Netezza needs to create External tables, can it create the external table in different database than the one which the actual table with data that needs to be pulled into Hadoop. What is the config change that needs to be done ?
Q4. Does Sqoop run DROP table on external table which was created by individual mappers ?
Sqoop command Used :
export HADOOP_CLASSPATH=/opt/nz/lib/nzjdbc3.jar
sqoop import -D mapreduce.job.queuename=some_queue
-D yarn.nodemanager.local-dirs=/tmp -D mapreduce.map.log.level=DEBUG
--direct --connect jdbc:netezza://somehost:5480/SOME_DB --table SOME_TBL_1
--username SOMEUSER --password xxxxxxx --target-dir /tmp/netezza/some_tbl_file
--num-mappers 2 --verbose
This is what I got as reply in Sqoop User community (Thanks Szabolcs Vasas).
In case of Netezza direct imports Sqoop executes a CREATE EXTERNAL TABLE command (so you will need CREATE EXTERNAL TABLE privilege) to create a backup of the content of the table to a temporary file and it copies the content of this file to the final output on HDFS.
The SQL command you pasted in your email is indeed the one which is executed by Sqoop but as far as I understand from the Netezza documentation (http://www.ibm.com/support/knowledgecenter/SSULQD_7.2.1/com.ibm.nz.load.doc/c_load_create_external_tbl_expls.html, 6th example) this does not really create a new external table in any schema it just backs up the content of the table and because of that no DROP TABLE statement is executed.
Q1. Yes, Sqoop need CREATE EXTERNAL TABLE but not DROP privilege.
Q2. Sqoop does not really create a new external table in any schema it just backs up the content of the table (http://www.ibm.com/support/knowledgecenter/SSULQD_7.2.1/com.ibm.nz.load.doc/c_load_create_external_tbl_expls.html, 6th example).
Q3. Not possible to create an EXTERNAL table in a specific schema.
Q4. No, Sqoop does not run DROP command.
Moreover, the table created by sqoop direct process is Netezza TET - Transient external tables. Thus, the external remotesource JDBC table is dropped once the mapper receives the data as NamedFifo. Thus tables are not stored in Netezza after the transfer.

Sqoop not loading CLOB type data into hive table properly

I am trying to use Sqoop job for importing data from Oracle and one of the column in Oracle table is of data type CLOB which contains newline characters.
In this case, the option --hive-drop-import-delims is not working. Hive table doesn’t read the /n characters properly.
Please suggest how I can import CLOB data into target directory parsing all the characters properly.

Sqoop - Create empty hive partitioned table based on schema of oracle partitioned table

I have an oracle table which has 80 columns and id partitioned on state column. My requirement is to create a hive table with similar schema of oracle table and partitioned on state.
I tried using sqoop -create-hive-table option. But keep getting an error
ERROR sqoop.Sqoop: Got exception running Sqoop: java.lang.IllegalArgumentException: Partition key state cannot be a column to import.
I understand that in Hive the partitioned column should not be in table definition, but then how do I get around the issue?
I do not want to manually write create table command, as I have 50 such tables to import and would like to use sqoop.
Any suggestion or ideas?
Thanks
There is a turn around for this.
Below is the procedure i fallow :
On Oracle run query to get the schema for a table and store it to a file.
Move that file to Hadoop
On Hadoop create a shell script which constructs a HQL file.
That hql file contains "Hive create table statement along with columns". For this we can use the above file(Oracle schema file copied to hadoop).
For this script to run u need to just pass Hive database name,table name, partition column name,path, etc.. depending on u r customization level.At the end of this shell script add "hive -f HQL filename".
If everything is ready it just takes couple of mins for each table creation.

Resources