Escape character for sql query - shell

Im using the below query in the properties file and using in the shell script but due to special characters in the query it not giving me the output with special characters.
query="select top 10 source_system,updt_etl_instnc_run_id,negative_posting_flag, to_number(to_varchar(to_date(create_tmstmp),'yyyymm')) as part_date from c_fin_a.gl_transaction_data where to_number(to_varchar(to_date(create_tmstmp),'yyyymm'))=$NOW and \$CONDITIONS"
I have used escape characters for all special chars then also its not giving me same output with escape characters.
query= \ " select top 10 source_system,updt_etl_instnc_run_id,negative_posting_flag, to_number \ (to_varchar \ ( to_date \ ( create_tmstmp \ ) , \ ' yyyymm \ ' \ ) \ ) as part_date from c_fin_a.gl_transaction_data where to_number \ ( to_varchar \ ( to_date \ ( create_tmstmp \ ) , \ ' yyyymm \ ' \ ) \ )= \ $NOW and \ \$CONDITIONS \ "

First: never use spaces around the equal sign when you assign a value to variable. I mean var=value is OK, var = value is not ok.
Now, let's assume your shell variables have the following values:
NOW=201603
CONDITIONS="city='New York'"
Then you need to use the following:
query="select top 10 source_system,updt_etl_instnc_run_id,negative_posting_flag, to_number(to_varchar(to_date(create_tmstmp),'yyyymm')) as part_date from c_fin_a.gl_transaction_data where to_number(to_varchar(to_date(create_tmstmp),'yyyymm'))=${NOW} and ${CONDITIONS}"
to generate a valid SQL statement.

Related

Exporting data from Teradata to HDFS using TDCH

I'm trying to export a table from Teradata into a file in my hdfs using TDCH.
I'm using the below parameters :
hadoop jar $TDCH_JAR com.teradata.connector.common.tool.ConnectorImportTool \
-libjars $LIB_JARS \
-Dmapred.job.queue.name=default \
-Dtez.queue.name=default \
-Dmapred.job.name=TDCH \
-classname com.teradata.jdbc.TeraDriver \
-url jdbc:teradata://$ipServer/logmech=ldap,database=$database,charset=UTF16 \
-jobtype hdfs \
-fileformat textfile \
-separator ',' \
-enclosedby '"' \
-targettable ${targetTable} \
-username ${userName} \
-password ${password} \
-sourcequery "select * from ${database}.${targetTable}" \
-nummappers 1 \
-sourcefieldnames "" \
-targetpaths ${targetPaths}
It's working, but I need the headers in the file, and when I add the parameter:
-targetfieldnames "ID","JOB","DESC","DT","REG" \
It doesnt work, I don't even generate the file anymore.
Can anyonne help me?
The -targetfieldnames option is only valid for -jobtype hive.
It does not put headers in the HDFS file, it specifies Hive column names.
(There is no option to prefix CSV with a header record.)
Also the value supplied for -targetfieldnames should be a single string like "ID,JOB,DESC,DT,REG" rather than a list of strings.

Sqoop Error: --table or --query is required for import

I am getting the command not found errors while using the following sqoop command(oracle query).
I used another sqoop command to connect to another DB with a different jdbc connection string, it connected and fetched the data without error.
Not sure what the problem is here, can someone help fixing this error? Thanks in advance.
`sqoop import
--connectjdc:oracle:thin:#//(connection string)
--query
"Select sys,
case when (substring(gid,3,3))="_A_" or gid="NJ_Parsipanny") then "core"
when
else "misc" end "org",
aid, aname,
b.workid as "waddress",
f.ai,
a.ag,
b.jobd,
b.jobk,
e.emstatus,
b.jobfunc,
b.superid,
c.fname+" "+c.lname as "S_Name",
FROM ad.db.tbl_a a
left join common.db.b b
on a.tid=b.sbid
left join common.db.c c
on b.sid=c.sbid
left join common.db.d d
on c.sid=d.sbid
left join common.db.e e
on d.sid=e.sbid
left join ad.db.tbl_f f
on a.AG=f.AG
WHERE RIGHT(a.AG,1) IN ("E","T")
AND \$CONDITIONS"
--num-mappers 1
--target-dir /abc/46780
--fields-terminated-by ","
--user-name xyz
--password-file hdfs:///abc/46780/p/pswd.txt`
Errors: command not found.
--table or --query is required for import
line 26:--query: command not found
line 44:--num-mappers: command not found
line 45:--target-dir: command not found
line 46:--fields-terminated-by: command not found
line 47:--username:command not found
line 48:--password-file: command not found
Looks like shell script is giving you the error not sqoop.
When you have single command but divided into multiple line you need to add backslash \ in each line ending.
sqoop import \
--connect jdbc:oracle:thin:#//(connection string) \
--query \
'Select sys,
case when (substring(gid,3,3))="_A_" or gid="NJ_Parsipanny") then "core"
when
else "misc" end "org",
aid, aname,
b.workid as "waddress",
f.ai,
a.ag,
b.jobd,
b.jobk,
e.emstatus,
b.jobfunc,
b.superid,
c.fname+" "+c.lname as "S_Name",
FROM ad.db.tbl_a a
left join common.db.b b
on a.tid=b.sbid
left join common.db.c c
on b.sid=c.sbid
left join common.db.d d
on c.sid=d.sbid
left join common.db.e e
on d.sid=e.sbid
left join ad.db.tbl_f f
on a.AG=f.AG
WHERE RIGHT(a.AG,1) IN ("E","T")
AND $CONDITIONS' \
--num-mappers 1 \
--target-dir /abc/46780 \
--fields-terminated-by "," \
--user-name xyz \
--password-file hdfs:///abc/46780/p/pswd.txt
Note: Don’t need to add \ inside quotes.
Changes:
I have changed double quotes to single quotes as you have double quotes inside your SQL query.
Remove \ before $CONDITIONS as using single quotes does need $ to escape. It by default does.
I assume the you are running the sqoop command from the linux shell command line.
I would rewrite the sqoop command as follow.
First, I would test the query through the Oracle console to see if the query is working fine.
Second, to the end of each line you should put the slash \
Third, you should put strings literals into the query between single quotes 'string literal' instead of double quotes.
sqoop import \
--connectjdc:oracle:thin:#//(connection string) \
--query "Select sys,
case when (substring(gid,3,3))='_A_' or gid='NJ_Parsipanny') then 'core'
when
else 'misc' end 'org',
aid, aname,
b.workid as waddress,
f.ai,
a.ag,
b.jobd,
b.jobk,
e.emstatus,
b.jobfunc,
b.superid,
c.fname+' '+c.lname as S_Name,
FROM ad.db.tbl_a a
left join common.db.b b
on a.tid=b.sbid
left join common.db.c c
on b.sid=c.sbid
left join common.db.d d
on c.sid=d.sbid
left join common.db.e e
on d.sid=e.sbid
left join ad.db.tbl_f f
on a.AG=f.AG
WHERE RIGHT(a.AG,1) IN ('E','T')
AND \$CONDITIONS" \
--num-mappers 1 \
--target-dir /abc/46780 \
--fields-terminated-by "," \
--username xyz \
--password-file hdfs:///abc/46780/p/pswd.txt

White spaces instead of NULL in Hive table after Sqoop import

I created sqoop process which imports data from MS SQL to Hive, but I have a problem with 'char' type fields. Sqoop import code:
sqoop import \
--create-hcatalog-table \
--connect "connection_parameters" \
--username USER \
--driver net.sourceforge.jtds.jdbc.Driver \
--null-string '' \
--null-non-string '' \
--class-name TABLE_X \
--hcatalog-table TABLE_X_TEST \
--hcatalog-database default \
--hcatalog-storage-stanza "stored as orc tblproperties ('orc.compress'='SNAPPY')" \
--map-column-hive "column_1=char(10),column_2=char(35)" \
--num-mappers 1 \
--query "select top 10 "column_1", "column_2" from TABLE_X where \$CONDITIONS" \
--outdir "/tmp"
column_1 which is type char(10) should be NULL if there is no data. But Hive fills the field with 10 spaces.
column_2 which is type char(35) should be NULL too, but there are 35 spaces.
It is huge problem because I cannot run query like this:
select count(*) from TABLE_X_TEST where column_1 is NULL and column_2 is NULL;
But I have to use this one:
select count(*) from TABLE_X_TEST where column_1 = ' ' and column_2 = ' ';
I tried change query parameter and use trim function:
--query "select top 10 rtrim(ltrim("column_1")), rtrim(ltrim("column_2")) from TABLE_X where \$CONDITIONS"
but it does not work, so I suppose it is not a problem with source, but with Hive.
How I can prevent Hive from inserting spaces in empty fields?
You need to change these parameters:
--null-string '\\N' \
--null-non-string '\\N' \
Hive, by default, expects that the NULL value will be encoded using the string constant \N. Sqoop, by default, encodes it using the string constant null. To rectify the mismatch, you’ll need to override Sqoop’s default behavior with Hive’s using parameters --null-string and --null-non-string (this is what you do but with incorrect values). For details, see docs.
I tried without giving the options of null-string and null-non-string for creating orc tables using Sqoop hcatalog, all the nulls in source are reflecting as NULL and I am able to query using is null function.
Let me know if you found any other solution to handle null's.

query with special charaters in sqoop

Can we use query in sqoop options file like i used query with special characters when i ran the query its giving an error incorrect syntax near "\" . Should I use any escape character in the properties file ?
In option file I have mentioned query and using in sqoop import command.
Propertiesfile:
--query "select top 10 source_system,company_code,gl_document,***************negative_posting_flag, to_number(to_varchar(to_date(create_tmstmp),'yyyymm')) as part_date from c_fin_a.gl_transaction_data where to_number(to_varchar(to_date(create_tmstmp),'yyyymm'))=201602 and \$CONDITIONS"
sqoop import command
sudo sqoop import \
--options-file /home/emaarae/sqoop_shell/sqoop_hdfs.properties \
--append \
--null-string '' \
--null-non-string '' \
--fields-terminated-by '\001' \
--lines-terminated-by '\n' \
--m 15

badly placed ()'s when creating oracle database link in tcsh

#my code
echo \
'create database link remotec101 \
connect to "os_user" \
identified by "password" \
using ' \
(DESCRIPTION=(ADDRESS=(PROTOCOL=TCP) \
(HOST=c101) \
(PORT=1521)) \
(CONNECT_DATA=(SID=XE)))';'|sqlplus
I tried to run some sql in this way and it worked. But when creating database link I got error, saying badly placed ()'s
This code is in tcsh.
Please help me.
Thanks
The parentheses are not quoted, so they're treated as shell metacharacters.
This:
echo \
'create database link remotec101 \
connect to "os_user" \
identified by "password" \
using \
(DESCRIPTION=(ADDRESS=(PROTOCOL=TCP) \
(HOST=c101) \
(PORT=1521)) \
(CONNECT_DATA=(SID=XE)));' | sqlplus
will feed the following to the sqlplus command:
create database link remotec101
connect to "os_user"
identified by "password"
using
(DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)
(HOST=c101)
(PORT=1521))
(CONNECT_DATA=(SID=XE)));
But a "here document" is probably cleaner:
sqlplus <<'EOF'
create database link remotec101
connect to "os_user"
identified by "password"
using
(DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)
(HOST=c101)
(PORT=1521))
(CONNECT_DATA=(SID=XE)));
'EOF'
If you want the last 4 lines to become a single line of input to sqlplus, I think you'll need to put them all on one line in your script. Or you might find it easier to use the printf command to organize your output, for example:
printf '%s\n%s\n%s\n%s\n%s %s %s %s\n' \
'create database link remotec101' \
'connect to "os_user"' \
'identified by "password"' \
'using' \
'(DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)' \
'(HOST=c101)' \
'(PORT=1521))' \
'(CONNECT_DATA=(SID=XE)));' | sqlplus
This prints the last 4 lines as a single line. You can adjust the format string as needed.
I figured it out. You have to utilize the fact that tcsh supports both quote types and echo allows multiple arguments:
echo 'create database link remotec101 \
connect to "os_user" \
identified by "password" \
using ' "'" ' ( DESCRIPTION= ( ADDRESS= ( PROTOCOL=TCP )
( HOST=c101 )
( PORT=1521 ) )
( CONNECT_DATA= ( SID=XE ) ) ) ' "';"

Resources