i have a requirement to import the data into .csv file with comma(,) as a delimiter .
i am using below sqoop options .
--optionally-enclosed-by '\"'
--escaped-by '\\'
below is the input data and output data i want .
input "foo output i want ""foo
but i am getting below
input "foo output "foo
another example :
input foo" output i want foo""
but i am getting below
input foo" output foo"
how can i achieve the desired output
Refer SqoopGuide 7.2.11. Large Objects for a better understanding of --enclosed-by,--escaped-by and --optionally-enclosed-by with examples.
Based on the question, below are the details understood.
--fields-terminated-by , Since you need a file with a comma as the delimiter.
--optionally-enclosed-by '\"' This will enclose only the fields whose data contains delimiter comma , in them.
--escaped-by \\ Used to escape the enclosing characters(double quotes in this case) if they are present in the data field which requires enclosing.
Example:
Input: Suppose if the data in source table is like below with the respective columns. For representation, I used pipe(|) as the delimiter.
Some string, with a comma.|1|2|3...
Another "string with quotes"|4|5|6...
Output: sqoop import --fields-terminated-by , --enclosed-by '\"' --escaped-by \ ...
"Some string, with a comma.","1","2","3"...
"Another \"string with quotes\"","4","5","6"...
Explanation: All fields are terminated by comma and all fields are enclosed by double-quotes. If there is any field with double-quotes in the data then those quotes will be escaped by a backslash() as in the second line.
Output: sqoop import --fields-terminated-by , --optionally-enclosed-by '\"' --escaped-by \ ...
"Some string, with a comma.",1,2,3...
"Another \"string with quotes\"",4,5,6...
Explanation: All fields are terminated by comma and only fields contacting the comma in the data are enclosed by double-quotes. If there is any field with double-quotes in the data then those quotes will be escaped by a backslash() as in the second line and even this column will also be enclosed as in the second line.
For your scenario:
Input: Suppose if the data in the source table is like below with the respective columns. For representation, I used pipe(|) as the delimiter.
"foo|bar"|1|2
foo"|3|4|"bar
Possible Output: sqoop import --fields-terminated-by , --enclosed-by '\"' --escaped-by \ ...
"\"foo","bar\"","1","2"
"foo\"","3","4","\"bar"
Possible Output: sqoop import --fields-terminated-by , --optionally-enclosed-by '\"' --escaped-by \ ...
"\"foo","bar\"",1,2
"foo\"",3,4,"\"bar"
Related
How can I escape double quotes inside a double string in Bash?
For example, in my shell script
#!/bin/bash
dbload="load data local infile \"'gfpoint.csv'\" into table $dbtable FIELDS TERMINATED BY ',' ENCLOSED BY '\"' LINES TERMINATED BY \"'\n'\" IGNORE 1 LINES"
I can't get the ENCLOSED BY '\"' with double quote to escape correctly. I can't use single quotes for my variable, because I want to use variable $dbtable.
Use a backslash:
echo "\"" # Prints one " character.
A simple example of escaping quotes in the shell:
$ echo 'abc'\''abc'
abc'abc
$ echo "abc"\""abc"
abc"abc
It's done by finishing an already-opened one ('), placing the escaped one (\'), and then opening another one (').
Alternatively:
$ echo 'abc'"'"'abc'
abc'abc
$ echo "abc"'"'"abc"
abc"abc
It's done by finishing already opened one ('), placing a quote in another quote ("'"), and then opening another one (').
More examples: Escaping single-quotes within single-quoted strings
Keep in mind that you can avoid escaping by using ASCII codes of the characters you need to echo.
Example:
echo -e "This is \x22\x27\x22\x27\x22text\x22\x27\x22\x27\x22"
This is "'"'"text"'"'"
\x22 is the ASCII code (in hex) for double quotes and \x27 for single quotes. Similarly you can echo any character.
I suppose if we try to echo the above string with backslashes, we will need a messy two rows backslashed echo... :)
For variable assignment this is the equivalent:
a=$'This is \x22text\x22'
echo "$a"
# Output:
This is "text"
If the variable is already set by another program, you can still apply double/single quotes with sed or similar tools.
Example:
b="Just another text here"
echo "$b"
Just another text here
sed 's/text/"'\0'"/' <<<"$b" #\0 is a special sed operator
Just another "0" here #this is not what i wanted to be
sed 's/text/\x22\x27\0\x27\x22/' <<<"$b"
Just another "'text'" here #now we are talking. You would normally need a dozen of backslashes to achieve the same result in the normal way.
Bash allows you to place strings adjacently, and they'll just end up being glued together.
So this:
echo "Hello"', world!'
produces
Hello, world!
The trick is to alternate between single and double-quoted strings as required. Unfortunately, it quickly gets very messy. For example:
echo "I like to use" '"double quotes"' "sometimes"
produces
I like to use "double quotes" sometimes
In your example, I would do it something like this:
dbtable=example
dbload='load data local infile "'"'gfpoint.csv'"'" into '"table $dbtable FIELDS TERMINATED BY ',' ENCLOSED BY '"'"'"' LINES "'TERMINATED BY "'"'\n'"'" IGNORE 1 LINES'
echo $dbload
which produces the following output:
load data local infile "'gfpoint.csv'" into table example FIELDS TERMINATED BY ',' ENCLOSED BY '"' LINES TERMINATED BY "'\n'" IGNORE 1 LINES
It's difficult to see what's going on here, but I can annotate it using Unicode quotes. The following won't work in Bash – it's just for illustration:
dbload=‘load data local infile "’“'gfpoint.csv'”‘" into ’“table $dbtable FIELDS TERMINATED BY ',' ENCLOSED BY '”‘"’“' LINES ”‘TERMINATED BY "’“'\n'”‘" IGNORE 1 LINES’
The quotes like “ ‘ ’ ” in the above will be interpreted by bash. The quotes like " ' will end up in the resulting variable.
If I give the same treatment to the earlier example, it looks like this:
echo “I like to use” ‘"double quotes"’ “sometimes”
Store the double quote character in a variable:
dqt='"'
echo "Double quotes ${dqt}X${dqt} inside a double quoted string"
Output:
Double quotes "X" inside a double quoted string
Check out printf...
#!/bin/bash
mystr="say \"hi\""
Without using printf
echo -e $mystr
Output: say "hi"
Using printf
echo -e $(printf '%q' $mystr)
Output: say \"hi\"
Make use of $"string".
In this example, it would be,
dbload=$"load data local infile \"'gfpoint.csv'\" into table $dbtable FIELDS TERMINATED BY ',' ENCLOSED BY '\"' LINES TERMINATED BY \"'\n'\" IGNORE 1 LINES"
Note (from the man page):
A double-quoted string preceded by a dollar sign ($"string") will cause the string to be translated according to the current locale. If the current locale is C or POSIX, the dollar sign is ignored. If the string is translated and replaced, the replacement is double-quoted.
For use with variables that might contain spaces in you Bash script, use triple quotes inside the main quote, e.g.:
[ "$(date -r """$touchfile""" +%Y%m%d)" -eq "$(date +%Y%m%d)" ]
Add "\" before double quote to escape it, instead of \
#! /bin/csh -f
set dbtable = balabala
set dbload = "load data local infile "\""'gfpoint.csv'"\"" into table $dbtable FIELDS TERMINATED BY ',' ENCLOSED BY '"\""' LINES TERMINATED BY "\""'\n'"\"" IGNORE 1 LINES"
echo $dbload
# load data local infile "'gfpoint.csv'" into table balabala FIELDS TERMINATED BY ',' ENCLOSED BY '"' LINES TERMINATED BY "''" IGNORE 1 LINES
Let's say I have a file test.txt with contents:
+-foo.bar:2.4
| bar.foo:1.1:test
\| hello.goobye:3.3.3
\|+- baz.yeah:4
I want to use the tr command to delete all instances of the following set of characters:
{' ', '+', '-', '|', '\'}
Done some pretty extensive research on this but found no clear/concise answers.
This is the command that works:
input:
cat test.txt | tr -d "[:blank:]|\\\+-"
output:
foo.bar:2.4
bar.foo:1.1:test
hello.goobye:3.3.3
baz.yeah:4
I experimented with many combinations of that set and I found out that the '-' was being treated as a range indicator (like... [a-z]) and therefore must be put at the end. But I have two main questions:
1) Why must the backslash be double escaped in order to be included in the set?
2) Why does putting the '|' at the end of the set string cause the tr program to delete everything in the file except for trailing new line characters?
Like this:
tr -d '\-|\\+[:blank:] ' < file
You have to escape the - because it is used for denoting ranges of characters like:
tr -d '1-5'
and must therefore being escaped if you mean a literal hyphen. You can also put it at the end. (learned that, thanks! :) )
Furthermore the \ must be escaped when you mean a literal \ because it has a special meaning needed for escape sequences.
The remaining characters must not being escaped.
Why must the \ being doubly escaped in your example?
It's because you are using a "" (double quoted) string to quote the char set. A double quoted string will be interpreted by the shell, a \\ in a double quoted string means a literal \. Try:
echo "\+"
echo "\\+"
echo "\\\+"
To avoid to doubly escape the \ you can just use single quotes as in my example above.
Why does putting the '|' at the end of the set string cause the tr program to delete everything in the file except for trailing new line characters?
Following CharlesDuffy's comment having the | at the end means also that you had the unescaped - not at the end, which means it was describing a range of characters where the actual range depends on the position you had it in the set.
another approach is to define the allowed chars
$ tr -cd '[:alnum:]:.\n' <file
foo.bar:2.4
bar.foo:1.1:test
hello.goobye:3.3.3
baz.yeah:4
or, perhaps delete all the prefix non-word chars
$ sed -E 's/\W+//' file
Since I have special char in one of the fields, I wanted to use lower value as delimiter. Hive works fine with the delimiter(\0) but sqoop fails with NoSuchElement Exception. Looks like it is not detecting the delimiter as \0.
This is how my hive an sqoop script looks like. Any help please.
CREATE TABLE SCHEMA.test
(
name CHAR(20),
id int,
dte_report date
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\0'
LOCATION '/user/$USER/test';
sqoop-export \
-Dmapred.job.name="TEST" \
-Dorg.apache.sqoop.export.text.dump_data_on_error=true \
--options-file ${OPTION_FILE_LOCATION}\conn_mysql \
--export-dir /user/$USER/test \
--input-fields-terminated-by '\0' \
--input-lines-terminated-by '\n' \
--input-null-string '\\N' \
--input-null-non-string '\\N' \
--table MYSQL_TEST \
--validate \
--outdir /export/home/$USER/javalib
In VI editor, the delimiter looks like '^#' and with od -c the delimiter is \0
Set the character set to UTF 8 in the my sql conn string that can resolve this issue.
mysql.url=jdbc:mysql://localhost:3306/nbs?useJvmCharsetConverters=false&useDynamicCharsetInfo=false&useUnicode=true&characterEncoding=UTF-8&characterSetResults=UTF-8&useEncoding=true
You should use \000 as delimiter , it will generate that character as a delimiter.
The Null values are displayed as '\N' when a hive external table is queried.
Below is the sqoop import script:
sqoop import -libjars /usr/lib/sqoop/lib/tdgssconfig.jar,/usr/lib/sqoop/lib/terajdbc4.jar -Dmapred.job.queue.name=xxxxxx \
--connect jdbc:teradata://xxx.xx.xxx.xx/DATABASE=$db,LOGMECH=LDAP --connection-manager org.apache.sqoop.teradata.TeradataConnManager \
--username $user --password $pwd --query "
select col1,col2,col3 from $db.xxx
where \$CONDITIONS" \
--null-string '\N' --null-non-string '\N' \
--fields-terminated-by '\t' --num-mappers 6 \
--split-by job_number \
--delete-target-dir \
--target-dir $hdfs_loc
Please advise what change should be done to the script so that nulls are displayed as nulls when the external hive table is queried.
Sathiyan- Below are my findings after many trials
If (null string) property is not included during sqoop import, then NULLs are stored as [blank for integer columns] and [blank for string columns] in HDFS.
2.If the HIVE table on top of HDFS is queried, we would see [NULL for integer column] and [blank for String columns]
If the (--null-string '\N') property is included during sqoop import, then NULLs are stored as ['\N' for both integer and string columns].
If the HIVE table on top of HDFS is queried, we would see [NULL for both integer and string columns not '\N']
In your sqoop script you mentioned --null-string '\N' --null-non-string '\N which means,
--null-string '\N' = The string to be written for a null value for string columns
--null-non-string '\N' = The string to be written for a null value for non-string columns
If any value is NULL in the table and we want to sqoop that table ,then sqoop will import NULL value as string null in HDFS. So, that will create problem to use Null condition in our query using hive
For example: – Lets insert NULL value to mysql table “cities”.
mysql> insert into cities values(6,7,NULL);
By default, Sqoop will import NULL value as string null in HDFS.
Lets sqoop and see what happens:–
sqoop import –connect jdbc:mysql://localhost:3306/sqoop –username sqoop -P –table cities –hive-import –hive-overwrite –hive-table vikas.cities -m 1
http://deltafrog.com/how-to-handle-null-value-during-sqoop-import-export/
In The sqoop import command remove the --null-string and --null-non-string '\N' option.
by default system will assign null for both strings and non string values.
I have tried --null-string '\N' and --null-string '' and other options but getting blank and different issues.
How can I escape double quotes inside a double string in Bash?
For example, in my shell script
#!/bin/bash
dbload="load data local infile \"'gfpoint.csv'\" into table $dbtable FIELDS TERMINATED BY ',' ENCLOSED BY '\"' LINES TERMINATED BY \"'\n'\" IGNORE 1 LINES"
I can't get the ENCLOSED BY '\"' with double quote to escape correctly. I can't use single quotes for my variable, because I want to use variable $dbtable.
Use a backslash:
echo "\"" # Prints one " character.
A simple example of escaping quotes in the shell:
$ echo 'abc'\''abc'
abc'abc
$ echo "abc"\""abc"
abc"abc
It's done by finishing an already-opened one ('), placing the escaped one (\'), and then opening another one (').
Alternatively:
$ echo 'abc'"'"'abc'
abc'abc
$ echo "abc"'"'"abc"
abc"abc
It's done by finishing already opened one ('), placing a quote in another quote ("'"), and then opening another one (').
More examples: Escaping single-quotes within single-quoted strings
Keep in mind that you can avoid escaping by using ASCII codes of the characters you need to echo.
Example:
echo -e "This is \x22\x27\x22\x27\x22text\x22\x27\x22\x27\x22"
This is "'"'"text"'"'"
\x22 is the ASCII code (in hex) for double quotes and \x27 for single quotes. Similarly you can echo any character.
I suppose if we try to echo the above string with backslashes, we will need a messy two rows backslashed echo... :)
For variable assignment this is the equivalent:
a=$'This is \x22text\x22'
echo "$a"
# Output:
This is "text"
If the variable is already set by another program, you can still apply double/single quotes with sed or similar tools.
Example:
b="Just another text here"
echo "$b"
Just another text here
sed 's/text/"'\0'"/' <<<"$b" #\0 is a special sed operator
Just another "0" here #this is not what i wanted to be
sed 's/text/\x22\x27\0\x27\x22/' <<<"$b"
Just another "'text'" here #now we are talking. You would normally need a dozen of backslashes to achieve the same result in the normal way.
Bash allows you to place strings adjacently, and they'll just end up being glued together.
So this:
echo "Hello"', world!'
produces
Hello, world!
The trick is to alternate between single and double-quoted strings as required. Unfortunately, it quickly gets very messy. For example:
echo "I like to use" '"double quotes"' "sometimes"
produces
I like to use "double quotes" sometimes
In your example, I would do it something like this:
dbtable=example
dbload='load data local infile "'"'gfpoint.csv'"'" into '"table $dbtable FIELDS TERMINATED BY ',' ENCLOSED BY '"'"'"' LINES "'TERMINATED BY "'"'\n'"'" IGNORE 1 LINES'
echo $dbload
which produces the following output:
load data local infile "'gfpoint.csv'" into table example FIELDS TERMINATED BY ',' ENCLOSED BY '"' LINES TERMINATED BY "'\n'" IGNORE 1 LINES
It's difficult to see what's going on here, but I can annotate it using Unicode quotes. The following won't work in Bash – it's just for illustration:
dbload=‘load data local infile "’“'gfpoint.csv'”‘" into ’“table $dbtable FIELDS TERMINATED BY ',' ENCLOSED BY '”‘"’“' LINES ”‘TERMINATED BY "’“'\n'”‘" IGNORE 1 LINES’
The quotes like “ ‘ ’ ” in the above will be interpreted by bash. The quotes like " ' will end up in the resulting variable.
If I give the same treatment to the earlier example, it looks like this:
echo “I like to use” ‘"double quotes"’ “sometimes”
Store the double quote character in a variable:
dqt='"'
echo "Double quotes ${dqt}X${dqt} inside a double quoted string"
Output:
Double quotes "X" inside a double quoted string
Check out printf...
#!/bin/bash
mystr="say \"hi\""
Without using printf
echo -e $mystr
Output: say "hi"
Using printf
echo -e $(printf '%q' $mystr)
Output: say \"hi\"
Make use of $"string".
In this example, it would be,
dbload=$"load data local infile \"'gfpoint.csv'\" into table $dbtable FIELDS TERMINATED BY ',' ENCLOSED BY '\"' LINES TERMINATED BY \"'\n'\" IGNORE 1 LINES"
Note (from the man page):
A double-quoted string preceded by a dollar sign ($"string") will cause the string to be translated according to the current locale. If the current locale is C or POSIX, the dollar sign is ignored. If the string is translated and replaced, the replacement is double-quoted.
For use with variables that might contain spaces in you Bash script, use triple quotes inside the main quote, e.g.:
[ "$(date -r """$touchfile""" +%Y%m%d)" -eq "$(date +%Y%m%d)" ]
Add "\" before double quote to escape it, instead of \
#! /bin/csh -f
set dbtable = balabala
set dbload = "load data local infile "\""'gfpoint.csv'"\"" into table $dbtable FIELDS TERMINATED BY ',' ENCLOSED BY '"\""' LINES TERMINATED BY "\""'\n'"\"" IGNORE 1 LINES"
echo $dbload
# load data local infile "'gfpoint.csv'" into table balabala FIELDS TERMINATED BY ',' ENCLOSED BY '"' LINES TERMINATED BY "''" IGNORE 1 LINES