Escaping parenthesis in Pig declare statement - hadoop

PIG VERSION: 0.12.0-cdh5.10.1
I am fairly new in using pig. I learned that there are several ways to define parameters in pig. One of them is 'declare' statement. Just wanted to know, if we can use characters like "(" and ")" (parenthesis) in the parameter value. I am trying to save few(variable for different feeds) lookup values in the declare statement which might contain "(" and ")" characters due to which it is throwing error. I also tried to escape these characters using "\" and "\\" but it does not seem to work
For example,
On running below statement in pig:
%declare DESC 'Joe\\(s URL'
Getting below error on trying to read the same using below command:
sh echo $DESC
ERROR:
2018-02-25 10:11:55,692 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1000: Error during parsing. Lexical error at line 8, column 13. Encountered: "(" (40), after : ""
But, this approach of escaping is working fine for characters like "%" and "=" which are mentioned on the below page:
https://wiki.apache.org/pig/ParameterSubstitution
Is there any way to escape such characters like "(" and ")" in the declare statement? I noticed the same case is with " ' " also.

It seems as though parentheses don't require escaping in Pig declare statements. See this toy example:
%declare DESC 'Joe(s URL'
A = LOAD ...
B = LIMIT A 2;
C = FOREACH B GENERATE '$DESC' AS var;
dump;
(Joe(s URL)
(Joe(s URL)
I was also able to pass parameters with parentheses to Pig through the command line, e.g.:
pig -f temp.pig -p DESC='Joe(s URL'

Related

Double quotes in csv-table cell

I am struggling to add a cell with double quotes in the csv-table.
.. csv-table::
:header: f,d
2,"ts*"
the above one works fine.
But if I try to get the cell as ts"*" instead of ts*, it starts throwing an error :
Error with CSV data in "csv-table" directive: ',' expected after '"'
I tried using escape characters (like \ ) but it didn't work.
I was trying it here : online editor
I think i found the solution; There is an option to specify the escape sequence :escape: '.
.. csv-table::
:escape: '
:header: f,d
2,"ts'"*'""
It is now showing the cell as ts"*".
Try it online

ODI: KM Java BeanShell - escape double quotes

I want to set a variable inside Knowledge Module's Task, with target technology set to Java BeanShell. The value represents mapping EXPRESSIONs, where source table is inside MSSQL database. Column names are surrounded by double quotes, that causes a problem with templating.
Column expression is:
source_tab."Entry Number"
Task (Java BeanShell)
<$
String SEL_COLS = "<%=odiRef.getColList(0, "", "[EXPRESSION]\t[ALIAS_SEP] [CX_COL_NAME]", ",\n\t", "", "")%>";
$>
This variable assignment fails, because " in source_tab."Entry Number" is not escaped - code does not compile.
odiRef.getQuotedString does not solve the problem...
odiRef.getQuotedString could help if generated code is executed as a final code in JBS technology. When we use it in the following way (in ?-, $- or #-substitution):
<$
String SEL_COLS = <%=odiRef.getQuotedString(odiRef.getColList(0, "", "[EXPRESSION]\t[ALIAS_SEP] [CX_COL_NAME]", ",\n\t", "", ""))%>;
$>
then result fails like this:
... Caused by: org.apache.bsf.BSFException: BeanShell script error:
Parse error at line 3, column 37. Encountered: Entry BSF info: ....
... 11 more
Text: <$
String SEL_COLS = "SOURCE_TAB.\"Entry Number\" ENTRY_NUMBER";
$>.
This looks good but does not work. It could work as final code (I mean result of all substitutions) in JBS Technology. Unfortunately any substitutions eats backslashes.
Ok, if standard odiRef-functtion does not work, lets write our own:
<%
String getQuotedStringCustomized(String s){
return '"'+s.replaceAll('"'.toString(),'"'+"+'"+'"'+"'+"+'"')+'"';
}
%>
-- other code........
<$
String SEL_COLS = <%=getQuotedStringCustomized(odiRef.getColList(0, "", "[EXPRESSION]\t[ALIAS_SEP] [CX_COL_NAME]", ",\n\t", "", ""))%>;
$>
Only the way to put " into a Java literal within the JBS Substitution is contatenation with Char literal '"' or using '"'.toString() expression if it is impossible to use Char type.
FINALLY:
In final JBS code you may use \", but within substitutions only +'"'+.

Regular expression to match special characters within double quotes

My input string is :
"& is here "& is here also, & has again occured""
Using gsub method in Ruby language, is there a way to substitute character '&' which is occuring within double quotes with character '$', if gsub method doesnt solve this problem, is there any other approach which can be used to address this problem.
Since first arguement in gsub method can be a regex, so matched regex will be substituted by the second arguement, getting a right regex for identifying might also solve this problem since it can be substituted in the gsub method for replacing '&' with '$'.
Expected output is as shown :
& is here "$ is here also , $ has again occured"
str = %q{& is here "& is here also , & has again occured"}
str.gsub!(/".*?"/) do |substr|
substr.gsub(/&/, '$')
end
puts str
# => & is here "$ is here also , $ has again occured"
EDIT: Just noticed that stribizhev proposed this way before I wrote it.

Expecting QUOTED STRING in pig script

I have written a script to select from vsql:
LOAD 'sql://{select * from sandesh.insights_voice_day
WHERE Observation_date BETWEEN '2011-11-22' AND '2011-11-23' AND
Type='total'
ORDER BY Observation_date}'
It is showing exception as '' Expecting QUOTEDSTRING?. What is problem?
Pig expects a quoted string following a load with the name of the file you are loading. Pig is not SQL, so you have to do something like first dump your query into a file and then:
A = LOAD "your_file" as (column1:datatype, column2:datatype);
B = FITER A by observation date > '2011-11-22' AND observation_date < '2011-11-23' AND
Type='total';
C = ORDER B by observation_date;
DUMP C;
Now, this will order these as strings. So depending on the version of Pig you're using, you'll need to deal with timestamps with the appropriate function. Something like:
http://pig.apache.org/docs/r0.8.1/api/org/apache/pig/piggybank/evaluation/datetime/convert/CustomFormatToISO.html
The problem seems to be use of single quotes multiple times. Following in a single line seems to compile (pig -c test.pig)
A = LOAD 'sql://{select * from sandesh.insights_voice_day WHERE Observation_date BETWEEN "2011-11-22" AND "2011-11-23" AND Type="total" ORDER BY Observation_date}';

Escaping Strings For Ruby SQLite Insert

I'm creating a Ruby script to import a tab-delimited text file of about 150k lines into SQLite. Here it is so far:
require 'sqlite3'
file = File.new("/Users/michael/catalog.txt")
string = []
# Escape single quotes, remove newline, split on tabs,
# wrap each item in quotes, and join with commas
def prepare_for_insert(s)
s.gsub(/'/,"\\\\'").chomp.split(/\t/).map {|str| "'#{str}'"}.join(", ")
end
file.each_line do |line|
string << prepare_for_insert(line)
end
database = SQLite3::Database.new("/Users/michael/catalog.db")
# Insert each string into the database
string.each do |str|
database.execute( "INSERT INTO CATALOG VALUES (#{str})")
end
The script errors out on the first line containing a single quote in spite of the gsub to escape single quotes in my prepare_for_insert method:
/Users/michael/.rvm/gems/ruby-1.9.3-p0/gems/sqlite3-1.3.5/lib/sqlite3/database.rb:91:
in `initialize': near "s": syntax error (SQLite3::SQLException)
It's erroring out on line 15. If I inspect that line with puts string[14], I can see where it's showing the error near "s". It looks like this: 'Touch the Top of the World: A Blind Man\'s Journey to Climb Farther Than the Eye Can See'
Looks like the single quote is escaped, so why am I still getting the error?
Don't do it like that at all, string interpolation and SQL tend to be a bad combination. Use a prepared statement instead and let the driver deal with quoting and escaping:
# Ditch the gsub in prepare_for_insert and...
db = SQLite3::Database.new('/Users/michael/catalog.db')
ins = db.prepare('insert into catalog (column_name) values (?)')
string.each { |s| ins.execute(s) }
You should replace column_name with the real column name of course; you don't have to specify the column names in an INSERT but you should always do it anyway. If you need to insert more columns then add more placeholders and arguments to ins.execute.
Using prepare and execute should be faster, safer, easier, and it won't make you feel like you're writing PHP in 1999.
Also, you should use the standard CSV parser to parse your tab-separated files, XSV formats aren't much fun to deal with (they're downright evil in fact) and you have better things to do with your time than deal with their nonsense and edge cases and what not.

Resources