We know to replace word we can use REPLACE keyword like below...
RELATION = FOREACH data GENERATE REPLACE(string,'a','b');
above statement replace all 'a' letters to 'b'.
But if I want to REPLACE dollar sign($). then how I can do that? Because in Pig '$' indicates no of column. So for example, if want to replace '$' from string like '$1234.56' and want output like '1234.56'.
RELATION = FOREACH data GENERATE REPLACE(string,'$','');
But this not work for me.
Can anyone please help? Thanks in advance.
Using Unicode:
REPLACE(string,'\u0024','')
It can helpful to look at the string regrexes in Java, for instance: https://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html
In your particular case, you can use the following:
REPLACE(string, '[$]', '')
For increased flexibility, (when dealing with other currency types for instance), it might be a good idea to remove all non-numeric characters, except '.'. In that case use:
REPLACE(string, '[^\\d.]', '')
This worked for me: (triple backslashes)
REPLACE(string,'\\\$','')
Related
I am facing a problem with my data, in my data other than alphanumeric characters are there in a column field, where for EX in Name column: Ravicᅩhandr¬an (¬ᅩ○`) like these many characters are there. I need a result like Ravichandran. How can I achieve this? Is there any way to remove in transformer stage.
I tried Convert function in Transformer stage, but problem in using Convert, I am not sure about these unknown characters, I have shown above is just example.
My Requirement is, other than alphanumeric must be removed. And the Balance string should be the same.
How can I get this done?
The following Convert function can be used in Transformer stage to remove any kind of unknown/special characters from the column.
**Convert(Convert('ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789 ','', Column_Name1),'',Column_Name1)
Ex : Convert(Convert('ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789 ','', to_txm.SourceCode),'',to_txm.SourceCode)**
I would like to extract a line of strings but am having difficulties using the correct RegEx. Any help would be appreciated.
String to extract: KSEA 122053Z 21008KT 10SM FEW020 SCT250 17/08 A3044 RMK AO2 SLP313 T01720083 50005
For Some reason StackOverflow wont let me cut and paste the XML data here since it includes "<>" characters. Basically I am trying to extract data between "raw_text" ... "/raw_text" from a xml that will always be formatted like the following: http://www.aviationweather.gov/adds/dataserver_current/httpparam?dataSource=metars&requestType=retrieve&format=xml&hoursBeforeNow=3&mostRecent=true&stationString=PHNL%20KSEA
However, the Station name, in this case "KSEA" will not always be the same. It will change based on user input into a search variable.
Thanks In advance
if I can assume that every strings that you want starts with KSEA, then the answer would be:
.*(KSEA.*?)KSEA.*
using ? would let .* match as less as possible.
i have to work with a given XPath/XQuery-Processor and i cannot use the replace() or matches() functions, because they are not supported.
But i need their functionality.
What would be good alternatives?
i am trying to do something like this:
replace "-" symbols in a string with "", means i have to erase the minus symbols
e.g. : turn
"--ssam----ple----string"
into
"samplestring"
and later i need to
look for a certain string pattern in the resulting string e.g.
matches("samplestring", [a-z]*st[a-z]*)
but since i cannot use replace or matches, i dont know how to realize this.
Thanks
In your particular cases, consider fn:translate():
translate('--ssam----ple----string', '-', '')
and fn:contains():
contains('samplestring', 'st')
This is one more solution (provided that your processor supports codepoint functions):
contains(
codepoints-to-string(
string-to-codepoints("--ssam----ple----string")[. ne 45]
)
, "st")
Let's say I have a string:
asd;;%$##!G'{}[]
Now I want to escape special symbols:
;&|><*?`$(){}[]!#
So, the output will be something like:
asd\;\;%\$#\#\!G\'\{\}\[\]
How can I achieve this using gsub/sub in Ruby?
test_value = "asd;;%$##!G'{}[]"
SPEC_REGEXP = /((;)|(\&)|(\|)|(>)|(<)|(\*)(\?)|(`)|(\$)|(\()|(\))|({)|(})|(\[)|(\])|(!)|(#))/
test_value.gsub!(SPEC_REGEXP,'\\\\\1')
Here's pretty much the same idea as in soundar's solution (but using character classes and no capturing):
"asd;;%$##!G'{}[]".gsub(/[;&|><*?`$(){}\[\]!#]/, '\\\\\\0')
I'm trying to get the first word in this string: Basic (11/17/2011 - 12/17/2011)
So ultimately wanting to get Basic out of that.
Other example string: Premium (11/22/2011 - 12/22/2011)
The format is always "Single-word followed by parenthesized date range" and I just want the single word.
Use this:
str = "Premium (11/22/2011 - 12/22/2011)"
str.split.first # => "Premium"
The split uses ' ' as default parameter if you don't specify any.
After that, get the first element with first
You don't need regexp for that, you can just use
str.split(' ')[0]
I know you found the answer you are needing but in case anyone stumbles on this in the future, in order to pull the needed value out of a large String of unknown length:
word_you_need = s.slice(/(\b[a-zA-Z]*\b \(\d+\/\d+\/\d+ - \d+\/\d+\/\d+\))/).split[0]
This regular expression will match the first word with out the trailing space
"^\w+ ??"
If you really want a regex you can get the first group after using this regex:
(\w*) .*
"Single-word followed by parenthesized date range"
'word' and 'parenthesized date range' should be better defined
as, by your requirement statement, they should be anchors and/or delimeters.
These raw regex's are just a general guess.
\w+(?=\s*\([^)]*\))
or
\w+(?=\s*\(\s*\d+(?:/\d+)*\s*-\s*\d+(?:/\d+)*\s*\))
Actually, all you need is:
s.split[0]
...or...
s.split.first