Alternatives for replace() and matches() - xpath

i have to work with a given XPath/XQuery-Processor and i cannot use the replace() or matches() functions, because they are not supported.
But i need their functionality.
What would be good alternatives?
i am trying to do something like this:
replace "-" symbols in a string with "", means i have to erase the minus symbols
e.g. : turn
"--ssam----ple----string"
into
"samplestring"
and later i need to
look for a certain string pattern in the resulting string e.g.
matches("samplestring", [a-z]*st[a-z]*)
but since i cannot use replace or matches, i dont know how to realize this.
Thanks

In your particular cases, consider fn:translate():
translate('--ssam----ple----string', '-', '')
and fn:contains():
contains('samplestring', 'st')

This is one more solution (provided that your processor supports codepoint functions):
contains(
codepoints-to-string(
string-to-codepoints("--ssam----ple----string")[. ne 45]
)
, "st")

Related

How to replace '$' from string in pig?

We know to replace word we can use REPLACE keyword like below...
RELATION = FOREACH data GENERATE REPLACE(string,'a','b');
above statement replace all 'a' letters to 'b'.
But if I want to REPLACE dollar sign($). then how I can do that? Because in Pig '$' indicates no of column. So for example, if want to replace '$' from string like '$1234.56' and want output like '1234.56'.
RELATION = FOREACH data GENERATE REPLACE(string,'$','');
But this not work for me.
Can anyone please help? Thanks in advance.
Using Unicode:
REPLACE(string,'\u0024','')
It can helpful to look at the string regrexes in Java, for instance: https://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html
In your particular case, you can use the following:
REPLACE(string, '[$]', '')
For increased flexibility, (when dealing with other currency types for instance), it might be a good idea to remove all non-numeric characters, except '.'. In that case use:
REPLACE(string, '[^\\d.]', '')
This worked for me: (triple backslashes)
REPLACE(string,'\\\$','')

Changing "word" to "Word" using a RegEx like [A-Z]([a-z]*)\b

The title sums up my conundrum pretty well. I've been searching around the net for a while, and being new to Ruby and Regular Expressions as a whole, I'm stuck trying to figure out how to alter the case of a single word string using a RegEx "filter" such as [A-Z]([a-z]*)\b.
Basically I want the flow to be
input: woRD
filter: [A-Z]([a-z]*)\b
output: Word
I already have the words filtered into a list, so I don't need to match words; I only need to filter the case of the word using a RegEx filter.
I do not want to use standard capitalization methods, I want this to be done using Regular Expressions.
You can use
"woRD".downcase.capitalize
Ruby provides some predefined methods for these type of functionality. Try to use them instead of regex. which saves coding time!
Well, for some reason you want to use regexps. Here you go:
# prepare hashes for gsub
to_down = (to_upper = Hash[('a'..'z').zip('A'..'Z')]).invert
# convert to downcase
downcased = 'woRD'.gsub(/[A-Z]/, to_down)
# ⇛ 'word'
titlecased = downcased.gsub(/^\w/, to_upper)
# ⇒ 'Word'
Hope it helps. Note the usage of String#gsub(re, hash) method.
You can't use Regex to such altering as you want to do.
Please read carefully this topic: How to change case of letters in string using regex in Ruby.
The best way to solve your problem is to use:
"woRD".downcase.capitalize
or
name_of_your_variable.downcase!.capitalize!
if you want to alter string in your variable permanently without need of assign it to other variable.

How to handle word boundary in FreeMarker when replacing string?

When using FreeMarker, I want to replace some words in the template, but replace function does not handle word boundary, so my output is messed. Is it possible handle word boundary in FreeMarker? Thanks!
Edit:
The word boundary question is solved, but I have another question about backreference.
I just found I should use the third optional flag 'r' to tell FreeMarker that I am using regular expression. For my purpose, I use something like this:
block?replace("\\b${arg}\\b", "__${arg}", "r")
Note we must use \\b for the word boundary matching.

Regex can this be achieved

I'm too ambitious or is there a way do this
to add a string if not present ?
and
remove a the same string if present?
Do all of this using Regex and avoid the if else statement
Here an example
I have string
"admin,artist,location_manager,event_manager"
so can the substring location_manager be added or removed with regards to above conditions
basically I'm looking to avoid the if else statement and do all of this plainly in regex
"admin,artist,location_manager,event_manager".test(/some_regex/)
The some_regex will remove location_manager from the string if present else it will add it
Am I over over ambitions
You will need to use some sort of logic.
str += ',location_manager' unless str.gsub!(/location_manager,/,'')
I'm assuming that if it's not present you append it to the end of the string
Regex will not actually add or remove anything in any language that I am aware of. It is simply used to match. You must use some other language construct (a regex based replacement function for example) to achieve this functionality. It would probably help to mention your specific language so as to get help from those users.
Here's one kinda off-the-wall solution. It doesn't use regexes, but it also doesn't use any if/else statements either. It's more academic than production-worthy.
Assumptions: Your string is a comma-separated list of titles, and that these are a unique set (no duplicates), and that order doesn't matter:
titles = Set.new(str.split(','))
#=> #<Set: {"admin", "artist", "location_manager", "event_manager"}>
titles_to_toggle = ["location_manager"]
#=> ["location_manager"]
titles ^= titles_to_toggle
#=> #<Set: {"admin", "artist", "event_manager"}>
titles ^= titles_to_toggle
#=> #<Set: {"location_manager", "admin", "artist", "event_manager"}>
titles.to_a.join(",")
#=> "location_manager,admin,artist,event_manager"
All this assumes that you're using a string as a kind of set. If so, you should probably just use a set. If not, and you actually need string-manipulation functions to operate on it, there's probably no way around except for using if-else, or a variant, such as the ternary operator, or unless, or Bergi's answer
Also worth noting regarding regex as a solution: Make sure you consider the edge cases. If 'location_manager' is in the middle of the string, will you remove the extraneous comma? Will you handle removing commas correctly if it's at the beginning or the end of the string? Will you correctly add commas when it's added? For these reasons treating a set as a set or array instead of a string makes more sense.
No. Regex can only match/test whether "a string" is present (or not). Then, the function you've used can do something based on that result, for example replace can remove a match.
Yet, you want to do two actions (each can be done with regex), remove if present and add if not. You can't execute them sequentially, because they overlap - you need to execute either the one or the other. This is where if-else structures (or ternary operators) come into play, and they are required if there is no library/native function that contains them to do exactly this job. I doubt there is one in Ruby.
If you want to avoid the if-else-statement (for one-liners or expressions), you can use the ternary operator. Or, you can use a labda expression returning the correct value:
# kind of pseudo code
string.replace(/location,?|$/, function($0) return $0 ? "" : ",location" )
This matches the string "location" (with optional comma) or the string end, and replaces that with nothing if a match was found or the string ",location" otherwise. I'm sure you can adapt this to Ruby.
to remove something matching a pattern is really easy:
(admin,?|artist,?|location_manager,?|event_manager,?)
then choose the string to replace the match -in your case an empty string- and pass everything to the replace method.
The other operation you suggested was more difficult to achieve with regex only. Maybe someone knows a better answer

Using regex to replace all spaces NOT in quotes in Ruby

I'm trying to write a regex to replace all spaces that are not included in quotes so something like this:
a = 4, b = 2, c = "space here"
would return this:
a=4,b=2,c="space here"
I spent some time searching this site and I found a similar q/a ( Split a string by spaces -- preserving quoted substrings -- in Python ) that would replace all the spaces inside quotes with a token that could be re-substituted in after wiping all the other spaces...but I was hoping there was a cleaner way of doing it.
It's worth noting that any regular expression solution will fail in cases like the following:
a = 4, b = 2, c = "space" here"
While it is true that you could construct a regexp to handle the three-quote case specifically, you cannot solve the problem in the general sense. This is a mathematically provable limitation of simple DFAs, of which regexps are a direct representation. To perform any serious brace/quote matching, you will need the more powerful pushdown automaton, usually in the form of a text parser library (ANTLR, Bison, Parsec).
With that said, it sounds like regular expressions should be sufficient for your needs. Just be aware of the limitations.
This seems to work:
result = string.gsub(/( |(".*?"))/, "\\2")
I consider this very clean:
mystring.scan(/((".*?")|([^ ]))/).map { |x| x[0] }.join
I doubt gsub could do any better (assuming you want a pure regex approach).
try this one, string in single/double quoter is also matched (so you need to filter them, if you only need space):
/( |("([^"\\]|\\.)*")|('([^'\\]|\\.)*'))/

Resources