xpath expression for regex-like matching?

xpath expression for regex-like matching? - ruby

I want to search div id in an html doc with certain pattern.
I want to match this pattern in regex:
foo_([[:digit:]]{1.8})
using xpath. What is the xpath equivalent for the above pattern?
I'm stuck with //div[#id="foo_ and then what? If someone could continue a legal expression for it.
EDIT
Sorry, I think I have to elaborate more. Actually it's not foo_, it's post_message_
Btw, I use mechanize/nokogiri ( ruby )
Here's the snippet :
html_doc = Nokogiri::HTML(open(myfile))
message_div = html_doc.xpath('//div[substring(#id,13) = "post_message_" and substring-after(#id, "post_message_") => 0 and substring-after(#id, "post_message_") <= 99999999]')
Still failed. Error message:
Couldn't evaluate expression '//div[substring(#id,13) = "post_message_" and substring-after(#id, "post_message_") => 0 and substring-after(#id, "post_message_") <= 99999999]' (Nokogiri::XML::XPath::SyntaxError)

How about this (updated):
XPath 1.0:
"//div[substring-before(#id, '_') = 'foo'
and substring-after(#id, '_') >= 0
and substring-after(#id, '_') <= 99999999]"
Edit #2: The OP made a change to the question. The following, even more reduced XPath 1.0 expression works for me:
"//div[substring(#id, 1, 13) = 'post_message_'
and substring(#id, 14) >= 0
and substring(#id, 14) <= 99999999]"
XPath 2.0 has a convenient matches() function:
"//div[matches(#id, '^foo_\d{1,8}$')]"
Apart from the better portability, I would expect the numerical expression (XPath 1.0 style) to perform better than the regex test, though this would only become noticeable when processing large data sets.
Original version of the answer:
"//div[substring-before(#id, '_') = 'foo'
and number(substring-after(#id, '_')) = substring-after(#id, '_')
and number(substring-after(#id, '_')) >= 0
and number(substring-after(#id, '_')) <= 99999999]"
The use of the number() function is unnecessary, because the mathematical comparison operators coerce their arguments to numbers implicitly, any non-numbers will become NaN and the greater than/less than tests will fail.
I also removed the encoding of the angle brackets, since this is an XML requirement, not an XPath requirement.

As already pointed out, in XPath 2.0 it would be good to use its standard regex capabilities with a function like the matches() function.
One possible XPath 1.0 solution:
//div[starts-with(#id, 'post_message_')
and
string-length(#id) = 21
and
translate(substring-after(#id, 'post_message_'),
'0123456789',
''
)
=
''
]
Do note the following:
The use of the standard XPath function starts-with().
The use of the standard XPath function string-length().
The use of the standard XPath function substring-after().
The use of the standard XPath function translate().

Or use xpath function matches(string,pattern).
<xsl:if test="matches(name(.),'foo_')">
Unfortunately it's not regex, but it might be enough unless you have other foo_ tags you don't need, then I Guess you can add a few more "if" checks to cull them out.

Nikkou makes this very easy and readable:
doc.search('div').attr_matches('id', /post_message_\d{1,8}/)

Related

Traversing and search with xpath

My code:
"/root/pharagraph/sentence[" + y + "]/sequence/word"
that is same like
"/root/pharagraph[1]/sentence[" + y + "]/sequence/word"
Problem is that I want something like:
"/root/pharagraph[*]/sentence[" + y + "]/sequence/word"
So my Xpath search the sentence y in first pharagraph but I want to search sentence y in all pharagraphs.

No. Your first XPath expression is the same as your hypothetical XPath (the 3rd XPath). If you get only the first matched element using the 1st XPath, then the problem is in the code that execute the XPath, not in the XPath it self. For example, since I came from .NET, this might happen when one is using the wrong SelectSingleNode() method instead of the correct SelectNodes() to execute the XPath.

Xpath 1.0 using an arithmetic operators

Let's say we have this:
something
Now is there a way to return the #href like: "www.something/page/2". Basically to return the #href value, but with the substring-after(.,"page/") incremented by 1. I've been trying something like
//a/#href[number(substring-after(.,"page/"))+1]
but it doesn't work, and I don't think I can use
//a/#href/number(substring-after(.,"page/"))+1
It's not precisely a paging think, so that I can use the pagination, I just picked that for an example. The point is just to find a way to increment a value in xpath 1.0. Any help?

What you can do is
concat(
translate(//a/#href, '0123456789', ''),
translate(//a/#href, translate(//a/#href, '0123456789', ''), '') + 1
)
So that concatenates the 'href' attribute with all digits being removed with the the sum of 1 and the 'href' with anything but digits being removed.
That might suffice is all digits in your URLs occur at the end of your URL. But generally XPath 1.0 is good at selecting nodes in your input but bad at constructing new values based on parts of node values.

There is a simpler way to achieve this, just take the substring after the page, add 1, and then munge it all back together:
This XPath is based on the current node being the #href attribute:
concat(substring-before(.,'page/'),
'page/',
substring-after(.,'page/')+1
)

Your order of operations is a little, well, out of order. Use something like this:
substring-after(//a/#href, 'page/') + 1
Note that it is not necessary to explicitly convert the string value to a number. From the spec:
The numeric operators convert their operands to numbers as if by
calling the number function.
Putting it all together:
concat(
substring-before(//a/#href, 'page/'),
'page/',
substring-after(//a/#href, 'page/') + 1)
Result:
www.something/page/2

thymeleaf eq with spring variables bug?

I'm trying to run the following th:if:
th:if="${camelContext.getRouteStatus( route.id )} &eq; 'Hey'
but I get this error:
org.thymeleaf.exceptions.TemplateProcessingException: Could not parse as expression: "${camelContext.getRouteStatus( route.id )} &neq; 'Hey' " (camel:92)
However, if I try
th:if="${camelContext.getRouteStatus( route.id )} > 41 "
I get a different error, but now indicating that it's able to parse the expression, its just that it cannot compare Strings and numbers:
Cannot execute GREATER THAN from Expression "${camelContext.getRouteStatus( route.id )} > 41". Left is "Started", right is "41" (camel:92)
That's fine, I just wanted to check if I was writing the syntax correctly, and I don't want to compare numbers anyways, I want to compare the RouteStatus string.
Anyways, maybe someone can help me with this problem? Basically I want to do a if-else on the contents of a string, but I can't get this to work..
Cheers

Have you tried this:
th:if="${camelContext.getRouteStatus( route.id )} == 'Hey'"
Maybe it will work like this?
The example on the thymeleaf shows something similar:
Values in expressions can be compared with the >, <, >= and <= symbols, as usual, and also the == and != operators can be used to check equality (or the lack of it). Note that XML establishes that the < and > symbols should not be used in attribute values, and so they should be substituted by < and >.
th:if="${prodStat.count} gt; 1"
th:text="'Execution mode is ' + ( (${execMode} == 'dev')? 'Development' : 'Production')"
Even though textual aliases exist for some of these operators: gt (>), lt (<), ge (>=), le (<=), not (!). Also eq (==), neq/ne (!=), it is sometimes still better to stick with the old fashion operators.

It seems that your expression is malformed, but maybe this a copy paste issue.
Could you try: th:if="${camelContext.getRouteStatus( route.id ) eq 'Hey'} ?

match regular expression

I have to requirement to check the value 91981552e1775310VgnVCM100000a2b6140a____;standard;212.58.244.70;Oct-22-2012;24353teehdtehg; where the date and 24353teehdtehg is dynamic.
How can I may it more generic so that I can check expected_value =~/actual_value/ excluding the dynamic values in Ruby.

I wouldn't use a regular expression if at all possible. You seem to have an input string that can easily be altered and used to compare against an expected value without using a regular expression.
str = "91981552e1775310VgnVCM100000a2b6140a____;standard;212.58.244.70;Oct-22-2012;24353teehdtehg;"
actual_value = str.split(';')[0..-3].join(';')
# "91981552e1775310VgnVCM100000a2b6140a____;standard;212.58.244.70"
Then just compare the two
expected_value == actual_value

I guess you could use something like :
/91981552e1775310VgnVCM100000a2b6140a____;standard;212\.58\.244\.70;(Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)-\d{2}-\d{4};\d{5}[a-z]{9};/
depending on what the string could actually be.

Checking if a string has balanced parentheses

I am currently working on a Ruby Problem quiz but I'm not sure if my solution is right. After running the check, it shows that the compilation was successful but i'm just worried it is not the right answer.
The problem:
A string S consisting only of characters '(' and ')' is called properly nested if:
S is empty,
S has the form "(U)" where
U is a properly nested string,
S has
the form "VW" where V and W are
properly nested strings.
For example, "(()(())())" is properly nested and "())" isn't.
Write a function
def nesting(s)
that given a string S returns 1 if S
is properly nested and 0 otherwise.
Assume that the length of S does not
exceed 1,000,000. Assume that S
consists only of characters '(' and
')'.
For example, given S = "(()(())())"
the function should return 1 and given
S = "())" the function should return
0, as explained above.
Solution:
def nesting ( s )
# write your code here
if s == '(()(())())' && s.length <= 1000000
return 1
elsif s == ' ' && s.length <= 1000000
return 1
elsif
s == '())'
return 0
end
end

Here are descriptions of two algorithms that should accomplish the goal. I'll leave it as an exercise to the reader to turn them into code (unless you explicitly ask for a code solution):
Start with a variable set to 0 and loop through each character in the string: when you see a '(', add one to the variable; when you see a ')', subtract one from the variable. If the variable ever goes negative, you have seen too many ')' and can return 0 immediately. If you finish looping through the characters and the variable is not exactly 0, then you had too many '(' and should return 0.
Remove every occurrence of '()' in the string (replace with ''). Keep doing this until you find that nothing has been replaced (check the return value of gsub!). If the string is empty, the parentheses were matched. If the string is not empty, it was mismatched.

You're not supposed to just enumerate the given examples. You're supposed to solve the problem generally. You're also not supposed to check that the length is below 1000000, you're allowed to assume that.
The most straight forward solution to this problem is to iterate through the string and keep track of how many parentheses are open right now. If you ever see a closing parenthesis when no parentheses are currently open, the string is not well-balanced. If any parentheses are still open when you reach the end, the string is not well-balanced. Otherwise it is.
Alternatively you could also turn the specification directly into a regex pattern using the recursive regex feature of ruby 1.9 if you were so inclined.

My algorithm would use stacks for this purpose. Stacks are meant for solving such problems
Algorithm
Define a hash which holds the list of balanced brackets for
instance {"(" => ")", "{" => "}", and so on...}
Declare a stack (in our case, array) i.e. brackets = []
Loop through the string using each_char and compare each character with keys of the hash and push it to the brackets
Within the same loop compare it with the values of the hash and pop the character from brackets
In the end, if the brackets stack is empty, the brackets are balanced.
def brackets_balanced?(string)
return false if string.length < 2
brackets_hash = {"(" => ")", "{" => "}", "[" => "]"}
brackets = []
string.each_char do |x|
brackets.push(x) if brackets_hash.keys.include?(x)
brackets.pop if brackets_hash.values.include?(x)
end
return brackets.empty?
end

You can solve this problem theoretically. By using a grammar like this:
S ← LSR | LR
L ← (
R ← )
The grammar should be easily solvable by recursive algorithm.
That would be the most elegant solution. Otherwise as already mentioned here count the open parentheses.

Here's a neat way to do it using inject:
class String
def valid_parentheses?
valid = true
self.gsub(/[^\(\)]/, '').split('').inject(0) do |counter, parenthesis|
counter += (parenthesis == '(' ? 1 : -1)
valid = false if counter < 0
counter
end.zero? && valid
end
end
> "(a+b)".valid_parentheses? # => true
> "(a+b)(".valid_parentheses? # => false
> "(a+b))".valid_parentheses? # => false
> "(a+b))(".valid_parentheses? # => false

You're right to be worried; I think you've got the very wrong end of the stick, and you're solving the problem too literally (the info that the string doesn't exceed 1,000,000 characters is just to stop people worrying about how slow their code would run if the length was 100times that, and the examples are just that - examples - not the definitive list of strings you can expect to receive)
I'm not going to do your homework for you (by writing the code), but will give you a pointer to a solution that occurs to me:
The string is correctly nested if every left bracket has a right-bracket to the right of it, or a correctly nested set of brackets between them. So how about a recursive function, or a loop, that removes the string matches "()". When you run out of matches, what are you left with? Nothing? That was a properly nested string then. Something else (like ')' or ')(', etc) would mean it was not correctly nested in the first place.

Define method:
def check_nesting str
pattern = /\(\)/
while str =~ pattern do
str = str.gsub pattern, ''
end
str.length == 0
end
And test it:
>ruby nest.rb (()(())())
true
>ruby nest.rb (()
false
>ruby nest.rb ((((()))))
true
>ruby nest.rb (()
false
>ruby nest.rb (()(((())))())
true
>ruby nest.rb (()(((())))()
false

Your solution only returns the correct answer for the strings "(()(())())" and "())". You surely need a solution that works for any string!
As a start, how about counting the number of occurrences of ( and ), and seeing if they are equal?

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

xpath expression for regex-like matching? - ruby

Or use xpath function matches(string,pattern). <xsl:if test="matches(name(.),'foo_')"> Unfortunately it's not regex, but it might be enough unless you have other foo_ tags you don't need, then I Guess you can add a few more "if" checks to cull them out.

Nikkou makes this very easy and readable: doc.search('div').attr_matches('id', /post_message_\d{1,8}/)

Related

Traversing and search with xpath

Xpath 1.0 using an arithmetic operators

thymeleaf eq with spring variables bug?

match regular expression

Checking if a string has balanced parentheses

Categories

Resources