Given the following sample XML:
<a z="123" y="321"></a>
<b z="456" y="654"></b>
<c x="456" w="654"></c>
<c x="123" w="111"></c>
<c x="789" w="321"></c>
I need an xpath query that will return element 'a', because there is a 'c' element whose #x equals the a's #z, and whose #w does NOT equal the a's #y.
Notice that 'b' is not returned because there is a 'c' element where #x=#z and #w=#y.
Also, the elements being returned can be any element (*). The important bit is there is a matching 'c' element, where the second attribute doesn't match.
The closest I've come up with is this:
//*[#z=//c/#x and .[#y != //c/#w]]
However in my sample above, this would not return 'a' because #z matches #x of a 'c' element, and #y matches #w of a different 'c' element. The second attribute check needs to be made against the same 'c' element.
I hope this makes sense.
This XPath 2.0 expression:
//*[
let $a := .
return
following-sibling::*[#x eq $a/#z and not(#w eq $a/#y)]
]
Will bind the matched element to a variable in a predicate, and then use it in a predicate for the following-sibling elements of that context element to see if their attributes satisfy the stated requirements.
Related
Is it possible to select elements of xml tree that end with a given string? Not the elements that contain an attribute that ends with a string, but the elements themselves?
As mentioned in my comment, you can use the XPath-2.0 function ends-with to solve this. Its signature is
ends-with
fn:ends-with($arg1 as xs:string?, $arg2 as xs:string?) as xs:boolean
fn:ends-with( $arg1 as xs:string?,
$arg2 as xs:string?,
$collation as xs:string) as xs:boolean
Summary:
Returns an xs:boolean indicating whether or not the value of $arg1 ends with a sequence of collation units that provides a minimal match to the collation units of $arg2 according to the collation that is used.
So you can use the following expression to
select elements of xml tree that end with a given string
document-wide
//*[ends-with(.,'given string')]
To realize this in Xpath-1.0, refer to this SO answer.
For example, to select all elements that end with "es", you can search for all the elements whose name contains the substring "es" starting at the position corresponding to the length of the name minus 1:
//*[substring(name(),string-length(name())-1,2) = "es"]
So I have elements that look like this
<li class="attribute "></li> # note the space
<li class="attribute"></li>
Using the xpath //li[#class="attribute"] will get the second element but not the first. How can I get both elements with the same xpath?
This XPath 1.0 expression,
//li[contains(concat(' ', normalize-space(#class), ' '),
' attribute ')]
will select all li elements with class attributes that contain the attribute substring, regardless of whether it has leading or trailing spaces.
If you only want to match attribute with possible leading and trailing spaces only (no other string values), just use normalize-space():
//li[normalize-space(#class) = 'attribute']
I need a very simple string validator that would show where is first symbol not corresponding to the desired format. I want to use regex but in this case I have to find the place where the string stops corresponding to the expression and I can't find a method that would do that.
(It's got to be a fairly simple method... maybe there isn't one?)
For example if I have regex:
/^Q+E+R+$/
with string:
"QQQQEEE2ER"
The desired result should be 7
An idea: what you can do is to tokenize your pattern and write it with optional nested capturing groups:
^(Q+(E+(R+($)?)?)?)?
Then you only need to count the number of capture groups you obtain to know where the regex engine stops in the pattern and you can determine the offset of the match end in the string with the whole match length.
As #zx81 notices it in his comment, if one of the elements can match the next element (example Q can match the element E), things become different.
Let's say that Q is \w (and can match E and R). For the string QQQEEERRR the precedent pattern will give only one capturing group (the greedy \w+ matches all) when ^(\w+)(E+)(R+)$ will give three groups: QQQEE, E, RRR
To obtain the same result you need to add an alternation:
^((?:\w+(?=E)|\w+)(E+(R+($)?)?)?)?
In the alternation, the case where E exists must be tested first, and only if this branch fails (with the lookahead), then the other branch where E doesn't exist is used.
Thus the full pattern can be rewritten like this to deal with this specific case:
^((?:Q+(?=E)|Q+)((?:E+(?=R)|E+)((?:R+(?=$)|R+)($)?)?)?)?
Perhaps could you take a look to the gem amatch too.
This is an interesting task that can be accomplished with a neat regex trick:
^(?:(?=(Q+)))?(?:(?=(Q+E+)))?(?:(?=(Q+E+R+)))?(?:(?=(Q+E+R+$)))?
We have four optional lookaheads checking various parts of the pattern and capturing the partial matches to Groups 1, 2, 3 and 4 incrementally.
Group 1 contains Q+ if it can be matched, in your example QQQQ.
Group 2 contains Q+E+ if it can be matched, in your example EEE.
Group 3 contains Q+E+R+ if it can be matched, in your example nil.
Group 3 contains Q+E+R+$ if it can be matched, in your example nil.
In your code, check which is the last Group that is set by testing !$1.nil?, !$2.nil? and so on.
The last one set gives you the length that is matchable, so in your example $2.length gives you the 7 you wanted.
Incidentally, the fact that Group 2 is the last one set also tells you that we fail on R+.
For your example, you could do the following.
Code
Change your regex from:
/^Q+E+R+$/
to
R = /^(Q*)(E*)(R*)/
and then apply the following method to the string:
def nbr_matched_chars(str)
str.scan(R).flatten.reduce(0) {|t,e| return t if e.nil?; t+e.size }
end
str matches the original regex if and only if nbr_matched_chars(str) == str.size.
Examples
nbr_matched_chars("QQQQEEE2ER") #=> 7
nbr_matched_chars("QQQQEEEERR") #=> 10 (= "QQQQEEEERR".size)
nbr_matched_chars("QQAQQEEEER") #=> 2
Explanation
To see why this [evidently :-)] works, we can look at the results of invoking String#scan, followed by Array#flatten:
"QQQQEEE2ER".scan(r).flatten #=> ["QQQQ", "EEE" , nil ]
"QQQQEEEERR".scan(r).flatten #=> ["QQQQ", "EEEE", "RR"]
"QQAQQEEEER".scan(r).flatten #=> ["QQ" , nil , nil ]
(Sorry for the bad title, any suggestion appreciated) ;-)
Well, consider those strings:
first = "SC/SCO_160ZA206_T_mlaz_kdiz_nziizjeij.ext"
second = "MLA/SA2_jkj15PO_B_lkazkl lakzlk-akzl.oxt"
third = "A12A/AZD_KZALKZL_F_LKAZ_AZ__azaz___.ixt"
I'm looking for a regular expression allowing me to get arrays like this (in ruby):
first_array = ['SCO', '160ZA206', 'T', 'mlaz_kdiz_nziizjeij']
second_array = ['SA2', 'jkj15PO', 'B', 'lkazkl lakzlk-akzl']
third_array = ['AZD', 'KZALKZL', 'F', 'LKAZ_AZ__azaz___']
The first match must be anything right after the / and before the first _
The second match must be anything between the first and the second _
The third match must be anything between the second and the third _
The last match must be anything between the third _ and the last .
I can't get it: [^\/].?([A-Z]*)_(.*)_(.*)[\.$] :-(
You're super close. Just add a question mark to the second matcher to make it lazy (otherwise, it won't stop at the first underscore), and then duplicate that matcher.
[^\/].?([A-Z]*)_(.*?)_(.*?)_(.*)[\.$]
Following up on #fge's split suggestion:
str = "SC/SCO_160ZA206_T_mlaz_kdiz_nziizjeij.ext"
p str[(str.index('/')+1)...str.rindex('.')].split( '_', 4)
#=> ["SCO", "160ZA206", "T", "mlaz_kdiz_nziizjeij"]
It splits on _ for max 4 elements (the fourth element is the remainder).
XML document:
<doc>
<A>
<Node>Hello!</Node>
</A>
<B>
<Node/>
</B>
<C>
</C>
<D/>
</doc>
How would you evaluate the following XPath queries?
/doc/A/Node != 'abcd'
/doc/B/Node != 'abcd'
/doc/C/Node != 'abcd'
/doc/D/Node != 'abcd'
I would expect ALL of these to evaluate to true.
However, here are the results:
/doc/A/Node != 'abcd' true
/doc/B/Node != 'abcd' true
/doc/C/Node != 'abcd' false
/doc/D/Node != 'abcd' false
Is this expected behavior? Or is it a bug with my XPath provider (jaxen)?
Recommendation: Never use the != operator to compare inequality where one or both arguments are node-sets.
By definition the expression:
$node-set != $value
evaluates to true() exactly when there is at least one node in $node-set such that its string value is not equal to the string value of $value.
Using this definition:
$empty-nodeset != $value
is always false(), because there isn't even a single node in $empty-nodeset for which the inequality holds.
Solution:
Use:
not($node-set = $value)
Then you get all results true(), as wanted.
From the XPath spec:
If one object to be compared is a node-set and the other is a string, then the comparison will be true if and only if there is a node in the node-set such that the result of performing the comparison on the string-value of the node and the other string is true.
This means that if the node-set is empty (as in your cases C and D), the result of the boolean expression will be false, since there is no node to which the inequality can apply.
You can work around this behaviour and get the result you want using an expression like:
count(/doc/C/Node) = 0 or /doc/C/Node != 'abcd'