How to do group capture in xpath - xpath

I am looking for something in xpath can do: //foo[#n="$1"]//bar[#n="$2"]
which can return me $1 and $2. Or at least return me both <foo> and <bar>
Here is more detail, I have a xml document:
<xml>
<foo>
<a n="1">
<b n="1"/>
<b n="2"/>
</a>
</foo>
<a n="2">
<b n="1"/>
</a>
<a n="3">
<b n="1"/>
<foo>
<b n="2"/>
</foo>
<b n="3"/>
</a>
</xml>
I want generate a string base on the n attribute in <a> and <b>
So I have xpath: //a[#n]//b[#n]
Then for every result I get back, I use: ./#n and ./ancestor::a/#n to get the info I want.
This is working fine, but I need something more intelligent, because I have a lot structure like this, and need auto generate the xpath.
So for above example, I am looking for some xpath like: //a[#n="$1"]//b[#n="$2"]
Then return me:
`(1, 1), (1, 2), (2, 1), (3, 1), (3, 2), (3, 3)

Here is one XPath 1.0 expression that selects all wanted n attributes:
//a[.//b]/#n | //a//b/#n
Without optimization the evaluation of the above expression performas at least two complete traversals of the XML document.
This XPath 1.0 expression may be more efficient:
//*[self::a and .//b or self::b and ancestor::a]/#n
Both expressions can be simplified if it is guaranteed that every a has a b descendant.
They become, respectively:
//a/#n | //a//b/#n
And:
//*[self::a or self::b and ancestor::a]/#n
Further simplification is possible if it is guaranteed that every a has a descendant b and every b has an ancestor a.:
//*[self::a or self::b]/#n
It is impossible in a single XPath 1.0 expression to get the string values of all the wanted attributes. One needs to get all the atributes using one of the above expressions, then on each of the selected attributes apply a second XPath expression: string().
In Xpath 2.0 it is possible to get with a single expression all string values of the wanted attributes -- by simply appending each of the expressions with /string(.)
For example, for the simplest one:
//(a|b)/#n/string(.)
Update:
The OP has clarified his question. Now we know that he wants this result to be produced:
(1, 1), (1, 2), (2, 1), (3, 1), (3, 2), (3, 3)
It isn't possible to produce the wanted result with a single XPath 1.0 expression.
The following XPath 2.0 expression produces the wanted result:
for $a in //a[#n and .//b[#n]],
$b in $a//b[#n]
return
concat('(', $a/#n, ',', $b/#n, ') ')

Related

Use XPath:Replace on first instance of a single character [duplicate]

This question already has an answer here:
Match first only instance of a character [duplicate]
(1 answer)
Closed 3 years ago.
I'm trying to replace only the first instance of a character in a string like fred-3-1-2 with XPath::replace and replace it with a / so that the resulting string is fred/3-1-2. I cannot guarantee anything else about the original string other than that it will have one or more dashes in it. I'm having a ton of difficulty finding a regex pattern that works with XPath::replace and consistently matches only that particular first instance of -.
I feel like I came close with:
(?:.+?)(-)(?:.+)
But this also matches the full string as well, so it is no good.
Please do not offer solutions using anything but plain regular expressions that would work on https://regex101.com. The "flavor" of regex should abide by XPath/XQuery semantics (https://www.w3.org/TR/xmlschema-2/#regexs).
You can use:
replace('fred-3-1-2', '^([^-]*)-','$1/')
Check it. Result:
fred/3-1-2
Meaning:
From the start, match any non - character followed by one -
character, replace the match by the first captured group plus /
character
Do note: XPath/XQuery follows Perl regexp. The description in XML Schema you cite is extended in XPath/Xquery with the followings: "Matching the Start and End of the String", "Reluctant Quantifiers", "Captured Sub-Expressions", "Back-References"
Pure XPath 1.0 solution -- no regular expressions:
concat(substring-before('fred-3-1-2', '-'), '/', substring-after('fred-3-1-2', '-'))
In case the string is contained in a variable $s and it isn't known that it contains a -, then use:
concat(
substring(concat(substring-before($s, '-'), '/', substring-after($s, '-')),
1 div contains($s, '-')
),
substring($s, 1 div not(contains($s, '-')))
)
XSLT-based verification:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:variable name="s" select="'fred-3-1-2'"/>
<xsl:variable name="t" select="'fred+3+1+2'"/>
<xsl:template match="/">
<xsl:value-of select="concat(substring-before('fred-3-1-2', '-'), '/', substring-after('fred-3-1-2', '-'))"/>
=================
<xsl:value-of select=
"concat(
substring(concat(substring-before($s, '-'), '/', substring-after($s, '-')),
1 div contains($s, '-')
),
substring($s, 1 div not(contains($s, '-')))
)"/>
===============
<xsl:value-of select=
"concat(
substring(concat(substring-before($t, '-'), '/', substring-after($t, '-')),
1 div contains($t, '-')
),
substring($t, 1 div not(contains($t, '-')))
)"/>
</xsl:template>
</xsl:stylesheet>
When this transformation is applied on any XML file (not used), it evaluates the XPath expressions provided on this answer and produces the wanted, correct results: if the string contains - then its first (only) occurence in the string is replaced in the result of the evaluation; if the string doesn't contain any hyphens, then the result is the same string unchanged:
fred/3-1-2
=================
fred/3-1-2
===============
fred+3+1+2

XPath query that matches multiple attributes of any particular element name

Given the following sample XML:
<a z="123" y="321"></a>
<b z="456" y="654"></b>
<c x="456" w="654"></c>
<c x="123" w="111"></c>
<c x="789" w="321"></c>
I need an xpath query that will return element 'a', because there is a 'c' element whose #x equals the a's #z, and whose #w does NOT equal the a's #y.
Notice that 'b' is not returned because there is a 'c' element where #x=#z and #w=#y.
Also, the elements being returned can be any element (*). The important bit is there is a matching 'c' element, where the second attribute doesn't match.
The closest I've come up with is this:
//*[#z=//c/#x and .[#y != //c/#w]]
However in my sample above, this would not return 'a' because #z matches #x of a 'c' element, and #y matches #w of a different 'c' element. The second attribute check needs to be made against the same 'c' element.
I hope this makes sense.
This XPath 2.0 expression:
//*[
let $a := .
return
following-sibling::*[#x eq $a/#z and not(#w eq $a/#y)]
]
Will bind the matched element to a variable in a predicate, and then use it in a predicate for the following-sibling elements of that context element to see if their attributes satisfy the stated requirements.

Xpath - sum multiplication

Im crawling a webpage using Xpath and I need to write the deposit as a number.
The deposit needs to be ("monthly rent" x "amount of prepaid rent")
the result should be: 15450 in this case
<table>
<tr>
<td>monthly rent: </td>
<td>5.150,00</td>
</tr>
<tr>
<td>deposit: </td>
<td>3 mdr.</td>
</tr>
</table>
I am currently using the following XPath to find the info:
//td[contains(.,'Depositum') or contains(.,'Husleje ')]/following-sibling::td/text()
But I don't know how to remove "mdr." from deposit, and how to multiply the to numbers and only return 1 number to the database.
You can use the following query which is compatible with XPath 1.0 and upwards:
substring-before(//td[contains(.,'deposit:')]/following-sibling::td/text(), ' mdr.') * translate(//td[contains(.,'monthly rent:')]/following-sibling::td/text(), ',.', '') div 100
Output:
15450
Step by Step Explanation:
// Get the deposit and remove mdr. from it using substring-before
substring-before(//td[contains(.,'deposit:')]/following-sibling::td/text(), ' mdr.')
// Arithmetic multiply operator
*
// The number format 5.150,00 can't be used for arithmetic calculations.
// Therefore we get the monthly rent and remove . and , chars from it.
// Note that this is equal to multiply it by factor 100. That's why we divide
// by 100 later on.
translate(//td[contains(.,'monthly rent:')]/following-sibling::td/text(), ',.', '')
// Divide by 100
div 100
You can refer to the List of Functions and Operators supported by XPath 1.0 and 2.0
Pure XPath solution:
translate(
/table/tr/td[contains(., 'monthly rent')]/following-sibling::td[1],
',.',
'.'
)
*
substring-before(
/table/tr/td[contains(., 'deposit')]/following-sibling::td[1],
' mdr'
)
It seems I ended up with a solution quite much similar to hek2mgl's correct answer but there is no need for dividing with 100 (comma converted to dot, dot removed) and <td> elements containing numeric data have positional predicates in order to avoid matching more elements, if the actual table is not as simple as the given example. XPath number format requires decimal separator to be a dot and no thousand separators.

How to join the results of two XPath expressions using the concat function?

I have following XML:
<root>
<chp id='1'>
<sent id='1'>hello</sent>
<sent id='2'>world</sent>
</chp>
<chp id='2'>
<sent id='1'>the</sent>
<sent id='2'>best</sent>
<sent id='3'>world</sent>
</chp>
</root>
Using the XPath expression
root/chp/sent[contains(.,'world')]/#id
I have the result 2 3 (which is exactly what I want), but when I run
concat('sentence ', /root/chp/sent[contains(.,'world')]/#id, ' - chap ' , /root/chp/sent[contains(.,'world')]/../#id )
the result breaks at the first result:
sentence 2 - chap 1
The last argument does not contain a single value, but a sequence. You cannot use XPath 1.0 to join this sequence to a single string. If you're able to use XPath 2.0, use string-join($sequence, $divisor):
concat(
'sentence ',
string-join(/root/chp/sent[contains(.,'world')]/#id, ' '),
' - chap ',
string-join(/root/chp/sent[contains(.,'world')]/../#id, ' ')
)
which will return
sentence 2 3 - chap 1 2
Most probably you want to loop over the result set yourself (also requires XPath 2.0):
for $sentence in /root/chp/sent[contains(.,'world')]
return concat('sentence ', $sentence/#id, ' - chap ', $sentence/../#id)
which will return
sentence 2 - chap 1
sentence 3 - chap 2
If you cannot use XPath 2.0, but are able to further process results in some outside programming language (like Java, PHP, ...) using DOM: Use /root/chp/sent[contains(.,'world')] to find the sentence nodes, loop over them, then query the #id and parent (chapter) #id using DOM and construct the result.

XPath - find first occurance of string

I'm trying to select an anchor element by first containing the text "To Be Coded", then extracting a number from a string using substring, then using the greater than comparison operator (>0). This is what I have thus far:
/a[number(substring(text(),???,string-length()-1))>0]
An example of the HTML is:
<a class="" href="javascript:submitRequest('getRec','30', '63', 'Z')">
To Be Coded (23)
</a>
My issue right now is I don't know how to find the first occurrence of the open parenthesis. I'm also not sure how to combine what I have with the contains(text(),"To Be Coded") function.
So my criteria for the selection is:
Must be an anchor element
Must include the text "To Be Coded"
Must contain a number greater than 0 in the parentheses
Edit: I suppose I could just "hard code" the starting position for the substring, but I'm not sure what that would be - will XPath count the white space before the text in the element? How would it handle/count the characters?
Here try this :
a[contains(., 'To Be Coded') and number(substring-before(substring-after(., '('), ')')) > 0]

Resources