xpath get value attributes concat string - xpath

I have a xml with several attributes like this:
<results value="1">
<result value="111"></result>
<result value="222"></result>
<result value="333"></result>
<result value="444"></result>
</results>
Are there any way to get all the values of the attributes and concat to every value a constant string? Output like that:
Value: 111
Value: 222
Value: 333
Value: 444
Thank you very much

Using only pure xpath:
//results//result/concat("Value: ", #value)
Output:
Value: 111
Value: 222
Value: 333
Value: 444

Yes, that sounds feasible if you relies on XPath 2.0's functions library. You could write something like:
concat('Value: ',string-join((//result/#value), '\nValue: '))
or:
concat('Value: ',string-join((//result/#value), '
Value: '))
depending on how you encode the newline character.
(tested using https://www.freeformatter.com/xpath-tester.html)
Another solution − if you only have a XPath 1.0 parser at hand − would consist in just evaluating //result/#value, then post-processing the node-list result so obtained in the programming language that you use.
EDIT: if you still need to get a nodelist as a result and only rely on XPath, you should prefer #JackFleeting's answer over my first suggestion.
(BTW I had also thought about the same solution as Jack's first, and tested it on http://www.xpathtester.com/xpath but this didn't work, probably because that online parser is buggy actually).

Related

xpath return default value ,if value of attribute not found using text()

sample_xml='<employees>\
<person id="p1">\
<name value="Alice">ALICE</name>\
</person>\
<person id="p2">\
<name value="Alice">BOB</name>\
</person>\
<person id="p3">\
<name value="Alice"/>\
</person>\
</employees>'
data = [
[f'{sample_xml}']
]
df = spark.createDataFrame(data, ['data'])
df=df.selectExpr(
'xpath(data,"/employees/person/name[#value=\'Alice\']/text()") test'
)
this gives expcted ["ALICE", "BOB"]
Problem:
I want my result to be ["ALICE", "BOB","NA"]
i.e for empty path like below
<name value="Alice"/>
I want to return a default NA .
is it possible to achieve this ?
Regards
With XPath itself this is not possible. It can only return you the actual values of the matching nodes or nothing if no match.
In order to get NA or any other data that is not actually contained in the XML, you should wrap the basic XPath request with some additional, external code to return the customized output in case of no match.
In XPath 2.0, use /employees/person/name[#value=\'Alice\'] /(string(text()), 'NA')[1]".
It can't be done in XPath 1.0. In XPath 1.0 there's no such thing as a sequence of strings; you can only return a sequence of nodes, and you can only return nodes that are actually present in the input document.

xpath multiple nodes query with custom strings

I have a working multiple node xpath query and I want to add some custom strings between the results.
<FooBar>
<Foo>
<Fooid>A</Fooid>
<Booid>222</Booid>
<Wooid>Z</Wooid>
</Foo>
<Foo>
<Fooid>B</Fooid>
<Booid>333</Booid>
<Wooid>Y</Wooid>
</Foo>
<Foo>
<Fooid>C</Fooid>
<Booid>444</Booid>
<Wooid>X</Wooid>
</Foo>
</FooBar>
I have messed with different combinations of string-joins and/or concats, but the result was always wrong or ended up in a syntax-error. My xpath version is Xpath 2.0
//Foo/Fooid | //Foo/Booid | Foo/Wooid
The above xpath results in:
A
222
Z
My preferred result would be:
(A)
{222}
[Z]
what is the correct usage of string-join in order to get the brackets around the three ids?
after doing some research and with your comments, I was able to achive the desired solution with this line:
//Foo/concat('(', Fooid, ')'), //Foo/concat('{', Booid, '}'),Foo/concat('[', Wooid, ']')
The '|' was replaced by a comma.
to concat these characters, use their html entity instead.
concat('&lpar;', //Fooid, '&rpar;')
for parentheses use
&lpar;
&rpar;
for brackets
&lbrack;
&rbrack;
for brackes
&lbrace;
&rbrace;
See full character entity sets here

XPath 1.0 lowest value regardless of ordering

I have this data, and I'm looking for the lowest bid.
<root>
<current_bid>$1.00</current_bid>
<current_bid>$2.00</current_bid>
<current_bid>$3.00</current_bid>
<current_bid>$4.00</current_bid>
<current_bid>$5.00</current_bid>
</root>
This is my XPath 1.0 attempt:
//current_bid[not(translate (., '$,.','') > translate(//current_bid, '$,.',''))]
And it works fine (returns only the $1.00 bid) with the data above, but if I change the ordering of the data to let's say this here:
<root>
<current_bid>$5.00</current_bid>
<current_bid>$1.00</current_bid>
<current_bid>$2.00</current_bid>
<current_bid>$3.00</current_bid>
<current_bid>$4.00</current_bid>
</root>
Then it gives a wrong output (returns all values).
Shouldn't the order be irrelevant when I use //current_bid, since it queries the whole document?
Also: how would I go if I wanted the second lowest bid?
XPath 1.0 processes nodes in document order so there's no way to sort them with pure XPath. It can be done with XSL processing
This approach works only if minimum is at first position.
Xpath:
'//current_bid[(position()<=last()) and not(translate (., "$,.","") > translate(//current_bid, "$,.",""))]'
Sample:
<root>
<current_bid>$1.00</current_bid>
<current_bid>$5.00</current_bid>
<current_bid>$2.00</current_bid>
<current_bid>$4.00</current_bid>
<current_bid>$3.00</current_bid>
</root>
Testing on command line with xmllint
xmllint --xpath '//current_bid[(position()<=last()) and not(translate (., "$,.","") > translate(//current_bid, "$,.",""))]' test.xml ; echo
Result:
<current_bid>$1.00</current_bid>
If the number of nodes is known in advance perhaps it could be done with nested conditions but would give a very complex XPath expression.

xpath expression wild-cards

I have a requirement to specify wild card in the following xpath
Field[#name="/Root/Table[i]/FirstName"]
Basically the "i" would be a variable which can have either a GUID or a running number. I would like to pick up all elements that basically have the attribute pattern
"/Root/Table[*]/FirstName"
i.e. starting with "/Root/Table[" and ending with "]/FirstName". Any ideas as to how this can be done ?
Here is a sample payload:
<Package>
<Input>
<Data id="36e9f0fe3f8d4508ac20710e07cfddd4">
<Input>
<Field name="/Root/Table[1]/FirstName">Thomas</Field>
</Input>
</Data>
</Input>
</Package>
You should be able to do this using starts-with() and a makeshift ends-with() (since XPath 1.0 doesn't actually have an ends-with() function):
//*[starts-with(#name, '/Root/Table[') and
substring(#name, string-length(#name) - 11 + 1) = ']/FirstName']
Here, 11 is the length of ]/FirstName.

HTMLAgilityPack and separating on <br/>

I have some html, which is separated by <br/> e.g.:
Jack Janson
<br/>
309 123 456
<br/>
My Special Street 43
What is the easiest way to retrieve the information in 3 columns?
I am not an XPath expert, so another approach would be to separate the string on the line break, and just work with the array. Is there a smarter way to do it?
Update: Forgot to format the code.
In pure XPATH over XML, you would use an XPATH expression like this: //preceding-sibling::br or //following-sibling::br (see here for help on XPATH Axes)
But, the XPATH over HTML implementation that you'll find in Html Agility Pack does not support pure text node or (Attribute node) in XPATH selection expressions (//br/text() or //br/#blah do not work for example). Note it works in filters, so, these //br[text()='blah'] or //br[#att='blah'] work.
So, back to the question, you need to combine XPATH and code, something like this:
HtmlDocument doc = new HtmlDocument();
doc.Load(myHtmlFile);
foreach (HtmlNode p in doc.DocumentNode.SelectNodes("//br"))
{
Console.WriteLine(p.PreviousSibling.InnerText.Trim());
}
That will output
Jack Janson
309 123 456

Resources