Navigating XML via Xpath - xpath

I am new to Xpath and have been trying to get my head around some basic examples using xpath testing sites before I tackle a much more complex piece.
I am trying to understand exactly how to use the contains function in conjunction with other condition(s), but struggling a bit.
Here is my xml:
<root xmlns:foo="http://www.foo.org/" xmlns:bar="http://www.bar.org">
<actors>
<actor id="1">Christian Bale</actor>
<actor id="2">Liam Neeson</actor>
<actor id="3">Michael Caine</actor>
</actors>
<foo:singers>
<foo:singer id="4">Tom Waits</foo:singer>
<foo:singer id="4">B.B. King</foo:singer>
<foo:singer id="6">Ray Charles</foo:singer>
</foo:singers>
</root>
To replicate the type of xml (or html to be more precise) I am trying to parse, I have got one of the singer attributes repeated.
So I am trying to use contains to find the foo:singer id = 4 and contains "Tom Waits".
From what I have read and examples I have seen, you can use this type of expression:
.//*[#id =4 and //foo:singer[contains(text(),'Tom Waits')]]/text()
However, this returns both Tom Waits and BB King.
If I use these two expressions separately, they get the expected results, so not sure exactly if/how they can be combined.
Many thanks if you can assist me.
Andrew

Be sure to pay attention to context. There's no need for the nested predicate.
.//*[#id =4 and contains(.,'Tom Waits')]/text()
So I am trying to use contains to find the foo:singer id = 4 and contains "Tom Waits".
Since you're using //foo:singer for the contains test, the entire document is in context so it's always true.

Use
//foo:singer[contains(text(),'Tom Waits')]/text()

Related

ms365 power automate online: how to get xml attribute values for a particular text value

I am looking to extract only the wow:rank values at which the text in between the > and < starts with BOOM-. I was able to get the text values in between the > and < that start with BOOM- using filters in power automate 365 online, but i just couldn't simultaneously get the wow:rank attribute values. Any help would be greatly appreciated!
<category wow:rank="0">EIGEGenderEquality</category>
<category wow:rank="9" >BOOM-A0304-DiscriminatoryPracticesOnTheBasisOfSex</category>
<category wow:rank="0">EIGEGenderEquality</category>
<category wow:rank="5" url = "www.google.ca">BOOM-EIGEGenderEquality</category>
Have a great day! :)
It looks as though the PowerAutomate xpath expression (coupled with xml) doesn't handle the extraction of attributes. It merely provides the ability to filter on all elements based on the given xquery expression provided.
In that case, you can use the below expression on each element within your XML to get your result ...
replace(split(split(variables('XML String'), 'wow:rank="')[1], '"')[0], '"', '')
... that was applied in the Initialize wow:rank step below.
Outside of that, you could make it a lot more complex by making an API call to a service that does it for you but it's potentially a lot more work and not really worth it.
If your scenario was more complex, then I'd look into doing that.
You can get to attributes easily by using the # symbol, like #wow:rank.
In a "Apply to each", you would use:
xpath(xml(item()), 'string(/category/#wow:rank)')
to loop through each of them.

Can't select XML attributes with Oxygen XQuery implementation; Oxygen XPath emits result

I learned that every Xpath expression is also a valid Xquery expression. I'm using Oxygen 16.1 with this sample XML:
<actors>
<actor filmcount="4" sex="m" id="15">Anderson, Jeff</actor>
<actor filmcount="9" sex="m" id="38">Bishop, Kevin</actor>
</actors>
My expression is:
//actor/#id
When I evaluate this expression in Oxygen with Xpath 3.0, I get exactly what I expect:
15
38
However, when I evaluate this expression with Xquery 3.0 (also 1.0), I get the message: "Your query returned an empty sequence.
Can anyone provide any insight as to why this is, and how I can write the equivalent Xquery statement to get what the Xpath statement did above?
Other XQuery implementations do support this query
If you want to validate that your query (as corrected per discussion in comments) does in fact work with other XQuery implementations when entered exactly as given in the question, you can run it as follows (tested in BaseX):
declare context item := document { <actors>
<actor filmcount="4" sex="m" id="15">Anderson, Jeff</actor>
<actor filmcount="9" sex="m" id="38">Bishop, Kevin</actor>
</actors> };
//actor/#id
Oxygen XQuery needs some extra help
Oxygen XML doesn't support serializing attributes, and consequently discards them from a result sequence when that sequence would otherwise be provided to the user.
Thus, you can work around this with a query such as the following:
//actor/#id/string(.)
data(//actor/#id)
Below applies to a historical version of the question.
Frankly, I would not expect //actors/#id to return anything against that data with any valid XPath or XQuery engine, ever.
The reason is that there's only one place you're recursing -- one // -- and that's looking for actors. The single / between the actors and the #id means that they need to be directly connected, but that's not the case in the data you give here -- there's an actor element between them.
Thus, you need to fix your query. There are numerous queries you could write that would find the data you wanted in this document -- knowing which one is appropriate would require more information than you've provided:
//actor/#id - Find actor elements anywhere, and take their id attribute values.
//actors/actor/#id - Find actors elements anywhere; look for actor elements directly under them, and take the id attribute of such actor elements.
//actors//#id - Find all id attributes in subtrees of actors elements.
//#id - Find id attributes anywhere in the document.
...etc.

Find HTML Tags in Properties

My current issue is to find HTML-Tags inside of property values. I thought it would be easy to search with a query like /jcr:root/content/xgermany//*[jcr:contains(., '<strong>')] order by #jcr:score
It looks like there is a problem with the chars < and > because this query finds everything which has strong in it's property. It finds <strong>Some Text</strong> but also This is a strong man.
Also the Query Builder API didn't helped me.
Is there a possibility to solve it with a XPath or SQL Query or do I have to iterate through the whole content?
I don't fully understand why it finds This is a strong man as a result for '<strong>', but it sounds like the unexpected behavior comes from the "simple search-engine syntax" for the second argument to jcr:contains(). Apparently the < > are just being ignored as "meaningless" punctuation.
You could try quoting the search term:
/jcr:root/content/xgermany//*[jcr:contains(., '"<strong>"')]
though you may have to tweak that if your whole XPath expression is enclosed in double quotes.
Of course this will not be very robust even if it works, since you're trying to find HTML elements by searching for fixed strings, instead of actually parsing the HTML.
If you have an specific jcr:primaryType and the targeted properties you can do something like this
select * from nt:unstructured where text like '%<strong>%'
I tested it , but you need to know the properties you are intererested in.
This is jcr-sql syntax
Start using predicates like a champ this way all of this will make sense to you!
HTML Encode <strong>
HTML Decimal <strong>
Query builder is your friend:
Predicates: (like a CHAMP!)
path=/content/geometrixx
type=nt:unstructured
property=text
property.operation=like
property.value=%<strong>%
Have go here:
http://localhost:4502/libs/cq/search/content/querydebug.html?charset=UTF-8&query=path%3D%2Fcontent%2Fgeometrixx%0D%0Atype%3Dnt%3Aunstructured%0D%0Aproperty%3Dtext%0D%0Aproperty.operation%3Dlike%0D%0Aproperty.value%3D%25%3Cstrong%3E%25
Predicates: (like a CHAMP!)
path=/content/geometrixx
type=nt:unstructured
property=text
property.operation=like
property.value=%<strong>%
Have a go here:
http://localhost:4502/libs/cq/search/content/querydebug.html?charset=UTF-8&query=path%3D%2Fcontent%2Fgeometrixx%0D%0Atype%3Dnt%3Aunstructured%0D%0Aproperty%3Dtext%0D%0Aproperty.operation%3Dlike%0D%0Aproperty.value%3D%25%26lt%3Bstrong%26gt%3B%25
XPath:
/jcr:root/content/geometrixx//element(*, nt:unstructured)
[
jcr:like(#text, '%<strong>%')
]
SQL2 (already covered... NASTY YUK..)
SELECT * FROM [nt:unstructured] AS s WHERE ISDESCENDANTNODE([/content/geometrixx]) and text like '%<strong>%'
Although I'm sure it's entirely possible with a string of predicates, it's possibly heading down the wrong route. Ideally it would be better to parse the HTML when it is stored or published.
The required information would be stored on simple properties on the node in question. The query will then be a lot simpler with just a property = value query, than lots of overly complex query syntax.
It will probably be faster too.
So if you read in your HTML with something like HTMLClient and then parse it with a OSGI service, that can accurately save these properties for you. Every time the HTML is changed the process would update these properties as necessary. Just some thoughts if your SQL is getting too much.

Retrieve an xpath text contains using text()

I've been hacking away at this one for hours and I just can't figure it out. Using XPath to find text values is tricky and this problem has too many moving parts.
I have a webpage with a large table and a section in this table contains a list of users (assignees) that are assigned to a particular unit. There is nearly always multiple users assigned to a unit and I need to make sure a particular user is assigned to any of the units on the table. I've used XPath for nearly all of my selectors and I'm half way there on this one. I just can't seem to figure out how to use contains with text() in this context.
Here's what I have so far:
//td[#id='unit']/span [text()='asdfasdfasdfasdfasdf (Primary); asdfasdfasdfasdfasdf, asdfasdfasdfasdf; 456, 3456'; testuser]
The XPath Query above captures all text in the particular section I am looking at, which is great. However, I only need to know if testuser is in that section.
text() gets you a set of text nodes. I tend to use it more in a context of //span//text() or something.
If you are trying to check if the text inside an element contains something you should use contains on the element rather than the result of text() like this:
span[contains(., 'testuser')]
XPath is pretty good with context. If you know exactly what text a node should have you can do:
span[.='full text in this span']
But if you want to do something like regular expressions (using exslt for example) you'll need to use the string() function:
span[regexp:test(string(.), 'testuser')]

XPath concat multiple nodes

I'm not very familiar with xpath. But I was working with xpath expressions and setting them in a database. Actually it's just the BAM tool for biztalk.
Anyway, I have an xml which could look like:
<File>
<Element1>element1<Element1>
<Element2>element2<Element2>
<Element3>
<SubElement>sub1</SubElement>
<SubElement>sub2</SubElement>
<SubElement>sub3</SubElement>
<Element3>
</File>
I was wondering if there is a way to use an xpath expression of getting all the SubElements concatted? At the moment, I am using:
/*[local-name()='File']/*[local-name()='Element3']/*[local-name()='SubElement']
This works if it only has one index. But apparently my xml sometimes has more nodes, so it gives NULL. I could just use
/*[local-name()='File']/*[local-name()='Element3']/*[local-name()='SubElement'][0]
but I need all the nodes. Is there a way to do this?
Thanks a lot!
Edit: I changed the XML, I was wrong, it's different, it should look like this:
<item>
<element1>el1</element1>
<element2>el2</element2>
<element3>el3</element3>
<element4>
<subEl1>subel1a</subEl1>
<subEl2>subel2a</subEl2>
</element4>
<element4>
<subEl1>subel1b</subEl1>
<subEl2>subel2b</subEl2>
</element4>
</item>
And I need to have a one line code to get a result like: "subel2a subel2b";
I need the one line because I set this xpath expression as an xml attribute (not my choice, it's specified). I tried string-join but it's not really working.
string-join(/file/Element3/SubElement, ',')
/File/Element3/SubElement will match all of the SubElement elements in your sample XML. What are you using to evaluate it?
If your evaluation method is subject to the "first node rule", then it will only match the first one. If you are using a method that returns a nodeset, then it will return all of them.
You can get all SubElements by using:
//SubElement
But this won't keep them grouped together how you want. You will want to do a query for all elements that contain a SubElement (basically do a search for the parent of any SubElements).
//parent::SubElement
Once you have that, you could (depending on your programming language) loop through the parents and concatenate the SubElements.

Resources