Xpath Expression evaluation on attributes with any namespace prefix - xpath

Could you please help me on this xpath expression evaluation
I am working on fetching the proxy references. In the xml file the references will get stored as:
One way of XML file will have the reference as below:
con1:service ref="MyProject/ProxyServices/service1"
xsi:type="con2:PipelineRef" xmlns:ref="http://www.bea.com/wli/sb/reference"/
here in the xml file the name spaces are:
xmlns:con1="http://www.bea.com/wli/sb/stages/config"
xmlns:con2="http://www.bea.com/wli/sb/pipeline/config"
Another way of XML will have the reference as below.
con1:service ref="MyProject/ProxyServices/service2"
xsi:type="ref:ProxyRef" xmlns:ref="http://www.bea.com/wli/sb/reference"/
here in the xml file the name spaces are:
xmlns:con1="http://www.bea.com/wli/sb/stages/config"
xmlns:ref="http://www.bea.com/wli/sb/reference"
I have used this xpath expression, this is not fetching the reference service values, could you please help what is wrong in it.
"//service[#type= #*[local-name() ='ProxyRef' or #type=#*[local-name() ='PipelineRef']]/#ref"
when I used like this it is working but, name space prefix is keep on changes when there are multiple references in the xml file.
"//service[#type='ref:ProxyRef'or #type='con:PipelineRef' or #type='con1:PipelineRef' or #type='con2:PipelineRef' or #type='con3:PipelineRef' ...#type='con20:PipelineRef' ]/#ref";
Now here basically the type attribute PipelineRef is keep on changing the name space prefix from con to con(n). Now I am looking for something which supports some thing like #type='*:PipelineRef' or #type='con*:PipelineRef' or the best way to fetch the service element reference attribute value.
Thanks in advance.

Try using contains() like so :
//service[contains(#type,':ProxyRef') or contains(#type,':PipelineRef')]
Another alternative would be using ends-with() function which is more precise for this purpose compared to contains() function. However, ends-with() isn't available in xpath 1.0, so there is a chance that you need to implement it yourself (feasible, but the xpath result is less intuitive for me).

Related

XPath to retrive XML tag value without Namespace and prefix

I have the following XML -
<d><m:properties xmlns:m="http://schemas.microsoft.com/ado/2007/08/dataservices/metadata" xmlns:d="http://schemas.microsoft.com/ado/2007/08/dataservices">
<d:AllTexts/>
<d:BomFlag/>
<d:OrderNumber>9489</d:OrderNumber>
<d:LineNumber>000000</d:LineNumber>
<d:VcFlag>Y</d:VcFlag>
<d:PricingFlag/>
<d:TextType>H</d:TextType>
<d:TextId>ZC01</d:TextId>
<d:TextLineNo>1</d:TextLineNo>
<d:TextLine>ecom header text 1</d:TextLine>
and trying to retrieve the TextLine nodelist as based on TextId = ZC01 -
<TextLine>ecom header text1</TextLine>
when I applied the xpath as --> //m:properties[d:TextId = 'ZC01']/d:TextLine
I get the output as -
<d:TextLine xmlns:d="http://schemas.microsoft.com/ado/2007/08/dataservices">ecom header text 1</d:TextLine>
how can I remove the prefix and namespace? I tried using local-name(), but that didn't work
May be used it wrong way.
Thank you for your help!
Thanks
Sugata
XPath is a selection language: it can only retrieve nodes that are actually there, it can't change them in any way. If the selected element has a prefix and namespace in the original, then it will have a prefix and namespace in the result.
However, you need to distinguish what the XPath selects (a node) from the way it the result is displayed. This depends on the application that is evaluating the XPath. The two popular ways of displaying a node selected by an XPath expression are (a) by serialising the node as XML (which is what we see in your case), and (b) by showing a path to the selected node, such as /d/m:properties/d:TextLine. You haven't told us how you are evaluating the XPath expression or displaying its result, and you may have options here.
But perhaps you should consider XSLT or XQuery, which (unlike XPath) allow you to construct new XML that differs from your original.

Expression to read XML attribute value in Logic Apps

In a Logic Apps For Each, I am iterating over part of an XML document that has, in part:
<Part ref="1">
I want to read out the attribute value only. In this case, "1". I have tried:
xpath(xml(item()),'Part/#ref')
and I get
["ref=\"1\""]
With
first(xpath(xml(item()),'Part/#ref'))
I get
ref="1"
I have tried incorporating string() and value() functions to no avail. What is the proper way to read out just the value?
Try this xpath expression :
'string(//Part/#ref)'
You have to use this expression in Code View:
#xpath(xml(item()), 'string(/*[local-name()=\"Part\" and namespace-uri()=\"\"]/#*[local-name()=\"ref\" and namespace-uri()=\"\"])')

Find HTML Tags in Properties

My current issue is to find HTML-Tags inside of property values. I thought it would be easy to search with a query like /jcr:root/content/xgermany//*[jcr:contains(., '<strong>')] order by #jcr:score
It looks like there is a problem with the chars < and > because this query finds everything which has strong in it's property. It finds <strong>Some Text</strong> but also This is a strong man.
Also the Query Builder API didn't helped me.
Is there a possibility to solve it with a XPath or SQL Query or do I have to iterate through the whole content?
I don't fully understand why it finds This is a strong man as a result for '<strong>', but it sounds like the unexpected behavior comes from the "simple search-engine syntax" for the second argument to jcr:contains(). Apparently the < > are just being ignored as "meaningless" punctuation.
You could try quoting the search term:
/jcr:root/content/xgermany//*[jcr:contains(., '"<strong>"')]
though you may have to tweak that if your whole XPath expression is enclosed in double quotes.
Of course this will not be very robust even if it works, since you're trying to find HTML elements by searching for fixed strings, instead of actually parsing the HTML.
If you have an specific jcr:primaryType and the targeted properties you can do something like this
select * from nt:unstructured where text like '%<strong>%'
I tested it , but you need to know the properties you are intererested in.
This is jcr-sql syntax
Start using predicates like a champ this way all of this will make sense to you!
HTML Encode <strong>
HTML Decimal <strong>
Query builder is your friend:
Predicates: (like a CHAMP!)
path=/content/geometrixx
type=nt:unstructured
property=text
property.operation=like
property.value=%<strong>%
Have go here:
http://localhost:4502/libs/cq/search/content/querydebug.html?charset=UTF-8&query=path%3D%2Fcontent%2Fgeometrixx%0D%0Atype%3Dnt%3Aunstructured%0D%0Aproperty%3Dtext%0D%0Aproperty.operation%3Dlike%0D%0Aproperty.value%3D%25%3Cstrong%3E%25
Predicates: (like a CHAMP!)
path=/content/geometrixx
type=nt:unstructured
property=text
property.operation=like
property.value=%<strong>%
Have a go here:
http://localhost:4502/libs/cq/search/content/querydebug.html?charset=UTF-8&query=path%3D%2Fcontent%2Fgeometrixx%0D%0Atype%3Dnt%3Aunstructured%0D%0Aproperty%3Dtext%0D%0Aproperty.operation%3Dlike%0D%0Aproperty.value%3D%25%26lt%3Bstrong%26gt%3B%25
XPath:
/jcr:root/content/geometrixx//element(*, nt:unstructured)
[
jcr:like(#text, '%<strong>%')
]
SQL2 (already covered... NASTY YUK..)
SELECT * FROM [nt:unstructured] AS s WHERE ISDESCENDANTNODE([/content/geometrixx]) and text like '%<strong>%'
Although I'm sure it's entirely possible with a string of predicates, it's possibly heading down the wrong route. Ideally it would be better to parse the HTML when it is stored or published.
The required information would be stored on simple properties on the node in question. The query will then be a lot simpler with just a property = value query, than lots of overly complex query syntax.
It will probably be faster too.
So if you read in your HTML with something like HTMLClient and then parse it with a OSGI service, that can accurately save these properties for you. Every time the HTML is changed the process would update these properties as necessary. Just some thoughts if your SQL is getting too much.

getting attribute via xpath query succesfull in browser, but not in Robot Framework

I have a certain XPATH-query which I use to get the height from a certain HTML-element which returns me perfectly the desired value when I execute it in Chrome via the XPath Helper-plugin.
//*/div[#class="BarChart"]/*[name()="svg"]/*[name()="svg"]/*[name()="g"]/*[name()="rect" and #class="bar bar1"]/#height
However, when I use the same query via the Get Element Attribute-keyword in the Robot Framework
Get Element Attribute//*/div[#class="BarChart"]/*[name()="svg"]/*[name()="svg"]/*[name()="g"]/*[name()="rect" and #class="bar bar1"]/#height
... then I got an InvalidSelectorException about this XPATH.
InvalidSelectorException: Message: u'invalid selector: Unable to locate an
element with the xpath expression `//*/div[#class="BarChart"]/*[name()="svg"]/*
[name()="svg"]/*[name()="g"]/*[name()="rect" and #class="bar bar1"]/`
So, the Robot Framework or Selenium removed the #-sign and everything after it. I thought it was an escape -problem and added and removed some slashes before the #height, but unsuccessful. I also tried to encapsulate the result of this query in the string()-command but this was also unsuccessful.
Does somebody has an idea to prevent my XPATH-query from getting broken?
It looks like you can't include the attribute axis in the XPath itself when you're using Robot. You need to retrieve the element by XPath, and then specify the attribute name outside that. It seems like the syntax is something like this:
Get Element Attribute xpath=(//*/div[#class="BarChart"]/*[name()="svg"]/*[name()="svg"]/*[name()="g"]/*[name()="rect" and #class="bar bar1"])#height
or perhaps (I've never used Robot):
Get Element Attribute xpath=(//*/div[#class="BarChart"]/*[name()="svg"]/*[name()="svg"]/*[name()="g"]/*[name()="rect" and #class="bar bar1"])[1]#height
This documentation says
attribute_locator consists of element locator followed by an # sign and attribute name, for example "element_id#class".
so I think what I've posted above is on the right track.
You are correct in your observation that the keyword seems to removes everything after the final #. More correctly, it uses the # to separate the element locator from the attribute name, and does this by splitting the string at that final # character.
No amount of escaping will solve the problem as the code isn't doing any parsing at this point. This is the exact code (as of this writing...) that performs that operation:
def _parse_attribute_locator(self, attribute_locator):
parts = attribute_locator.rpartition('#')
...
The simple solution is to drop that trailing slash, so your xpath will look like this:
//*/div[#class="BarChart"]/... and #class="bar bar1"]#height`

XPath concat multiple nodes

I'm not very familiar with xpath. But I was working with xpath expressions and setting them in a database. Actually it's just the BAM tool for biztalk.
Anyway, I have an xml which could look like:
<File>
<Element1>element1<Element1>
<Element2>element2<Element2>
<Element3>
<SubElement>sub1</SubElement>
<SubElement>sub2</SubElement>
<SubElement>sub3</SubElement>
<Element3>
</File>
I was wondering if there is a way to use an xpath expression of getting all the SubElements concatted? At the moment, I am using:
/*[local-name()='File']/*[local-name()='Element3']/*[local-name()='SubElement']
This works if it only has one index. But apparently my xml sometimes has more nodes, so it gives NULL. I could just use
/*[local-name()='File']/*[local-name()='Element3']/*[local-name()='SubElement'][0]
but I need all the nodes. Is there a way to do this?
Thanks a lot!
Edit: I changed the XML, I was wrong, it's different, it should look like this:
<item>
<element1>el1</element1>
<element2>el2</element2>
<element3>el3</element3>
<element4>
<subEl1>subel1a</subEl1>
<subEl2>subel2a</subEl2>
</element4>
<element4>
<subEl1>subel1b</subEl1>
<subEl2>subel2b</subEl2>
</element4>
</item>
And I need to have a one line code to get a result like: "subel2a subel2b";
I need the one line because I set this xpath expression as an xml attribute (not my choice, it's specified). I tried string-join but it's not really working.
string-join(/file/Element3/SubElement, ',')
/File/Element3/SubElement will match all of the SubElement elements in your sample XML. What are you using to evaluate it?
If your evaluation method is subject to the "first node rule", then it will only match the first one. If you are using a method that returns a nodeset, then it will return all of them.
You can get all SubElements by using:
//SubElement
But this won't keep them grouped together how you want. You will want to do a query for all elements that contain a SubElement (basically do a search for the parent of any SubElements).
//parent::SubElement
Once you have that, you could (depending on your programming language) loop through the parents and concatenate the SubElements.

Resources