How to ignore folders name in JCR, searching by XPath - xpath

I want an XPath search expression in JackRabbit so that I can get all subfolders of the third level of folders, from any folders in the first and second level.
I am thinking on something like this: "/jcr:root/*/*/element(*,nt:folder)"
but this is wrong, I am getting an empty list, do you guys have anything like this?

The mentioned string '/jcr:root/*/*/element(*,nt:folder)' is a valid XPath query.
But crossing further in your issue description, you said you want to search all subfolder of 3rd level folders, so maybe you are missing one level depth in your query and that's why you are getting and empty list:
3rd level folders are resolved as jcr:root -> level 1 -> level 2 -> level 3 so your query should rather target /jcr:root/*/*/* instead of /jcr:root/*/*
subfolder is just resolved as a direct child and you have nothing wrong in your query; /element(*,nt:folder)
So maybe the following query would make it /jcr:root/*/*/*/element(*,nt:folder).

Related

Xpath expression pulling multiple items despite specifying item with [ ]

I am trying to write an XPath expression which can return the URL associated with the next page of a search.
The URL which leads to the next page of the search is always the href in the a tag following the tag span class="navCurrentPage" I have been trying to use a following-sibling term to pull the next URL. My search in the Chrome console is:
$x('//span[#class="navCurrentPage"][1]/following-sibling::a/#href[1]')
I thought by specifying #href[1] I would only get back one URL (thinking the [1] chooses the first element in list), but instead Chrome (and Scrapy) are returning four URLs. I don't understand why. Please help me to understand how to select the one URL that I am looking for.
Here is the URL where you can find the HTML giving me trouble:
https://www.yachtworld.com/core/listing/cache/searchResults.jsp?cit=true&slim=quick&ybw=&sm=3&searchtype=advancedsearch&Ntk=boatsEN&Ntt=&is=false&man=&hmid=102&ftid=101&enid=0&type=%28Sail%29&fromLength=35&toLength=50&fromYear=1985&toYear=2010&fromPrice=&toPrice=&luom=126&currencyid=100&city=&rid=100&rid=101&rid=104&rid=105&rid=107&rid=108&rid=112&rid=114&rid=115&rid=116&rid=128&rid=130&rid=153&pbsint=&boatsAddedSelected=-1
Thank you for the help.
Operator precedence: //x[1] means /descendant-or-self::node()/child::x[1] which finds every descendant x that is the first child of its parent. You want (//x)[1] which finds the first node among all the descendants named x.
xpath index will apply on all matching records, if you want to get only the first item, get the first instance.
$x('//span[#class="navCurrentPage"][1]/following-sibling::a/#href[1]').extract_first()
just add, .extract_first() or .get() to fetch the first item.
see the scrapy documentation here.
I've found this very helpful to make sure you have the bracket in the right place.
What is the XPath expression to find only the first occurrence?
also, the first occurrence may be [0] not [1]

Choosing specific element in XPath

I got 2 elements under the same name "reason". When i'm using //*:reason/text() it gives me both of the elements, but i need the first one. (not the one inside "details"). please help..
<xml xmlns:gob="http://osb.yes.co.il/GoblinAudit">
<fault>
<ctx:fault xmlns:ctx="http://www.bea.com/wli/sb/context">
<ctx:errorCode>BEA-382500</ctx:errorCode>
<ctx:reason>OSB Service Callout action received SOAP Fault response</ctx:reason>
<ctx:details>
<ns0:ReceivedFaultDetail xmlns:ns0="http://www.bea.com/wli/sb/stages/transform/config">
<ns0:faultcode xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/">soapenv:Server</ns0:faultcode>
<ns0:faultstring>BEA-380001: Internal Server Error</ns0:faultstring>
<ns0:detail>
<con:fault xmlns:con="http://www.bea.com/wli/sb/context">
<con:errorCode>BEA-380001</con:errorCode>
<con:reason>Internal Server Error</con:reason>
<con:location>
<con:node>RouteTo_FinancialControllerBS</con:node>
<con:path>response-pipeline</con:path>
</con:location>
</con:fault>
</ns0:detail>
</ns0:ReceivedFaultDetail>
</ctx:details>
<ctx:location>
<ctx:node>PipelinePairNode2</ctx:node>
<ctx:pipeline>PipelinePairNode2_request</ctx:pipeline>
<ctx:stage>set maintain offer</ctx:stage>
<ctx:path>request-pipeline</ctx:path>
</ctx:location>
</ctx:fault>
</fault>
</xml>
You are using the // qualifier which will descend into any subtree and find all occurences of reason. You can try to be more specific about the subpath:
//fault/*:fault/*:reason/text()
This will only match the outer reason but not the inner reason..
"...but i need the first one"
You can use position index to get the first matched reason element :
(//*:reason)[1]/text()
" (not the one inside "details")"
The above can be expressed as finding reason element which doesn't have ancestor details :
//*:reason[not(ancestor::*:details)]/text()
For a large XML document, using more specific path i.e avoid // at the beginning, would results in a more efficient XPath :
/xml/fault/*:fault/*:reason/text()
But for a small XML, it's just a matter of personal preference, since the improvement is likely to be negligible.

xpath - matching value of child in current node with value of element in parent

Edit: I think I found the answer but I'll leave the open for a bit to see if someone has a correction/improvement.
I'm using xpath in Talend's etl tool. I have xml like this:
<root>
<employee>
<benefits>
<benefit>
<benefitname>CDE</benefitname>
<benefit_start>2/3/2004</benefit_start>
</benefit>
<benefit>
<benefitname>ABC</benefitname>
<benefit_start>1/1/2001</benefit_start>
</benefit>
</benefits>
<dependent>
<benefits>
<benefit>
<benefitname>ABC</benefitname>
</benefit>
</dependent>
When parsing benefits for dependents, I want to get elements present in the employee's
benefit element. So in the example above, I want to get 1/1/2001 for the dependent's
start date. I want 1/1/2001, not 2/3/2004, because the dependent's benefit has benefitname ABC, matching the employee's benefit with the same benefitname.
What xpath, relative to /root/employee/dependent/benefits/benefit, will yield the value of
benefit_start for the benefit under parent employee that has the same benefit name as the
dependent benefit name? (Note I don't know ahead of time what the literal value will be, I can't just look for 'ABC', I have to match whatever value is in the dependent's benefitname element.
I'm trying:
../../../benefits/benefit[benefitname=??what??]/benefit_start
I don't know how to refer to the current node's ancestor in the middle of
the xpath (since I think "." at the point I have ??what?? will refer to
the benefit node of the employee/benefits.
EDIT: I think what I want is "current()/benefitname" where the ??what?? is. Seems to work with saxon, I haven't tried it in the etl tool yet.
Your XML is malformed, and I don't think you've described your siduation very well (the XPath you're trying has a bunch of ../../s at the beginning, but you haven't said what the context node is, whether you're iterating through certain nodes, or what.
Supposing the current context node were an employee element, you could select benefit_starts that match dependent benefits with
benefits/benefit[benefitname = ../../dependent/benefits/benefit/benefitname]
/benefit_start
If the current context node is a benefit element in a dependents section, and you want to get the corresponding benefit_start for just the current benefit element, you can do:
../../../benefits/benefit[benefitname = current()/benefitname]/benefit_start
Which is what I think you've already discovered.

Prefix the result of a XPATH query

I use libxmljs to parse some html.
I have a xpath query which has an "or" conjunction to retrieve basically the information of two queries
Example
doc.find("//div[contains(#class,'important') or contains(#class,'overdue')]")
this returns all the divs with either important or overdue...
Can I prefix or see within my result set which comes from which condition?
The result could be an array with an index for the match 0 for the first condition and 1 for the 2... Is this possible...
Or how can I find out which result comes from which query condition...
Thanks for any help...
P.S.: this is a simplified exampled of a sequence of elements which either have an important or an overdue item ... both, one or none of them... So I cannot go by looking for every second entry ... etc
This is the result I want to get...
message:{},
message:{
.....
important: "some immportant text",
overdue: "overdue date,
.....
}
There is no way to know which clause of an or XPath query caused a particular result to be included. It's simply not information that's kept around.
You'll either need to do entirely separate queries for important and overdue, or do one large query to get the entire result set (as you are now) and then further test each result's class to find out which one it is.

Sitecore xpath query not working

Query 1: /sitecore/content/FR/Cabinet/New Category/Attributes//*[##TemplateID = '{95793C69-3E37-4CEB-9AF4-FD88276D85AA}']
Query 2: /sitecore/content/FR/Cabinet/New Category/Child Category 1/Attributes//*[##TemplateID = '{95793C69-3E37-4CEB-9AF4-FD88276D85AA}']
Query 1 works, with no problem - query 2 doesn't work, says expected ::. What's the difference, other than one is one more level deep. It also so happens that /Child Category 1/ actually doesn't have any children in the attributes folder, while /New Category/Attributes does...
UPDATE - so it seems that "Child" is a keyword in Xpath...what is the workaround here?
You can escape the category by wrapping it in hashes:
/sitecore/content/FR/Cabinet/New Category/#Child Category 1#/ ...
This also comes in handy when selecting items with fields that contain spaces:
... //*[##My Spaced Out Field# = '%Hey Yo!%']
If you're building the query dynamically, you may want to consider escaping each token separately, using the .Axes API, or selecting the items using Lucene.

Resources