The following xpath query gets nodes except where the ancestor is a particular type:
(/def:Image|…|//def:TextBox)[not(ancestor::clpm:EditableText)]
However, I want to be able to exclude all nodes that have an ancestor that is in the clpm namespace.
Can't work it out guys, any ideas?
Thanks
Use the following as predicate:
not(ancestor::*[starts-with(name(),'clpm:')])
Do note, however, that namespace and prefix are quite different things. In a single XML document many different prefixes may be bound to the same namespace and a single prefix may be bound (redefined) to more than one namespace.
In your question you say namespace, when you mean prefix.
The XPath expression above is true if the current node doesn't have any ancestors with prefix clpm.
Related
For reasons out of scope for this question I need to be able to handle multiple xml documents of the same structure but belonging to different namespaces (don't ask).
To achieve this I've become very accustomed to using an xpath like the following for many of my value selections:
//*[local-name()='apple']/*[local-name()='flavor']/text()"
My lack of understanding of predicates is preventing me from selecting a node's value based upon a sibling node's value. Consider the following xml:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<fruit>
<apple>
<kind>Red Delicious</kind>
<flavor>starchy</flavor>
</apple>
<apple>
<kind>Granny Smith</kind>
<flavor>tart</flavor>
</apple>
<apple>
<kind>Pink Lady</kind>
<flavor>sweet</flavor>
</apple>
</fruit>
Let's say I want to write an xpath that will select the flavor of a Granny Smith apple. While I would normally do something like:
//apple[kind/text()='Granny Smith']/flavor/text()
I cannot figure out how to merge the concept of utilizing local-name() to be namespace agnostic while still selecting a node based upon a sibling's value.
In short, what is the xpath necessary to return "tart" regardless of what namespace the input fruit xml document belongs to?
I need to be able to handle multiple xml documents of the same
structure but belonging to different namespaces (don't ask)
My preferred way to handle this is to first transform the data to use a single namespace, and then do the transformation proper. Doing it this way (a) keeps the real transformation much simpler, (b) puts all the namespace conversion logic in one place, and (c) makes the namespace conversion logic reusable - you can use the same transformation regardless how the data will subsequently be used.
I have a structure like this:
<sv:a>
<sv:b sv:name="one"/>
<sv:b sv:name="two"/>
<sv:c sv:name="exclude"/>
<sv:b sv:name="error"/>
<sv:a>
I am trying to get all a's and b's but exclude from my search the content of any c.
I have this structure so far for my xpath query
//*[not(name()='error') and jcr:contains(*, 'searchInput')]
I want to add something to this to essentially say, "do not give me any node named exclude" or maybe a better way to put it is "exclude any node named exclude from the search". I am not sure if I can do that using the path initially used of //* and just filtering a different way. I know I cannot just say not(name()='exclude') because it is only looking at one level below root and only excludes nodes at that level.
Is there a way to search 1 more level below and exclude certain nodes by their name or search everything in the entire document and exclude those nodes of a particular name?
Im not sure it matters, but I am working the CMS Magnolia and trying to make a site search. I hit a limitation using jcr sql2 and cannot do what I am trying to do here as far as I have found in researching this.
EDIT:
Based on answers and comments, here is what I am looking at now:
//*[not(#sv:name='exclude' or #sv:name='error') and jcr:contains(*, 'searchInput)]
I still seem to be getting the 'exclude' results so I must either not be registering 'sv:' correctly or missing something in the query needed to exclude some of the results from the search.
I want to add something to this to essentially say, "do not give me any node named exclude"
That's easy: Nodes (elements) named exclude can be selected via the self axis,
using *[self::exclude]. Corollary: An element not named exclude is *[not(self::exclude)].
But I think you don't refer to element names. You don't have any <exclude> elements in your input.
You actually seem to refer to attributes.
//*[not(#sv:name = 'error' or #sv:name = 'exclude') and jcr:contains(*, 'searchInput')]
I am trying to get all a's and b's but exclude from my search the content of any c.
You can't, at least not with pure XPath. XPath is a language for selecting nodes out of an XML tree, not for building new trees that are different. An XPath expression can either select the a or not select the a, but it can't give you a new a element that has only some of the children of the original a and not others.
I'm currently trying to figure out how to shorten my extremely long xpath.
//div[#class='m_set_part'][1]/div/div[2]/div[#class='row']/div[#class='col details detail-head']/div[#class='detail-body']/div[2]/div/div[#class='size']/div/div[#class='m_product_finder_size']/ul/li[1]/span[#class='size-btn']/a
This is the one I have right now and it's way too long, the problem is I need the first node to differentiate between products. Is there a way to shorten it like
//div[#class='m_set_part']/*/span[#class='size-btn']/a
Or do I have to go through all childnodes to reach the last nodes?
Link
I want to find the for each product the sizebuttons. The only way to differentiate them, I guess, is via adding a [1] or [2] to the m_set_part node.
You are basically correct. As said in the comments, you can use // to select descendant or self nodes. Hence, this will give you all the size links:
//span[#class='size-btn']/a
As you suggest, you can select the specific product using a positional predicate. However, if you prefer you could also use another detail, e.g. the name. This would simply be
//div[#class="m_set_part"][.//label="Vælg"]
to given you the Vælg product.
Now combine them both and you can get the size link for this specifc product using
//div[#class="m_set_part"][.//label="Vælg"]//span[#class='size-btn']/a
or using the psoitional predicate it would be
//div[#class="m_set_part"][1]//span[#class='size-btn']/a
Also, please make sure you use a proper namespace as this is an actual XHTML document. One other thing is that you might prefer to use contains(#class, 'm_set_part') instead of #class="m_set_part" and the like, because the query will still work even if the add new CSS classes to this element.
To answer to your question: No you don't have to go through all nodes.
You may use the // descendant-or-self selector to 'skip' zero or more nodes in between the preceeding and the next part of the expression. So //div[#class='m_set_part']//span[#class='size-btn']/a might give you exactly what you want. * on the other hand matches any node, but exactly one node. Therfore
//div[#class='m_set_part'][1]/*/*[2]/*[#class='row']/*[#class='col details detail-head']/*[#class='detail-body']/*[2]/*/*[#class='size']/*/*[#class='m_product_finder_size']/*/*[1]/*[#class='size-btn']/a
is another way to shorten your original expression. Whether it's still returns only the interested node or more is solely depends on the document you apply the expression on.
I am using google docs for web scraping. More specifically, I am using the Google Sheets built in IMPORTXML function in which I use XPath to select nodes to scrape data from.
What I am trying to do is basically check if a particular node exists, if YES, select some other random node.
/*IF THIS NODE EXISTS*/
if(exists(//table/tr/td[2]/a/img[#class='special'])){
/*SELECT THIS NODE*/
//table/tr/td[2]/a
}
You don't have logic quite like that in XPath, but you might be able to do something like what you want.
If you want to select //table/tr/td[2]/a but only if it has a img[#class='special'] in it, then you can use //table/tr/td[2]/a[img[#class='special']].
If you want to select some other node in some other circumstance, you could union two paths (the | operator), and just make sure that each has a filter (within []) that is mutually exclusive, like having one be a path and the other be not() of that path. I'd give an example, but I'm not sure what "other random node" you'd want… Perhaps you could clarify?
The key thing is to think of XPath as a querying language, not a procedural one, so you need to be thinking of selectors and filters on them, which is a rather different way of thinking about problems than most programmers are used to. But the fact that the filters don't need to specifically be related to the selector (you can have a filter that starts looking at the root of the document, for instance) leads to some powerful (if hard-to-read) possibilities.
Use:
/self::node()[//table/tr/td[2]/a/img[#class='special']]
//table/tr/td[2]/a
I have some XML that is structured like this:
<whatson>
<productions>
<production>
<category>Film</category>
</production>
<production>
<category>Business</category>
</production>
<production>
<category>Business training</category>
</production>
</productions>
</whatson>
And I need to select every production with a category that doesn't contain "Business" (so just the first production in this example).
Is this possible with XPath? I tried working along these lines but got nowhere:
//production[not(contains(category,'business'))]
XPath queries are case sensitive. Having looked at your example (which, by the way, is awesome, nobody seems to provide examples anymore!), I can get the result you want just by changing "business", to "Business"
//production[not(contains(category,'Business'))]
I have tested this by opening the XML file in Chrome, and using the Developer tools to execute that XPath queries, and it gave me just the Film category back.
I need to select every production with a category that doesn't contain "Business"
Although I upvoted #Arran's answer as correct, I would also add this...
Strictly interpreted, the OP's specification would be implemented as
//production[category[not(contains(., 'Business'))]]
rather than
//production[not(contains(category, 'Business'))]
The latter selects every production whose first category child doesn't contain "Business". The two XPath expressions will behave differently when a production has no category children, or more than one.
It doesn't make any difference in practice as long as every <production> has exactly one <category> child, as in your short example XML. Whether you can always count on that being true or not, depends on various factors, such as whether you have a schema that enforces that constraint. Personally, I would go for the more robust option, since it doesn't "cost" much... assuming your requirement as stated in the question is really correct (as opposed to e.g. 'select every production that doesn't have a category that contains "Business"').
You can use not(expression) function.
not() is a function in xpath (as opposed to an operator)
Example:
//a[not(contains(#id, 'xx'))]
OR
expression != true()
Should be xpath with not contains() method, //production[not(contains(category,'business'))]