XPath remove single node (via Saxon CLI) - xpath

I want to remove a node from an XML file (using SaxonHE9-8-0-11J):
<project name="Build">
<property name="src" value="src/main/resources" />
<property name="target" value="target/classes" />
<condition property="target.exists">
<available file="target" />
</condition>
</project>
Apparently there are 2 ways I can do this.
XPath1: using a not function
XPath2: using an except clause. But both simply return the entire node-set.
With a not function:
saxonb-xquery -s:test.xml -qs:'*[not(local-name()="condition")]'
With an except clause:
saxonb-xquery -s:test.xml -qs:'* except condition'
With -explain switch the queries are:
<query>
<body>
<filterExpression>
<axis name="child" nodeTest="element()"/>
<operator op="ne (on empty return true())">
<functionCall name="local-name">
<dot/>
</functionCall>
<literal value="condition" type="xs:string"/>
</operator>
</filterExpression>
</body>
</query>
and
<query>
<body>
<operator op="except">
<axis name="child" nodeTest="element()"/>
<path>
<root/>
<axis name="descendant" nodeTest="element(condition, xs:anyType)"/>
</path>
</operator>
</body>
</query>

In general, XPath select nodes from one or more input documents, it doesn't allow you to construct new ones, for that you need XSLT or XQuery. And removing the condition child of the project root, if that is what you want to achieve, is something you need XSLT or XQuery for, with XPath, even if you use /*/(* except condition), you then get all children except the condition element, but as a sequence, not wrapped into a a root.
So with XQuery you could use
/*/element {node-name()} { * except condition }
as a compact but generic way to reconstruct any root with all child elements except the condition: https://xqueryfiddle.liberty-development.net/948Fn5b
Whether you get such an expression through a command line shell is a different problem, on Windows with a Powershell window and the cmd shell it works for me to use
-qs:"/*/element {node-name()} { * except condition }"

Related

How to skip paragraphs with comments in XPath expression?

I'm trying to scrape websites like this with the following Xpath expression:
.//div[#class="tresc"]/p[not(starts-with(text(), "<!--"))]
The thing is that the first paragraph is a comment section, so I'd like to skip it:
<!--[if gte mso 9]><xml>
<w:WordDocument>
<w:View>Normal</w:View>
<w:Zoom>0</w:Zoom>
<w:HyphenationZone>21</w:HyphenationZone>
<w:PunctuationKerning />
<w:ValidateAgainstSchemas />
<w:SaveIfXMLInvalid>false</w:SaveIfXMLInvalid
<w:IgnoreMixedContent>false</w:IgnoreMixedContent
<w:AlwaysShowPlaceholderText>false</w:AlwaysShowPlaceholderText>
<w:Compatibility>
<w:BreakWrappedTables />
<w:SnapToGridInCell />
<w:WrapTextWithPunct />
<w:UseAsianBreakRules />
<w:DontGrowAutofit />
</w:Compatibility>
<w:BrowserLevel>MicrosoftInternetExplorer4</w:BrowserLevel>
</w:WordDocument>
</xml><![endif]-->
Unfortunately, my expression does not skip the paragraph with comments. Anyone know what I'm doing wrong?
Comments are not part of text(), they constitute a node of their own: comment(). To exclude p's that contain comments, use
p[not(comment())]

XMLUNIT 2 using comparison with ignore element order with diffbuilder and namespaces fails

I am trying to use DiffBuilder to ignore XML elements order when comparing two .xml files but it fails. I have tried every possible combination and read many articles before posting this question.
For example:
<Data:Keys>
<Data:Value Key="1" Name="Example1" />
<Data:Value Key="2" Name="Example2" />
<Data:Value Key="3" Name="Example3" />
</Data:Keys>
<Data:Keys>
<Data:Value Key="2" Name="Example2" />
<Data:Value Key="1" Name="Example1" />
<Data:Value Key="3" Name="Example3" />
</Data:Keys>
I want these two treated as same XML. Notice that elements are empty, they have only attributes.
What I did so far:
def diff = DiffBuilder.compare(Input.fromString(xmlIN))
.withTest(Input.fromString(xmlOUT))
.ignoreComments()
.ignoreWhitespace()
.checkForSimilar()
.withNodeMatcher(new DefaultNodeMatcher(ElementSelectors.conditionalBuilder()
.whenElementIsNamed("Data:Keys").thenUse(ElementSelectors.byXPath("./Data:Value",
ElementSelectors.byNameAndText))
.elseUse(ElementSelectors.byName)
.build()))
But it fails every time. I don't know if the issue is the namespace, or that the elements are empty.
Any help will be appricated. Thank you in advance.
if you aim to match tags Data:Value by their attributes together, you should start with this:
.withNodeMatcher(new DefaultNodeMatcher(ElementSelectors.conditionalBuilder()
.whenElementIsNamed("Data:Value")
and since that tag doesn't have any text, the byNameAndText won't work. You can only work on names and attributes. My advice is to do it like this:
.thenUse(ElementSelectors.byNameAndAttributes("Key"))
or
.thenUse(ElementSelectors.byNameAndAllAttributes())
//equivalent
.thenUse(ElementSelectors.byNameAndAttributes("Key", "Name"))
As of issues with namespaces, checkForSimilar() should output SIMILAR, this means they are not DIFFERENT, so this is what you need. If you didn't use checkForSimilar() the differences in namespaces would be outputed as DIFFERENT.

Xpath 1.0 select first node back to ancestor

My XML as below :
<Query>
<Comp>
<Pers>
<Emp>
<Job>
<Code>Not selected</Code>
</Job>
</Emp>
<Emp>
<Job>
<Code>selected</Code>
</Job>
</Emp>
</Pers>
</Comp>
</Query>
I have an XPath : /Query/Comp/Pers/Emp/Job[Code='selected']/../../../..
The result should only have one < Emp > that meet condition
<Query>
<Comp>
<Pers>
<Emp>
<Job>
<Code>selected</Code>
</Job>
</Emp>
</Pers>
</Comp>
</Query>
How could I get the result?
The system doesn't work with ancestor::*. I have to use '/..' to populate the ancestor.
You shouldn't have to use ancestor here to get the <emp> tag, the following expath should select any <emp> tag that meets your criteria:
/Query/Comp/Pers/Emp[Job[Code='selected']]
Note: You say your result should be one, which will be correct in this case but this expression will return all nodes that match your criteria
Edit:
You've stated you're using XSLT and you've given me a bit of a snippet below, but I'm still not 100% sure of your actual structure. You can use the XPath to identify all the nodes that are not equal to selected, and then use XSLT to copy everything except those.
// Copy's all nodes in the input to the output
<xsl:template match="#*|node()">
<xsl:copy>
<xsl:apply-templates select="#*|node()" />
</xsl:copy>
</xsl:template>
// Matches specifically the Emp records that are not equal to selected and
// applies no action to them to they do not appear in the output
<xsl:template match="/Query/Comp/Pers/Emp[Job[Code!='selected']]" />
The two templates above would transform your input to your desired output!

XMLStarlet: selecting nodes using less than / greater than

Does XMLStarlet let you use a less-than/greater-than operator to filter on an attribute value? For example, consider a document like this:
<xml>
<list>
<node name="a" val="x" />
<node name="b" val="y" />
<node name="c" val="z" />
etc.
</list>
{code}
Is there a way to select nodes whose value is greater than "x"? This XPath does not seem to work with XMLStarlet 1.5.0:
//node[#val > 'x']
Nor does this:
//node[#value gt 'x']
Comparing Characters like they were numbers (ASCII values/UniCode codepoints) is (unfortunately) impossible in XPath 1.0, look at this SO question if interested in more details.
So if your #val attributes are sorted in the XML, you can achieve this with a simple XPath expression selecting all nodes after an 'equal' match:
//node[#val='x']/following-sibling::node
If not, you'd have to use an XSLT-Stylesheet. Luckily, XMLStarlet has the ability to apply XSL-Stylesheets. I cite from their overview:
Apply XSLT stylesheets to XML documents (including EXSLT support, and passing parameters to stylesheets)
So you have the possibility to apply an xsl:stylesheet to achieve the desired result using xsl:sort, which is capable of sorting by characters.
<xsl:template match="/list">
<xsl:for-each select="//node"> <!-- all nodes sorted by 'val' attribute' -->
<xsl:sort select="#val" data-type="text" order="ascending" case-order="upper-first"/>
<xsl:value-of select="#name" /> <!-- or whatever output you desire -->
</xsl:for-each>
</xsl:template>

XPath to Select Nodes Starting with a Certain Value

Given the XML structure
<Doc>
<Other />
<Q1 />
<Q2 />
</Doc>
How can I select only nodes that begin with a "Q", e.g. /Doc/Q1 and /Doc/Q2?
It seems like this can be done with starts-with, but I have only found examples that apply starts-with to the value of the node
/Doc/*[starts-with(name(), 'Q')]

Resources