XMLUNIT 2 using comparison with ignore element order with diffbuilder and namespaces fails - xmlunit

I am trying to use DiffBuilder to ignore XML elements order when comparing two .xml files but it fails. I have tried every possible combination and read many articles before posting this question.
For example:
<Data:Keys>
<Data:Value Key="1" Name="Example1" />
<Data:Value Key="2" Name="Example2" />
<Data:Value Key="3" Name="Example3" />
</Data:Keys>
<Data:Keys>
<Data:Value Key="2" Name="Example2" />
<Data:Value Key="1" Name="Example1" />
<Data:Value Key="3" Name="Example3" />
</Data:Keys>
I want these two treated as same XML. Notice that elements are empty, they have only attributes.
What I did so far:
def diff = DiffBuilder.compare(Input.fromString(xmlIN))
.withTest(Input.fromString(xmlOUT))
.ignoreComments()
.ignoreWhitespace()
.checkForSimilar()
.withNodeMatcher(new DefaultNodeMatcher(ElementSelectors.conditionalBuilder()
.whenElementIsNamed("Data:Keys").thenUse(ElementSelectors.byXPath("./Data:Value",
ElementSelectors.byNameAndText))
.elseUse(ElementSelectors.byName)
.build()))
But it fails every time. I don't know if the issue is the namespace, or that the elements are empty.
Any help will be appricated. Thank you in advance.

if you aim to match tags Data:Value by their attributes together, you should start with this:
.withNodeMatcher(new DefaultNodeMatcher(ElementSelectors.conditionalBuilder()
.whenElementIsNamed("Data:Value")
and since that tag doesn't have any text, the byNameAndText won't work. You can only work on names and attributes. My advice is to do it like this:
.thenUse(ElementSelectors.byNameAndAttributes("Key"))
or
.thenUse(ElementSelectors.byNameAndAllAttributes())
//equivalent
.thenUse(ElementSelectors.byNameAndAttributes("Key", "Name"))
As of issues with namespaces, checkForSimilar() should output SIMILAR, this means they are not DIFFERENT, so this is what you need. If you didn't use checkForSimilar() the differences in namespaces would be outputed as DIFFERENT.

Related

How to skip paragraphs with comments in XPath expression?

I'm trying to scrape websites like this with the following Xpath expression:
.//div[#class="tresc"]/p[not(starts-with(text(), "<!--"))]
The thing is that the first paragraph is a comment section, so I'd like to skip it:
<!--[if gte mso 9]><xml>
<w:WordDocument>
<w:View>Normal</w:View>
<w:Zoom>0</w:Zoom>
<w:HyphenationZone>21</w:HyphenationZone>
<w:PunctuationKerning />
<w:ValidateAgainstSchemas />
<w:SaveIfXMLInvalid>false</w:SaveIfXMLInvalid
<w:IgnoreMixedContent>false</w:IgnoreMixedContent
<w:AlwaysShowPlaceholderText>false</w:AlwaysShowPlaceholderText>
<w:Compatibility>
<w:BreakWrappedTables />
<w:SnapToGridInCell />
<w:WrapTextWithPunct />
<w:UseAsianBreakRules />
<w:DontGrowAutofit />
</w:Compatibility>
<w:BrowserLevel>MicrosoftInternetExplorer4</w:BrowserLevel>
</w:WordDocument>
</xml><![endif]-->
Unfortunately, my expression does not skip the paragraph with comments. Anyone know what I'm doing wrong?
Comments are not part of text(), they constitute a node of their own: comment(). To exclude p's that contain comments, use
p[not(comment())]

xpath remove an attribute from an dynamic attribute list

I want remove an xml attribute via xpath, but
the xml element could have more atrributes in the future.
html code:
<p class="red, blue, green">test/<p>
xpath:
<xpath expr="//p[contains(#class, 'green')]" position="attributes">
<attribute name="class">red, blue</attribute>
</xpath>
Is where a better way for fixtext "red, blue"?
In order to suppport possible new version of the html file like
"<p class="red, blue, green, brown">test</p>" in the future without need to change the xpath code again.
for instance actual attribute list as var + an xpath function
What about setting the #class to
concat(substring-before(#class, "green"), substring-after(#class, "green"))
You'll need to solve the abandoned commas, too, but as Björn Tantau commented, in real HTML the classes would be separated by spaces, so you can just wrap the result into normalize-space.

XPath based on node indexes only

I have an XML :
<Section>
<Paragraph>
<Text>t1</Text>
<Text>t2</Text>
</Paragraph>
<Paragraph>
<Text>t3</Text>
<Text>t4</Text>
</Paragraph>
</Section>
and I know only element indexes, e.g., /0/1/0 i.e. first Section, second Paragraph, and its first Text. How can I translate '0/1/0' into a valid XPath that returns element where t3 is ?
Note that I don't know element names because they can differ but I only know sequence of indexes as in above example.
Many thanks
For the example given this will work.
/element()[1]/element()[2]/element()[1]/text()

Determine if any element with a given name has a particular value

Given this XML fragment (I've removed superfluous fluff):
<Event name="DataComplete">
<Task id="d20a0053-7678-43ba-bc8a-ece24dcff15b"/>
<DataItems>
<DataItem name="Survey" type="task">
<Value status="NotStarted" taskId="00000000-0000-0000-0000-000000000000" />
</DataItem>
<GroupDataItem name="CT_Visit"> --- this may repeat
<ItemGroup id="1" >
<DataItem name="Special Contractor" type="string">Yes</DataItem>
What xPath expression will determine if any DataItem with name="Special Contractor" has the value "Yes".
I'm trying something like this:
Yes = /Event/Task/DataItems/GroupDataItem/ItemGroup/DataItem/#[normalize-space() = 'Special Contractor']
and many variations usually resulting in "invalid xPath expression".
Any clues most welcome. Thanks!
[EDIT]
Thanks for the answers Jiri and Will. Will was close, but as my question states, I'm trying to determine if any* element has the value Yes. I should have been more explicit in saying that I need a boolean, true or false. Adapting Will's answer led me to this:
"Yes" = //Event/Task/DataItems/GroupDataItem/ItemGroup/DataItem[#name='Special Contractor']
This returns a simple Boolean='true' or Boolean='false'.
Thanks guys!
/Event/DataItems/GroupDataItem/ItemGroup/DataItem[#name = "Special Contractor"][. = "Yes"]
Returns the DataItem in question. Note that this will be a sequence of matching DataItem elements if there are more than one. If you just want a boolean:
exists(/Event/DataItems/GroupDataItem/ItemGroup/DataItem[#name = "Special Contractor"][. = "Yes"])
(as an aside; I removed Task from the xpath, since it's not actually an ancestor of the DataItem in the XML fragment you posted, even though the indentation makes it look like it is.)
Use this xpath
/Event/Task/DataItems/GroupDataItem/ItemGroup/DataItem[#name='Special Contractor']
for following xml:
<Event name="DataComplete">
<Task id="d20a0053-7678-43ba-bc8a-ece24dcff15b">
<DataItems>
<DataItem name="Survey" type="task">
<Value status="NotStarted" taskId="00000000-0000-0000-0000-000000000000" />
</DataItem>
<GroupDataItem name="CT_Visit"> --- this may repeat
<ItemGroup id="1" >
<DataItem name="Special Contractor" type="string">Yes</DataItem>
</ItemGroup>
</GroupDataItem>
</DataItems>
</Task>
...
</Event>
If the task is really non-pair element, then omit it from the xpath expression.

How to escape double quote for Ruby inside of XML

Although I know that I can use &quote, I was wondering if there was a less blunt and long way, such as \", or the like.
Here is an example of the XML:
<root name="test" type="Node" action="{puts :ROOT.to_s}">
<leaf type="Node" decider="{print :VAL1.to_s; gets.chomp.to_i}" action="{puts :ONE.to_s}" />
<leaf type="Node" decider="{print :VAL2.to_s; gets.chomp.to_i}" action="{puts :TWO.to_s}" />
<branch type="Node" decider="{100}" action="{}">
<leaf type="LikelihoodNode" decider="{100}" action="{puts :HI.to_s}" arg="0"/>
</branch>
</root>
The attributes that need this are decider and action. Right now the embedded code is using a little :sym.to_s hack, but that is not a solution.
NOTE: Although the action attribute is only a block in brackets, the processing code pre-pends the lambda.
A double quote inside an XML attribute is written as &quote; (or " or "). You'll have similar issues with single quotes too so you can't use those. However, you can use % as-is in an XML attribute so %|...|, %Q|...|, and %q|...| are available and they're as easy to read and type as quotes:
<root name="test" type="Node" action="{puts %|ROOT|}">
<leaf type="Node" decider="{print %|VAL1|; gets.chomp.to_i}" action="{puts %|ONE|}" />
<!-- ... -->
</root>
Choose whichever delimiters you find the easiest to type and read.
You can also use single quotes for your attributes in XML so you can have:
<leaf type='Node' decider='{print "VAL1"; gets.chomp.to_i}' ...
But then you'd have to use &apos; inside the attribute if you needed to include a single quote.
Alternatively, you could switch to elements instead of attributes:
<leaf type="Node">
<decider><![CDATA[
print "VAL1"
gets.chomp.to_i
]]></decider>
<action><![CDATA[
puts "ONE"
]]></decider>
</leaf>
but that's a bit verbose, ugly, and not as easy to work with as attributes (IMHO).

Resources