XMLUNIT - Comparing two xmls with differing Nodes - xmlunit

I have 2 XMLs to be compared:
File1.xml
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<member>
<SchoolID>1021</SchoolID>
<CandidateType>First Year</CandidateType>
<CandidateName>John</CandidateName>
</member>
File2.xml
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<member>
<CandidateID>3147</CandidateID>
<SchoolID>1021</SchoolID>
<CandidateType>Second Year</CandidateType>
<CandidateName>Peter</CandidateName>
</member>
I am using xmlunit to compare, however the output I am getting is like:
Similar? false
Identical? false
***********************
Expected number of child nodes '2' but was '3' - comparing <member...> at /member[1] to <member...> at /member[1]
***********************
***********************
Expected sequence of child nodes '0' but was '1' - comparing <SchoolID...> at /member[1]/SchoolID[1] to <SchoolID...> at /member[1]/SchoolID[1]
***********************
***********************
Expected text value 'John' but was 'Peter' - comparing <CandidateName ...>John</CandidateName> at /member[1]/CandidateName[1]/text()[1] to <CandidateName ...>Peter</CandidateName> at /member[1]/CandidateName[1]/text()[1]
***********************
***********************
Expected sequence of child nodes '1' but was '2' - comparing <CandidateName...> at /member[1]/CandidateName[1] to <CandidateName...> at /member[1]/CandidateName[1]
***********************
***********************
Expected presence of child node 'null' but was 'CandidateID' - comparing at null to <CandidateID...> at /member[1]/CandidateID[1]
I need to represent the output such that it only tells me the following differences:
Node CandidateID is missing in File1.xml and the data difference for Node CandidateName. I don't need the extra details. Is there a way to tweak the output of detDiff.getAllDifferences().
The code snapshot looks like:
try {
// fr1 and fr2 are my two xml files.
Diff diff = new Diff(fr1, fr2);
System.out.println("Similar? " + diff.similar());
System.out.println("Identical? " + diff.identical());
DetailedDiff detDiff = new DetailedDiff(diff);
List differences = detDiff.getAllDifferences();
for (Object object : differences) {
Difference difference = (Difference)object;
System.out.println("***********************");
System.out.println(difference);
System.out.println("***********************");
}

Depending on your needs you could either filter the differences after the comparison or override the DifferenceListener in order to ignore the differences you are not interested in.
For filtering it seems you could just look at isRecoverable and strip away all recoverable differences.
A custom DifferenceListener would be something like
detDiff.overrideDifferenceListener(new DifferenceListener() {
#Override
public int differenceFound(Difference diff) {
if (diff.getId() == DifferenceConstants.CHILD_NODELIST_SEQUENCE_ID
|| diff.getId() == DifferenceConstants.CHILD_NODELIST_LENGTH_ID) {
return RETURN_IGNORE_DIFFERENCE_NODES_IDENTICAL;
}
return RETURN_ACCEPT_DIFFERENCE;
}
#Override
public void skippedComparison(Node arg0, Node arg1) { }
});
before you call getAllDifferences.

Related

How to use following in Xpath to get siblings in a Tag

I have following Structure: I am trying to build a robust method to extract the elements of FT1_19_0 of the FT1_19 Tag in the order they appear. However
in my results the elements are rearranged. How can i get my result in correct order.
//*/FT1_19/FT1_19_0[contains(../FT1_19_2,'I10') and
not(.=../following::FT1_19/FT1_19_0)]
The Result(Rearranged)
X50.0XXA
M76.891
M17.11
M23.303
<?xml version="1.0" encoding="UTF-8"?>
<root>
<FT1>
<FT1_1>1</FT1_1>
<FT1_4>20180920130000</FT1_4>
<FT1_5>20180924110101</FT1_5>
<FT1_6>CG</FT1_6>
<FT1_7>99203</FT1_7>
<FT1_9/>
<FT1_10>1.00</FT1_10>
<FT1_13>NPI</FT1_13>
<FT1_16>
<FT1_16_1>Gavin, Matthew, MD</FT1_16_1>
<FT1_16_3>22</FT1_16_3>
</FT1_16>
<FT1_19 NO="1">
<FT1_19_0>M76.891</FT1_19_0>
<FT1_19_2>I10</FT1_19_2>
</FT1_19>
<FT1_19 NO="2">
<FT1_19_0>M17.11</FT1_19_0>
<FT1_19_2>I10</FT1_19_2>
</FT1_19>
<FT1_19 NO="3">
<FT1_19_0>M23.303</FT1_19_0>
<FT1_19_2>I10</FT1_19_2>
</FT1_19>
<FT1_19 NO="4">
<FT1_19_0>X50.0XXA</FT1_19_0>
<FT1_19_2>I10</FT1_19_2>
</FT1_19>
</FT1>
</root>
Use this if you are using java:
List<WebElement> list = driver.findElements(By.xpath("//ft1_19//following::ft1_19_0"));
for(WebElement we:list) {
System.out.println(we.getText());
}

XmlUnit empty Elements

I try to compare two xml with xmlUnit. I have the following problem. When i have two empty elements like the example below xmlUnit identificate the elements as a difference. Can i configure xmlUnit to ignore this?
</name> and <name></name>
I am only interesting in difference like the next two examples.
<name>test1</name> and <name>test2</name>
difference: test1 and test2
or
<name>test1</name> and <name></name>
difference
test1 and ...
My code:
`
Diff diff = new Diff(fr1, fr2);
DetailedDiff detailedDiff = new DetailedDiff(diff);
List differenceList = detailedDiff.getAllDifferences();
List differences = detailedDiff.getAllDifferences();
for (Object object : differences) {
Difference difference = (Difference)object;
String node1;
String node2;
node1 = difference.getControlNodeDetail().getNode().getNodeName() + " " + difference.getControlNodeDetail().getNode().getNodeValue();
node2 = difference.getTestNodeDetail().getNode().getNodeName() + " " + difference.getTestNodeDetail().getNode().getNodeValue();
}
`
Assuming your </name> is a typo and it is <name/> as per the comment,
then you could try the following.
XMLUnit.setIgnoreWhitespace(true);
Seems to work for me.
ie.
When I try to compare <Carp1></Carp1> with <Carp1/>.
Without the above setting, I get
Expected text value '
' but was '
' - comparing <CfgDN ...>
</CfgDN> at /CfgDN[1]/text()[19] to <CfgDN ...>
</CfgDN> at /CfgDN[1]/text()[19]
With the above setting, all is similar and identical.

Unable to findnodes() restricted just to current parent

I'm parsing a simple XML file to create a flat text file from it. The desired outcome is shown below the sample XML. The XML has sort of a header-detail structure (Assembly_Info and Part respectively), with a unique header node followed by any number of detail record nodes, all of which are siblings. After digging into the elements under the header, I can't then find a way back 'up' to then pick up all the sibling detail nodes.
XML file looks like this:
<?xml version="1.0" standalone="yes" ?>
<Wrapper>
<Record>
<Product>
<prodid>4094</prodid>
</Product>
<Assembly>
<Assembly_Info>
<id>DF-7A</id>
<interface>C</interface>
</Assembly_Info>
<Part>
<status>N/A</status>
<dev_name>0000</dev_name>
</Part>
<Part>
<status>Ready</status>
<dev_name>0455</dev_name>
</Part>
<Part>
<status>Ready</status>
<dev_name>045A</dev_name>
</Part>
</Assembly>
<Assembly>
<Assembly_Info>
<id>DF-7A</id>
<interface>C</interface>
</Assembly_Info>
<Part>
<status>N/A</status>
<dev_name>0002</dev_name>
</Part>
<Part>
<status>Ready</status>
<dev_name>0457</dev_name>
</Part>
</Assembly>
</Record>
</Wrapper>
For each Assembly I need to read the values of the two elemenmets in Assembly_Info which I do successfully. But, I then want to read each of the Part records that are associated with the Assembly. The objective is to 'flatten' the file into this:
prodid id interface status dev_name
4094 DF-7A C N/A 0000
4094 DF-7A C Ready 0455
4094 DF-7A C Ready 045A
4094 DF-7A C N/A 0002
4094 DF-7A C Ready 0457
I'm attempting to use findnodes() to do this, as that's about the only tool I thought I understood. My code unfortunately reads all of the Part records from the entire file foreach Assembly--since the only way I've been able to find the Part nodes is to start at the root. I don't know how to change 'where I am', if you will; to tell findnodes to begin at current parent. Code looks like this:
my $parser = XML::LibXML -> new();
my $tree = $parser -> parse_file ('DEMO.XML');
for my $product ($tree->findnodes ('/Wrapper/Record/Product/prodid')) {
$prodid = $product->textContent();
}
foreach my $assembly ($tree->findnodes ('/Wrapper/Record/Assembly')){
$assemblies++;
$parts = 0;
for my $assembly ($tree->findnodes ('/Wrapper/Record/Assembly/Assembly_Info')) {
$id = $assembly->findvalue('id');
$interface = $assembly->findvalue('interface');
}
foreach my $part ($tree->findnodes ('/Wrapper/Record/Assembly/Part')) {
$parts++;
$status = $part->findvalue('status');
$dev_name = $part->findvalue('dev_name');
}
print "Assembly No: ", $assemblies, " Parts: ",$parts, "\n";
}
How do I get just the Part nodes for a given Assembly, after I've gone down to the Assembly_Info depths? There is quite a bit I'm not getting, and I think a problem may be that I'm thinking of this as 'navigating' or moving a cursor, if you will. Examples of XPath path expressions have not helped me.
Instead of always using $tree as the starting point for the findnodes method, you can use any other node, especially also child nodes. Then you could use a relative XPath expression. For example:
for my $record ($tree->findnodes('/Wrapper/Record')) {
for my $assembly ($record->findnodes('./Assembly')) {
for my $part ($assembly->findnodes('./Part')) {
}
}
}

Is there something wrong with this xpath, or is Marklogic misbehaving?

I've got a weird problem with what should be a simple xpath not returning data when I'm sure the data is present in Marklogic. I can see the data in question through a more general xpath, but not get it specifically.
xpath that works:
/log:record[log:changes/log:change/#field = "moduleCodes"]
=> a long sequence of log:record elements with "moduleCodes" field changes
xpath that doesn't:
/log:record/log:changes/log:change[#field = "moduleCodes"]
=> empty sequence
(I've ommitted the log namespace definition brevity.)
Trying to figure out what's going on, I started with the first, working, xpath and built on it:
/log:record[log:changes/log:change/#field = "moduleCodes"]/log:changes/log:change
=> sequence of log:change elements including some with #field = "moduleCodes"
/log:record[log:changes/log:change/#field = "moduleCodes"]/log:changes/log:change[#field]
=> sequence of log:change elements including some with #field = "moduleCodes"
/log:record[log:changes/log:change/#field = "moduleCodes"]/log:changes/log:change[#field = "moduleCodes"]
=> empty sequence
Am I misunderstanding something fundamental? I can't see any reason why the xpaths putting the predicate on the log:change would return an empty sequence when everything else works as I expect. This feels like Marklogic getting confused somehow to me, but I want to make sure it's not just me missing a subtlety of xpath before I start talking like that.
I just tried the paths with a different field-name. It works as I expect with (at least some) other values.
I just restarted the ML cluster; no change.
Edit:
All of the xpaths above work fine in Oxygen, so it seems to be just ML that's behaving like this. I tried adding fn:doc() to the front of all the paths, in case that helped, but it made no difference.
Here's an (anonymised) record that I believe should match all the xpaths:
<log:record id="00000001" date="2013-04-14T01:42:02.922+01:00" type="change" xmlns:log="some/namespace/definition">
<log:head>
<some-header-info/>
</log:head>
<log:changes>
<log:change field="dateModified">
<log:old-value>2012-11-06T00:00:00.0000000</log:old-value>
<log:new-value>2013-03-20T00:00:00.0000000</log:new-value>
</log:change>
<log:change field="moduleCodes">
<log:old-value>
<log:moduleCodes>
<log:moduleCodes-value code="AAA"/>
</log:moduleCodes>
</log:old-value>
<log:new-value>
<log:moduleCodes>
<log:moduleCodes-value code="AAA"/>
<log:moduleCodes-value code="BBB"/>
</log:moduleCodes>
</log:new-value>
</log:change>
</log:changes>
</log:record>
As best I can recreate your test with 6.0-2.3, this works for me.
When debugging database queries, one useful technique is to move things in memory. If it still doesn't work, this throws suspicion on the database query. When I try that using 6.0-2.3, the results seem to be correct.
declare namespace log="some/namespace/definition" ;
document {
<log:record id="00000001" date="2013-04-14T01:42:02.922+01:00" type="change" xmlns:log="some/namespace/definition">
<log:head>
<some-header-info/>
</log:head>
<log:changes>
<log:change field="dateModified">
<log:old-value>2012-11-06T00:00:00.0000000</log:old-value>
<log:new-value>2013-03-20T00:00:00.0000000</log:new-value>
</log:change>
<log:change field="moduleCodes">
<log:old-value>
<log:moduleCodes>
<log:moduleCodes-value code="AAA"/>
</log:moduleCodes>
</log:old-value>
<log:new-value>
<log:moduleCodes>
<log:moduleCodes-value code="AAA"/>
<log:moduleCodes-value code="BBB"/>
</log:moduleCodes>
</log:new-value>
</log:change>
</log:changes>
</log:record> }
/log:record[log:changes/log:change/#field = "moduleCodes"]/xdmp:path(.)
=>
/log:record
declare namespace log="some/namespace/definition" ;
document {
<log:record id="00000001" date="2013-04-14T01:42:02.922+01:00" type="change" xmlns:log="some/namespace/definition">
<log:head>
<some-header-info/>
</log:head>
<log:changes>
<log:change field="dateModified">
<log:old-value>2012-11-06T00:00:00.0000000</log:old-value>
<log:new-value>2013-03-20T00:00:00.0000000</log:new-value>
</log:change>
<log:change field="moduleCodes">
<log:old-value>
<log:moduleCodes>
<log:moduleCodes-value code="AAA"/>
</log:moduleCodes>
</log:old-value>
<log:new-value>
<log:moduleCodes>
<log:moduleCodes-value code="AAA"/>
<log:moduleCodes-value code="BBB"/>
</log:moduleCodes>
</log:new-value>
</log:change>
</log:changes>
</log:record> }
/log:record/log:changes/log:change[#field = "moduleCodes"]/xdmp:path(.)
=>
/log:record/log:changes/log:change[2]
So the implication is that the problem is in the index or the way the index is queried. You can try to debug that using xdmp:query-trace(true()) at the start of your query. For example:
declare namespace log="some/namespace/definition" ;
xdmp:query-trace(true()),
/log:record[log:changes/log:change/#field = "moduleCodes"]/xdmp:describe(.),
/log:record/log:changes/log:change[#field = "moduleCodes"]/xdmp:describe(.)
With 6.0-2.3 these both return the expected results for me.
fn:doc("test")/log:record
fn:doc("test")/log:record/log:changes/log:change[2]
Here are the traces, from the ErrorLog.txt file:
Analyzing path: fn:collection()/log:record[log:changes/log:change/#field = "moduleCodes"]/xdmp:describe(.)
Step 1 is searchable: fn:collection()
Step 2 is searchable: log:record[log:changes/log:change/#field = "moduleCodes"]
Step 3 is unsearchable: xdmp:describe(.)
First 2 steps of path are searchable: fn:collection()/log:record[log:changes/log:change/#field = "moduleCodes"]
Gathering constraints.
Comparison contributed hash value constraint: log:change/#field = "moduleCodes"
Step 2 predicate 1 contributed 3 constraints: log:changes/log:change/#field = "moduleCodes"
Comparison contributed hash value constraint: log:change/#field = "moduleCodes"
Step 2 predicate 1 contributed 1 constraint: log:changes/log:change/#field = "moduleCodes"
Step 2 contributed 4 constraints: log:record[log:changes/log:change/#field = "moduleCodes"]
Executing search.
Selected 1 fragment to filter
xdmp:eval("declare namespace log="some/namespace/definition" ;&#1...", (), <options xmlns="xdmp:eval"><database>598453498912235799</database><root>/tmp</root><isolati...</options>)
Analyzing path: fn:collection()/log:record/log:changes/log:change[#field = "moduleCodes"]/xdmp:describe(.)
Step 1 is searchable: fn:collection()
Step 2 is searchable: log:record
Step 3 is searchable: log:changes
Step 4 is searchable: log:change[#field = "moduleCodes"]
Step 5 is unsearchable: xdmp:describe(.)
First 4 steps of path are searchable: fn:collection()/log:record/log:changes/log:change[#field = "moduleCodes"]
Gathering constraints.
Step 2 contributed 1 constraint: log:record
Comparison contributed hash value constraint: log:change/#field = "moduleCodes"
Step 4 predicate 1 contributed 1 constraint: #field = "moduleCodes"
Step 4 contributed 1 constraint: log:change[#field = "moduleCodes"]
Comparison contributed hash value constraint: log:change/#field = "moduleCodes"
Step 4 predicate 1 contributed 1 constraint: #field = "moduleCodes"
Comparison contributed hash value constraint: log:change/#field = "moduleCodes"
Step 4 predicate 1 contributed 1 constraint: #field = "moduleCodes"
Step 4 contributed 1 constraint: log:change[#field = "moduleCodes"]
Step 3 contributed 1 constraint: log:changes
Executing search.
Selected 1 fragment to filter

SimpleXML Reading node with a hyphenated name

I have the following XML:
<?xml version="1.0" encoding="UTF-8"?>
<gnm:Workbook xmlns:gnm="http://www.gnumeric.org/v10.dtd" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.gnumeric.org/v9.xsd">
<office:document-meta xmlns:office="urn:oasis:names:tc:opendocument:xmlns:office:1.0" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:meta="urn:oasis:names:tc:opendocument:xmlns:meta:1.0" xmlns:ooo="http://openoffice.org/2004/office" office:version="1.1">
<office:meta>
<dc:creator>Mark Baker</dc:creator>
<dc:date>2010-09-01T22:49:33Z</dc:date>
<meta:creation-date>2010-09-01T22:48:39Z</meta:creation-date>
<meta:editing-cycles>4</meta:editing-cycles>
<meta:editing-duration>PT00H04M20S</meta:editing-duration>
<meta:generator>OpenOffice.org/3.1$Win32 OpenOffice.org_project/310m11$Build-9399</meta:generator>
</office:meta>
</office:document-meta>
</gnm:Workbook>
And am trying to read the office:document-meta node to extractthe various elements below it (dc:creator, meta:creation-date, etc.)
The following code:
$xml = simplexml_load_string($gFileData);
$namespacesMeta = $xml->getNamespaces(true);
$officeXML = $xml->children($namespacesMeta['office']);
var_dump($officeXML);
echo '<hr />';
gives me:
object(SimpleXMLElement)[91]
public 'document-meta' =>
object(SimpleXMLElement)[93]
public '#attributes' =>
array
'version' => string '1.1' (length=3)
public 'meta' =>
object(SimpleXMLElement)[94]
but if I try to read the document-meta element using:
$xml = simplexml_load_string($gFileData);
$namespacesMeta = $xml->getNamespaces(true);
$officeXML = $xml->children($namespacesMeta['office']);
$docMeta = $officeXML->document-meta;
var_dump($docMeta);
echo '<hr />';
I get
Notice: Use of undefined constant meta - assumed 'meta' in /usr/local/apache/htdocsNewDev/PHPExcel/Classes/PHPExcel/Reader/Gnumeric.php on line 273
int 0
I assume that SimpleXML is trying to extract a non-existent node "document" from $officeXML, then subtract the value of (non-existent) constant "meta", resulting in forcing the integer 0 result rather than the document-meta node.
Is there a way to resolve this using SimpleXML, or will I be forced to rewrite using XMLReader? Any help appreciated.
Your assumption is correct. Use
$officeXML->{'document-meta'}
to make it work.
Please note that the above applies to Element nodes. Attribute nodes (those within the #attributes property when dumping the SimpleXmlElement) do not require any special syntax to be accessed when hyphenated. They are regularly accessible via array notation, e.g.
$xml = <<< XML
<root>
<hyphenated-element hyphenated-attribute="bar">foo</hyphenated-element>
</root>
XML;
$root = new SimpleXMLElement($xml);
echo $root->{'hyphenated-element'}; // prints "foo"
echo $root->{'hyphenated-element'}['hyphenated-attribute']; // prints "bar"
See the SimpleXml Basics in the Manual for further examples.
I assume the best way to do it is to cast to array:
Consider the following XML:
<subscribe hello-world="yolo">
<callback-url>example url</callback-url>
</subscribe>
You can access members, including attributes, using a cast:
<?php
$xml = (array) simplexml_load_string($input);
$callback = $xml["callback-url"];
$attribute = $xml['#attributes']['hello-world'];
It makes everything easier. Hope I helped.

Resources