How to ignore XML declaration differences with XmlUnit? - xmlunit

How can I configure XmlUnit.Net to ignore the XML declaration when comparing two documents?
Assume I have the following control document:
<?xml version="1.0" encoding="utf-8"?>
<a><amount>1</amount></a>
Which I want to compare with:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<a><amount>1</amount></a>
The comparison should result in no differences.
My expectation would be that using a NodeFilter like so should work, but it doesn't:
var diff = DiffBuilder.Compare(control)
.WithTest(test)
.WithNodeFilter(n => n.NodeType != XmlNodeType.XmlDeclaration)
.Build();
diff.Differences.Count().Should().Be(0);
The assertion fails with two differences - one for the encoding (different casing) and another for the standalone attribute. I'm not interested in any.
Whether I say n.NodeType != XmlNodeType.XmlDeclaration or n.NodeType == XmlNodeType.XmlDeclaration makes no difference.
I am using XMLUnit.Core v2.5.1.

NodeFilter only applies to nodes that are children of other nodes (returned by XmlNode.ChildNodes). Unfortunately this is not the case for the document type declaration, which probably is a bug.
In your case you want to tweak the DifferenceEvaluator and downgrade the differences you are not interested in. Something like
DifferenceEvaluators.Chain(DifferenceEvaluators.Default,
DifferenceEvaluators.DowngradeDifferencesToEqual(ComparisonType.XML_STANDALONE, ComparisonType.XML_ENCODING))
would swallow the differences.
Maybe you don't want to just count the differences but also look at their severity. The difference in encoding would be a "similar" difference, while the different values of standalone are critical.

Related

Wildcard XPath nested element selection by example

I am being given XML that looks like this:
<?xml version='1.0' encoding='UTF-8'?>
<singleton-list>
<fizzbuzz>
<amountId>123</amountId>
<countryId>456</countryId>
<action>Overwrite</action>
</fizzbuzz>
</singleton-list>
However, the outer-most element (in this case, singleton-list) will be different each time, and will be one of many different values. For instance I might get another XML message that looks like this:
<?xml version='1.0' encoding='UTF-8'?>
<aggregateList>
<fizzbuzz>
<amountId>123</amountId>
<countryId>456</countryId>
<action>Overwrite</action>
</fizzbuzz>
</aggregateList>
I'm trying to write an XPath that selects the fizzbuzz out of each message I receive, regardless of what the outer-most element is (singleton-list, aggregateList, or otherwise).
I think I could do something involving a wildcard, but I'm not sure if asterisks have special meaning in XPath or if I'm going about this the wrong way.
My best attempt at an XPath to do this selection is:
/*/fizzbuzz
Is this correct or is there a better way to do this?
either you can use //fizzbuzz or /*/fizzbuzz.

How to properly add acronyms to CustomDictionary.xml

One of my classes has a public property named Ttl. This is supposed to follow the CA1709 rules:
By convention, two-letter acronyms use all uppercase letters, and
acronyms of three or more characters use Pascal casing. The following
examples use this naming convention: 'DB', 'CR', 'Cpa', and 'Ecma'.
The following examples violate the convention: 'Io', 'XML', and 'DoD',
and for nonparameter names, 'xp' and 'cpl'.
Now, code analysis complains about my property, telling me it violates CA1704 (bad spelling).
I tried adding it to my CustomDictionary.xml like this:
<?xml version="1.0" encoding="utf-8" ?>
<Dictionary xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="CodeAnalysisDictionary.xsd">
<!-- Some unimportant elements are here in the real file -->
<Acronyms>
<CasingExceptions>
<Acronym>Ttl</Acronym> <!--Time To Live-->
</CasingExceptions>
</Acronyms>
</Dictionary>
I tried putting lower, upper and camel case into the dictionary, but none of them will remove the spelling complaint. Is there a way to add this acronym to the XML or do I just have to suppress the message for my properly named property?
You added "Ttl" as a casing exception. In fact it's not. It's three letters in Pascal case.
What you did not do is add "Ttl" as a word.
<Words>
<Recognized>
<Word>Ttl</Word>
Make sure you need it at all. Most .NET languages have "no abbreviations" as a good naming convention.

Nokogiri not parsing XML in ruby - xmlns issue?

Given the following ruby code :
require 'nokogiri'
xml = "<?xml version='1.0' encoding='UTF-8'?>
<ProgramList xmlns:xsi='http://www.w3.org/2001/XMLSchema-instance' xmlns:xsd='http://www.w3.org/2001/XMLSchema' xmlns='http://publisher.webservices.affili.net/'>
<TotalRecords>145</TotalRecords>
<Programs>
<ProgramSummary>
<ProgramID>6540</ProgramID>
<Title>Matalan</Title>
<Limitations>A bit of text
</Limitations>
<URL>http://www.matalan.co.uk</URL>
<ScreenshotURL>http://www.matalan.co.uk/</ScreenshotURL>
<LaunchDate>2009-11-02T00:00:00</LaunchDate>
<Status>1</Status>
</ProgramSummary>
<ProgramSummary>
<ProgramID>11787</ProgramID>
<Title>Club 18-30</Title>
<Limitations/>
<URL>http://www.club18-30.com/</URL>
<ScreenshotURL>http://www.club18-30.com</ScreenshotURL>
<LaunchDate>2013-05-16T00:00:00</LaunchDate>
<Status>1</Status>
</ProgramSummary>
</Programs>
</ProgramList>"
doc = Nokogiri::XML(xml)
p doc.xpath("//Programs")
gives :
=> []
Not what is expected.
On further investigation if I remove xmlns='http://publisher.webservices.affili.net/' from the initial <ProgramList> tag I get the expected output.
Indeed if I change xmlns='http://publisher.webservices.affili.net/' to xmlns:anything='http://publisher.webservices.affili.net/' I get the expected output.
So my question is what is going on here? Is this malformed XML? And what is the best strategy for dealing with it?
While it's hardcoded in this example the XML is (will be) coming from a web service.
Update
I realise I can use the remove_namespaces! method but the Nokogiri docs do say that it's "...probably is not a good thing in general" to do this. Also I'm interested in why it's happening and what the 'correct' XML should be.
The xmlns='http://publisher.webservices.affili.net/' indicates the default namespace for all elements under the one where it appears (including the element itself). That means that all elements that don’t otherwise have an explicit namespace fall under this namespace.
XPath queries don’t have default namespaces (at least in XPath 1.0), so any name that appears in one without a prefix refers to that element in no namespace.
In your code, you want to find Program elements in the http://publisher.webservices.affili.net/ namespace (since that is the default namespace), but are looking (in your XPath query) for Program elements in no namespace.
To explicitly specify the namespace in the query, you can do something like this:
doc.xpath("//pub:Programs", "pub" => "http://publisher.webservices.affili.net/")
Nokogiri makes this a little easier for namespaces declared on the root element (as in this case), declaring them for you with the same prefix. It will also declare the default namespace using the xmlns prefix, so you can also do:
doc.xpath("//xmlns:Programs")
which will give you the same result.

XPath query for node names matching a certain pattern

Is there a way to apply XPath's starts-with function to a node's name instead of its value? I want to select the FOObar and FOObaz nodes from the following XML document without selecting the notFOO node:
<?xml version="1.0" encoding="UTF-8" ?>
<RootNode>
<FOObar xmlns="http://sample.example.com">
<value>numOne</value>
</FOObar>
<FOObaz xmlns="http://sample.example.com">
<value>numTwo</value>
</FOObaz>
<notFOO xmlns="http://sample.example.com">
<value>numThree</value>
</notFOO>
</RootNode>
I get that it's possible to use starts-with to search based on text nodes, e.g.
//sample:value[starts-with(.,'num')]
Is there a way to write the following that is syntactically valid?
//sample:[starts-with(node(),'FOO')]
This question originally came with an SSCCE, but now that the question is answered, all that code is just clutter. It's still available in the revision history, of course.
Use the name() or local-name() functions to refer to nodes by name:
//*[starts-with(local-name(), 'FOO')]

Any way to strip namespace garbage from XML file?

I need to select some nodes from an XML file (AppNamespace.xaml from a Silverlight XAP file, not that it matters), but the file has namespace stuff so XPath doesn't work. I could waste most of a day trial-and-erroring the bondage-and-discipline nightmare of XmlNamespaceManager and end up with hopelessly fragile code that can't tolerate the slightest variation in the input file (not a great idea in production code), or I could use the ludicrous local-name() syntax[1].
But it would be more convenient to use XPath as a human-readable query language that can be used to return specified nodes or attribute values from arbitrary XML files.
So is there any way to strip the line-noise out of the file? Or am I stuck? Is the labyrinthine imbecility of Linq-to-XML truly the lesser evil?
[1]
//*[local-name() = 'Deployment']/*[local-name() = 'Deployment.Parts']/*[local-name() = 'AssemblyPart']/#*[local-name()='Name']
Update
Five years down the road, I stand behind the term "labyrinthine imbecility" with every fiber of my being, except for a few fibers that want to use something much stronger.
Ed, here's an example of using namespaces with the System.Xml.XPath Extensions class. I've modified it to match the input you're looking at:
string markup = #"
<Deployment xmlns="http://schemas.microsoft.com/client/2007/deployment"
xmlns:x="http://schemas.microsoft.com/winfx/2006/xaml" ...>
<Deployment.Parts>
<AssemblyPart x:Name="xamlName" Source="assembly" />
</Deployment.Parts>
</Deployment>
";
XmlReader reader = XmlReader.Create(new StringReader(markup));
XElement root = XElement.Load(reader);
XmlNameTable nameTable = reader.NameTable;
XmlNamespaceManager namespaceManager = new XmlNamespaceManager(nameTable);
nsm.AddNamespace("x", "http://schemas.microsoft.com/winfx/2006/xaml");
nsm.AddNamespace("dep", "http://schemas.microsoft.com/client/2007/deployment");
IEnumerable<XElement> elements =
root.XPathSelectElements("//dep:Deployment/dep:Deployment.Parts/dep:AssemblyPart/#x:Name", nsm);
foreach (XElement el in elements)
Console.WriteLine(el);
Not very complicated. Obviously you already know about XmlNamespaceManager, but I think you got a worse impression of it than it deserves.
When you say "hopelessly fragile code that can't tolerate the slightest variation in the input file", are you blaming namespaces in general, or XmlNamespaceManager? I don't see how either one makes it fragile... any more so than XML processing code without namespaces will not tolerate certain changes in the input document, but will tolerate others.
Have a little respect for other intelligent people in the industry, take a little time to understand the advantages behind a design before you dismiss it, and you will usually find that there are good reasons for what was done.
Not that XML namespaces couldn't be improved upon. However nobody has managed to produce a better standard and get it accepted by the community.
In XPath 2.0 you can use namespace wildcards (if you know what you are doing):
//*:Deployment/*:Deployment.Parts/*:AssemblyPart/#Name
btw. If an attribute doesn't have a prefix it is in no namespace at all. As this is most often the case, I guess, you don't need local-name() for the attribute.
I came here as a result of this search:
and I am adding an "Answer" to cheer on your "5 years on" update.
I was motivated to do this because I have an XML document that uses a tonne of namespaces -
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns="urn:schemas-microsoft-com:office:spreadsheet" xmlns:dt="uuid:C2F41010-65B3-11d1-A29F-00AA00C14882" xmlns:html="http://www.w3.org/TR/REC-html40" xmlns:msxsl="urn:schemas-microsoft-com:xslt" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:ss="urn:schemas-microsoft-com:office:spreadsheet" xmlns:x="urn:schemas-microsoft-com:office:excel" xmlns:x2="urn:schemas-microsoft-com:office:excel2" version="1.0" exclude-result-prefixes="msxsl">
and APPARENTLY I have to know what all those namespaces are in advance in order to hard code the XmlNamespaceManager, or write some code that parses the namespace declarations and adds the relevant name spaces myself. Why in the name of all that is holy does the XmlDocument not manage to do that all by itself?
XmlDocument databaseXml = new XmlDocument();
databaseXml.LoadXml(xslt.XslTransform);
var dbnsmgr = new XmlNamespaceManager(databaseXml.NameTable);
dbnsmgr.AddNamespace("xsl", "http://www.w3.org/1999/XSL/Transform");
dbnsmgr.AddNamespace("ss", "urn:schemas-microsoft-com:office:spreadsheet");
XmlElement databaseStylesElement = (XmlElement)database
Xml.DocumentElement.SelectSingleNode("/xsl:stylesheet/xsl:template");

Resources