Wildcard XPath nested element selection by example - xpath

I am being given XML that looks like this:
<?xml version='1.0' encoding='UTF-8'?>
<singleton-list>
<fizzbuzz>
<amountId>123</amountId>
<countryId>456</countryId>
<action>Overwrite</action>
</fizzbuzz>
</singleton-list>
However, the outer-most element (in this case, singleton-list) will be different each time, and will be one of many different values. For instance I might get another XML message that looks like this:
<?xml version='1.0' encoding='UTF-8'?>
<aggregateList>
<fizzbuzz>
<amountId>123</amountId>
<countryId>456</countryId>
<action>Overwrite</action>
</fizzbuzz>
</aggregateList>
I'm trying to write an XPath that selects the fizzbuzz out of each message I receive, regardless of what the outer-most element is (singleton-list, aggregateList, or otherwise).
I think I could do something involving a wildcard, but I'm not sure if asterisks have special meaning in XPath or if I'm going about this the wrong way.
My best attempt at an XPath to do this selection is:
/*/fizzbuzz
Is this correct or is there a better way to do this?

either you can use //fizzbuzz or /*/fizzbuzz.

Related

How to ignore XML declaration differences with XmlUnit?

How can I configure XmlUnit.Net to ignore the XML declaration when comparing two documents?
Assume I have the following control document:
<?xml version="1.0" encoding="utf-8"?>
<a><amount>1</amount></a>
Which I want to compare with:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<a><amount>1</amount></a>
The comparison should result in no differences.
My expectation would be that using a NodeFilter like so should work, but it doesn't:
var diff = DiffBuilder.Compare(control)
.WithTest(test)
.WithNodeFilter(n => n.NodeType != XmlNodeType.XmlDeclaration)
.Build();
diff.Differences.Count().Should().Be(0);
The assertion fails with two differences - one for the encoding (different casing) and another for the standalone attribute. I'm not interested in any.
Whether I say n.NodeType != XmlNodeType.XmlDeclaration or n.NodeType == XmlNodeType.XmlDeclaration makes no difference.
I am using XMLUnit.Core v2.5.1.
NodeFilter only applies to nodes that are children of other nodes (returned by XmlNode.ChildNodes). Unfortunately this is not the case for the document type declaration, which probably is a bug.
In your case you want to tweak the DifferenceEvaluator and downgrade the differences you are not interested in. Something like
DifferenceEvaluators.Chain(DifferenceEvaluators.Default,
DifferenceEvaluators.DowngradeDifferencesToEqual(ComparisonType.XML_STANDALONE, ComparisonType.XML_ENCODING))
would swallow the differences.
Maybe you don't want to just count the differences but also look at their severity. The difference in encoding would be a "similar" difference, while the different values of standalone are critical.

Nokogiri not parsing XML in ruby - xmlns issue?

Given the following ruby code :
require 'nokogiri'
xml = "<?xml version='1.0' encoding='UTF-8'?>
<ProgramList xmlns:xsi='http://www.w3.org/2001/XMLSchema-instance' xmlns:xsd='http://www.w3.org/2001/XMLSchema' xmlns='http://publisher.webservices.affili.net/'>
<TotalRecords>145</TotalRecords>
<Programs>
<ProgramSummary>
<ProgramID>6540</ProgramID>
<Title>Matalan</Title>
<Limitations>A bit of text
</Limitations>
<URL>http://www.matalan.co.uk</URL>
<ScreenshotURL>http://www.matalan.co.uk/</ScreenshotURL>
<LaunchDate>2009-11-02T00:00:00</LaunchDate>
<Status>1</Status>
</ProgramSummary>
<ProgramSummary>
<ProgramID>11787</ProgramID>
<Title>Club 18-30</Title>
<Limitations/>
<URL>http://www.club18-30.com/</URL>
<ScreenshotURL>http://www.club18-30.com</ScreenshotURL>
<LaunchDate>2013-05-16T00:00:00</LaunchDate>
<Status>1</Status>
</ProgramSummary>
</Programs>
</ProgramList>"
doc = Nokogiri::XML(xml)
p doc.xpath("//Programs")
gives :
=> []
Not what is expected.
On further investigation if I remove xmlns='http://publisher.webservices.affili.net/' from the initial <ProgramList> tag I get the expected output.
Indeed if I change xmlns='http://publisher.webservices.affili.net/' to xmlns:anything='http://publisher.webservices.affili.net/' I get the expected output.
So my question is what is going on here? Is this malformed XML? And what is the best strategy for dealing with it?
While it's hardcoded in this example the XML is (will be) coming from a web service.
Update
I realise I can use the remove_namespaces! method but the Nokogiri docs do say that it's "...probably is not a good thing in general" to do this. Also I'm interested in why it's happening and what the 'correct' XML should be.
The xmlns='http://publisher.webservices.affili.net/' indicates the default namespace for all elements under the one where it appears (including the element itself). That means that all elements that don’t otherwise have an explicit namespace fall under this namespace.
XPath queries don’t have default namespaces (at least in XPath 1.0), so any name that appears in one without a prefix refers to that element in no namespace.
In your code, you want to find Program elements in the http://publisher.webservices.affili.net/ namespace (since that is the default namespace), but are looking (in your XPath query) for Program elements in no namespace.
To explicitly specify the namespace in the query, you can do something like this:
doc.xpath("//pub:Programs", "pub" => "http://publisher.webservices.affili.net/")
Nokogiri makes this a little easier for namespaces declared on the root element (as in this case), declaring them for you with the same prefix. It will also declare the default namespace using the xmlns prefix, so you can also do:
doc.xpath("//xmlns:Programs")
which will give you the same result.

XPath query for node names matching a certain pattern

Is there a way to apply XPath's starts-with function to a node's name instead of its value? I want to select the FOObar and FOObaz nodes from the following XML document without selecting the notFOO node:
<?xml version="1.0" encoding="UTF-8" ?>
<RootNode>
<FOObar xmlns="http://sample.example.com">
<value>numOne</value>
</FOObar>
<FOObaz xmlns="http://sample.example.com">
<value>numTwo</value>
</FOObaz>
<notFOO xmlns="http://sample.example.com">
<value>numThree</value>
</notFOO>
</RootNode>
I get that it's possible to use starts-with to search based on text nodes, e.g.
//sample:value[starts-with(.,'num')]
Is there a way to write the following that is syntactically valid?
//sample:[starts-with(node(),'FOO')]
This question originally came with an SSCCE, but now that the question is answered, all that code is just clutter. It's still available in the revision history, of course.
Use the name() or local-name() functions to refer to nodes by name:
//*[starts-with(local-name(), 'FOO')]

How to find all nodes of a specific type in XPath

Lets say i have the following form data instance in my view.xml:
<xhtml:html xmlns:xhtml="http://www.w3.org/1999/xhtml"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:xxforms="http://orbeon.org/oxf/xml/xforms"
xmlns:exforms="http://www.exforms.org/exf/1-0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<xhtml:head>
<xforms:instance id="instanceData">
<form xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<fruits>
<fruit>
<fruit-name>Mango</fruit-name>
</fruit>
<fruit>
<fruit-name>Apple</fruit-name>
</fruit>
<fruit>
<fruit-name>Banana</fruit-name>
</fruit>
</fruits>
</form>
</xforms:instance>
</xhtml:head>
I want to select all the fruit names from the above instance.
I tried the following ways but it always selects the first fruit.
instance('instanceData')/fruits/fruit[*]/fruit-name
instance('instanceData')/fruits/fruit/fruit-name
instance('instanceData')/fruits/fruit[position()>0]/fruit-name
Please provide a way to overcome this in XPATH
try this
"//fruit-name"
It shall find all fruitnames wherever they are in the document hierarchy.
If you want to select all the <fruit-name> from the instance instanceData (<xforms:instance id="instanceData">) that looks like the one you have in your question, the following should do it:
instance('instanceData')/fruits/fruit/fruit-name
If this doesn't work, one common reason is that you have a default namespace declaration in the document that contains your instance, like: xmlns="http://www.w3.org/1999/xhtml". If you have this, you need to undo that default namespace declaration on where you declare the instance, with:
<xforms:instance xmlns="" id="instanceData">
(And if this is the issue, my advice is not to use default namespace declarations. Ever. Instead declare xmlns:xhtml="http://www.w3.org/1999/xhtml" and use the xhtml prefix everywhere.)
First:
It may be a typo any way to point out you xml has wrong node ending
<service>
Second:
your XPATH is very much valid but when you parse it out you need to iterate over the result set as like its a sequence of node and not a single value.
e.g) in JDOM :< Element.selectObject Vs selectSingleNodes Vs selectAsArray kind.
In your XForms you need to iterate over the resultset to get the list of fruits.
if you want only fruit names then you could try
instance('instanceData')/fruits/fruit/fruit-name/text()

XPath concat multiple nodes

I'm not very familiar with xpath. But I was working with xpath expressions and setting them in a database. Actually it's just the BAM tool for biztalk.
Anyway, I have an xml which could look like:
<File>
<Element1>element1<Element1>
<Element2>element2<Element2>
<Element3>
<SubElement>sub1</SubElement>
<SubElement>sub2</SubElement>
<SubElement>sub3</SubElement>
<Element3>
</File>
I was wondering if there is a way to use an xpath expression of getting all the SubElements concatted? At the moment, I am using:
/*[local-name()='File']/*[local-name()='Element3']/*[local-name()='SubElement']
This works if it only has one index. But apparently my xml sometimes has more nodes, so it gives NULL. I could just use
/*[local-name()='File']/*[local-name()='Element3']/*[local-name()='SubElement'][0]
but I need all the nodes. Is there a way to do this?
Thanks a lot!
Edit: I changed the XML, I was wrong, it's different, it should look like this:
<item>
<element1>el1</element1>
<element2>el2</element2>
<element3>el3</element3>
<element4>
<subEl1>subel1a</subEl1>
<subEl2>subel2a</subEl2>
</element4>
<element4>
<subEl1>subel1b</subEl1>
<subEl2>subel2b</subEl2>
</element4>
</item>
And I need to have a one line code to get a result like: "subel2a subel2b";
I need the one line because I set this xpath expression as an xml attribute (not my choice, it's specified). I tried string-join but it's not really working.
string-join(/file/Element3/SubElement, ',')
/File/Element3/SubElement will match all of the SubElement elements in your sample XML. What are you using to evaluate it?
If your evaluation method is subject to the "first node rule", then it will only match the first one. If you are using a method that returns a nodeset, then it will return all of them.
You can get all SubElements by using:
//SubElement
But this won't keep them grouped together how you want. You will want to do a query for all elements that contain a SubElement (basically do a search for the parent of any SubElements).
//parent::SubElement
Once you have that, you could (depending on your programming language) loop through the parents and concatenate the SubElements.

Resources