How to use xmlns declarations with XPath in Nokogiri - ruby

I'm using Nokogiri::XML to parse responses from Amazon SimpleDB. The response is something like:
<SelectResponse xmlns="http://sdb.amazonaws.com/doc/2007-11-07/">
<SelectResult>
<Item>
<Attribute><Name>Foo</Name><Value>42</Value></Attribute>
<Attribute><Name>Bar</Name><Value>XYZ</Value></Attribute>
</Item>
</SelectResult>
</SelectResponse>
If I just hand the response straight over to Nokogiri, all XPath queries (e.g. doc/"//Item/Attribute[Name='Foo']/Value") return an empty array. But if I remove the xmlns attribute from the SelectResponse tag, it works perfectly.
Is there some extra thing I need to do to account for the namespace declaration? This workaround feels horribly like a hack.

That XPath query looks for elements that are not in any namespace. You need to tell your XPath processor that you are looking for elements in the http://sdb.amazonaws.com/doc/2007-11-07/ namespace.
One way to do that with Nokogiri is:
doc = Nokogiri::XML.parse(...)
doc.xpath("//aws:Item/aws:Attribute[Name='Foo']/aws:Value", {"aws" => "http://sdb.amazonaws.com/doc/2007-11-07/"})

I found "Namespaces in XML" really helpful in understanding what's going on.
Basically if you have a namespace defined via xmlns=, you must use a namespace in your XPath searches.
So in your case, you could do one of three things:
Remove the xmlns attribute from the root SearchResponse. In that case your original, namespace-less XPath query will work.
Use the default namespace in your XPath query:
doc/"//xmlns:Item/xmlns:Attribute[xmlns:Name='Foo']/xmlns:Value"
Define a custom namespace in the second argument of the xpath method and use that in your query, as shown in hrnt's solution above.

Related

xpath to select namespace from xml document

Using xslt 1.0 (BizTalk 2016) I'm looking for a generic way to select the namespace of any valid xml document
For example, I have the following xml document:
<?xml version="1.0" encoding="utf-8"?>
<PortfolioActivation xmlns="http://www.random.com/bo/request/portfolioactivation">
<Portfolio>
<ExternalId>PRT-00000450</ExternalId>
<InternalId>c8b0239c-1e98-e911-a8b1-00224800449b</InternalId>
<Version>8627558</Version>
<Type>001</Type>
</Portfolio>
</PortfolioActivation>
Given that the Root element value could be anything, what would be the xpath to select the value of the namespace i.e http://www.random.com/bo/request/portfolioactivation
I had hoped "/*/#xmlns" would work but it doesn't.
The namespace of the outermost element can be found using namespace-uri(/*).
Alternatively, the default namespace that's in scope for the outermost element is /*/namespace::*[name()=''].
These aren't the same thing. Consider
<p:root xmlns="a.ns" xmlns:p="b.ns"/>
The first expression will give you "b.ns", the second will give you "a.ns". It's not clear from your question which you want.
Note that namespaces are not attributes in the XDM data model, so you never access them using the attribute axis. #xmlns will therefore never work.

XPath expression to pluck out attribute value

I have the following XML:
<envelope>
<action>INSERT</action>
<auditId>123</auditId>
<payload class="vendor">
<fizz buzz="3"/>
</payload>
</envelope>
I am trying to write an XPath expression that will pluck out vendor (value for the payload's class attribute) or whatever its value is.
My best attempts are:
/dataEnvelope/payload[#class="vendor"]#class
But this requires the expression to already know that vendor is the value of the attribute. But if the XML is:
<dataEnvelope>
<action>INSERT</action>
<auditId>123</auditId>
<payload class="foobar">
<fizz buzz="3"/>
</payload>
</dataEnvelope>
Then I want the expression to pluck out the foobar. Any ideas where I'm going awry?
If you need #class value from payload node, you can use
/dataEnvelope/payload[#class]/#class
or just
/dataEnvelope/payload/#class
At first, your two XML files are out-of-sync - one references envelope and the other references dataEnvelope. So exchange one for the other, if necessary.
So, to get the attribute value of payload, you can use an XPath expression like this which uses a child's attribute value to be more specific:
/envelope/payload[fizz[#buzz='3']]/#class
Output is:
vendor
If the document element can/will change, then you can keep the XPath more generic and select the value of the class attribute from the payload element that is a child of any element:
/*/payload/#class
If you know that it will always be a child of envelope, then this would be more specific(but the above would still work):
/envelope/payload/#class

How to get the actual Hyperlink element inside the main document part using docx4j

So I have a case where I need to be able to work on the actual Hyperlink element inside the body of the docx, not just the target URL or the internal/externality of the link.
As a possible additional wrinkle this hyperlink wasn't present in the docx when it was opened but instead was added by the docx4j-xhtmlImporter.
I've iterated the list of relationships here: wordMLPackage.getMainDocumentPart().getRelationshipsPart().getRelationships().getRelationship()
And found the relationship ID of the hyperlink I want. I'm trying to use an XPath query: List<Object> results = wordMLPackage.getMainDocumentPart().getJAXBNodesViaXPath("//w:hyperlink[#r:id='rId11']", false);
But the list is empty. I also thought that it might need a refresh because I added the hyperlink at runtime so I tried with the refreshXMLFirst parameter set to true. On the off chance it wasn't a real node because it's an inner class of P, I also tried getJAXBAssociationsForXPath with the same parameters as above and that doesn't return anything.
Additionally, even XPath like "//w:hyperlink" fails to match anything.
I can see the hyperlinks in the XML if I unzip it after saving to a file, so I know the ID is right: <w:hyperlink r:id="rId11">
Is XPath the right way to find this? If it is, what am I doing wrong? If it's not, what should I be doing?
Thanks
XPathHyperlinkTest.java is a simple test case which works for me
You might be having problems because of JAXB, or possibly because of the specific way in which the binder is being set up in your case (do you start by opening an existing docx, or creating a new one?). Which docx4j version are you using?
Which JAXB implementation are you using? If its the Sun/Oracle implementation (the reference implementation, or the one included in their JDK/JRE), it might be this which is causing the problem, in which case you might try using MOXy instead.
An alternative to using XPath is to traverse the docx; see finders/ClassFinder.java
Try without namespace binding
List<Object> results = wordMLPackage.getMainDocumentPart().getJAXBNodesViaXPath("//*:hyperlink[#*:id='rId11']", false);

Capybara field.has_css? matcher

I'm using following spec with MiniTest::Spec and Capybara:
find_field('Email').must_have_css('[autofocus]')
to check if the field called 'Email' has the autofocus attribute. The doc says following:
has_css?(path, options = {})
Checks if a given CSS selector is on the page or current node.
As far as I understand, field 'Email' is a node, so calling must_have_css should definitely work! What I'm doing wrong?
Got an answer by Jonas Nicklas:
No, it shouldn't work. has_css? will check if any of the descendants
of the element match the given CSS. It will not check the element
itself. Since the autofocus property is likely on the email field
itself, has_css? will always return false in this case.
You might try:
find_field('Email')[:autofocus].should be_present
this can also be done with XPath, but I can't recall the syntax off
the top of my head.
My solution:
find_field('Email')[:autofocus].must_equal('autofocus')
Off top of my head. Can you use has_selector?(). Using Rspec wit Capy:
page.should have_selector('email', autofocus: true)
Also check Capybara matchers http://rubydoc.info/github/jnicklas/capybara/master/Capybara/Node/Matchers
I've not used MiniTest before but your syntax for checking for the attribute looks correct.
My concern would be with your use of find_field. The docs say:
Find a form field on the page. The field can be found by its name, id or label text.
It looks like you are trying to find the field based on the label. If so I would check that you have the for attribute on it and and that it has the correct id of the form field you are looking for. To rule out this being the issue you could temporarily slap an id your form field and look for that explicitly.

xpath using firexpath

I have firexpath, and it doesn't seem to be working with xpath. Even something simple //div returns no results. Even if I click on an existing node, say "copy XPath" and then paste that XPath into filter input box, it says "no nodes found". //*[name()='div'] does work though. Am I missing a namespace or something? Here is what root tag looks like (it's a valid XHTML):
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en-us" class="ff ff3">
I didn't find a support forum for FireXPath, so I'm posting it here.
In case you cannot register a namespace (for the default namespace) and then prefix every element name in the XPath expression with the registered prefix, then you can use an XPath expression like this:
/*[name()='x']/*[name()='y']/*[name()='z']
In case there are elements belonging to other namespaces (than the default namespace), you'll have to use a more specific XPath expression:
/*[name()='x' and namespace-uri()='http://www.w3.org/1999/xhtml']
/*[name()='y' and namespace-uri()='http://www.w3.org/1999/xhtml']
/*[name()='z' and namespace-uri()='http://www.w3.org/1999/xhtml']
If you could have registered the default namespace and the prefix was (say) "p", then the above would be equivalent to a now simpler expression:
/p:x/p:y/p:z
I have not used firexpath but it looks as though the default namespace xmlns="http://www.w3.org/1999/xhtml" is preventing xpath from finding the div as the div element inside the element which defies xmlns with be prefixed with that namespace.
You therefore would need to register the namespace with firexpath using some sort of method call??? then //div should work or your expression is fine as well, if you were wanting to consider namespaces in the expression you could include a check for the namespace like so
//*[name()='div' and namespace-uri()='http://www.w3.org/1999/xhtml']
EDIT:
I have downloaded firexpath which now is called firepath and it doesn't look possible to register a namespace so it looks like you will have to the name() method

Resources