Read XML file with namespace , using java [duplicate] - xpath

This question already has answers here:
how to retrieve XML data using XPath which contains namespace in Java?
(2 answers)
Closed 9 years ago.
i'd like to know how i can browse this XML file to get some values (like VersionXSD, Support, etx...). Without namespaces i can do it but with namespace i get no result.
<?xml version="1.0" encoding="UTF-8"?>
<ROOT Id="myRefId"
xsi:schemaLocation="urn:ir:si_my_xsd_07.xsd"
xmlns="urn:ir:se_pelmed" xmlns:nmsps="urn:siram:nmsps"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<Horodatage>2013-03-05T12:00:00</Horodatage>
<VersionXSD>01.01</VersionXSD>
<BASE>
<nmsps:Numero>NUM-123-456</nmsps:Numero>
<nmsps:Type>TATA</nmsps:Type>
<nmsps:Support>PASTIS</nmsps:Support>
<nmsps:DateLimite>2013-03-04</nmsps:DateLimite>
<nmsps:Serial>0</nmsps:Serial>
<nmsps:BASEA>
<nmsps:NomFamille>Paul</nmsps:NomFamille>
<nmsps:Prenom>Hilaire</nmsps:Prenom>
<nmsps:DateNai>1979-10-11</nmsps:DateNai>
<nmsps:Sexe>Masculin</nmsps:Sexe>
</nmsps:BASEA>
<nmsps:BASEC Id="1">
<nmsps:Validite>2013-03-04</nmsps:Validite>
<nmsps:Sub>true</nmsps:Sub>
<nmsps:Stup>true</nmsps:Stup>
</nmsps:BASEC>
<nmsps:BASED Id="1">
<nmsps:Numero R="DD1">numero1</nmsps:Numero>
<nmsps:Categorie>Categorie1</nmsps:Categorie>
</nmsps:BASED>
<nmsps:BASED Id="2">
<nmsps:Numero R="DD2">numero2</nmsps:Numero>
<nmsps:Categorie>Categorie2</nmsps:Categorie>
</nmsps:BASED>
<nmsps:BASED Id="3">
<nmsps:Numero R="DD3">numero3</nmsps:Numero>
<nmsps:Categorie>Categorie3</nmsps:Categorie>
</nmsps:BASED>
</BASE>
</ROOT>

You can either bypass the namespace using local-name() i.e. //*[local-name()='Support']
or instantiate one as flup suggested, which is how I would do it, it makes things much easier.

Related

ant.xslt/Xalan fails referencing namespaced nodes from an external file

In an .xsl file I want to use nodes from a separate file ("foo.xsd"). The .xsl file uses an explicit namespace prefix, the external file doesn't but rather relies on a default namespace. Their namespace URIs match up.
Reading in the nodes, with ant.xslt the following XPath expression results in an ArrayIndexOutOfBounds exception later
document('foo.xsd')/xsd:schema/xsd:*
while it works when removing the last reference to the namespace prefix
document('foo.xsd')/xsd:schema/*
Here is a minimal example that reproduces the issue. The transformation input file
<?xml version="1.0" encoding="UTF-8"?>
<schema xmlns="http://www.w3.org/2001/XMLSchema">
<element name="bar"/>
</schema>
and a transformation .xsl file
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xsd="http://www.w3.org/2001/XMLSchema"
version="1.0">
<xsl:variable name="nodeSet" select="document('foo.xsd')/xsd:schema/xsd:*[#name]"/>
<xsl:template match="/xsd:schema/xsd:element">
<xsl:value-of select="$nodeSet/#name"/>
</xsl:template>
</xsl:stylesheet>
The referenced foo.xsd is just a copy of the input file, so in this cut down example I'm running over one instance of the file and reading in the other instance in the stylesheet.
Goold ol' xsltproc is extracting the right attribute value ("bar"). ant.xslt with the default processor (Xalan) throws an ArrayIndexOutOfBoundsException (I presume when looking for the colon insided the element name).The problem only arises when referencing nodeSet as in the <xsl:value-of> element.
The <xsl:template> matches in all cases, using prefixes.
My question is: Did I hit a bug in Xalan, or am I doing something generally wrong?
I'm aware of the various work-arounds concerning namespace prefixes, like using [local-name() = 'element'] and such, so please don't post answers in that vein. I'm looking for a general answer whether this should work (like, according to the specs).
Background Material
Stacktrace (part.) that hints at Xalan:
...
Caused by: javax.xml.transform.TransformerException: java.lang.ArrayIndexOutOfBoundsException: Index -1 out of bounds for length 512
at java.xml/com.sun.org.apache.xalan.internal.xsltc.trax.TransformerImpl.transform(TransformerImpl.java:783)
at java.xml/com.sun.org.apache.xalan.internal.xsltc.trax.TransformerImpl.transform(TransformerImpl.java:370)
at org.apache.tools.ant.taskdefs.optional.TraXLiaison.transform(TraXLiaison.java:201)
at org.apache.tools.ant.taskdefs.XSLTProcess.process(XSLTProcess.java:870)
... 126 more
Caused by: java.lang.ArrayIndexOutOfBoundsException: Index -1 out of bounds for length 512
at java.xml/com.sun.org.apache.xml.internal.utils.SuballocatedIntVector.elementAt(SuballocatedIntVector.java:441)
at java.xml/com.sun.org.apache.xml.internal.dtm.ref.DTMDefaultBase._firstch(DTMDefaultBase.java:523)
at java.xml/com.sun.org.apache.xalan.internal.xsltc.dom.SAXImpl.access$200(SAXImpl.java:73)
at java.xml/com.sun.org.apache.xalan.internal.xsltc.dom.SAXImpl$NamespaceChildrenIterator.next(SAXImpl.java:1431)
at java.xml/com.sun.org.apache.xalan.internal.xsltc.dom.CurrentNodeListIterator.setStartNode(CurrentNodeListIterator.java:158)
at java.xml/com.sun.org.apache.xalan.internal.xsltc.dom.StepIterator.setStartNode(StepIterator.java:97)
at java.xml/com.sun.org.apache.xalan.internal.xsltc.dom.StepIterator.setStartNode(StepIterator.java:97)
at java.xml/com.sun.org.apache.xalan.internal.xsltc.dom.DupFilterIterator.setStartNode(DupFilterIterator.java:97)
at java.xml/com.sun.org.apache.xalan.internal.xsltc.dom.CachedNodeListIterator.setStartNode(CachedNodeListIterator.java:57)
at jdk.translet/die.verwandlung.test.topLevel()
at jdk.translet/die.verwandlung.test.transform()
at java.xml/com.sun.org.apache.xalan.internal.xsltc.runtime.AbstractTranslet.transform(AbstractTranslet.java:624)
at java.xml/com.sun.org.apache.xalan.internal.xsltc.trax.TransformerImpl.transform(TransformerImpl.java:776)
... 129 more
Invokation is through Gradle via shell command gradlew mytask using the Gradle built-in Ant using its built-in ant.xslt task.
build.gradle:
tasks.register('mytask') {
doLast {
ant.xslt(
baseDir: '.',
in: 'input.xsd',
out: 'out.xml',
style: 'stylefile.xsl'
)
}
}
Your xslt/xml code is fine (works with Saxon). So it's either something in the way you're running the transformation, or it's a bug in the version of Xalan that you're using.
Xalan shouldn't be throwing an ArrayIndexOutOfBounds exception anyway. It's presumably Xalan code on the stack trace?

How to turn a file into a Nokogiri::XML object?

I have a sample XML file (let's call it example.xml for the sake of this question) and want to turn it into a Nokogiri object.
According to documentation and lots of other online sources, this should work:
xml = Nokogiri::XML(File.read("example.txt"))
But the value of xml.to_xml is only:
"<?xml version=\"1.0\"?>\n"
In other words, it's ignoring the rest of the file. There are many tags afterwards and none of them are in the xml object.
How do I get Nokogiri to get all the tags?
Here's the XML I'm using:
<? xml version="1.0" encoding="UTF-8" ?>
<Document>
<Test>Test</Test>
</Document>
It looks like you are trying to parse an invalid XML doc.
This can be fixed by removing the spaces in the XML declaration:
<?xml version="1.0" encoding="UTF-8"?>
<Document>
<Test>Test</Test>
</Document>
How I figured this out
By default, when Nokogiri has errors parsing a document it populates an errors array.
xml = Nokogiri::XML(File.read("example.txt"))
p xml.errors
# => [#<Nokogiri::XML::SyntaxError: xmlParsePI : no target name>, #<Nokogiri::XML::SyntaxError: Start tag expected, '<' not found>]
You can also configure Nokogiri to raise an exception of it has parsing errors:
xml = Nokogiri::XML(File.read("example.txt")) do |config|
config.strict
end
Both of these cases show that there were issues parsing the document

Cant Read xml file using sax Parser with Nokogiri

I am using ruby 1.9.3 with rails 3.1. My requirement is that there is a parser file like below. when i opened with browser; Tags are not aligned in order. After the <item>; the data are clubbed format. There is a presence of
<?xml version="1.0" encoding="utf-8"?>
when I opened in sublime text; it shows after the <item>
<![CDATA[<?xml version="1.0" encoding="utf-8"?>
also after the </item> there is ]]> present. The data needs to be parsed are inside this <item></item>. the method called parse_file form Nokogiri called only start_element, end_element. When we tried manually by editing the file via removing the above statements; then it will call the characters method to fetch the data. Below is the example code.is there any other way?.
<batch transactionType="HC"><item><?xml version="1.0" encoding="utf-8"?><C><CI><Ve>00501</Ve></CI></C></item></batch>
You can do it easily using "xml-simple". Assuming your XML file name is "test.xml", first install the gem:
gem install xml-simple
Then, you can try:
require "XmlSimple"
abc = XmlSimple.xml_in File.read("test.xml")
puts abc['item']
The output should be:
{"C"=>[{"CI"=>[{"Ve"=>["00501"]}]}]}

Why can't I get a result from an XPath with namespace in the root element? [duplicate]

This question already has answers here:
Nokogiri/Xpath namespace query
(3 answers)
Closed 8 years ago.
This is probably an XML namespace newbie question but I can't figure out how to get an XPath to work with the following trunctated XML with this particular root element:
<?xml version="1.0" encoding="UTF-8"?>
<CreateOrUpdateEventsRequest xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://dhamma.org" version="3-0-0">
<LanguageKey>
<IsoCode>en</IsoCode>
</LanguageKey>
<Publish>
<Value>true</Value>
</Publish>
<Events>
<Event>
<EventKey>
<LocationKey>
<SubDomain>rasmi</SubDomain>
</LocationKey>
<EventId>10DayPDFStdTag</EventId>
</EventKey>
</Event>
</Events>
</LanguageKey>
</CreateOrUpdateEventsRequest>
Using Ruby and Nokogiri (with a just updated libxml2), it works fine with XPath only if I delete all the extra info in the root element, making it:
<CreateOrUpdateEventsRequest>
Otherwise nothing works:
$> #doc.xpath("//CreateOrUpdateEventsRequest") #=> [] with original header, an array of nodes with modified header
$> #doc.xpath("//LanguageKey") #=> [] with the original header, an array of nodes with modified header
$> #doc.xpath("//xmlns:LanguageKey") #=> undefined namespace prefix with the original
How do I address namespaces like this with XPath?
Many thanks for the help.
The answer seems to be that the XML re-declared XMLNS when it should have declared the namespace with a prefix as in xmlns:myns.
From www.w3.org:
The XML specification reserves all names beginning with the letters 'x', 'm', 'l' in any combination of upper- and lower-case for use by the W3C. To date three such names have been given definitions—although these names are not in the XML namespace, they are listed here as a convenience to readers and users:
xml: See http://www.w3.org/TR/xml/#NT-XMLDecl and http://www.w3.org/TR/xml-names/#xmlReserved
xmlns: See http://www.w3.org/TR/xml-names/#ns-decl
xml-stylesheet: See The xml-stylesheet processing instruction
I don't use Nokogiri nor Ruby,
but you need to register a prefix for namespace http://dhamma.org
When I read http://nokogiri.org/tutorials/searching_a_xml_html_document.html
I understand you must do something like
$> #doc.xpath('//dha:LanguageKey', 'dha' => 'http://dhamma.org')
Here's some code to consider. Starting with code to create a Nokogiri::XML::Document:
require 'nokogiri'
XML = <<EOT
<?xml version="1.0" encoding="UTF-8"?>
<CreateOrUpdateEventsRequest xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://dhamma.org" version="3-0-0">
<LanguageKey>
<IsoCode>en</IsoCode>
</LanguageKey>
<Publish>
<Value>true</Value>
</Publish>
<Events>
<Event>
<EventKey>
<LocationKey>
<SubDomain>rasmi</SubDomain>
</LocationKey>
<EventId>10DayPDFStdTag</EventId>
</EventKey>
</Event>
</Events>
</LanguageKey>
</CreateOrUpdateEventsRequest>
EOT
doc = Nokogiri::XML(XML)
Here's the root node's name:
doc.root.name # => "CreateOrUpdateEventsRequest"
The docs say:
When using CSS, if the namespace is called “xmlns”, you can even omit the namespace name.
doc.at('CreateOrUpdateEventsRequest').name # => "CreateOrUpdateEventsRequest"
doc.at('LanguageKey').to_xml # => "<LanguageKey>\n <IsoCode>en</IsoCode>\n </LanguageKey>"
Using XPath, we can specify the default namespace as:
doc.at('//xmlns:LanguageKey').to_xml # => "<LanguageKey>\n <IsoCode>en</IsoCode>\n </LanguageKey>"
Sometimes, if there are a lot of namespaces it makes sense to use collect_namespaces and pass them in:
name_spaces = doc.collect_namespaces # =>
doc.at('//xmlns:LanguageKey', name_spaces).to_xml # => "<LanguageKey>\n <IsoCode>en</IsoCode>\n </LanguageKey>"
You'll need to look through the documentation for Nokogiri::XML::Node for more information on the various methods.
I recommend using CSS selectors for simplicity and readability over XPath, as a first try. I think XPath has more functionality but it makes my eyes bug out sometimes, so I prefer CSS.

XPATH Error : Unable to evaluate expression

I have an xml like this as mentioned below. I am trying to obtain value for Cardnumber using following expression.
XPATH :
paymentService/ns0:submit/ns0:order/ns0:paymentDetails/ns0:VISA-SSL/cardNumber
But it's giving me error. Can any1 guide me on this?
<?xml version="1.0" encoding="UTF-8"?>
<paymentService version="1.0">
<ns0:submit xmlns:ns0="http://www.tibco.com/ns/no_namespace_schema_location/Payment/PaymentProcessors/WorldPay_CC/SharedResources/Schemas/paymentService_v1.dtd">
<ns0:order>
<description>description</description>
<amount value="500" currencyCode="EUR" exponent="2"/>
<ns0:paymentDetails>
<ns0:VISA-SSL>
<cardNumber>00009875083428500</cardNumber>
<expiryDate>
<date month="02" year="2008"/>
</expiryDate>
<cardHolderName>test</cardHolderName>
</ns0:VISA-SSL>
<session shopperIPAddress="192.165.22.35" id=""/>
</ns0:paymentDetails>
<shopper>
<browser>
<acceptHeader>text/html</acceptHeader>
<userAgentHeader>mozilla 5.0</userAgentHeader>
</browser>
</shopper>
</ns0:order>
</ns0:submit>
</paymentService>
Thanks
Your xmlns:ns0 is misplaced, and (think of this way) because ns0 is defined after <ns0:submit> tag, ns0:submit is "undefined" and thus the parse error.
Edit:
If you need to use this XPath in PHP, you'll either have to declare the namespace before use:
<?xml version="1.0" encoding="UTF-8"?>
<paymentService version="1.0" xmlns:ns0="http://www.tibco.com/ns/no_namespace_schema_location/Payment/PaymentProcessors/WorldPay_CC/SharedResources/Schemas/paymentService_v1.dtd">
<ns0:submit>
Or register the namespace before evaluating your XPath (Thanks to #MiMo for pointing out):
$xml->registerXPathNamespace("ns0","http://www.tibco.com/ns/no_namespace_schema_location/Payment/PaymentProcessors/WorldPay_CC/SharedResources/Schemas/paymentService_v1.dtd");
Also add a slash before your XPATH:
/paymentService/ns0:submit/ns0:order/ns0:paymentDetails/ns0:VISA-SSL/cardNumber
Live demo with declaration first or Live demo with namespace registration (both are demonstrated in PHP).
The problem is the node
<paymentService version="1.0">
Since this is not ended you have to comment it or end it properly.
If you comment that try out with this XPATH
/ns0:submit/ns0:order/ns0:paymentDetails/ns0:VISA-SSL/cardNumber
You need to register the namespace before evaluating the XPath:
$xml->registerXPathNamespace('ns0', 'http://www.tibco.com/ns/no_namespace_schema_location/Payment/PaymentProcessors/WorldPay_CC/SharedResources/Schemas/paymentService_v1.dtd');
where $xml is a SimpleXMLElement variable containing your XML.
Your XPath needs to start with / as per Passerby answer:
/paymentService/ns0:submit/ns0:order/ns0:paymentDetails/ns0:VISA-SSL/cardNumber

Resources