apache fop 2.7 with spring boot 2.7.1 getting failed to generate pdf [duplicate] - spring-boot

I am generating a PDF document with XML file as input using Apache FOP 2.4.
To prevent XXE-Attacks I need to set the secure processing feature (FEATURE_SECURE_PROCESSING) in TransformerFactory:
InputStream xslTransformer = getClass().getClassLoader().getResourceAsStream("foo.xsl");
TransformerFactory transformerFactory = TransformerFactory.newInstance();
transformerFactory.setFeature(FEATURE_SECURE_PROCESSING, true);
Transformer transformer = transformerFactory.newTransformer(new StreamSource(xslTransformer));
transformer.transform(new DOMSource(), new SAXResult(fop.getDefaultHandler()));
After setting this feature I can't generate any PDF document and I'm getting warnings:
SystemId Unknown; Line #49; Column #99; "master-name" attribute is not allowed on the fo:simple-page-master element!
SystemId Unknown; Line #49; Column #99; "initial-page-number" attribute is not allowed on the fo:simple-page-master element!
SystemId Unknown; Line #49; Column #99; "page-height" attribute is not allowed on the fo:simple-page-master element!
SystemId Unknown; Line #49; Column #99; "page-width" attribute is not allowed on the fo:simple-page-master element!
etc ...
Here is a section of XSL file (foo.xsl):
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0"
xmlns:fo="http://www.w3.org/1999/XSL/Format"
xmlns:pdf="http://xmlgraphics.apache.org/fop/extensions/pdf">
<xsl:template match="/">
<fo:root>
<fo:layout-master-set>
<fo:simple-page-master master-name="A4-portrait" initial-page-number="1"
page-height="29.7cm" page-width="21.0cm" margin-top="0cm"
margin-left="1cm" margin-right="1.3cm" margin-bottom="0cm">
<fo:region-body margin-top="2.2cm" margin-bottom="1.2cm" margin-left="1.3cm"/>
<fo:region-before region-name="xsl-region-before" extent="2.2cm"/>
<fo:region-after region-name="xsl-region-after" extent="1.2cm"/>
<fo:region-start region-name="xsl-region-start" extent="1.3cm"/>
</fo:simple-page-master>
</fo:layout-master-set>
<fo:page-sequence master-reference="A4-portrait" font-family="Consolas" font-size="11">
<fo:flow flow-name="xsl-region-body">
<fo:block linefeed-treatment="preserve" font-weight="bold">
foo
</fo:block>
<fo:block linefeed-treatment="preserve">
bar
</fo:block>
</fo:flow>
</fo:page-sequence>
</fo:root>
</xsl:template>
</xsl:stylesheet>
How should I use this feature and make it work? Java version is 8.

This is due to xalan-2.7.2.
Here is the bug in Xalan-J
Switching to xalan-2.7.1 or earlier will solve your problem.
You may have to force exclusions for xalan on an Apache-FO dependency.
You can also overwrite with 2.7.2_3, which patches this problem.
<dependency>
<groupId>org.apache.servicemix.bundles</groupId>
<artifactId>org.apache.servicemix.bundles.xalan</artifactId>
<version>2.7.2_3</version><!--$NO-MVN-MAN-VER$-->
</dependency>
Use of <!--$NO-MVN-MAN-VER$--> prevents overrides.

Related

XPath fails when using XmlUtil (UFT 12.0)

Given the following XML:
<?xml version="1.0" encoding="UTF-8" ?>
<soapenv:Envelope xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<soapenv:Header>
<WFContext xmlns="http://service.wellsfargo.com/entity/message/2003/" soapenv:actor="" soapenv:mustUnderstand="0">
<messageId>cci-sf-dev14.wellsfargo.com:425a9286:14998ac6245:-7e1e</messageId>
<sessionId>425a9286:14998ac6245:-7e1d</sessionId>
<sessionSequenceNumber>1</sessionSequenceNumber>
<creationTimestamp>2014-11-10T00:14:49.243-08:00</creationTimestamp>
<invokerId>cci-sf-dev14.wellsfargo.com</invokerId>
<activitySourceId>P7</activitySourceId>
<activitySourceIdType>FNC</activitySourceIdType>
<hostName>cci-sf-dev14.wellsfargo.com</hostName>
<billingAU>05426</billingAU>
<originatorId>287586861901211</originatorId>
<originatorIdType>ECN</originatorIdType>
<initiatorId>GTST0793</initiatorId>
<initiatorIdType>ACF2</initiatorIdType>
</WFContext>
</soapenv:Header>
<soapenv:Body>
<getCustomerInformation xmlns="http://service.wellsfargo.com/provider/ecpr/customerProfile/inquiry/getCustomerInformation/2012/05/">
<initiatorInformation xmlns="http://service.wellsfargo.com/provider/ecpr/shared/common/2011/11/">
<channelInfo>
<initiatorCompanyNbr xmlns="http://service.wellsfargo.com/entity/message/2003/">114</initiatorCompanyNbr>
</channelInfo>
</initiatorInformation>
<custNbr xmlns="http://service.wellsfargo.com/entity/party/2003/">287586861901211</custNbr>
<customerViewList xmlns="http://service.wellsfargo.com/provider/ecpr/customerProfile/inquiry/getCustomerInformationCommon/2012/05/">
<customerView>
<customerViewType>GENERAL_INFORMATION_201205</customerViewType>
<preferences>
<generalInformationPreferences201205 xmlns="http://service.wellsfargo.com/provider/ecpr/customerProfile/inquiry/common/2012/05/">
<formattedNameIndicator xmlns="">true</formattedNameIndicator>
<includeTaxCertificationIndicator xmlns="">true</includeTaxCertificationIndicator>
</generalInformationPreferences201205>
</preferences>
</customerView>
<customerView>
<customerViewType>SEGMENT_LIST</customerViewType>
</customerView>
<customerView>
<customerViewType>LIMITED_PROFILE_REQUIRED_DATA</customerViewType>
</customerView>
<customerView>
<customerViewType>INDIVIDUAL_CUSTOMER_GENERAL_INFORMATION_201205</customerViewType>
<preferences>
<individualGeneralInformationPreferences xmlns="http://service.wellsfargo.com/provider/ecpr/customerProfile/inquiry/common/2012/05/">
<includeMinorIndicator xmlns="">true</includeMinorIndicator>
</individualGeneralInformationPreferences>
</preferences>
</customerView>
</customerViewList>
</getCustomerInformation>
</soapenv:Body>
</soapenv:Envelope>
I am trying to access the getCustomerInformation tag using relative XPath in VBScript.
XMLDataFile = "C:\testReqfile.xml"
Set xmlDoc = XMLUtil.CreateXML()
xmlDoc.LoadFile(XMLDataFile)
Print xmlDoc.ToString
'xmlDoc.AddNamespace "ns","xmlns:soapenv=http://schemas.xmlsoap.org/soap/envelope/"
Set childrenObj = xmlDoc.ChildElementsByPath("//*[contains(#xmlns,'getCustomerInformation')]")
msgbox childrenObj.Count
But is failing to return a node.
Your XPath expression does not work because xmlns as in
<getCustomerInformation xmlns="http://service.wellsfargo.com/provider/ecpr/customerProfile/inquiry/getCustomerInformation/2012/05/">
is a default namespace, not an attribute. Therefore, it cannot be accessed with #xmlns.
But it seems you do not have to rely on the namespace at all, because the element name ("getCustomer Information") is telling already. To bypass the problem of those elements being in a namespace, use local-name() to select elements by their name.
Set childrenObj = xmlDoc.ChildElementsByPath("//*[local-name() = 'getCustomerInformation']")
As #Mathias Müller already explained in his answer, xmlns defines a namespace and can thus not be accessed like a regular attribute. I don't have experience with XmlUtil, but in standard VBScript you could select the node(s) like this:
Set xml = CreateObject("Msxml2.DOMDocument.6.0")
xml.async = False
xml.load "C:\path\to\your.xml"
If xml.ParseError Then
WScript.Echo xml.ParseError.Reason
WScript.Quit 1
End If
'define a namespace alias "ns"
uri = "http://service.wellsfargo.com/provider/ecpr/customerProfile/inquiry/getCustomerInformation/2012/05/"
xml.setProperty "SelectionNamespaces", "xmlns:ns='" & uri & "'"
'select nodes using the namespace alias
Set nodes = xml.SelectNodes("//ns:getCustomerInformation")

Why do I get errors with XML modified using Nokogiri?

I am having problems understanding Net::HTTP and Nokogiri.
I have a large number of jobs on my Jenkins server. I have to periodically update the branch name on these jobs. Doing it from the UI is a cumbersome process so I decided to update the Jenkins config.xml.
I use Nokogiri to parse the XML, traverse the XPath and update the value of the node. However, when I try to post the updated XML back to Jenkins, I get a 500 error saying:
Caused by: javax.xml.transform.TransformerException: org.xml.sax.SAXParseExceptionpublicId: -//W3C//DTD HTML 4.0 Transitional//EN; systemId: http://www.w3.org/TR/REC-html40/loose.dtd; lineNumber: 31; columnNumber: 3; The declaration for the entity "HTML.Version" must end with '>'.
Here is what I am doing:
require "net/http"
require "nokogiri"
uri = URI.parse("http://jenkins.my.domain.web:8080")
http = Net::HTTP.new(uri.host, uri.port)
getQueueRequest = Net::HTTP::Get.new("http://jenkins.my.domain.web:8080/my/job/location/config.xml")
getQueue = http.request(getQueueRequest)
xml_doc = Nokogiri::HTML(getQueue.body)
# Get current branch name
branch_name=xml_doc.at_xpath('//hudson.plugins.git.branchspec/name')
# Get new branch name
print "Enter new branch name "
user_input = gets.chomp
new_branch_name = user_input.downcase
# Set branch name and create xml
branch_name.content=new_branch_name
new_config_xml=xml_doc.to_xml
puts "Logging into Jenkins"
update_branch = Net::HTTP::Post.new("http://jenkins.my.domain.web:8080/my/job/location/config.xml")
update_branch.basic_auth 'username', 'password'
update_branch.body = new_config_xml
response = http.request(update_branch)
puts response.body
I understand it might have to do something with the XML that is getting added to request body but I am not sure how to fix the issue.
Original XML:
<?xml version='1.0' encoding='UTF-8'?>
<maven2-moduleset plugin="maven-plugin#1.504">
<actions/>
<description></description>
<keepDependencies>false</keepDependencies>
<properties>
<hudson.plugins.throttleconcurrents.ThrottleJobProperty plugin="throttle-concurrents#1.7.2">
<maxConcurrentPerNode>0</maxConcurrentPerNode>
<maxConcurrentTotal>0</maxConcurrentTotal>
<categories/>
<throttleEnabled>false</throttleEnabled>
<throttleOption>project</throttleOption>
<configVersion>1</configVersion>
</hudson.plugins.throttleconcurrents.ThrottleJobProperty>
</properties>
<scm class="hudson.plugins.git.GitSCM" plugin="git#1.4.0">
<configVersion>2</configVersion>
<userRemoteConfigs>
<hudson.plugins.git.UserRemoteConfig>
<name></name>
<refspec></refspec>
<url>git#github.com:<ORG_NAME>/<REPO_NAME>.git</url>
</hudson.plugins.git.UserRemoteConfig>
</userRemoteConfigs>
<branches>
<hudson.plugins.git.BranchSpec>
<name>release</name>
</hudson.plugins.git.BranchSpec>
</branches>
<disableSubmodules>false</disableSubmodules>
<recursiveSubmodules>false</recursiveSubmodules>
<doGenerateSubmoduleConfigurations>false</doGenerateSubmoduleConfigurations>
<authorOrCommitter>false</authorOrCommitter>
<clean>false</clean>
<wipeOutWorkspace>false</wipeOutWorkspace>
<pruneBranches>false</pruneBranches>
<remotePoll>false</remotePoll>
<ignoreNotifyCommit>false</ignoreNotifyCommit>
<useShallowClone>false</useShallowClone>
<buildChooser class="hudson.plugins.git.util.DefaultBuildChooser"/>
<gitTool>Default</gitTool>
<submoduleCfg class="list"/>
<relativeTargetDir></relativeTargetDir>
<reference></reference>
<excludedRegions></excludedRegions>
<excludedUsers></excludedUsers>
<gitConfigName></gitConfigName>
<gitConfigEmail></gitConfigEmail>
<skipTag>false</skipTag>
<includedRegions></includedRegions>
<scmName></scmName>
</scm>
<canRoam>true</canRoam>
<disabled>false</disabled>
<blockBuildWhenDownstreamBuilding>false</blockBuildWhenDownstreamBuilding>
<blockBuildWhenUpstreamBuilding>false</blockBuildWhenUpstreamBuilding>
<triggers class="vector">
<hudson.triggers.TimerTrigger>
<spec>0 22 * * 4</spec>
</hudson.triggers.TimerTrigger>
</triggers>
<concurrentBuild>false</concurrentBuild>
<rootModule>
<groupId>com.org.project.test</groupId>
<artifactId>functest</artifactId>
</rootModule>
<goals>clean verify -Dtestsuite=<test_suite_name> -Dbrowser=chrome -Dipaddress=http://<IP_ADDRESS>:4444/wd/hub</goals>
<mavenName>apache-maven-3.0.4</mavenName>
<aggregatorStyleBuild>true</aggregatorStyleBuild>
<incrementalBuild>false</incrementalBuild>
<perModuleEmail>true</perModuleEmail>
<ignoreUpstremChanges>false</ignoreUpstremChanges>
<archivingDisabled>false</archivingDisabled>
<resolveDependencies>false</resolveDependencies>
<processPlugins>false</processPlugins>
<mavenValidationLevel>-1</mavenValidationLevel>
<runHeadless>false</runHeadless>
<disableTriggerDownstreamProjects>false</disableTriggerDownstreamProjects>
<settings class="jenkins.mvn.DefaultSettingsProvider"/>
<globalSettings class="jenkins.mvn.DefaultGlobalSettingsProvider"/>
<reporters/>
<publishers/>
<buildWrappers/>
<prebuilders/>
<postbuilders/>
<runPostStepsIfResult>
<name>FAILURE</name>
<ordinal>2</ordinal>
<color>RED</color>
</runPostStepsIfResult>
</maven2-moduleset>
After Editing and Massaging:
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<?xml version="1.0" encoding="UTF-8"?>
<html>
<body>
<maven2-moduleset plugin="maven-plugin#1.504">
<actions />
<description />
<keepdependencies>false</keepdependencies>
<properties>
<hudson.plugins.throttleconcurrents.throttlejobproperty plugin="throttle-concurrents#1.7.2">
<maxconcurrentpernode>0</maxconcurrentpernode>
<maxconcurrenttotal>0</maxconcurrenttotal>
<categories />
<throttleenabled>false</throttleenabled>
<throttleoption>project</throttleoption>
<configversion>1</configversion>
</hudson.plugins.throttleconcurrents.throttlejobproperty>
</properties>
<scm class="hudson.plugins.git.GitSCM" plugin="git#1.4.0">
<configversion>2</configversion>
<userremoteconfigs>
<hudson.plugins.git.userremoteconfig>
<name />
<refspec />
<url>git#github.com:<ORG_NAME>/<REPO_NAME>.git</url>
</hudson.plugins.git.userremoteconfig>
</userremoteconfigs>
<branches>
<hudson.plugins.git.branchspec>
<name>master</name>
</hudson.plugins.git.branchspec>
</branches>
<disablesubmodules>false</disablesubmodules>
<recursivesubmodules>false</recursivesubmodules>
<dogeneratesubmoduleconfigurations>false</dogeneratesubmoduleconfigurations>
<authororcommitter>false</authororcommitter>
<clean>false</clean>
<wipeoutworkspace>false</wipeoutworkspace>
<prunebranches>false</prunebranches>
<remotepoll>false</remotepoll>
<ignorenotifycommit>false</ignorenotifycommit>
<useshallowclone>false</useshallowclone>
<buildchooser class="hudson.plugins.git.util.DefaultBuildChooser" />
<gittool>Default</gittool>
<submodulecfg class="list" />
<relativetargetdir />
<reference />
<excludedregions />
<excludedusers />
<gitconfigname />
<gitconfigemail />
<skiptag>false</skiptag>
<includedregions />
<scmname />
</scm>
<canroam>true</canroam>
<disabled>false</disabled>
<blockbuildwhendownstreambuilding>false</blockbuildwhendownstreambuilding>
<blockbuildwhenupstreambuilding>false</blockbuildwhenupstreambuilding>
<triggers class="vector">
<hudson.triggers.timertrigger>
<spec>0 22 * * 4</spec>
</hudson.triggers.timertrigger>
</triggers>
<concurrentbuild>false</concurrentbuild>
<rootmodule>
<groupid>com.org.project.test</groupid>
<artifactid>functest</artifactid>
</rootmodule>
<goals>clean verify -Dtestsuite=<test_suite_name> -Dbrowser=chrome -Dipaddress=http://<IP_ADDRESS>:4444/wd/hub</goals>
<mavenname>apache-maven-3.0.4</mavenname>
<aggregatorstylebuild>true</aggregatorstylebuild>
<incrementalbuild>false</incrementalbuild>
<permoduleemail>true</permoduleemail>
<ignoreupstremchanges>false</ignoreupstremchanges>
<archivingdisabled>false</archivingdisabled>
<resolvedependencies>false</resolvedependencies>
<processplugins>false</processplugins>
<mavenvalidationlevel>-1</mavenvalidationlevel>
<runheadless>false</runheadless>
<disabletriggerdownstreamprojects>false</disabletriggerdownstreamprojects>
<settings class="jenkins.mvn.DefaultSettingsProvider" />
<globalsettings class="jenkins.mvn.DefaultGlobalSettingsProvider" />
<reporters />
<publishers />
<buildwrappers />
<prebuilders />
<postbuilders />
<runpoststepsifresult>
<name>FAILURE</name>
<ordinal>2</ordinal>
<color>RED</color>
</runpoststepsifresult>
</maven2-moduleset>
</body>
</html>
When you use Nokogiri::HTML(some_html) or Nokogiri::XML(some_xml), Nokogiri will look to see if the content is valid. If it isn't, it will do fix-ups on the content in an attempt to make it so. For instance:
require 'nokogiri'
html_fragment = "<p>foo bar</p>"
Nokogiri::HTML(html_fragment).to_html
# => "<!DOCTYPE html PUBLIC \"-//W3C//DTD HTML 4.0 Transitional//EN\" \"http://www.w3.org/TR/REC-html40/loose.dtd\">\n<html><body><p>foo bar</p></body></html>\n"
If the document is partially correct Nokogiri still adds the DOCTYPE statement:
html = "<html><body><p>foo bar</p></body></html>"
Nokogiri::HTML(html).to_html
# => "<!DOCTYPE html PUBLIC \"-//W3C//DTD HTML 4.0 Transitional//EN\" \"http://www.w3.org/TR/REC-html40/loose.dtd\">\n<html><body><p>foo bar</p></body></html>\n"
If you want Nokogiri to leave the document along, because it's supposed to be a fragment, tell it to do so:
Nokogiri::HTML::DocumentFragment.parse(html_fragment).to_html
# => "<p>foo bar</p>"
Or:
xml_fragment = "<x>foo bar</x>"
Nokogiri::XML::DocumentFragment.parse(xml_fragment).to_xml
# => "<x>foo bar</x>"
Nokogiri is pretty smart about handling XML and HTML. You can try to confuse it and it'll generally do the right thing:
xml_fragment = "<x>foo bar</x>"
Nokogiri::HTML::DocumentFragment.parse(xml_fragment).to_xml
# => "<x>foo bar</x>"
That's parsing XML as an HTML fragment and telling it to emit it as XML.
Now, that all said, it's pretty obvious Nokogiri isn't doing anything mysterious, so, here's how to fix the problem. First, parse it as XML so Nokogiri doesn't think it should add the HTML DOCTYPE declaration, then, if the XML is syntactically correct, tell Nokogiri it's OK to parse it as a complete document:
require 'nokogiri'
xml = %{<?xml version='1.0' encoding='UTF-8'?>
<maven2-moduleset plugin="maven-plugin#1.504">
<actions/>
<description></description>
<keepDependencies>false</keepDependencies>
<properties>
<hudson.plugins.throttleconcurrents.ThrottleJobProperty plugin="throttle-concurrents#1.7.2">
</hudson.plugins.throttleconcurrents.ThrottleJobProperty>
</properties>
</maven2-moduleset>
}
puts Nokogiri::XML.parse(xml).to_xml
# >> <?xml version="1.0" encoding="UTF-8"?>
# >> <maven2-moduleset plugin="maven-plugin#1.504">
# >> <actions/>
# >> <description/>
# >> <keepDependencies>false</keepDependencies>
# >> <properties>
# >> <hudson.plugins.throttleconcurrents.ThrottleJobProperty plugin="throttle-concurrents#1.7.2">
# >> </hudson.plugins.throttleconcurrents.ThrottleJobProperty>
# >> </properties>
# >> </maven2-moduleset>
Or as a fragment, which, because it's complete, will result in the same thing:
puts Nokogiri::XML::DocumentFragment.parse(xml).to_xml
# >> <?xml version='1.0' encoding='UTF-8'?>
# >> <maven2-moduleset plugin="maven-plugin#1.504">
# >> <actions/>
# >> <description/>
# >> <keepDependencies>false</keepDependencies>
# >> <properties>
# >> <hudson.plugins.throttleconcurrents.ThrottleJobProperty plugin="throttle-concurrents#1.7.2">
# >> </hudson.plugins.throttleconcurrents.ThrottleJobProperty>
# >> </properties>
# >> </maven2-moduleset>
Instead of using Net::HTTP, which is the bare-building blocks for HTTP, I'd recommend looking at something a bit higher-level, like HTTPClient. Here's code that is similar to yours:
require 'httpclient'
require 'nokogiri'
URL = 'http://jenkins.my.domain.web:8080/my/job/location/config.xml'
http_client = HTTPClient.new
xml_doc = Nokogiri::HTML(
http_client.get_content(URL)
)
# Get current branch name using CSS for simplicity:
branch_name = xml_doc.at('hudson.plugins.git.branchspec name')
# Get new branch name
print 'Enter new branch name '
new_branch_name = gets.chomp.downcase
# Set branch name and create xml
branch_name.content = new_branch_name
puts 'Logging into Jenkins'
http_client.set_auth(domain, 'user', 'password')
response = http_client.post(URL, :body => xml_doc.to_xml)
I can't test it but it looks close.
I, now, find myself in another dilemma. I am seeing that the methods which allow moving to elements and editing values like at_xpath, at_css only work with Nokogiri::HTML or Nokogiri::HTML::DocumentFragment. They don't work when I use Nokogiri::XML. Using Nokogiri::HTML changes the case of the HTML tags. false becomes false. Jenkins does accept the xml with changed case of tags. Methods to_html, to_xml basically returns a string so I cannot use the xpath or css methods to navigate the xml tree. Is there a way around ?
The at methods work with both XML and HTML, and allows CSS and XPath selectors; Everything inside Nokogiri is really XML-based.
Nokogiri folds HTML tags to lower-case because HTML is case-insensitive, so at expects a lower-case value when dealing with HTML. XML is case-sensitive, so Nokogiri leaves the tag case alone, and at requires you to use the correct case when using CSS.
This is documented in the Nokogiri docs:
Note that the CSS query string is case-sensitive with regards to your document type. That is, if you’re looking for “H1” in an HTML document, you’ll never find anything, since HTML tags will match only lowercase CSS queries. However, “H1” might be found in an XML document, where tags names are case-sensitive (e.g., “H1” is distinct from “h1”).
When you are parsing the XML you are receiving from the service, you are declaring it as HTML:
xml_doc = Nokogiri::HTML(getQueue.body)
And this appears to cause Nokogiri to add HTML nodes.
Try parsing it as XML instead:
xml_doc = Nokogiri::XML(getQueue.body)

Unable to extract XML element's value using Nokogiri

I'm trying to parse the following XML to extract out the Lat Long combination under //ns2:Point/ns2:pos using Nokogiri XML parser but without much luck.
<?xml version="1.0" encoding="UTF-8"?>
<ns1:XLS ns1:lang="en" rel="5.2.sp03" version="1.0" xmlns:ns1="http://www.opengis.net/xls">
<ns1:ResponseHeader sessionID="wrx-rails1370997540"/>
<ns1:Response numberOfResponses="1" requestID="10" version="1.0">
<ns1:GeocodeResponse>
<ns1:GeocodeResponseList numberOfGeocodedAddresses="1">
<ns1:GeocodedAddress>
<ns2:Point xmlns:ns2="http://www.opengis.net/gml">
<ns2:pos>38.898331 -77.117273</ns2:pos>
</ns2:Point>
<ns1:Address countryCode="US">
<ns1:StreetAddress>
<ns1:Building number="4400"/>
<ns1:Street>Lee Hwy</ns1:Street>
</ns1:StreetAddress>
<ns1:Place type="CountrySubdivision">VA</ns1:Place>
<ns1:Place type="CountrySecondarySubdivision">Arlington</ns1:Place>
<ns1:Place type="MunicipalitySubdivision">Arlington</ns1:Place>
<ns1:PostalCode>22207</ns1:PostalCode>
</ns1:Address>
<ns1:GeocodeMatchCode accuracy="1.0" matchType="ADDRESS POINT LOOKUP"/>
<ns1:SpatialKeys>
<ns1:SpatialKey priority="0" val="1663355010"/>
<ns1:SpatialKey priority="1" val="2563322400"/>
<ns1:SpatialKey priority="2" val="3325185160"/>
<ns1:SpatialKey priority="3" val="3784086306"/>
<ns1:SpatialKey priority="4" val="4033029320"/>
<ns1:SpatialKey priority="5" val="4162373938"/>
<ns1:SpatialKey priority="6" val="4228264524"/>
<ns1:SpatialKey priority="7" val="4261514387"/>
<ns1:SpatialKey priority="8" val="4278215460"/>
<ns1:SpatialKey priority="9" val="4286585033"/>
<ns1:SpatialKey priority="10" val="4290774578"/>
<ns1:SpatialKey priority="11" val="4292870540"/>
<ns1:SpatialKey priority="12" val="4293918819"/>
<ns1:SpatialKey priority="13" val="4294443032"/>
<ns1:SpatialKey priority="14" val="4294705158"/>
<ns1:SpatialKey priority="15" val="4294836224"/>
</ns1:SpatialKeys>
</ns1:GeocodedAddress>
</ns1:GeocodeResponseList>
</ns1:GeocodeResponse>
</ns1:Response>
</ns1:XLS>
I get back an empty array when i try the following:
doc = Nokogiri::XML(response.body);
pos = doc.xpath('//ns2:Point/ns2:pos');
I can access Geocoded address element however just fine using:
doc.xpath('//ns1:GeocodeResponseList/ns1:GeocodedAddress')
Any clues as to what i'm missing here. Is it the namespace changing which it doesn't like for some reason?
My Environment is as follows:
Nokogiri 1.5.9 Java
Rails 3.2.11
jRuby 1.7.4
Windows 7 Box
You can find the first expression because Nokogiri found the XML namespace where it expected one. The ns2 namespace isn't where we'd normally find it so Nokogiri doesn't know what to do.
There are multiple ways to deal with this. The first is to gather the namespaces in the document and pass them to Nokogiri when you do your search. Nokogiri does this automatically for namespaces in the XML root, but not if they're sprinkled throughout the document, so we have to tell it to search everywhere, then pass them in:
namespaces = doc.collect_namespaces
namespaces # => {"xmlns:ns1"=>"http://www.opengis.net/xls", "xmlns:ns2"=>"http://www.opengis.net/gml"}
pos = doc.xpath('//ns2:Point/ns2:pos', namespaces);
pos # => [#<Nokogiri::XML::Element:0x3fe8c608ab30 name="pos" namespace=#<Nokogiri::XML::Namespace:0x3fe8c608aacc prefix="ns2" href="http://www.opengis.net/gml"> children=[#<Nokogiri::XML::Text:0x3fe8c608e1b8 "38.898331 -77.117273">]>]
An alternate is to tell Nokogiri to remove all namespaces from the document. You only want to do that if you're sure there are no collisions between tag names found in the various namespaces in the document:
doc.remove_namespaces!
pos = doc.xpath('//Point/pos', namespaces);
pos # => [#<Nokogiri::XML::Element:0x3fe8c608ab30 name="pos" children=[#<Nokogiri::XML::Text:0x3fe8c608e1b8 "38.898331 -77.117273">]>]
The Nokogiri documentation has this to say about the use of remove_namespaces!:
But I’m Lazy and Don’t Want to Deal With Namespaces!
Lazy == Efficient, so no judgements. :)
If you have an XML document with namespaces, but would prefer to ignore them entirely (and query as if Tim Bray had never invented them), then you can call remove_namespaces on an XML::Document to remove all namespaces. Of course, if the document had nodes with the same names but different namespaces, they will now be ambiguous. But you’re lazy! You don’t care!

Reading xml nodes using VB script

I have an xml file that I want to read using VBScript (Technology limitation). Below is the code and xml file. I am able to read the file if there is no DTD element involved but the code doesn't work for file having DTD and xml-style element.
Code-
Dim xmlDoc1:Set xmlDoc1 = CreateObject("MSXML2.DomDocument")
xmlDoc1.async=False
xmlDoc1.load "C:\ABC.xml"
Dim xmlTCID:Set xmlTCID = xmlDoc1.selectNodes("//*")
For nNodeCount = 0 To xmlTCID.length
MsgBox(xmlTCID(nNodeCount).nodeName)
Next
ABC.xml -
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE RESULT SYSTEM "Result.dtd"[]>
<?xml-stylesheet type="text/xsl" href="Result.xsl"?>
<SUMMARY>
<TITLE>Test</TITLE>
</SUMMARY>
<IDS>
<DATA>
<NAME>A</NAME>
<VALUE>PASS</VALUE>
</DATA>
<DATA>
<NAME>B</NAME>
<VALUE>PASS</VALUE>
</DATA
<DATA>
<NAME>C</NAME>
<VALUE>FAIL</VALUE>
</DATA
</IDS>
<IDS>
<DATA>
<NAME>A</NAME>
<VALUE>PASS</VALUE>
</DATA>
<DATA>
<NAME>B</NAME>
<VALUE>FAIL</VALUE>
</DATA
</IDS>
Note - If I avoid -
<!DOCTYPE RESULT SYSTEM "Result.dtd"[]>
<?xml-stylesheet type="text/xsl" href="Result.xsl"?>
The above code is able to read the nodes but with the above two lines in xml file, it gives the below error -
Requirement - I need to read the name of last DATA node with FAIL for each IDS node.
Any suggestion as what to do to get the code working even with -
<!DOCTYPE RESULT SYSTEM "Result.dtd"[]>
<?xml-stylesheet type="text/xsl" href="Result.xsl"?>
As there are problems with your XML - more than one top level element, miising ">" - setting the ProhibitDTD Property to False won't solve all of your tasks.
xmlDoc.validateOnParse=False
worked for me.

Xmllint using XSD+XMLCatalog cannot find uri

I am trying to use xmllint with an XSD file that uses one xs:import without the #schemaLocation attrib set. (In OxygenXMLEditor this setup is working fine)
XSD relevant section:
...
<!-- this one is the trouble -->
<xs:import namespace="http://www.w3.org/1999/xhtml" />
<!-- this one is resolved fine -->
<xs:import namespace="http://www.idpf.org/2007/ops" schemaLocation="conf/b.xsd"/>
...
Given this XML Catalog file:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE catalog PUBLIC "-//OASIS//DTD XML Catalogs V1.1//EN" "http://www.oasis-open.org/committees/entity/release/1.1/catalog.dtd">
<catalog xmlns="urn:oasis:names:tc:entity:xmlns:xml:catalog">
<uri name="http://www.w3.org/1999/xhtml" uri="file:///home/me/code/base5html.xsd"/>
</catalog>
Ok, then I run it:
$ export XML_CATALOG_FILES=~/code/mycatalog.xml
$ export XML_DEBUG_CATALOG=1
$ xmllint --load-trace --noout --schema myschema.xsd test_doc.xml
Result (approx):
Loaded URL="myschema.xsd" ID="(null)"
Loaded URL="conf/b.xsd" ID="(null)"
Loaded URL="c.xsd" ID="(null)"
c.xsd:17: element element: Schemas parser error : Element '{http://www.w3.org/2001/XMLSchema}element', attribute 'ref': The QName value '{http://www.w3.org/1999/xhtml}p' does not resolve to a(n) element declaration.
c.xsd:18: element element: Schemas parser error : Element '{http://www.w3.org/2001/XMLSchema}element', attribute 'ref': The QName value '{http://www.w3.org/1999/xhtml}cite' does not resolve to a(n) element declaration.
c.xsd:26: element attributeGroup: Schemas parser error : Element '{http://www.w3.org/2001/XMLSchema}attributeGroup', attribute 'ref': The QName value '{http://www.w3.org/1999/xhtml}normAttrGrp' does not resolve to a(n) attribute group definition.
...
(more taken out)
...
WXS schema public/ctrl/lov.xsd failed to compile
Loaded URL="myschema.xml" ID="(null)"
Q: Isn't XSD+XMLCat a supported feature in libxml2 ?
( No hints to be found anywhere on the homepage : http://xmlsoft.org/catalog.html )

Resources