Unable to extract XML element's value using Nokogiri - ruby

I'm trying to parse the following XML to extract out the Lat Long combination under //ns2:Point/ns2:pos using Nokogiri XML parser but without much luck.
<?xml version="1.0" encoding="UTF-8"?>
<ns1:XLS ns1:lang="en" rel="5.2.sp03" version="1.0" xmlns:ns1="http://www.opengis.net/xls">
<ns1:ResponseHeader sessionID="wrx-rails1370997540"/>
<ns1:Response numberOfResponses="1" requestID="10" version="1.0">
<ns1:GeocodeResponse>
<ns1:GeocodeResponseList numberOfGeocodedAddresses="1">
<ns1:GeocodedAddress>
<ns2:Point xmlns:ns2="http://www.opengis.net/gml">
<ns2:pos>38.898331 -77.117273</ns2:pos>
</ns2:Point>
<ns1:Address countryCode="US">
<ns1:StreetAddress>
<ns1:Building number="4400"/>
<ns1:Street>Lee Hwy</ns1:Street>
</ns1:StreetAddress>
<ns1:Place type="CountrySubdivision">VA</ns1:Place>
<ns1:Place type="CountrySecondarySubdivision">Arlington</ns1:Place>
<ns1:Place type="MunicipalitySubdivision">Arlington</ns1:Place>
<ns1:PostalCode>22207</ns1:PostalCode>
</ns1:Address>
<ns1:GeocodeMatchCode accuracy="1.0" matchType="ADDRESS POINT LOOKUP"/>
<ns1:SpatialKeys>
<ns1:SpatialKey priority="0" val="1663355010"/>
<ns1:SpatialKey priority="1" val="2563322400"/>
<ns1:SpatialKey priority="2" val="3325185160"/>
<ns1:SpatialKey priority="3" val="3784086306"/>
<ns1:SpatialKey priority="4" val="4033029320"/>
<ns1:SpatialKey priority="5" val="4162373938"/>
<ns1:SpatialKey priority="6" val="4228264524"/>
<ns1:SpatialKey priority="7" val="4261514387"/>
<ns1:SpatialKey priority="8" val="4278215460"/>
<ns1:SpatialKey priority="9" val="4286585033"/>
<ns1:SpatialKey priority="10" val="4290774578"/>
<ns1:SpatialKey priority="11" val="4292870540"/>
<ns1:SpatialKey priority="12" val="4293918819"/>
<ns1:SpatialKey priority="13" val="4294443032"/>
<ns1:SpatialKey priority="14" val="4294705158"/>
<ns1:SpatialKey priority="15" val="4294836224"/>
</ns1:SpatialKeys>
</ns1:GeocodedAddress>
</ns1:GeocodeResponseList>
</ns1:GeocodeResponse>
</ns1:Response>
</ns1:XLS>
I get back an empty array when i try the following:
doc = Nokogiri::XML(response.body);
pos = doc.xpath('//ns2:Point/ns2:pos');
I can access Geocoded address element however just fine using:
doc.xpath('//ns1:GeocodeResponseList/ns1:GeocodedAddress')
Any clues as to what i'm missing here. Is it the namespace changing which it doesn't like for some reason?
My Environment is as follows:
Nokogiri 1.5.9 Java
Rails 3.2.11
jRuby 1.7.4
Windows 7 Box

You can find the first expression because Nokogiri found the XML namespace where it expected one. The ns2 namespace isn't where we'd normally find it so Nokogiri doesn't know what to do.
There are multiple ways to deal with this. The first is to gather the namespaces in the document and pass them to Nokogiri when you do your search. Nokogiri does this automatically for namespaces in the XML root, but not if they're sprinkled throughout the document, so we have to tell it to search everywhere, then pass them in:
namespaces = doc.collect_namespaces
namespaces # => {"xmlns:ns1"=>"http://www.opengis.net/xls", "xmlns:ns2"=>"http://www.opengis.net/gml"}
pos = doc.xpath('//ns2:Point/ns2:pos', namespaces);
pos # => [#<Nokogiri::XML::Element:0x3fe8c608ab30 name="pos" namespace=#<Nokogiri::XML::Namespace:0x3fe8c608aacc prefix="ns2" href="http://www.opengis.net/gml"> children=[#<Nokogiri::XML::Text:0x3fe8c608e1b8 "38.898331 -77.117273">]>]
An alternate is to tell Nokogiri to remove all namespaces from the document. You only want to do that if you're sure there are no collisions between tag names found in the various namespaces in the document:
doc.remove_namespaces!
pos = doc.xpath('//Point/pos', namespaces);
pos # => [#<Nokogiri::XML::Element:0x3fe8c608ab30 name="pos" children=[#<Nokogiri::XML::Text:0x3fe8c608e1b8 "38.898331 -77.117273">]>]
The Nokogiri documentation has this to say about the use of remove_namespaces!:
But I’m Lazy and Don’t Want to Deal With Namespaces!
Lazy == Efficient, so no judgements. :)
If you have an XML document with namespaces, but would prefer to ignore them entirely (and query as if Tim Bray had never invented them), then you can call remove_namespaces on an XML::Document to remove all namespaces. Of course, if the document had nodes with the same names but different namespaces, they will now be ambiguous. But you’re lazy! You don’t care!

Related

How to loop xml nodes using ruby

I have an XML file of 50MB. I need to parse it and convert to CSV.
Following is the XML File
<?xml version="1.0" encoding="ISO-8859-1"?>
<IAPDFirmStateReport GenOn="2019-01-02">
<Firms>
<Firm>
<Info FirmCrdNb="146099" BusNm="PRINCIPA FINANCIAL ADVISORS" LegalNm="CHUNG, BUCK CHWEE"/>
<MainAddr Strt1="15111 WHITTIER BLVD" Strt2="STE 420" City="WHITTIER" State="CA" Cntry="United States" PostlCd="90603" PhNb="562-945-7888" FaxNb="562-968-1885"/>
<MailingAddr/>
<StateRgstn>
<Rgltrs>
<Rgltr Cd="CA" St="APPROVED" Dt="2008-03-13"/>
</Rgltrs>
</StateRgstn>
<ERA>
<Rgltrs/>
</ERA>
</Firm>
<Firm>
<Info FirmCrdNb="170562" SECNb="802-112318" BusNm="ALUMNI VENTURES GROUP" LegalNm="LAUNCH ANGELS MANAGEMENT COMPANY, LLC"/>
<MainAddr Strt1="788 ELM ST" City="MANCHESTER" State="NH" Cntry="United States" PostlCd="03101" PhNb="603-518-8112"/>
<MailingAddr Strt1="889 ELM ST" Strt2="3RD FLOOR" City="MANCHESTER" State="NH" Cntry="United States" PostlCd="03101"/>
<StateRgstn>
<Rgltrs/>
</StateRgstn>
<ERA>
<Rgltrs>
<Rgltr Cd="MA" St="ACTIVE" Dt="2014-02-24"/>
<Rgltr Cd="NH" St="ACTIVE" Dt="2018-07-23"/>
</Rgltrs>
</ERA>
</Firm>
<`/Frims>`
------Almost having 90 Firm tags
So, I need to parse it dynamically using ruby and convert it into CSV. How can I figure out this?
Look at this question - Parsing XML with Ruby
You may also use Ox gem which is not mentioned in above question - take a look here:
https://github.com/ohler55/ox#parsing-xml-into-a-hash-fast
Once you will have your XML converted to Hash you should easily convert it to CSV.

Read value from XML within another XML: Mule

I am making a SOAP webservice call and I get the below response. I want to read the value in internal XML, the value is 12345684 in 1234684 in the below XML.
I was able to get internal XML using #[xpath3('//:processaResponse /return[2]')], store it in a flow variable and #[xpath3('/AckReg/DataArea/PRegistration/PRDet/Person/IDSet/:ID[#schemeName="aid"]/text()')].
This works when I try an online parser, but it doesn't read the value in Mule.
Is there any way to extract 1234684 in oa:ID tag using one XPath.
<soapenv:Envelope xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/">
<soapenv:Header>
<ns3:TXID xmlns:ns3="http://a.d.r.test.com/"></ns3:TXID>
<ns3:SESSIONID xmlns:ns3="http://a.d.r.test.com/"></ns3:SESSIONID>
</soapenv:Header>
<soapenv:Body>
<ns3:processaResponse xmlns:ns3="http://a.d.r.test.com/" xmlns:ns2="http://p.r.test.com/">
<return>Hi</return>
<return>
<?xml version="1.0" encoding="UTF-8"?>
<AckReg
xmlns="http://www.test.com/e/1" languageCode="en-US" releaseID="normalizedString" systemEnvironmentCode="test" versionID="normalizedString"
xmlns:oa="www.test.com/r/9"
xsi:schemaLocation="http://www.test.com/a/1 ../test/test.xsd">
<Apa>
<oa:CreationDateTime>2018-04-05</oa:CreationDateTime>
</Apa>
<DataArea>
<Ack>
<OArea>
<o:Sender>
<o:LID schemeAgencyName="testi" schemeName="Application ID">test</o:LID>
</o:Sender>
</OArea>
<OriginalActionVerb/>
</Ack>
<PRegistration>
<testids>
<IDSet schemeAgencyName="try">
<oa:ID schemeName="abcid">1234684</oa:ID>
</IDSet>
</testids>
<PRDet>
<Person>
<IDSet schemeAgencyName="try">
<oa:ID schemeName="aid">1364561</oa:ID>
</IDSet>
<IDSet schemeAgencyName="enada">
<oa:ID schemeName="Employee ID">adsad</oa:ID>
</IDSet>
</Person>
<User>
<oa:ID/>
</User>
</PRDet>
</PRegistration>
</DataArea>
</AckReg>
</return>
</ns3:processaResponse>
</soapenv:Body>
</soapenv:Envelope>
In your expressions you were missing namespace prefixes or namespace wildcards *: on some nodes - so your expressions failed.
Is there any way to extract 1234684 in oa:ID tag using one XPath.
Combining both of your partial expressions is possible with namespace wildcards:
//*:processaResponse/return[2]/*:AckReg/*:DataArea/*:PRegistration/*:testids/*:IDSet/*:ID[#schemeName='abcid']/text()
Or you can use an absolute path with namespace wildcards:
/*:Envelope/*:Body/*:processaResponse/return[2]/*:AckReg/*:DataArea/*:PRegistration/*:testids/*:IDSet/*:ID[#schemeName='abcid']/text()
Output in both cases:
1234684
You can even use XmlSlurper class using groovy script to fetch that respective value.
root = new XmlSlurper( false, true).parseText(payload).declareNamespace('soapenv':"http://schemas.xmlsoap.org/soap/envelope/")

Parsing out contents of XML tag in Ruby

I have an XML, that as I understand it has already been parsed by tags. My goal is to parse all the information that is in the <GetResidentsContactInfoResult> tag. In this tag of the sample xml below there are two records in here which begin each with the Lease PropertyId key. How can I iterate over the <GetResidentsContactInfoResult> tag and print out the key/value pairs for each record? I'm new to Ruby and working with XML files, is this something I can do with Nokogiri?
<?xml version="1.0" encoding="UTF-8"?>
<soap:Envelope xmlns:soap="http://www.w3.org/2003/05/soap-envelope" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<soap:Body>
<GetResidentsContactInfoResponse xmlns="http://tempuri.org/">
<GetResidentsContactInfoResult><PropertyResidents><Lease PropertyId="21M" BldgID="00" UnitID="0903" ResiID="3" occustatuscode="P" occustatuscodedescription="Previous" MoveInDate="2016-01-07T00:00:00" MoveOutDate="2016-02-06T00:00:00" LeaseBeginDate="2016-01-07T00:00:00" LeaseEndDate="2017-01-31T00:00:00" MktgSource="DBY" PrimaryEmail="noemail1#fake.com"><Occupant PropertyId="21M" BldgID="00" UnitID="0903" ResiID="3" OccuSeqNo="3444755" OccuFirstName="Efren" OccuLastName="Cerda" Phone2No="(832) 693-9448" ResponsibleFlag="Responsible" /></Lease><Lease PropertyId="21M" BldgID="00" UnitID="0908" ResiID="2" occustatuscode="P" occustatuscodedescription="Previous" MoveInDate="2016-02-20T00:00:00" MoveOutDate="2016-04-25T00:00:00" LeaseBeginDate="2016-02-20T00:00:00" LeaseEndDate="2017-02-28T00:00:00" MktgSource="PW" PrimaryEmail="noemail1#fake.com"><Occupant PropertyId="21M" BldgID="00" UnitID="0908" ResiID="2" OccuSeqNo="3451301" OccuFirstName="Donna" OccuLastName="Mclean" Phone2No="(713) 785-4240" ResponsibleFlag="Responsible" /></Lease></PropertyResidents></GetResidentsContactInfoResult>
</GetResidentsContactInfoResponse>
</soap:Body>
</soap:Envelope>
This uses Nokogiri to find all the GetResidentsContactInfoResponse elements, and then Active Support to convert the inner text to a hash of key-value pairs.
Read "sparklemotion/nokogiri" and "Tutorials" regarding installing and using Nokogiri.
Read "Active Support Core Extensions" about more capabilities of Active Support (though the guide does not include Hash.from_xml). To install it simply do gem install activesupport.
I assume you're fine with Nokogiri as you mentioned it in your question.
If you don't want to use Active Support, consider looking into "Convert a Nokogiri document to a Ruby Hash" as an alternative to the line Hash.from_xml(elm.text):
# Needed in order to use the `Hash.from_xml`
require 'active_support/core_ext/hash/conversions'
def find_key_values(str)
doc = Nokogiri::XML(str)
# Ignore namespaces for easier traversal
doc.remove_namespaces!
doc.css('GetResidentsContactInfoResponse').map do |elm|
Hash.from_xml(elm.text)
end
end
Usage:
# Option 1: if your XML above is stored in a variable called `string`
find_key_values string
# Option 2: if your XML above is stored in a file
find_key_values File.open('/path/to/file')
Which returns:
[{"PropertyResidents"=>
{"Lease"=>
[{"PropertyId"=>"21M",
"BldgID"=>"00",
"UnitID"=>"0903",
"ResiID"=>"3",
"occustatuscode"=>"P",
"occustatuscodedescription"=>"Previous",
"MoveInDate"=>"2016-01-07T00:00:00",
"MoveOutDate"=>"2016-02-06T00:00:00",
"LeaseBeginDate"=>"2016-01-07T00:00:00",
"LeaseEndDate"=>"2017-01-31T00:00:00",
"MktgSource"=>"DBY",
"PrimaryEmail"=>"noemail1#fake.com",
"Occupant"=>
{"PropertyId"=>"21M",
"BldgID"=>"00",
"UnitID"=>"0903",
"ResiID"=>"3",
"OccuSeqNo"=>"3444755",
"OccuFirstName"=>"Efren",
"OccuLastName"=>"Cerda",
"Phone2No"=>"(832) 693-9448",
"ResponsibleFlag"=>"Responsible"}},
{"PropertyId"=>"21M",
"BldgID"=>"00",
"UnitID"=>"0908",
"ResiID"=>"2",
"occustatuscode"=>"P",
"occustatuscodedescription"=>"Previous",
"MoveInDate"=>"2016-02-20T00:00:00",
"MoveOutDate"=>"2016-04-25T00:00:00",
"LeaseBeginDate"=>"2016-02-20T00:00:00",
"LeaseEndDate"=>"2017-02-28T00:00:00",
"MktgSource"=>"PW",
"PrimaryEmail"=>"noemail1#fake.com",
"Occupant"=>
{"PropertyId"=>"21M",
"BldgID"=>"00",
"UnitID"=>"0908",
"ResiID"=>"2",
"OccuSeqNo"=>"3451301",
"OccuFirstName"=>"Donna",
"OccuLastName"=>"Mclean",
"Phone2No"=>"(713) 785-4240",
"ResponsibleFlag"=>"Responsible"}}]}}]

XPath fails when using XmlUtil (UFT 12.0)

Given the following XML:
<?xml version="1.0" encoding="UTF-8" ?>
<soapenv:Envelope xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<soapenv:Header>
<WFContext xmlns="http://service.wellsfargo.com/entity/message/2003/" soapenv:actor="" soapenv:mustUnderstand="0">
<messageId>cci-sf-dev14.wellsfargo.com:425a9286:14998ac6245:-7e1e</messageId>
<sessionId>425a9286:14998ac6245:-7e1d</sessionId>
<sessionSequenceNumber>1</sessionSequenceNumber>
<creationTimestamp>2014-11-10T00:14:49.243-08:00</creationTimestamp>
<invokerId>cci-sf-dev14.wellsfargo.com</invokerId>
<activitySourceId>P7</activitySourceId>
<activitySourceIdType>FNC</activitySourceIdType>
<hostName>cci-sf-dev14.wellsfargo.com</hostName>
<billingAU>05426</billingAU>
<originatorId>287586861901211</originatorId>
<originatorIdType>ECN</originatorIdType>
<initiatorId>GTST0793</initiatorId>
<initiatorIdType>ACF2</initiatorIdType>
</WFContext>
</soapenv:Header>
<soapenv:Body>
<getCustomerInformation xmlns="http://service.wellsfargo.com/provider/ecpr/customerProfile/inquiry/getCustomerInformation/2012/05/">
<initiatorInformation xmlns="http://service.wellsfargo.com/provider/ecpr/shared/common/2011/11/">
<channelInfo>
<initiatorCompanyNbr xmlns="http://service.wellsfargo.com/entity/message/2003/">114</initiatorCompanyNbr>
</channelInfo>
</initiatorInformation>
<custNbr xmlns="http://service.wellsfargo.com/entity/party/2003/">287586861901211</custNbr>
<customerViewList xmlns="http://service.wellsfargo.com/provider/ecpr/customerProfile/inquiry/getCustomerInformationCommon/2012/05/">
<customerView>
<customerViewType>GENERAL_INFORMATION_201205</customerViewType>
<preferences>
<generalInformationPreferences201205 xmlns="http://service.wellsfargo.com/provider/ecpr/customerProfile/inquiry/common/2012/05/">
<formattedNameIndicator xmlns="">true</formattedNameIndicator>
<includeTaxCertificationIndicator xmlns="">true</includeTaxCertificationIndicator>
</generalInformationPreferences201205>
</preferences>
</customerView>
<customerView>
<customerViewType>SEGMENT_LIST</customerViewType>
</customerView>
<customerView>
<customerViewType>LIMITED_PROFILE_REQUIRED_DATA</customerViewType>
</customerView>
<customerView>
<customerViewType>INDIVIDUAL_CUSTOMER_GENERAL_INFORMATION_201205</customerViewType>
<preferences>
<individualGeneralInformationPreferences xmlns="http://service.wellsfargo.com/provider/ecpr/customerProfile/inquiry/common/2012/05/">
<includeMinorIndicator xmlns="">true</includeMinorIndicator>
</individualGeneralInformationPreferences>
</preferences>
</customerView>
</customerViewList>
</getCustomerInformation>
</soapenv:Body>
</soapenv:Envelope>
I am trying to access the getCustomerInformation tag using relative XPath in VBScript.
XMLDataFile = "C:\testReqfile.xml"
Set xmlDoc = XMLUtil.CreateXML()
xmlDoc.LoadFile(XMLDataFile)
Print xmlDoc.ToString
'xmlDoc.AddNamespace "ns","xmlns:soapenv=http://schemas.xmlsoap.org/soap/envelope/"
Set childrenObj = xmlDoc.ChildElementsByPath("//*[contains(#xmlns,'getCustomerInformation')]")
msgbox childrenObj.Count
But is failing to return a node.
Your XPath expression does not work because xmlns as in
<getCustomerInformation xmlns="http://service.wellsfargo.com/provider/ecpr/customerProfile/inquiry/getCustomerInformation/2012/05/">
is a default namespace, not an attribute. Therefore, it cannot be accessed with #xmlns.
But it seems you do not have to rely on the namespace at all, because the element name ("getCustomer Information") is telling already. To bypass the problem of those elements being in a namespace, use local-name() to select elements by their name.
Set childrenObj = xmlDoc.ChildElementsByPath("//*[local-name() = 'getCustomerInformation']")
As #Mathias Müller already explained in his answer, xmlns defines a namespace and can thus not be accessed like a regular attribute. I don't have experience with XmlUtil, but in standard VBScript you could select the node(s) like this:
Set xml = CreateObject("Msxml2.DOMDocument.6.0")
xml.async = False
xml.load "C:\path\to\your.xml"
If xml.ParseError Then
WScript.Echo xml.ParseError.Reason
WScript.Quit 1
End If
'define a namespace alias "ns"
uri = "http://service.wellsfargo.com/provider/ecpr/customerProfile/inquiry/getCustomerInformation/2012/05/"
xml.setProperty "SelectionNamespaces", "xmlns:ns='" & uri & "'"
'select nodes using the namespace alias
Set nodes = xml.SelectNodes("//ns:getCustomerInformation")

Get value of XML attribute with namespace

I'm parsing a pptx file and ran into an issue. This is a sample of the source XML:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<p:presentation xmlns:a="http://schemas.openxmlformats.org/drawingml/2006/main" xmlns:p="http://schemas.openxmlformats.org/presentationml/2006/main" xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships">
<p:sldMasterIdLst>
<p:sldMasterId id="2147483648" r:id="rId2"/>
</p:sldMasterIdLst>
<p:sldIdLst>
<p:sldId id="256" r:id="rId3"/>
</p:sldIdLst>
<p:sldSz cx="10080625" cy="7559675"/>
<p:notesSz cx="7772400" cy="10058400"/>
</p:presentation>
I need to to get the r:id attribute value in the sldMasterId tag.
doc = Nokogiri::XML(path_to_pptx)
doc.xpath('p:presentation/p:sldMasterIdLst/p:sldMasterId').attr('id').value
returns 2147483648 but I need rId2, which is the r:id attribute value.
I found the attribute_with_ns(name, namespace) method, but
doc.xpath('p:presentation/p:sldMasterIdLst/p:sldMasterId').attribute_with_ns('id', 'r')
returns nil.
You can reference the namespace of attributes in your xpath the same way you reference element namespaces:
doc.xpath('p:presentation/p:sldMasterIdLst/p:sldMasterId/#r:id')
If you want to use attribute_with_ns, you need to use the actual namespace, not just the prefix:
doc.at_xpath('p:presentation/p:sldMasterIdLst/p:sldMasterId')
.attribute_with_ns('id', "http://schemas.openxmlformats.org/officeDocument/2006/relationships")
http://nokogiri.org/Nokogiri/XML/Node.html#method-i-attributes
If you need to distinguish attributes with the same name, with different namespaces use attribute_nodes instead.
doc.xpath('p:presentation/p:sldMasterIdLst/p:sldMasterId').each do |element|
element.attribute_nodes().select do |node|
puts node if node.namespace && node.namespace.prefix == "r"
end
end

Resources