I have an XML file of 50MB. I need to parse it and convert to CSV.
Following is the XML File
<?xml version="1.0" encoding="ISO-8859-1"?>
<IAPDFirmStateReport GenOn="2019-01-02">
<Firms>
<Firm>
<Info FirmCrdNb="146099" BusNm="PRINCIPA FINANCIAL ADVISORS" LegalNm="CHUNG, BUCK CHWEE"/>
<MainAddr Strt1="15111 WHITTIER BLVD" Strt2="STE 420" City="WHITTIER" State="CA" Cntry="United States" PostlCd="90603" PhNb="562-945-7888" FaxNb="562-968-1885"/>
<MailingAddr/>
<StateRgstn>
<Rgltrs>
<Rgltr Cd="CA" St="APPROVED" Dt="2008-03-13"/>
</Rgltrs>
</StateRgstn>
<ERA>
<Rgltrs/>
</ERA>
</Firm>
<Firm>
<Info FirmCrdNb="170562" SECNb="802-112318" BusNm="ALUMNI VENTURES GROUP" LegalNm="LAUNCH ANGELS MANAGEMENT COMPANY, LLC"/>
<MainAddr Strt1="788 ELM ST" City="MANCHESTER" State="NH" Cntry="United States" PostlCd="03101" PhNb="603-518-8112"/>
<MailingAddr Strt1="889 ELM ST" Strt2="3RD FLOOR" City="MANCHESTER" State="NH" Cntry="United States" PostlCd="03101"/>
<StateRgstn>
<Rgltrs/>
</StateRgstn>
<ERA>
<Rgltrs>
<Rgltr Cd="MA" St="ACTIVE" Dt="2014-02-24"/>
<Rgltr Cd="NH" St="ACTIVE" Dt="2018-07-23"/>
</Rgltrs>
</ERA>
</Firm>
<`/Frims>`
------Almost having 90 Firm tags
So, I need to parse it dynamically using ruby and convert it into CSV. How can I figure out this?
Look at this question - Parsing XML with Ruby
You may also use Ox gem which is not mentioned in above question - take a look here:
https://github.com/ohler55/ox#parsing-xml-into-a-hash-fast
Once you will have your XML converted to Hash you should easily convert it to CSV.
Related
I am making a SOAP webservice call and I get the below response. I want to read the value in internal XML, the value is 12345684 in 1234684 in the below XML.
I was able to get internal XML using #[xpath3('//:processaResponse /return[2]')], store it in a flow variable and #[xpath3('/AckReg/DataArea/PRegistration/PRDet/Person/IDSet/:ID[#schemeName="aid"]/text()')].
This works when I try an online parser, but it doesn't read the value in Mule.
Is there any way to extract 1234684 in oa:ID tag using one XPath.
<soapenv:Envelope xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/">
<soapenv:Header>
<ns3:TXID xmlns:ns3="http://a.d.r.test.com/"></ns3:TXID>
<ns3:SESSIONID xmlns:ns3="http://a.d.r.test.com/"></ns3:SESSIONID>
</soapenv:Header>
<soapenv:Body>
<ns3:processaResponse xmlns:ns3="http://a.d.r.test.com/" xmlns:ns2="http://p.r.test.com/">
<return>Hi</return>
<return>
<?xml version="1.0" encoding="UTF-8"?>
<AckReg
xmlns="http://www.test.com/e/1" languageCode="en-US" releaseID="normalizedString" systemEnvironmentCode="test" versionID="normalizedString"
xmlns:oa="www.test.com/r/9"
xsi:schemaLocation="http://www.test.com/a/1 ../test/test.xsd">
<Apa>
<oa:CreationDateTime>2018-04-05</oa:CreationDateTime>
</Apa>
<DataArea>
<Ack>
<OArea>
<o:Sender>
<o:LID schemeAgencyName="testi" schemeName="Application ID">test</o:LID>
</o:Sender>
</OArea>
<OriginalActionVerb/>
</Ack>
<PRegistration>
<testids>
<IDSet schemeAgencyName="try">
<oa:ID schemeName="abcid">1234684</oa:ID>
</IDSet>
</testids>
<PRDet>
<Person>
<IDSet schemeAgencyName="try">
<oa:ID schemeName="aid">1364561</oa:ID>
</IDSet>
<IDSet schemeAgencyName="enada">
<oa:ID schemeName="Employee ID">adsad</oa:ID>
</IDSet>
</Person>
<User>
<oa:ID/>
</User>
</PRDet>
</PRegistration>
</DataArea>
</AckReg>
</return>
</ns3:processaResponse>
</soapenv:Body>
</soapenv:Envelope>
In your expressions you were missing namespace prefixes or namespace wildcards *: on some nodes - so your expressions failed.
Is there any way to extract 1234684 in oa:ID tag using one XPath.
Combining both of your partial expressions is possible with namespace wildcards:
//*:processaResponse/return[2]/*:AckReg/*:DataArea/*:PRegistration/*:testids/*:IDSet/*:ID[#schemeName='abcid']/text()
Or you can use an absolute path with namespace wildcards:
/*:Envelope/*:Body/*:processaResponse/return[2]/*:AckReg/*:DataArea/*:PRegistration/*:testids/*:IDSet/*:ID[#schemeName='abcid']/text()
Output in both cases:
1234684
You can even use XmlSlurper class using groovy script to fetch that respective value.
root = new XmlSlurper( false, true).parseText(payload).declareNamespace('soapenv':"http://schemas.xmlsoap.org/soap/envelope/")
I have an XML, that as I understand it has already been parsed by tags. My goal is to parse all the information that is in the <GetResidentsContactInfoResult> tag. In this tag of the sample xml below there are two records in here which begin each with the Lease PropertyId key. How can I iterate over the <GetResidentsContactInfoResult> tag and print out the key/value pairs for each record? I'm new to Ruby and working with XML files, is this something I can do with Nokogiri?
<?xml version="1.0" encoding="UTF-8"?>
<soap:Envelope xmlns:soap="http://www.w3.org/2003/05/soap-envelope" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<soap:Body>
<GetResidentsContactInfoResponse xmlns="http://tempuri.org/">
<GetResidentsContactInfoResult><PropertyResidents><Lease PropertyId="21M" BldgID="00" UnitID="0903" ResiID="3" occustatuscode="P" occustatuscodedescription="Previous" MoveInDate="2016-01-07T00:00:00" MoveOutDate="2016-02-06T00:00:00" LeaseBeginDate="2016-01-07T00:00:00" LeaseEndDate="2017-01-31T00:00:00" MktgSource="DBY" PrimaryEmail="noemail1#fake.com"><Occupant PropertyId="21M" BldgID="00" UnitID="0903" ResiID="3" OccuSeqNo="3444755" OccuFirstName="Efren" OccuLastName="Cerda" Phone2No="(832) 693-9448" ResponsibleFlag="Responsible" /></Lease><Lease PropertyId="21M" BldgID="00" UnitID="0908" ResiID="2" occustatuscode="P" occustatuscodedescription="Previous" MoveInDate="2016-02-20T00:00:00" MoveOutDate="2016-04-25T00:00:00" LeaseBeginDate="2016-02-20T00:00:00" LeaseEndDate="2017-02-28T00:00:00" MktgSource="PW" PrimaryEmail="noemail1#fake.com"><Occupant PropertyId="21M" BldgID="00" UnitID="0908" ResiID="2" OccuSeqNo="3451301" OccuFirstName="Donna" OccuLastName="Mclean" Phone2No="(713) 785-4240" ResponsibleFlag="Responsible" /></Lease></PropertyResidents></GetResidentsContactInfoResult>
</GetResidentsContactInfoResponse>
</soap:Body>
</soap:Envelope>
This uses Nokogiri to find all the GetResidentsContactInfoResponse elements, and then Active Support to convert the inner text to a hash of key-value pairs.
Read "sparklemotion/nokogiri" and "Tutorials" regarding installing and using Nokogiri.
Read "Active Support Core Extensions" about more capabilities of Active Support (though the guide does not include Hash.from_xml). To install it simply do gem install activesupport.
I assume you're fine with Nokogiri as you mentioned it in your question.
If you don't want to use Active Support, consider looking into "Convert a Nokogiri document to a Ruby Hash" as an alternative to the line Hash.from_xml(elm.text):
# Needed in order to use the `Hash.from_xml`
require 'active_support/core_ext/hash/conversions'
def find_key_values(str)
doc = Nokogiri::XML(str)
# Ignore namespaces for easier traversal
doc.remove_namespaces!
doc.css('GetResidentsContactInfoResponse').map do |elm|
Hash.from_xml(elm.text)
end
end
Usage:
# Option 1: if your XML above is stored in a variable called `string`
find_key_values string
# Option 2: if your XML above is stored in a file
find_key_values File.open('/path/to/file')
Which returns:
[{"PropertyResidents"=>
{"Lease"=>
[{"PropertyId"=>"21M",
"BldgID"=>"00",
"UnitID"=>"0903",
"ResiID"=>"3",
"occustatuscode"=>"P",
"occustatuscodedescription"=>"Previous",
"MoveInDate"=>"2016-01-07T00:00:00",
"MoveOutDate"=>"2016-02-06T00:00:00",
"LeaseBeginDate"=>"2016-01-07T00:00:00",
"LeaseEndDate"=>"2017-01-31T00:00:00",
"MktgSource"=>"DBY",
"PrimaryEmail"=>"noemail1#fake.com",
"Occupant"=>
{"PropertyId"=>"21M",
"BldgID"=>"00",
"UnitID"=>"0903",
"ResiID"=>"3",
"OccuSeqNo"=>"3444755",
"OccuFirstName"=>"Efren",
"OccuLastName"=>"Cerda",
"Phone2No"=>"(832) 693-9448",
"ResponsibleFlag"=>"Responsible"}},
{"PropertyId"=>"21M",
"BldgID"=>"00",
"UnitID"=>"0908",
"ResiID"=>"2",
"occustatuscode"=>"P",
"occustatuscodedescription"=>"Previous",
"MoveInDate"=>"2016-02-20T00:00:00",
"MoveOutDate"=>"2016-04-25T00:00:00",
"LeaseBeginDate"=>"2016-02-20T00:00:00",
"LeaseEndDate"=>"2017-02-28T00:00:00",
"MktgSource"=>"PW",
"PrimaryEmail"=>"noemail1#fake.com",
"Occupant"=>
{"PropertyId"=>"21M",
"BldgID"=>"00",
"UnitID"=>"0908",
"ResiID"=>"2",
"OccuSeqNo"=>"3451301",
"OccuFirstName"=>"Donna",
"OccuLastName"=>"Mclean",
"Phone2No"=>"(713) 785-4240",
"ResponsibleFlag"=>"Responsible"}}]}}]
I am using soapui for automation testing. I am trying to write a xpath expression to do the property transfer with following xml
<snapshots query="after=2014-04-16 12:30:00-0700" mask="op" xmlns="http://ws.example.com/roma/201907" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<AsOf>2014-04-16T19:20:44-07:00</AsOf>
<offers>
<offer entityId="6500588"/>
<offer entityId="6500589"/>
<offer entityId="6500590"/>
<offer entityId="6557335">
<rubber>KJM</rubber>
<code>B44733</code>
<offerCode>PA</offerCode>
<status name="Completed">C</status>
<startDate basis="GMT-4">2013-04-01</startDate>
<endDate basis="GMT-4">2014-04-15</endDate>
<template>
<sourceKey-Ref entityId="36KTV" code="KPA513C36KTV" uri="/snapshot/sourcekey/36KTV"/>
<pTemplateCode>PPAKXC</pTemplateCode>
<panelCode>HPA5LTM</panelCode>
<itemCode>PA1467</itemCode>
</template>
<venue code="28">
<supervenue>5</supervenue>
</venue>
<currencyDetail currency="USD">
<unitPrice>29.95</unitPrice>
<numberPayments>1</numberPayments>
</currencyDetail>
<hData>
<legacyScriptCode>300</legacyScriptCode>
<hpKeycode>189161</hpKeycode>
<hpProductNumber>014399</hpProductNumber>
<hpMpgCode>300</hpMpgCode>
</hData>
</offer>
<offer entityId="6557336">
<rubber>KJM</rubber>
<code>B44734</code>
<offerCode>VY</offerCode>
<status name="Completed">C</status>
<startDate basis="GMT-4">2013-04-01</startDate>
<endDate basis="GMT-4">2014-04-15</endDate>
<template>
<sourceKey-Ref entityId="36KTV" code="KPA513C36KTV" uri="/snapshot/sourcekey/36KTV"/>
<pTemplateCode>PPAKXC</pTemplateCode>
<panelCode>HPA5LTM</panelCode>
<offerCode>OVYC8UM9</offerCode>
<itemCode>VY4023</itemCode>
</template>
<venue code="28">
<supervenue>5</supervenue>
</venue>
<currencyDetail currency="USD">
<unitPrice>0.00</unitPrice>
<numberPayments>1</numberPayments>
</currencyDetail>
<hData>
<legacyScriptCode>947</legacyScriptCode>
<hpKeycode>189162</hpKeycode>
<hpProductNumber>602185</hpProductNumber>
<hpMpgCode>947</hpMpgCode>
</hData>
</offer>
<offer entityId="6557337">
<rubber>KJM</rubber>
<code>B44736</code>
<offerCode>VY</offerCode>
<status name="Completed">C</status>
<startDate basis="GMT-4">2013-04-01</startDate>
<endDate basis="GMT-4">2014-04-15</endDate>
<template>
<sourceKey-Ref entityId="36KTV" code="KPA513C36KTV" uri="/snapshot/sourcekey/36KTV"/>
<pTemplateCode>PPAKXC</pTemplateCode>
<panelCode>HPA5LTM</panelCode>
<offerCode>OVYC8UMA</offerCode>
<itemCode>VY4012</itemCode>
</template>
<venue code="28">
<supervenue>5</supervenue>
</venue>
<currencyDetail currency="USD">
<unitPrice>0.00</unitPrice>
<numberPayments>1</numberPayments>
<firstPaymentAmount>0.00</firstPaymentAmount>
<firstShippingAmount>5.98</firstShippingAmount>
</currencyDetail>
<hData>
<legacyScriptCode>947</legacyScriptCode>
<hpKeycode>189163</hpKeycode>
<hpProductNumber>602094</hpProductNumber>
<hpMpgCode>947</hpMpgCode>
</hData>
</offer>
</offers>
</snapshots>
I would like to have all hpKeycode using local-name() in the XPath expression. I tried
//*[local-name()='hpKeycode']
but this gives me only the first node which is 189161. This
/*[local-name()='hpKeycode'][2]
also does not work. Any help would be greatly appreciated.
You can try with XQuery. In the property transfer step select Use XQuery checkbox as you see in the image below. And use this code:
declare namespace soap='http://schemas.xmlsoap.org/soap/envelope/';
declare namespace ns1='http://ws.example.com/roma/201907';
<ns1:offer>
{
for $id in //*[local-name()='hpKeycode'] return string($id)
}
</ns1:offer>
EDIT:
If you want to avoid the use of XQuery, you can add three Transfers on your property transfer, on each one use:
First id
(//*[local-name()='hpKeycode'])[1]
Second id
(//*[local-name()='hpKeycode'])[2]
Third id
(//*[local-name()='hpKeycode'])[3]
Hope this helps,
This would not work in the way you expect it to because you are pulling a single set of values. You would need to specify what node branch you want further up the tree.
Its like road ways... if you were giving someone directions you would need to inform them which way to go for each fork in the road. Lets say you wanted to tell them how to get to one of 3 possible airports each located in a different city. If you said "as you arrive at airports 1,2, and 3 enter the 2nd". They would be baffled and say something like "what city though?" and maybe "its not possible for them to exist in the same location".
Solution:
Here is what you would want given the xml you provided (both work in soapui).
Xpath 2.0
//*:offer[4]/*:hData/*:hpKeycode
//*:offer[5]/*:hData/*:hpKeycode
//*:offer[6]/*:hData/*:hpKeycode
Xpath 1.0
//*[local-name()='offer'][4]/*[local-name()='hData']/*[local-name()='hpKeycode']
//*[local-name()='offer'][5]/*[local-name()='hData']/*[local-name()='hpKeycode']
//*[local-name()='offer'][6]/*[local-name()='hData']/*[local-name()='hpKeycode']
Hope this more detailed explanation helps!
I'm trying to parse the following XML to extract out the Lat Long combination under //ns2:Point/ns2:pos using Nokogiri XML parser but without much luck.
<?xml version="1.0" encoding="UTF-8"?>
<ns1:XLS ns1:lang="en" rel="5.2.sp03" version="1.0" xmlns:ns1="http://www.opengis.net/xls">
<ns1:ResponseHeader sessionID="wrx-rails1370997540"/>
<ns1:Response numberOfResponses="1" requestID="10" version="1.0">
<ns1:GeocodeResponse>
<ns1:GeocodeResponseList numberOfGeocodedAddresses="1">
<ns1:GeocodedAddress>
<ns2:Point xmlns:ns2="http://www.opengis.net/gml">
<ns2:pos>38.898331 -77.117273</ns2:pos>
</ns2:Point>
<ns1:Address countryCode="US">
<ns1:StreetAddress>
<ns1:Building number="4400"/>
<ns1:Street>Lee Hwy</ns1:Street>
</ns1:StreetAddress>
<ns1:Place type="CountrySubdivision">VA</ns1:Place>
<ns1:Place type="CountrySecondarySubdivision">Arlington</ns1:Place>
<ns1:Place type="MunicipalitySubdivision">Arlington</ns1:Place>
<ns1:PostalCode>22207</ns1:PostalCode>
</ns1:Address>
<ns1:GeocodeMatchCode accuracy="1.0" matchType="ADDRESS POINT LOOKUP"/>
<ns1:SpatialKeys>
<ns1:SpatialKey priority="0" val="1663355010"/>
<ns1:SpatialKey priority="1" val="2563322400"/>
<ns1:SpatialKey priority="2" val="3325185160"/>
<ns1:SpatialKey priority="3" val="3784086306"/>
<ns1:SpatialKey priority="4" val="4033029320"/>
<ns1:SpatialKey priority="5" val="4162373938"/>
<ns1:SpatialKey priority="6" val="4228264524"/>
<ns1:SpatialKey priority="7" val="4261514387"/>
<ns1:SpatialKey priority="8" val="4278215460"/>
<ns1:SpatialKey priority="9" val="4286585033"/>
<ns1:SpatialKey priority="10" val="4290774578"/>
<ns1:SpatialKey priority="11" val="4292870540"/>
<ns1:SpatialKey priority="12" val="4293918819"/>
<ns1:SpatialKey priority="13" val="4294443032"/>
<ns1:SpatialKey priority="14" val="4294705158"/>
<ns1:SpatialKey priority="15" val="4294836224"/>
</ns1:SpatialKeys>
</ns1:GeocodedAddress>
</ns1:GeocodeResponseList>
</ns1:GeocodeResponse>
</ns1:Response>
</ns1:XLS>
I get back an empty array when i try the following:
doc = Nokogiri::XML(response.body);
pos = doc.xpath('//ns2:Point/ns2:pos');
I can access Geocoded address element however just fine using:
doc.xpath('//ns1:GeocodeResponseList/ns1:GeocodedAddress')
Any clues as to what i'm missing here. Is it the namespace changing which it doesn't like for some reason?
My Environment is as follows:
Nokogiri 1.5.9 Java
Rails 3.2.11
jRuby 1.7.4
Windows 7 Box
You can find the first expression because Nokogiri found the XML namespace where it expected one. The ns2 namespace isn't where we'd normally find it so Nokogiri doesn't know what to do.
There are multiple ways to deal with this. The first is to gather the namespaces in the document and pass them to Nokogiri when you do your search. Nokogiri does this automatically for namespaces in the XML root, but not if they're sprinkled throughout the document, so we have to tell it to search everywhere, then pass them in:
namespaces = doc.collect_namespaces
namespaces # => {"xmlns:ns1"=>"http://www.opengis.net/xls", "xmlns:ns2"=>"http://www.opengis.net/gml"}
pos = doc.xpath('//ns2:Point/ns2:pos', namespaces);
pos # => [#<Nokogiri::XML::Element:0x3fe8c608ab30 name="pos" namespace=#<Nokogiri::XML::Namespace:0x3fe8c608aacc prefix="ns2" href="http://www.opengis.net/gml"> children=[#<Nokogiri::XML::Text:0x3fe8c608e1b8 "38.898331 -77.117273">]>]
An alternate is to tell Nokogiri to remove all namespaces from the document. You only want to do that if you're sure there are no collisions between tag names found in the various namespaces in the document:
doc.remove_namespaces!
pos = doc.xpath('//Point/pos', namespaces);
pos # => [#<Nokogiri::XML::Element:0x3fe8c608ab30 name="pos" children=[#<Nokogiri::XML::Text:0x3fe8c608e1b8 "38.898331 -77.117273">]>]
The Nokogiri documentation has this to say about the use of remove_namespaces!:
But I’m Lazy and Don’t Want to Deal With Namespaces!
Lazy == Efficient, so no judgements. :)
If you have an XML document with namespaces, but would prefer to ignore them entirely (and query as if Tim Bray had never invented them), then you can call remove_namespaces on an XML::Document to remove all namespaces. Of course, if the document had nodes with the same names but different namespaces, they will now be ambiguous. But you’re lazy! You don’t care!
I'm having problems in decoding a string using Base64.decode64 in Ruby. As a test, I'm using this site that decodes strings in php: https://rnd.feide.no/simplesaml/module.php/saml2debug/debug.php.
As a test, I'm using this string:
fZJNT%2BMwEIbvSPwHy%2Fd8tMvHympSdUGISuwS0cCBm%2BtMUwfbk%2FU4zfLvSVMq2Euv45n3fd7xzOb%2FrGE78KTRZXwSp5yBU1hpV2f8ubyLfvJ5fn42I2lNKxZd2Lon%2BNsBBTZMOhLjQ8Y77wRK0iSctEAiKLFa%2FH4Q0zgVrceACg1ny9uMy7rCdaM2%2Bs0BWrtppK2UAdeoVjW2ruq1bevGImcvR6zpHmtJ1MHSUZAuDKU0vY7Si2h6VU5%2BiMuJuLx65az4dPql3SHBKaz1oYnEfVkWUfG4KkeBna7A%2Fxm6M14j1gZihZazBRH4MODcoKPOgl%2BB32kFz08PGd%2BG0JJIkr7v46%2BhRCaEpod17DCRivYZCkmkd4N28B3wfNyrGKP5bws9DS6PKDz%2FMpsl36Tyz%2F%2Fax1jeFmi0emcLY7C%2F8SDD0Z7dobcynHbbV3QVbcZW0TlqQemNhoqzJD%2B4%2Fn8Yw7l8AA%3D%3D
The output should be:
<?xml version="1.0" encoding="UTF-8"?>
<samlp:AuthnRequest xmlns:samlp="urn:oasis:names:tc:SAML:2.0:protocol" ID="agdobjcfikneommfjamdclenjcpcjmgdgbmpgjmo" Version="2.0" IssueInstant="2007-04-26T13:51:56Z" ProtocolBinding="urn:oasis:names:tc:SAML:2.0:bindings:HTTP-POST" ProviderName="google.com" AssertionConsumerServiceURL="https://www.google.com/a/solweb.no/acs" IsPassive="true"><saml:Issuer xmlns:saml="urn:oasis:names:tc:SAML:2.0:assertion">google.com</saml:Issuer><samlp:NameIDPolicy AllowCreate="true" Format="urn:oasis:names:tc:SAML:2.0:nameid-format:unspecified" /></samlp:AuthnRequest>
But in Ruby I keep getting a strange output:
}\222MO?0\206?H?\a??|\264???jRuA\210J????\233?LS\aۓ?8???IS*?K\257??}????\377\254a;??e|\022\247\234\201SXiWg????~?y~~6#iM+\026]غ'??6L:\022?C?;?J?$\234\264#\"(\261Z?~?8\255ǀ\n\rg??\214˺?u\2436??Z?i\244\255\224\001רV5\266\256?m??"g/G\254?kI???Q\220.\f\2454\275\216ҋhzUN~\210ˉ\270\274z??t???!?)\254????}Y\026Q?*G\201\235\256??\031\2723^#?b\205\226\263\005\021?0?ܠ\243_\201?i\005?O\017\031߆ВH\222\276??\241D&\204\246\207u?0?\212?
I\244w\203v??|ܫ\030\243?o
.\217(<\3772\233%ߤ?????X?h\264zg\vc\260\277? ??\236ݡ\2672\234v?Wt\025m?V?9jA鍆\212\263$?\270\376\177\030ù|\000
The code I use is:
require 'cgi'
require 'base64'
Base64::decode64(CGI::unescape('fZJNT%2BMwEIbvSPwHy%2Fd8tMvHympSdUGISuwS0cCBm%2BtMUwfbk%2FU4zfLvSVMq2Euv45n3fd7xzOb%2FrGE78KTRZXwSp5yBU1hpV2f8ubyLfvJ5fn42I2lNKxZd2Lon%2BNsBBTZMOhLjQ8Y77wRK0iSctEAiKLFa%2FH4Q0zgVrceACg1ny9uMy7rCdaM2%2Bs0BWrtppK2UAdeoVjW2ruq1bevGImcvR6zpHmtJ1MHSUZAuDKU0vY7Si2h6VU5%2BiMuJuLx65az4dPql3SHBKaz1oYnEfVkWUfG4KkeBna7A%2Fxm6M14j1gZihZazBRH4MODcoKPOgl%2BB32kFz08PGd%2BG0JJIkr7v46%2BhRCaEpod17DCRivYZCkmkd4N28B3wfNyrGKP5bws9DS6PKDz%2FMpsl36Tyz%2F%2Fax1jeFmi0emcLY7C%2F8SDD0Z7dobcynHbbV3QVbcZW0TlqQemNhoqzJD%2B4%2Fn8Yw7l8AA%3D%3D'))
What could possibly be wrong? Thanks in advance.
I have no idea where you got the idea that that string of yours is a base64-encoded version of your XML. If you pass the first bit of it (<?x) through Base64.encode64() then CGI.escape(), you get:
PD94
at the start, which is nothing like your string. In fact, your first four characters "fZJN" are values 31, 25, 9 and 13 in base 64 so will give you:
011111 011001 001001 001101
then, grouping them in octets instead of sextets (I guess that's the right word):
01111101 10010010 01001101
7D 92 4D
which are not the characters you're expecting to see.
Putting the whole string in gives you:
PD94bWwgdmVyc2lvbj0iMS4wIiBlbmNvZGluZz0iVVRGLTgiPz4gPHNhbWxw
OkF1dGhuUmVxdWVzdCB4bWxuczpzYW1scD0idXJuOm9hc2lzOm5hbWVzOnRj
OlNBTUw6Mi4wOnByb3RvY29sIiBJRD0iYWdkb2JqY2Zpa25lb21tZmphbWRj
bGVuamNwY2ptZ2RnYm1wZ2ptbyIgVmVyc2lvbj0iMi4wIiBJc3N1ZUluc3Rh
bnQ9IjIwMDctMDQtMjZUMTM6NTE6NTZaIiBQcm90b2NvbEJpbmRpbmc9InVy
bjpvYXNpczpuYW1lczp0YzpTQU1MOjIuMDpiaW5kaW5nczpIVFRQLVBPU1Qi
IFByb3ZpZGVyTmFtZT0iZ29vZ2xlLmNvbSIgQXNzZXJ0aW9uQ29uc3VtZXJT
ZXJ2aWNlVVJMPSJodHRwczovL3d3dy5nb29nbGUuY29tL2Evc29sd2ViLm5v
L2FjcyIgSXNQYXNzaXZlPSJ0cnVlIj48c2FtbDpJc3N1ZXIgeG1sbnM6c2Ft
bD0idXJuOm9hc2lzOm5hbWVzOnRjOlNBTUw6Mi4wOmFzc2VydGlvbiI+Z29v
Z2xlLmNvbTwvc2FtbDpJc3N1ZXI+PHNhbWxwOk5hbWVJRFBvbGljeSBBbGxv
d0NyZWF0ZT0idHJ1ZSIgRm9ybWF0PSJ1cm46b2FzaXM6bmFtZXM6dGM6U0FN
TDoyLjA6bmFtZWlkLWZvcm1hdDp1bnNwZWNpZmllZCIgLz48L3NhbWxwOkF1
dGhuUmVxdWVzdD4=
When you escape that, you get:
PD94bWwgdmVyc2lvbj0iMS4wIiBlbmNvZGluZz0iVVRGLTgiPz4gPHNhbWxw%0AOkF1dGhuUmVxdWVzdCB4bWxuczpzYW1scD0idXJuOm9hc2lzOm5hbWVzOnRj%0AOlNBTUw6Mi4wOnByb3RvY29sIiBJRD0iYWdkb2JqY2Zpa25lb21tZmphbWRj%0AbGVuamNwY2ptZ2RnYm1wZ2ptbyIgVmVyc2lvbj0iMi4wIiBJc3N1ZUluc3Rh%0AbnQ9IjIwMDctMDQtMjZUMTM6NTE6NTZaIiBQcm90b2NvbEJpbmRpbmc9InVy%0AbjpvYXNpczpuYW1lczp0YzpTQU1MOjIuMDpiaW5kaW5nczpIVFRQLVBPU1Qi%0AIFByb3ZpZGVyTmFtZT0iZ29vZ2xlLmNvbSIgQXNzZXJ0aW9uQ29uc3VtZXJT%0AZXJ2aWNlVVJMPSJodHRwczovL3d3dy5nb29nbGUuY29tL2Evc29sd2ViLm5v%0AL2FjcyIgSXNQYXNzaXZlPSJ0cnVlIj48c2FtbDpJc3N1ZXIgeG1sbnM6c2Ft%0AbD0idXJuOm9hc2lzOm5hbWVzOnRjOlNBTUw6Mi4wOmFzc2VydGlvbiI%2BZ29v%0AZ2xlLmNvbTwvc2FtbDpJc3N1ZXI%2BPHNhbWxwOk5hbWVJRFBvbGljeSBBbGxv%0Ad0NyZWF0ZT0idHJ1ZSIgRm9ybWF0PSJ1cm46b2FzaXM6bmFtZXM6dGM6U0FN%0ATDoyLjA6bmFtZWlkLWZvcm1hdDp1bnNwZWNpZmllZCIgLz48L3NhbWxwOkF1%0AdGhuUmVxdWVzdD4%3D%0A
So, the bottom line is that you're getting junk from the decode because the data is not of the correct format.
It appears that the data is also deflated/compressed.
require 'zlib'
inflated=Base64::decode64(CGI::unescape('fZJNT%2BMwEIbvSPwHy%2Fd8tMvHympSdUGISuwS0cCBm%2BtMUwfbk%2FU4zfLvSVMq2Euv45n3fd7xzOb%2FrGE78KTRZXwSp5yBU1hpV2f8ubyLfvJ5fn42I2lNKxZd2Lon%2BNsBBTZMOhLjQ8Y77wRK0iSctEAiKLFa%2FH4Q0zgVrceACg1ny9uMy7rCdaM2%2Bs0BWrtppK2UAdeoVjW2ruq1bevGImcvR6zpHmtJ1MHSUZAuDKU0vY7Si2h6VU5%2BiMuJuLx65az4dPql3SHBKaz1oYnEfVkWUfG4KkeBna7A%2Fxm6M14j1gZihZazBRH4MODcoKPOgl%2BB32kFz08PGd%2BG0JJIkr7v46%2BhRCaEpod17DCRivYZCkmkd4N28B3wfNyrGKP5bws9DS6PKDz%2FMpsl36Tyz%2F%2Fax1jeFmi0emcLY7C%2F8SDD0Z7dobcyHbbV3QVbcZW0TlqQemNhoqzJD%2B4%2Fn8Yw7l8AA%3D%3D'))
zlib = Zlib::Inflate.new(-Zlib::MAX_WBITS)
zlib.inflate(inflated)