Working with XML body from Soap call with Nokogiri in Ruby - ruby

I'm writing a Ruby script to make a Postman SOAP POST call, then using Nokogiri to to parse the XML response. When I take the full SOAP call response from Postman, copy it into my editor and manually take the XML body and decode it and format it online I'm able to use the following Nokogiri script successfully:
doc = Nokogiri::XML(File.open("response.xml"))
property_ids = []
doc.css('Property').each do |property|
puts "Property ID: #{property['PropertyId']}"
property_ids << property['PropertyId']
end
property_ids.each_with_index do |property_id, index|
puts "index: #{index}"
puts "property id: #{property_id}"
end
Where I run into the problem is when I want to include in the script the Ruby snippet of the Postman call:
require 'nokogiri'
require 'uri'
require 'net/http'
require 'openssl'
url = URI("https://esite.thelyndco.com/AmsiWeb/eDexWeb/esite/leasing.asmx")
http = Net::HTTP.new(url.host, url.port)
http.use_ssl = true
http.verify_mode = OpenSSL::SSL::VERIFY_NONE
request = Net::HTTP::Post.new(url)
request["content-type"] = 'application/soap+xml'
request["cache-control"] = 'no-cache'
request["postman-token"] = '916e3f3d-11ca-e8cf-2066-542b009a281d'
request.body = "<?xml version=\"1.0\" encoding=\"utf-8\"?>\r\n<soap12:Envelope xmlns:xsi=\"http://www.w3.org/2001/XMLSchema-instance\" xmlns:xsd=\"http://www.w3.org/2001/XMLSchema\" xmlns:soap12=\"http://www.w3.org/2003/05/soap-envelope\">\r\n <soap12:Body>\r\n <GetPropertyList xmlns=\"http://tempuri.org/\">\r\n <UserID>updater</UserID>\r\n <Password>[password]</Password>\r\n <PortfolioName>[portfolio name]</PortfolioName>\r\n <XMLData> \r\n</XMLData>\r\n </GetPropertyList>\r\n </soap12:Body>\r\n</soap12:Envelope>"
response = http.request(request)
doc = Nokogiri::XML(response.body)
# doc = Nokogiri::XML(File.open("full-response.xml"))
# doc.at('GetPropertyListResponse').text
What I want to do is take the full SOAP response with the SOAP envelope and be able to process it in my script without having to cut and paste; manually decoding and formatting using online XML formatters.
Commented out are a couple of lines that I tried from Stack Overflow. Is it possible to decode and format the XML body with Nokogiri or to parse out the SOAP envelope?
edit:
By decoding the XML I mean taking:
<GetPropertyListResult><Properties><Property PropertyId="11A" PropertyName1="1111 Austin Hwy" PropertyName2="" PropertyAddrLine1="The 1111" PropertyAddrLine2="1111 Austin Highway" PropertyAddrLine3="" PropertyAddrLine4="" PropertyAddrCity="San Antonio" PropertyAddrState="TX" PropertyAddrZipCode="78209" PropertyAddrCountry="" PropertyAddrEmail="" RemitToAddrLine1="The 1111" RemitToAddrLine2="1111 Austin Highway" RemitToAddrLine3="" RemitToAddrLine4="" RemitToAddrCity="San Antonio" RemitToAddrState="TX" RemitToAddrZipCode="78209" RemitToAddrCountry="" LiveDate="2013-12-04T00:00:00" MgrOffPhoneNo="210-804-1100" MgrFaxNo="" MgrSalutation="" MgrFirstName="" MgrMiName="" MgrLastName="" MonthEndInProcess="N"><Amenity PropertyId="11A"
and decoding it into using this online XML decoder:
<GetPropertyListResult><Properties><Property PropertyId="11A" PropertyName1="1111 Austin Hwy" PropertyName2="" PropertyAddrLine1="The 1111" PropertyAddrLine2="1111 Austin Highway" PropertyAddrLine3="" PropertyAddrLine4="" PropertyAddrCity="San Antonio" PropertyAddrState="TX" PropertyAddrZipCode="78209" PropertyAddrCountry="" PropertyAddrEmail="" RemitToAddrLine1="The 1111" RemitToAddrLine2="1111 Austin Highway" RemitToAddrLine3="" RemitToAddrLine4="" RemitToAddrCity="San Antonio" RemitToAddrState="TX" RemitToAddrZipCode="78209" RemitToAddrCountry="" LiveDate="2013-12-04T00:00:00" MgrOffPhoneNo="210-804-1100" MgrFaxNo="" MgrSalutation="" MgrFirstName="" MgrMiName="" MgrLastName="" MonthEndInProcess="N"><Amenity PropertyId="11A"
then running it through an XML formatter so that nested elements are indented for legibility.

You can use this code to decode and format the XML:
require "nokogiri"
XML_CHAR_ENTITIES = {
"lt" => "<",
"gt" => ">",
"amp" => "&",
"num" => "#",
"comma" => ","
}
xsl =<<XSL
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" encoding="UTF-8" indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:template match="/">
<xsl:copy-of select="."/>
</xsl:template>
</xsl:stylesheet>
XSL
xml = '<GetPropertyListResult><Properties><Property PropertyId="11A" PropertyName1="1111 Austin Hwy" PropertyName2="" PropertyAddrLine1="The 1111" PropertyAddrLine2="1111 Austin Highway" PropertyAddrLine3="" PropertyAddrLine4="" PropertyAddrCity="San Antonio" PropertyAddrState="TX" PropertyAddrZipCode="78209" PropertyAddrCountry="" PropertyAddrEmail="" RemitToAddrLine1="The 1111" RemitToAddrLine2="1111 Austin Highway" RemitToAddrLine3="" RemitToAddrLine4="" RemitToAddrCity="San Antonio" RemitToAddrState="TX" RemitToAddrZipCode="78209" RemitToAddrCountry="" LiveDate="2013-12-04T00:00:00" MgrOffPhoneNo="210-804-1100" MgrFaxNo="" MgrSalutation="" MgrFirstName="" MgrMiName="" MgrLastName="" MonthEndInProcess="N"><Amenity PropertyId="11A"></GetPropertyListResult>'
xml = xml.gsub(/&(\w+);/) do |match|
char_entity = XML_CHAR_ENTITIES[$1]
char_entity ? char_entity : match
end
doc = Nokogiri::XML(xml)
xslt = Nokogiri::XSLT(xsl)
xml = xslt.transform(doc)
puts "#{xml}"
The XML provided was incomplete, so this terminating string was appended to allow it to be parsed: ></GetPropertyListResult>
The XML_CHAR_ENTITIES provides a hash of encoded strings to decoded strings, and can be easily extended to include other XML character entities, such as those documented at the W3 Character Entity Reference Chart.
XSL is an embedded stylesheet that is used to format the XML for output with Nokogiri.
Decoding the XML character entities is done with the String#gsub call using the block option. The XML is then successfully parsed by Nokogiri. Once the XML is parsed, it is formatted using Nokogiri XSLT transformation.
The output of this code is:
<?xml version="1.0" encoding="UTF-8"?>
<GetPropertyListResult>
<Properties>
<Property PropertyId="11A" PropertyName1="1111 Austin Hwy" PropertyName2="" PropertyAddrLine1="The 1111" PropertyAddrLine2="1111 Austin Highway" PropertyAddrLine3="" PropertyAddrLine4="" PropertyAddrCity="San Antonio" PropertyAddrState="TX" PropertyAddrZipCode="78209" PropertyAddrCountry="" PropertyAddrEmail="" RemitToAddrLine1="The 1111" RemitToAddrLine2="1111 Austin Highway" RemitToAddrLine3="" RemitToAddrLine4="" RemitToAddrCity="San Antonio" RemitToAddrState="TX" RemitToAddrZipCode="78209" RemitToAddrCountry="" LiveDate="2013-12-04T00:00:00" MgrOffPhoneNo="210-804-1100" MgrFaxNo="" MgrSalutation="" MgrFirstName="" MgrMiName="" MgrLastName="" MonthEndInProcess="N">
<Amenity PropertyId="11A"/>
</Property>
</Properties>
</GetPropertyListResult>

Related

FreeMarker output being encoded to unicode when proceeded by question mark

I'm parsing a JSON file with the following property
{
"xml": "<?xml version=\"1.0\" encoding=\"UTF-8\"?>"
}
After I output the property it looks like this
obj.xml?json_string
{
"xml": "\u003C?xml version=\"1.0\" encoding=\"UTF-8\"?>"
}
How can I stop FreeMarker from escaping/encoding the "<" character when followed by a question mark?
I came up with two solutions...
The first one is more manual:
<#assign obj ={"xml": "<?xml version=\"1.0\" encoding=\"UTF-8\"?>"}>
${(obj.xml?json_string)?replace("\\u003C","<")}
The second one is more direct:
<#assign obj ={"xml": "<?xml version=\"1.0\" encoding=\"UTF-8\"?>"}>
${obj.xml?j_string}
Both give the desired output <?xml version=\"1.0\" encoding=\"UTF-8\"?>

Ruby nokogiri attribute selector in XML file

this is the xml file:
<?xml version="1.0" encoding="UTF-8"?>
<SOAP-ENV:Envelope SOAP-ENV:encodingStyle="http://schemas.xmlsoap.org/soap/encoding/"
xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/"
xmlns:xsd="http://www.w3.org/2001/XMLSchema"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:SOAP-ENC="http://schemas.xmlsoap.org/soap/encoding/">
<SOAP-ENV:Body>
<ns1:putResponse
xmlns:ns1="urn:DmsManagerClient">
<result xsi:type="xsd:string">
<?xml version="1.0" encoding="ISO-8859-1"?>
<MESSAGE ID="11c73b9e-687c-4300-baba-b743c26f7c83" TYPE="CUSDMS">
<DELIVERY>
<FROM>
<SENDER>0072000</SENDER>
<SERVICE>eService</SERVICE>
<DATE>2019-03-08T12:27:25</DATE>
</FROM>
<TO>
<DEALER DEALERCODE="0072000" MARKETCODE="1000"/>
</TO>
</DELIVERY>
<CONTENT>
<dms:ComplexResponse ErrorCode="430" ErrorDescription="null : PrivacyUE Mancante" Return="false"
xmlns:dms="http://dmsmanagerservice">
<dms:Element Name="DMSVERSION">2.7</dms:Element>
</dms:ComplexResponse>
</CONTENT>
</MESSAGE>
</result>
</ns1:putResponse>
</SOAP-ENV:Body>
</SOAP-ENV:Envelope>
I am coding with Ruby and I used Nokogiri and the method xpath to extrapole the "CONTENT" of the file
this is the code:
def extrapolate_error(xml)
doc = Nokogiri::XML(File.open(xml))
doc.xpath('//CONTENT')
end
and this is the result:
[#<Nokogiri::XML::Element:0x1c5ba78 name="CONTENT" children=[
#<Nokogiri::XML::Text:0x1c5b940 "\n">,
#<Nokogiri::XML::Element:0x1c5b8bc name="ComplexResponse" namespace=#<Nokogiri::XML::Namespace:0x1c5b88c prefix="dms" href="http://dmsmanagerservice">
attributes=[
#<Nokogiri::XML::Attr:0x1c5b874 name="ErrorCode" value="430">,
#<Nokogiri::XML::Attr:0x1c5b868 name="ErrorDescription" value="null : PrivacyUE Mancante">,
#<Nokogiri::XML::Attr:0x1c5b85c name="Return" value="false">]
children=[#<Nokogiri::XML::Text:0x1c5b118 "\n">,
#<Nokogiri::XML::Element:0x1c5b094 name="Element" namespace=#<Nokogiri::XML::Namespace:0x1c5b88c prefix="dms" href="http://dmsmanagerservice">
attributes=[#<Nokogiri::XML::Attr:0x1c5b058 name="Name" value="DMSVERSION">]
children=[#<Nokogiri::XML::Text:0x1c5abe4 "2.7">]>,
#<Nokogiri::XML::Text:0x1c5aaac "\n">]>,
#<Nokogiri::XML::Text:0x1c5a974 "\n">]>]
Now I need to enter in it and select some attributes.
In the specific I need this:
name="ErrorCode" value="430"
name="ErrorDescription" value="null : PrivacyUE Mancante"
I do not know how to procceed. Can you help me?
The following should work for you assuming the dms namespace is always the same
doc.xpath('//CONTENT/dms:ComplexResponse', dms: 'http://dmsmanagerservice')
.xpath('#ErrorCode | #ErrorDescription')
.each_with_object({}) do |e,obj|
obj[e.name] = e.text
end
#=> {"ErrorCode"=>"430", "ErrorDescription"=>"null : PrivacyUE Mancante"}
You already understand how you got to //CONTENT so from there we use dms:ComplexResponse to navigate deeper but since this is namespaced we have to provide the namespace reference e.g. dms: 'http://dmsmanagerservice'.
Then we select the attributes we are interested in #ErrorCode and #ErrorDescription.
In XPath the pipe | means UNION (think AND) so we want to select both.
Then we are just building a Hash using the name as the key and the text as the value.
XPath Cheatsheet - Useful resource if you need additional reference
Update
You asked about conditionals so this is what I would propose
ndoc = Nokogiri::XML(doc)
namespaces = ndoc.collect_namespaces
response = ndoc.xpath("//CONTENT/dms:ComplexResponse", namespaces)
if response.xpath("self::node()[#ErrorCode != '' and #ErrorDescription != '']").any?
response.xpath("#ErrorCode | #ErrorDescription")
.each_with_object({}) do |e,obj|
obj[e.name] = e.text
end
else
response.xpath('dms:Element/#Name | dms:Element/text()',namespaces)
.each_slice(2)
.map {|s| s.map(&:text)}.to_h
end
This checks to see if there is an ErrorCode and and ErrorDescription if so then Hash as originally proposed. If Not then it returns all the dms:Elements as a Hash so {"DMSVERSION"=>"2.7"} in this case Functional Example

How to add a comment before XML root node, in Nokogiri?

This is what I'm doing:
xml = Nokogiri::XML('<hello/>')
xml.root.add_previous_sibling(
Nokogiri::XML::Comment.new(
xml, '<!-- how are you? -->'
)
)
This is what I'm trying to achieve:
<?xml version="1.0"?>
<!-- how are you? -->
<hello/>
I'm getting:
ArgumentError: A document may not have multiple root nodes.
What is the right way?
Comment should be added inside xml.children NodeSet.
Here is an example:
xml = Nokogiri::XML('<hello/>')
=> #<Nokogiri::XML::Document:0x3fe1db8d0ed0 name="document" children=[#<Nokogiri::XML::Element:0x3fe1db8d0584 name="hello">]>
xml.children.before(Nokogiri::XML::Comment.new(xml, 'how are you?'))
=> #<Nokogiri::XML::Element:0x3fe1db8d0584 name="hello">
xml.to_s
=> "<?xml version=\"1.0\"?>\n<!--how are you?-->\n<hello/>\n"

How to scrape data from a website using Nokogiri

When I try to scrape the table data from the following link it displays nothing.. `
I write the following code but it gives nothing..I want the table data i.e last Update, weather, temperature from that link which is i given please help me..
url = "http://w1.weather.gov/xml/current_obs/KM89.xml"
docs = Nokogiri::HTML(open(url))
puts docs.css("table")
Go to that page, open your development tools and when you find the response of the request to KM89.xml under Network tab you'll see that it's not returning HTML, but XML like this one:
<?xml version="1.0" encoding="ISO-8859-1"?>
<?xml-stylesheet href="latest_ob.xsl" type="text/xsl"?>
<current_observation version="1.0"
xmlns:xsd="http://www.w3.org/2001/XMLSchema"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:noNamespaceSchemaLocation="http://www.weather.gov/view/current_observation.xsd">
<credit>NOAA's National Weather Service</credit>
<credit_URL>http://weather.gov/</credit_URL>
<image>
<url>http://weather.gov/images/xml_logo.gif</url>
<title>NOAA's National Weather Service</title>
<link>http://weather.gov</link>
</image>
<suggested_pickup>15 minutes after the hour</suggested_pickup>
<suggested_pickup_period>60</suggested_pickup_period>
<location>Dexter B Florence Memorial Field Airport, AR</location>
<station_id>KM89</station_id>
<latitude>34.1</latitude>
<longitude>-93.07</longitude>
<observation_time>Last Updated on Nov 23 2012, 7:56 am CST</observation_time>
<observation_time_rfc822>Fri, 23 Nov 2012 07:56:00 -0600</observation_time_rfc822>
<weather>Light Rain</weather>
<temperature_string>57.0 F (13.8 C)</temperature_string>
<temp_f>57.0</temp_f>
<temp_c>13.8</temp_c>
<relative_humidity>87</relative_humidity>
<wind_string>Northeast at 8.1 MPH (7 KT)</wind_string>
<wind_dir>Northeast</wind_dir>
<wind_degrees>30</wind_degrees>
<wind_mph>8.1</wind_mph>
<wind_kt>7</wind_kt>
<pressure_string>1027.5 mb</pressure_string>
<pressure_mb>1027.5</pressure_mb>
<pressure_in>30.30</pressure_in>
<dewpoint_string>52.9 F (11.6 C)</dewpoint_string>
<dewpoint_f>52.9</dewpoint_f>
<dewpoint_c>11.6</dewpoint_c>
<windchill_string>55 F (13 C)</windchill_string>
<windchill_f>55</windchill_f>
<windchill_c>13</windchill_c>
<visibility_mi>10.00</visibility_mi>
<icon_url_base>http://forecast.weather.gov/images/wtf/small/</icon_url_base>
<two_day_history_url>http://www.weather.gov/data/obhistory/KM89.html</two_day_history_url>
<icon_url_name>ra1.png</icon_url_name>
<ob_url>http://www.weather.gov/data/METAR/KM89.1.txt</ob_url>
<disclaimer_url>http://weather.gov/disclaimer.html</disclaimer_url>
<copyright_url>http://weather.gov/disclaimer.html</copyright_url>
<privacy_policy_url>http://weather.gov/notice.html</privacy_policy_url>
</current_observation>
So you can scrape it like this:
require 'open-uri'
require 'nokogiri'
url = 'http://w1.weather.gov/xml/current_obs/KM89.xml'
doc = Nokogiri::HTML(open(url))
p doc.at_css('station_id').text

trying to get content inside cdata tags in xml file using nokogiri

I have seen several things on this, but nothing has seemed to work so far. I am parsing an xml via a url using nokogiri on rails 3 ruby 1.9.2.
A snippet of the xml looks like this:
<NewsLineText>
<![CDATA[
Anna Kendrick is ''obsessed'' with 'Game of Thrones' and loves to cook, particularly creme brulee.
]]>
</NewsLineText>
I am trying to parse this out to get the text associated with the NewsLineText
r = node.at_xpath('.//newslinetext') if node.at_xpath('.//newslinetext')
s = node.at_xpath('.//newslinetext').text if node.at_xpath('.//newslinetext')
t = node.at_xpath('.//newslinetext').content if node.at_xpath('.//newslinetext')
puts r
puts s ? if s.blank? 'NOTHING' : s
puts t ? if t.blank? 'NOTHING' : t
What I get in return is
<newslinetext></newslinetext>
NOTHING
NOTHING
So I know my tags are named/spelled correctly to get at the newslinetext data, but the cdata text never shows up.
What do I need to do with nokogiri to get this text?
You're trying to parse XML using Nokogiri's HMTL parser. If node as from the XML parser then r would be nil since XML is case sensitive; your r is not nil so you're using the HTML parser which is case insensitive.
Use Nokogiri's XML parser and you will get things like this:
>> r = doc.at_xpath('.//NewsLineText')
=> #<Nokogiri::XML::Element:0x8066ad34 name="NewsLineText" children=[#<Nokogiri::XML::Text:0x8066aac8 "\n ">, #<Nokogiri::XML::CDATA:0x8066a9c4 "\n Anna Kendrick is ''obsessed'' with 'Game of Thrones' and loves to cook, particularly creme brulee.\n ">, #<Nokogiri::XML::Text:0x8066a8d4 "\n">]>
>> r.text
=> "\n \n Anna Kendrick is ''obsessed'' with 'Game of Thrones' and loves to cook, particularly creme brulee.\n \n"
and you'll be able to get at the CDATA through r.text or r.children.
Ah I see. What #mu said is correct. But to get at the cdata directly, maybe:
xml =<<EOF
<NewsLineText>
<![CDATA[
Anna Kendrick is ''obsessed'' with 'Game of Thrones' and loves to cook, particularly creme brulee.
]]>
</NewsLineText>
EOF
node = Nokogiri::XML xml
cdata = node.search('NewsLineText').children.find{|e| e.cdata?}

Resources