FreeMarker output being encoded to unicode when proceeded by question mark - freemarker

I'm parsing a JSON file with the following property
{
"xml": "<?xml version=\"1.0\" encoding=\"UTF-8\"?>"
}
After I output the property it looks like this
obj.xml?json_string
{
"xml": "\u003C?xml version=\"1.0\" encoding=\"UTF-8\"?>"
}
How can I stop FreeMarker from escaping/encoding the "<" character when followed by a question mark?

I came up with two solutions...
The first one is more manual:
<#assign obj ={"xml": "<?xml version=\"1.0\" encoding=\"UTF-8\"?>"}>
${(obj.xml?json_string)?replace("\\u003C","<")}
The second one is more direct:
<#assign obj ={"xml": "<?xml version=\"1.0\" encoding=\"UTF-8\"?>"}>
${obj.xml?j_string}
Both give the desired output <?xml version=\"1.0\" encoding=\"UTF-8\"?>

Related

How to add a comment before XML root node, in Nokogiri?

This is what I'm doing:
xml = Nokogiri::XML('<hello/>')
xml.root.add_previous_sibling(
Nokogiri::XML::Comment.new(
xml, '<!-- how are you? -->'
)
)
This is what I'm trying to achieve:
<?xml version="1.0"?>
<!-- how are you? -->
<hello/>
I'm getting:
ArgumentError: A document may not have multiple root nodes.
What is the right way?
Comment should be added inside xml.children NodeSet.
Here is an example:
xml = Nokogiri::XML('<hello/>')
=> #<Nokogiri::XML::Document:0x3fe1db8d0ed0 name="document" children=[#<Nokogiri::XML::Element:0x3fe1db8d0584 name="hello">]>
xml.children.before(Nokogiri::XML::Comment.new(xml, 'how are you?'))
=> #<Nokogiri::XML::Element:0x3fe1db8d0584 name="hello">
xml.to_s
=> "<?xml version=\"1.0\"?>\n<!--how are you?-->\n<hello/>\n"

How to filter XML elements by date range in Ruby

I typically use Nokogiri as my XML parser.
I have the following XML:
<albums>
<aldo_nova album="aldo nova">
<release_date value="19820401"/>
</aldo_nova>
<classix_nouveaux album="Night People"/>
<release_date value="19820501"/>
</classix_nouveaux>
<engligh_beat album="I Just Can't Stop It"/>
<release_date value="19800501"/>
</engligh_beat>
</albums>
I want to get all albums that were released between 1/1/1980 and 4/15/1982:
<aldo_nova album="aldo nova">
<release_date value="19820401"/>
</aldo_nova>
<engligh_beat album="I Just Can't Stop It"/>
<release_date value="19800501"/>
</engligh_beat>
How do I filter/query the XML by a release_date range?
Your XML is malformed. After parsing, here's what Nokogiri has to say about it:
doc.errors
# => [#<Nokogiri::XML::SyntaxError: Opening and ending tag mismatch: albums line 1 and classix_nouveaux>,
# #<Nokogiri::XML::SyntaxError: Extra content at the end of the document>]
That's because:
<classix_nouveaux album="Night People"/>
and
<engligh_beat album="I Just Can't Stop It"/>
are terminated. Instead they should be:
<classix_nouveaux album="Night People">
and
<engligh_beat album="I Just Can't Stop It">
You can use CSS or XPath selectors to find exact matches, or even sub-string matches, but neither CSS or XPath understand "ranges" of dates, nor do they have an idea of what a Date is, so you'd have to extract all nodes, convert the date value into a Date object or integer in this case, then compare to the range:
date_range = 19800501..19820401
selected_albums = doc.search('//release_date').select { |rd| date_range.include?(rd['value'].to_i) }.map { |rd| rd.parent }
selected_albums.map(&:to_xml)
# => ["<aldo_nova album=\"aldo nova\">\n" +
# " <release_date value=\"19820401\"/>\n" +
# "</aldo_nova>",
# "<engligh_beat album=\"I Just Can't Stop It\">\n" +
# " <release_date value=\"19800501\"/>\n" +
# "</engligh_beat>"]
I think your XML is poorly designed because you have varying tag names for what should be an album. <album> should be a child of <albums>. I'd recommend something like this:
<collection>
<albums>
<album band="aldo nova" title="aldo nova" release_date="19820401"/>
<album band="classix nouveaux" title="Night People" release_date="19820501"/>
<album band="english beat" title="I Just Can't Stop It" release_date="19800501"/>
</albums>
</collection>
Once the XML is in a standard form, then it becomes easier to navigate and search:
require 'nokogiri'
doc = Nokogiri::XML(<<EOT)
<collection>
<albums>
<album band="aldo nova" title="aldo nova" release_date="19820401"/>
<album band="classix nouveaux" title="Night People" release_date="19820501"/>
<album band="english beat" title="I Just Can't Stop It" release_date="19800501"/>
</albums>
</collection>
EOT
doc.search('album').last['title'] # => "I Just Can't Stop It"
band = 'aldo nova'
doc.search("//album[#band='#{band}']").map { |a| a['title'] } # => ["aldo nova"]
and searching for dates becomes more straightforward because it's not necessary to find the parent of the node:
date_range = 19800501..19820401
selected_albums = doc.search('album').select { |a| date_range.include?(a['release_date'].to_i) }
selected_albums.map(&:to_xml)
# => ["<album band=\"aldo nova\" title=\"aldo nova\" release_date=\"19820401\"/>",
# "<album band=\"english beat\" title=\"I Just Can't Stop It\" release_date=\"19800501\"/>"]
I'd recommend reading some tutorials on XML itself as it's easy to paint ourselves into corners if the data isn't represented logically and correctly.

Working with XML body from Soap call with Nokogiri in Ruby

I'm writing a Ruby script to make a Postman SOAP POST call, then using Nokogiri to to parse the XML response. When I take the full SOAP call response from Postman, copy it into my editor and manually take the XML body and decode it and format it online I'm able to use the following Nokogiri script successfully:
doc = Nokogiri::XML(File.open("response.xml"))
property_ids = []
doc.css('Property').each do |property|
puts "Property ID: #{property['PropertyId']}"
property_ids << property['PropertyId']
end
property_ids.each_with_index do |property_id, index|
puts "index: #{index}"
puts "property id: #{property_id}"
end
Where I run into the problem is when I want to include in the script the Ruby snippet of the Postman call:
require 'nokogiri'
require 'uri'
require 'net/http'
require 'openssl'
url = URI("https://esite.thelyndco.com/AmsiWeb/eDexWeb/esite/leasing.asmx")
http = Net::HTTP.new(url.host, url.port)
http.use_ssl = true
http.verify_mode = OpenSSL::SSL::VERIFY_NONE
request = Net::HTTP::Post.new(url)
request["content-type"] = 'application/soap+xml'
request["cache-control"] = 'no-cache'
request["postman-token"] = '916e3f3d-11ca-e8cf-2066-542b009a281d'
request.body = "<?xml version=\"1.0\" encoding=\"utf-8\"?>\r\n<soap12:Envelope xmlns:xsi=\"http://www.w3.org/2001/XMLSchema-instance\" xmlns:xsd=\"http://www.w3.org/2001/XMLSchema\" xmlns:soap12=\"http://www.w3.org/2003/05/soap-envelope\">\r\n <soap12:Body>\r\n <GetPropertyList xmlns=\"http://tempuri.org/\">\r\n <UserID>updater</UserID>\r\n <Password>[password]</Password>\r\n <PortfolioName>[portfolio name]</PortfolioName>\r\n <XMLData> \r\n</XMLData>\r\n </GetPropertyList>\r\n </soap12:Body>\r\n</soap12:Envelope>"
response = http.request(request)
doc = Nokogiri::XML(response.body)
# doc = Nokogiri::XML(File.open("full-response.xml"))
# doc.at('GetPropertyListResponse').text
What I want to do is take the full SOAP response with the SOAP envelope and be able to process it in my script without having to cut and paste; manually decoding and formatting using online XML formatters.
Commented out are a couple of lines that I tried from Stack Overflow. Is it possible to decode and format the XML body with Nokogiri or to parse out the SOAP envelope?
edit:
By decoding the XML I mean taking:
<GetPropertyListResult><Properties><Property PropertyId="11A" PropertyName1="1111 Austin Hwy" PropertyName2="" PropertyAddrLine1="The 1111" PropertyAddrLine2="1111 Austin Highway" PropertyAddrLine3="" PropertyAddrLine4="" PropertyAddrCity="San Antonio" PropertyAddrState="TX" PropertyAddrZipCode="78209" PropertyAddrCountry="" PropertyAddrEmail="" RemitToAddrLine1="The 1111" RemitToAddrLine2="1111 Austin Highway" RemitToAddrLine3="" RemitToAddrLine4="" RemitToAddrCity="San Antonio" RemitToAddrState="TX" RemitToAddrZipCode="78209" RemitToAddrCountry="" LiveDate="2013-12-04T00:00:00" MgrOffPhoneNo="210-804-1100" MgrFaxNo="" MgrSalutation="" MgrFirstName="" MgrMiName="" MgrLastName="" MonthEndInProcess="N"><Amenity PropertyId="11A"
and decoding it into using this online XML decoder:
<GetPropertyListResult><Properties><Property PropertyId="11A" PropertyName1="1111 Austin Hwy" PropertyName2="" PropertyAddrLine1="The 1111" PropertyAddrLine2="1111 Austin Highway" PropertyAddrLine3="" PropertyAddrLine4="" PropertyAddrCity="San Antonio" PropertyAddrState="TX" PropertyAddrZipCode="78209" PropertyAddrCountry="" PropertyAddrEmail="" RemitToAddrLine1="The 1111" RemitToAddrLine2="1111 Austin Highway" RemitToAddrLine3="" RemitToAddrLine4="" RemitToAddrCity="San Antonio" RemitToAddrState="TX" RemitToAddrZipCode="78209" RemitToAddrCountry="" LiveDate="2013-12-04T00:00:00" MgrOffPhoneNo="210-804-1100" MgrFaxNo="" MgrSalutation="" MgrFirstName="" MgrMiName="" MgrLastName="" MonthEndInProcess="N"><Amenity PropertyId="11A"
then running it through an XML formatter so that nested elements are indented for legibility.
You can use this code to decode and format the XML:
require "nokogiri"
XML_CHAR_ENTITIES = {
"lt" => "<",
"gt" => ">",
"amp" => "&",
"num" => "#",
"comma" => ","
}
xsl =<<XSL
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" encoding="UTF-8" indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:template match="/">
<xsl:copy-of select="."/>
</xsl:template>
</xsl:stylesheet>
XSL
xml = '<GetPropertyListResult><Properties><Property PropertyId="11A" PropertyName1="1111 Austin Hwy" PropertyName2="" PropertyAddrLine1="The 1111" PropertyAddrLine2="1111 Austin Highway" PropertyAddrLine3="" PropertyAddrLine4="" PropertyAddrCity="San Antonio" PropertyAddrState="TX" PropertyAddrZipCode="78209" PropertyAddrCountry="" PropertyAddrEmail="" RemitToAddrLine1="The 1111" RemitToAddrLine2="1111 Austin Highway" RemitToAddrLine3="" RemitToAddrLine4="" RemitToAddrCity="San Antonio" RemitToAddrState="TX" RemitToAddrZipCode="78209" RemitToAddrCountry="" LiveDate="2013-12-04T00:00:00" MgrOffPhoneNo="210-804-1100" MgrFaxNo="" MgrSalutation="" MgrFirstName="" MgrMiName="" MgrLastName="" MonthEndInProcess="N"><Amenity PropertyId="11A"></GetPropertyListResult>'
xml = xml.gsub(/&(\w+);/) do |match|
char_entity = XML_CHAR_ENTITIES[$1]
char_entity ? char_entity : match
end
doc = Nokogiri::XML(xml)
xslt = Nokogiri::XSLT(xsl)
xml = xslt.transform(doc)
puts "#{xml}"
The XML provided was incomplete, so this terminating string was appended to allow it to be parsed: ></GetPropertyListResult>
The XML_CHAR_ENTITIES provides a hash of encoded strings to decoded strings, and can be easily extended to include other XML character entities, such as those documented at the W3 Character Entity Reference Chart.
XSL is an embedded stylesheet that is used to format the XML for output with Nokogiri.
Decoding the XML character entities is done with the String#gsub call using the block option. The XML is then successfully parsed by Nokogiri. Once the XML is parsed, it is formatted using Nokogiri XSLT transformation.
The output of this code is:
<?xml version="1.0" encoding="UTF-8"?>
<GetPropertyListResult>
<Properties>
<Property PropertyId="11A" PropertyName1="1111 Austin Hwy" PropertyName2="" PropertyAddrLine1="The 1111" PropertyAddrLine2="1111 Austin Highway" PropertyAddrLine3="" PropertyAddrLine4="" PropertyAddrCity="San Antonio" PropertyAddrState="TX" PropertyAddrZipCode="78209" PropertyAddrCountry="" PropertyAddrEmail="" RemitToAddrLine1="The 1111" RemitToAddrLine2="1111 Austin Highway" RemitToAddrLine3="" RemitToAddrLine4="" RemitToAddrCity="San Antonio" RemitToAddrState="TX" RemitToAddrZipCode="78209" RemitToAddrCountry="" LiveDate="2013-12-04T00:00:00" MgrOffPhoneNo="210-804-1100" MgrFaxNo="" MgrSalutation="" MgrFirstName="" MgrMiName="" MgrLastName="" MonthEndInProcess="N">
<Amenity PropertyId="11A"/>
</Property>
</Properties>
</GetPropertyListResult>

Should Nokogiri::XML.parse be creating separate Text nodes for linefeeds?

I have an XML document created by an outside tool:
<?xml version="1.0" encoding="UTF-8"?>
<suite>
<id>S1</id>
<name>First Suite</name>
<description></description>
<sections>
<section>
<name>section 1</name>
<cases>
<case>
<id>C1</id>
<title>Test 1.1</title>
<type>Other</type>
<priority>4 - Must Test</priority>
<estimate></estimate>
<milestone></milestone>
<references></references>
</case>
<case>
<id>C2</id>
<title>Test 1.2</title>
<type>Other</type>
<priority>4 - Must Test</priority>
<estimate></estimate>
<milestone></milestone>
<references></references>
</case>
</cases>
</section>
</sections>
</suite>
From irb, I do the following: (Output suppressed until final command)
> require('nokogiri')
> doc = Nokogiri::XML.parse(open('./test.xml'))
> test_case = doc.search('case').first
=> #<Nokogiri::XML::Element:0x3ff75851bc44 name="case" children=[#<Nokogiri::XML::Text:0x3ff75851b8fc "\n ">, #<Nokogiri::XML::Element:0x3ff75851b7bc name="id" children=[#<Nokogiri::XML::Text:0x3ff75851b474 "C1">]>, #<Nokogiri::XML::Text:0x3ff75851b1cc "\n ">, #<Nokogiri::XML::Element:0x3ff75851b078 name="title" children=[#<Nokogiri::XML::Text:0x3ff75851ad58 "Test 1.1">]>, #<Nokogiri::XML::Text:0x3ff75851aa9c "\n ">, #<Nokogiri::XML::Element:0x3ff75851a970 name="type" children=[#<Nokogiri::XML::Text:0x3ff75851a6c8 "Other">]>, #<Nokogiri::XML::Text:0x3ff7585191d8 "\n ">, #<Nokogiri::XML::Element:0x3ff7585190d4 name="priority" children=[#<Nokogiri::XML::Text:0x3ff758518d64 "4 - Must Test">]>, #<Nokogiri::XML::Text:0x3ff758518ad0 "\n ">, #<Nokogiri::XML::Element:0x3ff7585189a4 name="estimate">, #<Nokogiri::XML::Text:0x3ff758518670 "\n ">, #<Nokogiri::XML::Element:0x3ff758518558 name="milestone">, #<Nokogiri::XML::Text:0x3ff7585182b0 "\n ">, #<Nokogiri::XML::Element:0x3ff758518184 name="references">, #<Nokogiri::XML::Text:0x3ff758517ef0 "\n ">]>
This results in a number of children that look like the following:
#<Nokogiri::XML::Text:0x3ff758517ef0 "\n ">
I want to iterate through these XML nodes without having to do something like:
> real_nodes = test_case.children.reject{|n| n.node_name == 'text' && n.content.strip!.empty?}
I couldn't find a parse parameter in the Nokogiri docs to suppress the treating of newlines as separate nodes. Is there a way to do this during the parse instead of after?
Check the documentation. You can just do this:
doc = Nokogiri::XML.parse(open('./test.xml')) do |config|
config.noblanks
end
That will load the file without any empty nodes.
The text nodes are the result of pretty-printing the XML. The spec doesn't require whitespace between tags, and, for efficiency, a huge XML file could be stripped of inter-tag whitespace to save space and reduce transfer time, without sacrificing the data content.
This might show what's happening:
require 'nokogiri'
xml = '<foo></foo>'
Nokogiri::XML(xml).at('foo').child
=> nil
With no whitespace between the tags there's no text node either.
xml = '<foo>
</foo>'
Nokogiri::XML(xml).at('foo').child
=> #<Nokogiri::XML::Text:0x3fcee9436ff0 "\n">
doc.at('foo').child.class
=> Nokogiri::XML::Text
With whitespace for pretty-printing, the XML has a text node following the foo tag.

trying to get content inside cdata tags in xml file using nokogiri

I have seen several things on this, but nothing has seemed to work so far. I am parsing an xml via a url using nokogiri on rails 3 ruby 1.9.2.
A snippet of the xml looks like this:
<NewsLineText>
<![CDATA[
Anna Kendrick is ''obsessed'' with 'Game of Thrones' and loves to cook, particularly creme brulee.
]]>
</NewsLineText>
I am trying to parse this out to get the text associated with the NewsLineText
r = node.at_xpath('.//newslinetext') if node.at_xpath('.//newslinetext')
s = node.at_xpath('.//newslinetext').text if node.at_xpath('.//newslinetext')
t = node.at_xpath('.//newslinetext').content if node.at_xpath('.//newslinetext')
puts r
puts s ? if s.blank? 'NOTHING' : s
puts t ? if t.blank? 'NOTHING' : t
What I get in return is
<newslinetext></newslinetext>
NOTHING
NOTHING
So I know my tags are named/spelled correctly to get at the newslinetext data, but the cdata text never shows up.
What do I need to do with nokogiri to get this text?
You're trying to parse XML using Nokogiri's HMTL parser. If node as from the XML parser then r would be nil since XML is case sensitive; your r is not nil so you're using the HTML parser which is case insensitive.
Use Nokogiri's XML parser and you will get things like this:
>> r = doc.at_xpath('.//NewsLineText')
=> #<Nokogiri::XML::Element:0x8066ad34 name="NewsLineText" children=[#<Nokogiri::XML::Text:0x8066aac8 "\n ">, #<Nokogiri::XML::CDATA:0x8066a9c4 "\n Anna Kendrick is ''obsessed'' with 'Game of Thrones' and loves to cook, particularly creme brulee.\n ">, #<Nokogiri::XML::Text:0x8066a8d4 "\n">]>
>> r.text
=> "\n \n Anna Kendrick is ''obsessed'' with 'Game of Thrones' and loves to cook, particularly creme brulee.\n \n"
and you'll be able to get at the CDATA through r.text or r.children.
Ah I see. What #mu said is correct. But to get at the cdata directly, maybe:
xml =<<EOF
<NewsLineText>
<![CDATA[
Anna Kendrick is ''obsessed'' with 'Game of Thrones' and loves to cook, particularly creme brulee.
]]>
</NewsLineText>
EOF
node = Nokogiri::XML xml
cdata = node.search('NewsLineText').children.find{|e| e.cdata?}

Resources