Replace image src in vml markup with globally available images using Nokogiri - ruby

Is it possible to find outlook specific markup via Capybara/Nokogiri ?
Given the following markup (erb <% %> tags are processed into regular HTML)
...
<div>
<!--[if gte mso 9]>
<v:rect
xmlns:v="urn:schemas-microsoft-com:vml" fill="true" stroke="false"
style="width:<%= card_width %>px;height:<%= card_header_height %>px;"
>
<v:fill type="tile"
src="<%= avatar_background_url.split('?')[0] %>"
color="<%= background_color %>" />
<v:textbox inset="0,0,0,0">
<![endif]-->
<div>
How can I get the list of <v:fill ../> tags ? (or eventually how can I get the whole comment if finding the tag inside a conditional comment is a problem)
I have tried the following
doc.xpath('//v:fill')
*** Nokogiri::XML::XPath::SyntaxError Exception: ERROR: Undefined namespace prefix: //v:fill
DO I need to somehow register the vml namespace ?
EDIT - following #ThomasWalpole approach
doc.xpath('//comment()').each do |comment_node|
vml_node_match = /<v\:fill.*src=\"(?<url>http\:[^"]*)"[^>]*\/>/.match(comment_node)
if vml_node_match
original_image_uri = URI.parse(vml_node_match['url'])
vml_tag = vml_node_match[0]
handle_vml_image_replacement(original_image_uri, comment_node, vml_tag)
end
My handle_vml_image_replacement then ends up calling the following replace_comment_image_src
def self.replace_comment_image_src(node:, comment:, old_url:, new_url:)
new_url = new_url.split('?').first # VML does not support URL with query params
puts "Replacing comment src URL in #{comment} by #{new_url}"
node.content = node.content.gsub(old_url, new_url)
end
But then it feels like the comment is actually no longer a "comment" and I can sometimes see the HTML as if it was escaped... I am most likely using the wrong method to change the comment text with Nokogiri ?

Here's the final code that I used for my email interceptor, thanks to #Thomas Walpole and #sschmeck for help along the way.
My goal was to replace images (linking to localhost) in VML markup with globally available images for testing with services like MOA or Litmus
doc.xpath('//comment()').each do |comment_node|
# Note : cannot capture beginning of tag, since it might span across several lines
src_attr_match = /.*src=\"(?<url>http\:[^"]*)"[^>]*\/>/.match(comment_node)
next unless src_attr_match
original_image_uri = URI.parse(src_attr_match['url'])
handle_comment_image_replacement(original_image_uri, comment_node)
end
WHich is later calling (after picking an url replacement strategy depending on source image type) :
def self.replace_comment_image_src(node:, old_url:, new_url:)
new_url = new_url.split('?').first
node.native_content = node.content.gsub(old_url, new_url)
end

Related

Parse quoted-printable encoding content from .mht file

I am trying to get all the images from .mht file by using Nokogiri gem. But since the .mht file has quoted-printable encoding, all the images that I received, has weird characters in it:
<img alt='3D"AFC-Logo' src="3D%22https://upload.=" width='3D"75"' height='3D"75"'>
<img src="3D%22https://en.wikipedia.org/static/images/footer/wikimedia-butto=" width='3D"88"' height='3D"31"' alt='3D"Wikimedia'>
<img src="3D%22https://en.wikipedia.org/static/images/footer/poweredby_mediawiki_8=" alt='3D"Powered' width='3D"88"' height='3D"31"'>
This is the link to that .mht file: https://drive.google.com/file/d/1DtbgrFyCEcggAk1nqpZSluNhRt-k3t95/view?usp=sharing
And below is the code that I am using to get all the images from the .mht file:
html = File.open("1646037951.mht").read
image_links = get_image_links(html)
def get_image_links(html)
html_doc = Nokogiri::HTML(html)
nodes = html_doc.xpath("//img[#src]")
raise "No <img .../> tags!" if nodes.empty?
nodes.inject([]) do |uris, node|
puts node.to_s
uris << node.attr('src').strip
end.uniq
end
I have tried to parse it by using .unpack('M').first but it's still not working as it just returns the same result as above.
Or maybe Rails have something for this?

How to delete tags with Nokogiri in SVG file

I have an SVG document that has image tags inside it. I am trying to parse it and remove all image tags with Nokogiri and then rewrite it to another file:
doc = File.open("cartes_des_risques.svg") { |f| Nokogiri::XML(f) }
doc.search('//image').each do |node|
node.remove
end
file = File.open("formmatted_map.svg", "w")
file.write(doc)
However doc.search('//image') returns an empty array.
The doc I am trying to modify is a big SVG file that embedds image tags that look like this:
<image a:adobe-opacity-share="1" width="338" height="338" xlink:href="data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAVYAAAFWCAYAAAAyr7WDAAAACXBIWXMAAC4jAAAuIwF4pT92AAAA
GXRFWHRTb2Z0d2FyZQBBZG9iZSBJbWFnZVJlYWR5ccllPAAAEelJREFUeNrs3dGu5CgShOHi1Lz/
K7Oau92VRprutiEy8wupr/qUDQnxO8GAPx8iIiIiIiIiIiIiIiIiIiIiIiIiIiIiIiIiIiIiIiIi
qqQlBDSov21NQMBK+hIAEzOQ/tJagEuMQvoG4BLzkH4AtMRQpN0JaInBtDMBLTEcaVuQJeYj7UlA
y4ikDQlkiSm1GwEsMShpLwJZRiVtRABLTKttCGSJeUmbEMAyMWkHAlhiaPEngCXGFncigGVwEm8C
WGJ0sSaAJWYXYyKAZXoSWwJY5icxJXAlEBBPIoAFAnEkAlhAIPErG0NgEENgEDsxAg8x0vHFjHrH
A0zEgynES/3BRQwYRazUGWTUm3HESF0JcMCVkcRHHcFHHZlKbNQNgNSNwcREvQiIwJXZRsVD+wKS
+jCeWKgLwKoLA4qDOhAogSsztouBNgRZ5WdKdVd+GgSokXCdas6l7ARSys2k6mwZGENXrtcouE4D
61Jm5W9k9q28Or26Ki8AKO8IuE4B61JW/WYADJSTQdSxUDm9MKsDBnBlGPULLieY1obEFjfmMaRW
HpCdCa+WcLWERxnBtD80wJWp2tdpKQPIAlhvuDomb075wBRAwJXZWtUFUCkJJODKdOXrssTj9fLb
QdULYuXh2gWsoCpzl62Bq47evA7TgVqxX+3BZQFXBgDVsBh0ncOd+BbfRgemGA9Vc7dzDL6H1BNY
lftamczbGq6DK0C1L/dyL5BteC9wHWaeiVBd+gPIXrgPuA4x0jSoAirAgiuwgmqhe4ApwIIrc405
DWoqUNcA8+2i1wbXpmAF1foxXExZHrDg2gisoFovftWmGbrt1Z8AV2AF1fZA7TZf22GNKrgCV3T5
qkHVSoKaBt6u2weuTkICVTDNMnKl7NVXEYoZBVR7PwBA9vy1wRVY277BXkPLCbDgOgqujt2rO0xf
Q/oDyD57za5wBdaB2eoaVL6b5ei8zhJcC7V/GlhBtcdUQmobd/j0yQ6PBbh+fM5jKlQtyTpvxMSM
syNcgTWoLMlQrQTUDnO0VZZTJcN1fNY6ZbvodKhaPZBn0CQwdoMrsDYEaxIMrRzINiu4NoVr94X4
oAqoFUy7G10DXD+zj5LrCtXVqH9MAmxXuALroPsnQjUpS63yUrPjutUdUA5wlTG2yOpSstSuRw1W
+4oquAKrbLUBVCcuw0pfVtUNrqOy1i5wA9Wsdqg4N5u6WH5frlMHuALroCmAVfTenWBaZRgMrsXu
O2mvOKjmxWEaZHfB3z4dB2AF1lZQ7bhioKIxb2WvO6T+I+B62iTTs9XKULWmNScDBFdgbZUhVwNj
daCuUIMmgKo6XFtnrRM+kVx9CmAVK2/aPSp8YXUX+V0HuAIrqF6BavclWKknVlWCq6wVWMtOAawi
5UwH6WmTgaustfVJ8qBaL7PvBNnT0ALXoHv+9aHJULVa4J/rsh+61v7N3+4Dv6HCmeTEbHUCVCct
v6r0cmofrl/VrPXV+3UEa9UpgOTfTAVqAmCnwLXVdEDHU41uD2+7QjX9RWeVowE7wlXW2hysFacA
Tvxm8vKrxOVVqaCUtQJrXLbaDapdl18lfddph/29rLVAZ5et1oTqtNOvbgMBXIFVtloIkqtYDLtA
NnHIDq6H7/fzoRsgWYF//8RqgdWgLZf+QKlgnZStdoDqdKA+Xaf0F5Jpo7d2DwIZ62xI/ik81oD2
XQd/++bf355v7zbqvBK4aoe7JL6wSjPZybZNXcfa4QXVDq3zjfZ87X4dPip3C6zToFqxryR9ebUy
LPel2FfZ9NEOrB2h+iYopy+/Sji9ahf7W1nrb8gca566QXV9cuZjnyzLOhDTlL7Q4Vt1pQtebRqg
8hRAupEqmSL9cJWKWW7FrFXGWsS406CalJ2eyD5PPaAXv9Wq4+SpgKrLTxKh2mXpVfLyqhXQPwD9
AliXYJc1zZ/MF3bMZP4UsLf7ctIyypG7vqZmrKt4Z03IWk4/SNeFqYb1yZqH77A8ckTStoZWvuLb
3JShYAVzpxwNePtFUtKLrFEvsSp+THA1uudUqJ7e574fvOb+xd/si3/rA4PFpwIqzaOkHF4xDao3
Vww8vX41YaRz0xeroW8jwdpZa3h9nliKtMLqf7pOVaaG+Oihsv40h4Js9Vy2lQ7Utx4aFeDaLbuW
scpWy0H6T+s+bYNABRjJWguC1ZPrXpaRBNXJXxCoMNTvsMGgTP+alLF6wfYuVKf3lQpDfTrUp3+m
VLShURIykM5fEagE10ovvEY8LP76UEVDp0A1zXT7pfK8tXb16bJOuG+Jtbk/l43QIdtNHdZVgeob
Z6Suz91dhckZpOmFAmCdnDWmZ6tvAbjS4dcVNgakQtN0wACwdnjbnHi/02+Ab87JPnHvm3Bd+ned
eslYa2UIVeZfk4D6dFmWPgLkwHr/pczkbHuFm3PSF2upCFind4QJmciE3VdJO65WYF+pksDIWDXA
1TidgOqUh+SEvfX8ZyogrhOlZReLCR/PslP7DK8Ca7ngr+b1nLyt9c26p31lw/uEl+okY53zcHjT
/CfXo55a/5oOvA4f72ybedvSWj9juL28ZoXE7o3trBW2slYr0wilZ6zTP6m9hrTPE9noGxnthA0B
5muDwDr5k7mrWB0Tvy7w/zB8c/ogFa6SgcYPBXOstacBbnXQSjuxbmxlTeojVVaUyFipzNM4cYdQ
1c+T+B7UsKwTWHXICsBI2I3VcZncahiX8l4D1l8LXKVhVdIaxcTPX1fJWkFPxmqIUCyDWIPbMWXL
bqU28K2u4Rlr94at8nXMrsvF0gAjqwVWCjGaw1jqlHOJQ61yA6uOoI73zwdwBKeMdew35FOGdSu8
zoal4GUqQAjadV4Gqpm1Jjxw9S9gpcKg8WA0EgRWktEOMtLkT4zIRpuDdQ3tHEvH91Cl+u0nY2Ve
WVud+qym9ZKxAhZdiv/JLwhUeaDpp6HyBQEPiOSHzvoX/+eEfJKxghb9QoZ6M+5LH+EfYNWYXWJi
U4I+Ur79Zax9OvNq0KErrwYBdwJWYGyZBVU80o+AlcgwVfwIWKmCYWSaRjbASuDL4ETASvMyaeAn
YCUiAlYiIgJWIiJgJSICVqJf1i52XSJgpddgBFxEwAqag8q2G7eTBxqwEh2HIfCIH7DqsDr0gzGQ
+RKwUnkw76Br7vD2IWAFKzEpVQbtONs3ce0vYwXp5HjuB/6mQzbOP8XkY4LzOu8qcE0wIlMBnqak
f0RcRz8FVmpk8C2OJcoFvMPBuod2HkuyPPSoQfvJWOeZy9582Tr4mwqgQibbA+sMaiANrDJSuvAg
2QfKt/Wv2mCd+mRN6bg7vM6M56EPrEKgg6vj9Wx18wSwUo+O+namvpm6VBl3474OrJ6AkfXrCte3
62Xe01QAOAaVu5Mhd2gbpTwsKrXBbu5dGeuljrcHmumJ6+0gs53afOLgbmClwdMBJ2HhzNTM0Q14
PxwfYDWlcPqaN7LX/Tn7UACq4bEC1nudZRfu3E9Bfx+oV+UHVFLZPSx+Qd8/+O06UL518PdP/+0T
f7MOluV34r3C2jwhQz8JoNQXg+Oz3fSDrt8+RPnWvSqX6e3y7X8J2sQXgBU3DchETQWYDnj5Polb
WPc//KsK1a7ZIwFrO0Cfzqa6G3hfvnbaHL2jJoF1ZIdIWLO6m/SX9FOu0kZM3UH5W3WSsd7vaCcN
8PaSoT2w7VP7DK/KWHWi0KH+78J1DzHphLWt/HcYrI46y8tadxAQKgD21JkBlfvKW/2kNT8mTAVo
8LtZSyJg9yfvzAD9DljpBTMkZ61P1DUBsKe3tt7obwln8Vby3Wiw+kjdO/e7uZX15AL4W1tbp3xl
oGu2/dv1krFmdZRpc2NvQPaNjQT7cxeqN/rPbuivY/o+cI1TWy7X4d+/cc5A8t7/Fdz21R6y+9Lf
kakACss67LjKjEn6dEKVzwQBa8EGTB32geu5of+JWKfB3DSAjLVUg94cIk7ZFPB0TLs8iOmBeP1o
sBYwfPqenTcFvFH2m1C9ma2aBmiSsU5qyNsG7LYp4K2yVtjWuof695q+D13n5Nvh5NUBv/L36X/3
dPsmrSBIXtMqW20gqwL6dKD081ZPbwx48/6VoCpbvVDW74OFkbXWykaT1q2uIkZOWSlwu24V3hO0
mAroDtY3oVUNrkltnX6K/m1YGa4Dayuw3gZhN7hWHBbe/nrFW1C31vlf6GdYZ0/sHAnDuulrV5+u
X4VzBSpBr9z3vX6GNVBqZ0jYRTVx7eobQE2AalISMJILk1cFVO0kiXD9b6hM3CCQ1J+6LZkq2Z++
D1/v9Lxbp7nWlL99si3XENNWfLEl83yxjt8XClfpJRa4nmvL1c08h7LUpJdQYAysZcGaBte32vRU
P6k835gC1elbcceBVdZ65u9Ptu0KN2HaSoFUAE35BPoxsE7JWqfCNWlYX8V4SRmibPWAfoZ3+E5P
fKdXvdcOicuvkjPb8UsvHcLSC377gPGnALbC8is7pkIfBN8XC2uutdYwf8LyqlMGS4PqyVh0HUEC
64VrdIHrk+1WBbIJnz9JnvaR4Q4HazW4rtB7vNV23Q667gzV3axtjt/v+3KhZa114fp2+60mZj05
ZE6Gqmy1MVjBtR5gn7pXpUwnNYu8Dcc2cAbWDAhVgGvacL6q+Xfwb6YB8bVyfw8UHlzzQAmwNYDa
FcTtpxKsY609tLw1HzZlg8BTRyF2hqps9VLGKmvNHuJPW1510nAV5mEToNruIQ2s4PpG+1SD7A66
nrfyDbSa3msqXNMAmwraHXrtXeR31bPV1+/5PViZqWCtBtdTbbU6GelTax7WFACwguvFet/MNFcR
c+6LvzdtEFr3b6hZUu+5hv02dSjfxaAVoSpbBVZwDcvawbTmtEEXGB4r//dC5UwJ1Lz3ZMhWXoKV
CNX2UxHACq4ge8b8FediS2eNN+/7vVRJcK3/+46Q3aHXvQ1Va2QHAg5cs9siGbQ7/PoJw/cOUwDH
7/u92Km7QL1L5tl5s0jFw5NBtbAmgjUta03LPJMyzFXMrNVXDXQE4ZV6fMONMwmuSdlr+hC+q3F3
o2uMzVYTzLMa3TspY5x+qErFTAhUm2SrCRkruNaBK8BmfYTQ8F/GCq4NrjMVsjv8uslbTEdmq8A6
B64nYtwJtLvI9UE1NPP+BnXm1ezea8C1uoB2F7wPqAYLWGfC9Va8E2Bb/WBla0sL1OkbFhRwrV2+
hLJ03hdfAapegIUO18AVYGVNoFoa6sAKrgA7C1agOnAqoHPWWg2wIHvf0KBqKkDW2nwoD7LnQFLp
Tf0e0B6tzAKuuTGbCNld+B6gaioAXIvFrStkd4N77SbxMRXQtHxVh9lLn4oHBag2hHsFE4Br/Rim
9bPuQ1hQNRUAro2un1SuVONVhp7hf8Nh26QDY7zh76Xd4B6+9Aqs4AqygAqqYAWuAAumoNoCqn/r
W7DjrqFlsYQKUE/fy5zqMAOtoeWxfGouTLvCuyXkgVVGCbQ1zA2qAAWug2LQGbQTt2yC6gP6Fu/4
4JoXBzuvAHW8ljoA7LCypQNki0l92HcZxoFrr3ZdUwwYUAdQZUBwHdy+HTX6E9GdH5TAOqd8IAsc
oMps4KrNAXUYtNq8QOtoMkchgixYgCpjgavygOloYLVb6tXZTEs59YvhcABVBgJXZQPSYbBquylh
gmmcOavPyP6UlUnUU3kZH1QBR12VuW75q5vcA0DHV1/tdLRenU28lZlh1VnZaTacRp2eNdWohtgE
TMrOoOqv/QBV+RlTDNSBpsNo7OHZDNkvDtoUiNSFCcVCfcBUfZhPPNSLAAhUGU5ctDnwqBuTiY06
Ao76MZb4qCuNAw2oMpM4qTO4qDfziJX6A4r6M4uYiQWIiAeBhLiJEWCID0CInxgCAqgCgxgSASoo
kFgSqBIYiCcRoAKBuBKBKgCQ+BKgEuOLMxGgMrxYEwEqs5OYE6ASk4s9ASoxtzYgAlSmJu1BgMrI
pF0IUImBtQ8RmDIuaSsCVGYlbUZgSkyq/QhMiTFJWxKYMiNpVwJTYkDSzmBKDEfaHEiJyUgfICAl
piL9AkiJgUhfAVBiFtKPwJMYgmhKfwNNIiIiIiIiIiIiIiIiIiIiIiIiIiIiIiIiIiIiIiIiIiL6
H/1HgAEArVlNIehstlUAAAAASUVORK5CYII=" transform="matrix(0.24 0 0 0.24 839.2388 2258.6909)">
</image>
How can I modify this SVG file with Nokogiri?
Take a close look at the <image> tag. The SVG content is part of the value for the xlink:href parameter. If you delete that tag, you delete the SVG data, and end up with a document that doesn't contain the useful information, so that's probably not what you want to do.
Here's untested code showing how I'd read the file, remove the node, then write the content:
doc = Nokogiri::XML(File.read('cartes_des_risques.svg'))
doc.search('//image').remove
file.write('formmatted_map.svg', doc)
But, again, removing the node will result in an file with no image information:
require 'nokogiri'
doc = Nokogiri::XML('<image/>')
doc.search('//image').remove
doc.to_xml # => "<?xml version=\"1.0\"?>\n"

How to read parameter from rb file in erb file

I'm using rhoMobile platform
I'm trying to get a parameter in my erb file from rb file.
I have a properties file, in my app.rb file i'm getting values from keys in this properties file.
This value is saved in application.rb, and i want to use this value in my app.erb.
Here is some code:
myFunc(<%= Rho::RhoConfig.getValue %>)
I am not going to question if your doing things right, but this should work:
myFunc("<%= Rho::RhoConfig.getValue %>")
Try this:
<script type="text/javascript" charset="utf-8">
var rho_config_value = <%= Rho::RhoConfig.getValue || 'null' %>;
myFunc(rho_config_value)
</script>
myFunc('<%= Rho.get_app.getValue('key')%>')

gh-pages - Jekyll fails to build - "did not find expected node content while parsing"

I'm trying to get a gh-pages site up and running. First time using Jekyll.
I have a super basic layout (default.html) in /_layouts:
<!doctype html>
<html>
<head>
<meta charset="utf-8">
</head>
<body>
<div class="wrapper">
<section id="main">
{{ content }}
</section>
</div>
</body>
</html>
And a single content page (index.html)
---
layout: default
---
Hello World
My _config.yml file is simply
pygments: true
When running jekyll --no-auto --server I get the following error. No files are generated.
.rvm/rubies/ruby-1.9.3-p327/lib/ruby/1.9.1/psych.rb:203:in `parse':
(<unknown>): did not find expected node content while parsing a flow
node at line 3 column 1 (Psych::SyntaxError)
Anyone know what's wrong here?
Since line 3 is <head>, it is possible that some basic metadata is missing, like <title>.
All template I see have a title (zinga, Symplicity, ... either fixed or generated), and the most basic template has one too (see "Hello World, I'm Jekyll")
<html>
<head>
<title>Hello world!</title>
</head>
<body>
<h1>Hello world!</h1>
<p>This is my first Jekyll website.</p>
</body>
</html>
You should check that what it's parsing is YAML at all.
The way I'm checking this in by putting some debug commands in the gem directly and re-running.
Change the psych.rb which for me is at /home/user/.rbenv/versions/2.0.0-p0/lib/ruby/2.0.0/psych.rb. Look for the def self.load and change it from
def self.load yaml, filename = nil
result = parse(yaml, filename)
result ? result.to_ruby : result
end
to
def self.load yaml, filename = nil
puts "****************#{filename}"
result = parse(yaml, filename)
result ? result.to_ruby : result
end
and look for the output in your terminal when you re-run the command.
I am currently dealing with deploying a rails app with capistrano (no jekyll at all). In my case, the output was blank, which is obviously not a filename. So now I'm investigating further up the chain. I hope that gets you started.

How to use rake to insert/replace html section in each file?

I'm using rake to create a Table of contents from a bunch of static HTML files.
The question is how do I insert it into all files from within rake?
I have a <ul id="toc"> in each file to aim for. The entire content of that I want to replace.
I was thinking about using Nokogiri or similar to parse the document and replace the DOM node ul#toc. However, I don't like the idea that I have to write the parser's DOM to the HTML file. What if it changes my layouts/indents etc.??
Any thoughts/ideas? Or perhaps links to working examples?
Could you rework the files to .rhtml, where
<ul id="toc">
is replaced with an erb directive, such as
<%= get_toc() %>
where get_toc() is defined in some library module. Write the transformed files as .html (to another directory if you like) and you're in business and the process is repeatable.
Or, come to that, why not just use gsub? Something like:
File.open(out_filename,'w+') do |output_file|
output_file.puts File.read(filename).gsub(/\<ul id="toc"\>/, get_toc())
end
I ended up with an idea similar to what Mike Woodhouse suggested. Only not using erb templates (as I wanted the source files to be freely editable also by non ruby-lovers)
def update_toc(filename)
raise "FATAL: Requires self.toc= ... before replacing TOC in files!" if #toc.nil?
content = File.read(filename)
content.gsub(/<h2 class="toc">.+?<\/ul>/, #toc)
end
def replace_toc_in_all_files
#file_names.each do |name|
content = update_toc(name)
File.open(name, "w") do |io|
io.write content
end
end
end
You can manipulate the document directly and save the resulting output. If you confine your manipulations to a particular element, you won't alter the overall structure and should be fine.
A library like Nokogiri or Hpricot will only adjust your document if it's malformed. I know that Hpricot can be coached to have a more relaxed parsing method, or can operate in a more strict XML/XHTML manner.
Simple example:
require 'rubygems'
require 'hpricot'
document = <<END
<html>
<body>
<ul id="tag">
</ul>
<h1 class="indexed">Item 1</h1>
<h2 class="indexed">Item 1.1</h2>
<h1 class="indexed">Item 2</h1>
<h2 class="indexed">Item 2.1</h2>
<h2 class="indexed">Item 2.2</h2>
<h1>Remarks</h1>
<!-- Test Comment -->
</body>
</html>
END
parsed = Hpricot(document)
ul_tag = (parsed / 'ul#tag').first
sections = (parsed / '.indexed')
ul_tag.inner_html = sections.collect { |i| "<li>#{i.inner_html}</li>" }.to_s
puts parsed.to_html
This will yield:
<html>
<body>
<ul id="tag"><li>Item 1</li><li>Item 1.1</li><li>Item 2</li><li>Item 2.1</li><li>Item 2.2</li></ul>
<h1 class="indexed">Item 1</h1>
<h2 class="indexed">Item 1.1</h2>
<h1 class="indexed">Item 2</h1>
<h2 class="indexed">Item 2.1</h2>
<h2 class="indexed">Item 2.2</h2>
<h1>Remarks</h1>
<!-- Test Comment -->
</body>
</html>

Resources