I'm using rake to create a Table of contents from a bunch of static HTML files.
The question is how do I insert it into all files from within rake?
I have a <ul id="toc"> in each file to aim for. The entire content of that I want to replace.
I was thinking about using Nokogiri or similar to parse the document and replace the DOM node ul#toc. However, I don't like the idea that I have to write the parser's DOM to the HTML file. What if it changes my layouts/indents etc.??
Any thoughts/ideas? Or perhaps links to working examples?
Could you rework the files to .rhtml, where
<ul id="toc">
is replaced with an erb directive, such as
<%= get_toc() %>
where get_toc() is defined in some library module. Write the transformed files as .html (to another directory if you like) and you're in business and the process is repeatable.
Or, come to that, why not just use gsub? Something like:
File.open(out_filename,'w+') do |output_file|
output_file.puts File.read(filename).gsub(/\<ul id="toc"\>/, get_toc())
end
I ended up with an idea similar to what Mike Woodhouse suggested. Only not using erb templates (as I wanted the source files to be freely editable also by non ruby-lovers)
def update_toc(filename)
raise "FATAL: Requires self.toc= ... before replacing TOC in files!" if #toc.nil?
content = File.read(filename)
content.gsub(/<h2 class="toc">.+?<\/ul>/, #toc)
end
def replace_toc_in_all_files
#file_names.each do |name|
content = update_toc(name)
File.open(name, "w") do |io|
io.write content
end
end
end
You can manipulate the document directly and save the resulting output. If you confine your manipulations to a particular element, you won't alter the overall structure and should be fine.
A library like Nokogiri or Hpricot will only adjust your document if it's malformed. I know that Hpricot can be coached to have a more relaxed parsing method, or can operate in a more strict XML/XHTML manner.
Simple example:
require 'rubygems'
require 'hpricot'
document = <<END
<html>
<body>
<ul id="tag">
</ul>
<h1 class="indexed">Item 1</h1>
<h2 class="indexed">Item 1.1</h2>
<h1 class="indexed">Item 2</h1>
<h2 class="indexed">Item 2.1</h2>
<h2 class="indexed">Item 2.2</h2>
<h1>Remarks</h1>
<!-- Test Comment -->
</body>
</html>
END
parsed = Hpricot(document)
ul_tag = (parsed / 'ul#tag').first
sections = (parsed / '.indexed')
ul_tag.inner_html = sections.collect { |i| "<li>#{i.inner_html}</li>" }.to_s
puts parsed.to_html
This will yield:
<html>
<body>
<ul id="tag"><li>Item 1</li><li>Item 1.1</li><li>Item 2</li><li>Item 2.1</li><li>Item 2.2</li></ul>
<h1 class="indexed">Item 1</h1>
<h2 class="indexed">Item 1.1</h2>
<h1 class="indexed">Item 2</h1>
<h2 class="indexed">Item 2.1</h2>
<h2 class="indexed">Item 2.2</h2>
<h1>Remarks</h1>
<!-- Test Comment -->
</body>
</html>
Related
I am trying to get all the images from .mht file by using Nokogiri gem. But since the .mht file has quoted-printable encoding, all the images that I received, has weird characters in it:
<img alt='3D"AFC-Logo' src="3D%22https://upload.=" width='3D"75"' height='3D"75"'>
<img src="3D%22https://en.wikipedia.org/static/images/footer/wikimedia-butto=" width='3D"88"' height='3D"31"' alt='3D"Wikimedia'>
<img src="3D%22https://en.wikipedia.org/static/images/footer/poweredby_mediawiki_8=" alt='3D"Powered' width='3D"88"' height='3D"31"'>
This is the link to that .mht file: https://drive.google.com/file/d/1DtbgrFyCEcggAk1nqpZSluNhRt-k3t95/view?usp=sharing
And below is the code that I am using to get all the images from the .mht file:
html = File.open("1646037951.mht").read
image_links = get_image_links(html)
def get_image_links(html)
html_doc = Nokogiri::HTML(html)
nodes = html_doc.xpath("//img[#src]")
raise "No <img .../> tags!" if nodes.empty?
nodes.inject([]) do |uris, node|
puts node.to_s
uris << node.attr('src').strip
end.uniq
end
I have tried to parse it by using .unpack('M').first but it's still not working as it just returns the same result as above.
Or maybe Rails have something for this?
Is it possible to find outlook specific markup via Capybara/Nokogiri ?
Given the following markup (erb <% %> tags are processed into regular HTML)
...
<div>
<!--[if gte mso 9]>
<v:rect
xmlns:v="urn:schemas-microsoft-com:vml" fill="true" stroke="false"
style="width:<%= card_width %>px;height:<%= card_header_height %>px;"
>
<v:fill type="tile"
src="<%= avatar_background_url.split('?')[0] %>"
color="<%= background_color %>" />
<v:textbox inset="0,0,0,0">
<![endif]-->
<div>
How can I get the list of <v:fill ../> tags ? (or eventually how can I get the whole comment if finding the tag inside a conditional comment is a problem)
I have tried the following
doc.xpath('//v:fill')
*** Nokogiri::XML::XPath::SyntaxError Exception: ERROR: Undefined namespace prefix: //v:fill
DO I need to somehow register the vml namespace ?
EDIT - following #ThomasWalpole approach
doc.xpath('//comment()').each do |comment_node|
vml_node_match = /<v\:fill.*src=\"(?<url>http\:[^"]*)"[^>]*\/>/.match(comment_node)
if vml_node_match
original_image_uri = URI.parse(vml_node_match['url'])
vml_tag = vml_node_match[0]
handle_vml_image_replacement(original_image_uri, comment_node, vml_tag)
end
My handle_vml_image_replacement then ends up calling the following replace_comment_image_src
def self.replace_comment_image_src(node:, comment:, old_url:, new_url:)
new_url = new_url.split('?').first # VML does not support URL with query params
puts "Replacing comment src URL in #{comment} by #{new_url}"
node.content = node.content.gsub(old_url, new_url)
end
But then it feels like the comment is actually no longer a "comment" and I can sometimes see the HTML as if it was escaped... I am most likely using the wrong method to change the comment text with Nokogiri ?
Here's the final code that I used for my email interceptor, thanks to #Thomas Walpole and #sschmeck for help along the way.
My goal was to replace images (linking to localhost) in VML markup with globally available images for testing with services like MOA or Litmus
doc.xpath('//comment()').each do |comment_node|
# Note : cannot capture beginning of tag, since it might span across several lines
src_attr_match = /.*src=\"(?<url>http\:[^"]*)"[^>]*\/>/.match(comment_node)
next unless src_attr_match
original_image_uri = URI.parse(src_attr_match['url'])
handle_comment_image_replacement(original_image_uri, comment_node)
end
WHich is later calling (after picking an url replacement strategy depending on source image type) :
def self.replace_comment_image_src(node:, old_url:, new_url:)
new_url = new_url.split('?').first
node.native_content = node.content.gsub(old_url, new_url)
end
I'm using rhoMobile platform
I'm trying to get a parameter in my erb file from rb file.
I have a properties file, in my app.rb file i'm getting values from keys in this properties file.
This value is saved in application.rb, and i want to use this value in my app.erb.
Here is some code:
myFunc(<%= Rho::RhoConfig.getValue %>)
I am not going to question if your doing things right, but this should work:
myFunc("<%= Rho::RhoConfig.getValue %>")
Try this:
<script type="text/javascript" charset="utf-8">
var rho_config_value = <%= Rho::RhoConfig.getValue || 'null' %>;
myFunc(rho_config_value)
</script>
myFunc('<%= Rho.get_app.getValue('key')%>')
My HTML code, here am passing xslx file for parsing,
<form method="post" action="/home/parse_xlsx" enctype="multipart/form-data">
Upload XSLX File <input type="file" name="xlsx_file" id="xlsx_file" />
<input type="submit" value="Post"/>
</form>
My Controller code,
def parse_xlsx
xlsxFile = params[:xlsx_file]
prefix_tmp_path = xlsxFile.path
filename = xlsxFile.original_filename
fullname = File.join(prefix_tmp_path,filename)
require 'roo'
s = Roo::Excelx.new(fullname)
for i in 1..14
puts s.cell(i,3)
end
end
Giving me error ,
file /tmp/RackMultipart20130910-10043-u4nqsc/CMS.xlsx does not exist
When I run the following code on console am keeping my 'CMS.xlsx' file in rails root folder & it is running without any errors.
require 'roo'
s = Roo::Excelx.new("CMS.xlsx")
for i in 1..14
puts s.cell(i,3)
end
Please explain where I am going wrong.
xlsxFile.path is the file's location, you shouldn't have to join in the filename. If you need to save the file, you can rename it the original filename when you move it to it's file location
try
s = Roo::Excelx.new(xlsxFile.path)
I'm trying to get a gh-pages site up and running. First time using Jekyll.
I have a super basic layout (default.html) in /_layouts:
<!doctype html>
<html>
<head>
<meta charset="utf-8">
</head>
<body>
<div class="wrapper">
<section id="main">
{{ content }}
</section>
</div>
</body>
</html>
And a single content page (index.html)
---
layout: default
---
Hello World
My _config.yml file is simply
pygments: true
When running jekyll --no-auto --server I get the following error. No files are generated.
.rvm/rubies/ruby-1.9.3-p327/lib/ruby/1.9.1/psych.rb:203:in `parse':
(<unknown>): did not find expected node content while parsing a flow
node at line 3 column 1 (Psych::SyntaxError)
Anyone know what's wrong here?
Since line 3 is <head>, it is possible that some basic metadata is missing, like <title>.
All template I see have a title (zinga, Symplicity, ... either fixed or generated), and the most basic template has one too (see "Hello World, I'm Jekyll")
<html>
<head>
<title>Hello world!</title>
</head>
<body>
<h1>Hello world!</h1>
<p>This is my first Jekyll website.</p>
</body>
</html>
You should check that what it's parsing is YAML at all.
The way I'm checking this in by putting some debug commands in the gem directly and re-running.
Change the psych.rb which for me is at /home/user/.rbenv/versions/2.0.0-p0/lib/ruby/2.0.0/psych.rb. Look for the def self.load and change it from
def self.load yaml, filename = nil
result = parse(yaml, filename)
result ? result.to_ruby : result
end
to
def self.load yaml, filename = nil
puts "****************#{filename}"
result = parse(yaml, filename)
result ? result.to_ruby : result
end
and look for the output in your terminal when you re-run the command.
I am currently dealing with deploying a rails app with capistrano (no jekyll at all). In my case, the output was blank, which is obviously not a filename. So now I'm investigating further up the chain. I hope that gets you started.