I'm trying to use Nokogiri to get a page's full HTML but with all of the text stripped out.
I tried this:
require 'nokogiri'
x = "<html> <body> <div class='example'><span>Hello</span></div></body></html>"
y = Nokogiri::HTML.parse(x).xpath("//*[not(text())]").each { |a| a.children.remove }
puts y.to_s
This outputs:
<div class="example"></div>
I've also tried running it without the children.remove part:
y = Nokogiri::HTML.parse(x).xpath("//*[not(text())]")
puts y.to_s
But then I get:
<div class="example"><span>Hello</span></div>
But what I actually want is:
<html><body><div class='example'><span></span></div></body></html>
NOTE: This is a very aggressive approach. Tags like <script>, <style>, and <noscript> also have child text() nodes containing CSS, HTML, and JS that you might not want to filter out depending on your use case.
If you operate on the parsed document instead of capturing the return value of your iterator, you'll be able to remove the text nodes, and then return the document:
require 'nokogiri'
html = "<html> <body> <div class='example'><span>Hello</span></div></body></html>"
# Parse HTML
doc = Nokogiri::HTML.parse(html)
puts doc.inner_html
# => "<html> <body> <div class=\"example\"><span>Hello</span></div>\n</body>\n</html>"
# Remove text nodes from parsed document
doc.xpath("//text()").each { |t| t.remove }
puts doc.inner_html
# => "<html><body><div class=\"example\"><span></span></div></body></html>"
I am currently using an helper method to render a HAML nested partial/template in Sinatra. For the sake of simplicity I just wrote a minimal example but of course my code is much larger. The point is that I want to factorise inner_template instead of copy/pasting everywhere:
require 'sinatra'
require 'haml'
helpers do
def inner_template(&block)
haml_tag('div', :class => 'title', &block)
end
end
get '/' do
haml :index
end
__END__
## layout
%html
= yield
## index
%div.div1
- inner_template do
%span Item 1
- inner_template do
%span Item 2
This correctly renders the page like this:
<html>
<div class='div1'>
<div class='title'>
<span>Item 1</span>
</div>
<div class='title'>
<span>Item 2</span>
</div>
</div>
</html>
The problem is that my real life inner_template method has a consequent number of tags, and I find it quite inelegant (and cumbersome when I must edit it). I also want to keep all the inner-most blocks (my items) on the same page, otherwise I'll end up with dozens of small files and it will be a headache to maintain that.
I am under the impression that, since I can use an helper method which only uses haml_tag methods, there must be a way to make everything work in pure HAML. But I cannot figure out how to do it properly. For instance if I just try the obvious way:
require 'sinatra'
require 'haml'
get '/' do
haml :index
end
__END__
## layout
%html
= yield
## index
%div.div1
= haml :inner_template do
%span Item 1
= haml :inner_template do
%span Item 2
## inner_template
%div.title
= yield
This doesn't work, the inner_template is rendered after its "inner" block (as to what exactly yields the value 1, I do not know):
<html>
<div class='div1'>
<span>Item 1</span>
<div class='title'>
1
</div>
<span>Item 2</span>
<div class='title'>
1
</div>
</div>
</html>
I tried a very long list of hacks, I also tried to use content_for or similar solutions (which only seem to apply to old versions of Sinatra anyway), but I cannot find examples that match my approach (which could be a hint that it's just not possible). And it looks like the block is not passed at all to the HAML renderer when calling haml.
So I would like to know whether or not this can be done in pure HAML (and how, or why not)?
sample.erb.html
<p>Page 1</p1>
<p>Page 2</p2>
So, everything after "Page 1" I want to print on the 2nd page.
How can I do this?
There's one solution in SO but it didn't work for me.
For example, in case of Prawn, it has a nice feature called start_new_page
in your css
p{page-break-after: always;}
Update
After a couple of questions, I will expand my answer and how I use it in my apps.
1 Some times, the wickedpdf helpers doesn't work, so I add an initializer
_config/initializers/wiked_pdf.rb_
module WickedPdfHelper
def wicked_pdf_stylesheet_link_tag(*sources)
sources.collect { |source|
"<style type='text/css'>#{Rails.application.assets.find_asset("#{source}.css")}</style>"
}.join("\n").gsub(/url\(['"](.+)['"]\)(.+)/,%[url("#{wicked_pdf_image_location("\\1")}")\\2]).html_safe
end
def wicked_pdf_image_tag(img, options={})
image_tag wicked_pdf_image_location(img), options
end
def wicked_pdf_image_location(img)
"file://#{Rails.root.join('app', 'assets', 'images', img)}"
end
def wicked_pdf_javascript_src_tag(source)
"<script type='text/javascript'>#{Rails.application.assets.find_asset("#{source}.js").body}</script>"
end
def wicked_pdf_javascript_include_tag(*sources)
sources.collect{ |source| wicked_pdf_javascript_src_tag(source) }.join("\n").html_safe
end
WickedPdf.config = {
}
end
2 In application controller create a config method with the general config params
_app/controllers/application_controller.rb_
class ApplicationController < ActionController::Base
def pdf_config
WickedPdf.config = {
:wkhtmltopdf => "/usr/local/bin/wkhtmltopdf",
:orientation => 'Landscape',
:layout => "pdf.html",
:footer => {
:left => "Rectores Lideres Transformadores",
#:left => "#{Entidad.find(#current_user.entidad).nombre}",
:right => "#{Time.now}",
:font_size => 5,
:center => '[page] de [topage]'
},
:disposition => 'attachment'
}
end
end
3 Create a common layout for all of your pdf files. Here I use my application css in order to maintain the same look and feel of web page in pdf reports, only I have to use the same classes and id's
app/layouts/pdf.html.erb
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
<%= wicked_pdf_stylesheet_link_tag "application" %> ----- HERE YOUR APPLICATION CSS -----
</head>
<div id="content">
<%= yield %>
</div>
</body>
</html>
4 Add pdf redirection in your controllers
_app/controllers/users_controller.rb_
def index
#users = User.all
respond_to do |format|
format.pdf do
pdf_config
render :pdf => "filename"
end
end
end
5 Now, in your css, choose which html id is the page brake
#brake{page-break-after: always;}
I had the same problem and I discovered something that might help. This was my page break CSS code:
.page-break {
display: block;
clear: both;
page-break-after: always;
}
This didn't work because of TWO reasons:
I. In one of the SASS imported file I had this line of code:
html, body
overflow-x: hidden !important
II. The other problem was bootstrap
#import "bootstrap"
It looks like because of the float: left in:
.col-xs-1, .col-xs-2, .col-xs-3, .col-xs-4, .col-xs-5, .col-xs-6, .col-xs-7, .col-xs-8, .col-xs-9, .col-xs-10, .col-xs-11, .col-xs-12 {
float: left;
}
the page break is no longer working. So, just add this after you import bootstrap.
.col-xs-1, .col-xs-2, .col-xs-3, .col-xs-4, .col-xs-5, .col-xs-6, .col-xs-7, .col-xs-8, .col-xs-9, .col-xs-10, .col-xs-11, .col-xs-12 {
float: initial !important;
}
For anyone still having this problem but none of these answers, like for me, just aren't working:
I gave up on using CSS to fix the page breaks and instead generated 2 pdfs and merged them together and used the resulting file instead, that way there is no possible way for a page break to not exist.
To merge files, an array of pdf file names, I used
system("pdftk #{files.join(' ')} cat output merged_file_name.pdf")
Update
I don't remember where I generated 2 pdfs but I did manage to do these page breaks in a single pdf file by manually counting the pixels in the .html.erb files.
<% #pixel_height = 0 %> and <% #page_height = 980 %>. view the pdf as html to see how many pixels each section takes up. Add to #pixel_height.
In places a page break would make sense, I check #pixel_height + 20 >= #page_height (20 being the number of pixels a <tr> took up for most of our pdfs) and rendering a manual page break and resetting #pixel_height to 0. The manual page break closes all the html tags, adds a 0 pixel tall div with a page-break-after: always, and opens the html tags again.
I've only had 2 problems with this method:
If some text in the pdf is too long, it will line-break and throw off the #pixel_count causing a automatic page break in an odd spot and a manual page break also in an odd spot
WickedPdf is slow
To combat these 2 issues, we've been slowly migrating our pdfs to Prawn, specifically Prawn::Table. It is much faster and it calculates the height of each row before it draws the pdf so page breaks are more predictable and consistent
One quick method is to use below HTML:
<div style="page-break-before: always;"></div>
From this line onwards HTML Content will come in next page
Problem:
I want it to be able to pass the return value into an HTML ERB page(see result form below). I have tried many of the solutions on here and other sites and have yet to find one that fixes my problem. I included the full code in case I missed something.
NOTE: I get the return value and can bring up the results form but the return value is not passing. I already posted this but solution given did NOT help. This includes the change recommended to me. It appears adding #{output} so the redirect in the main code causes a "The connection was reset" error which makes no sense what-so-ever.
Main Code:
file name: */projects/webhosted_custom_fibonacci_calculator.rb
require "rubygems"
require "sinatra"
require_relative 'fibonacci_calculator.rb'
require "erb"
include Calculator
get '/' do
redirect ("/calculate")
end
get '/calculate' do
erb :calculator_form, :locals => {:calculator => session[:calculator]}
end
post '/calculate' do
num1 = params['firstnum'].to_i
num2 = params['secondnum'].to_i
output = Calculator.run(num1, num2)
redirect ("/results_form?results=#{output}")
end
get '/results_form' do
erb :results_form, :locals => {:results => params[:results]}
end
Result form:
File name: */projects/views/results_form.erb
<html>
<head>
<title>Fibonacci Calculator</title>
</head>
<body>
<h1>Results</h1>
Result: <%= results %>
</body>
</html>
I'd like to to show a message only if on a specific route/page. Essentially, if on /route display a message.
I tried going through the Sinatra Docs, but I can't find a specific way to do it. Is there a Ruby method that will make this work?
EDIT: Here's an example of what I'd like to do.
get '/' do
erb :index
end
get '/page1' do
erb :page1
end
get '/page2' do
erb :page2
end
*******************
<!-- Layout File -->
<html>
<head>
<title></title>
</head>
<body>
<% if this page is 'page1' do something %>
<% else do something else %>
<% end %>
<%= yield %>
</body>
</html>
No idea what how to target the current page using Ruby/Sinatra and structure it into an if statement.
There are several ways to approach this (and BTW, I'm going to use Haml even though you've used ERB because it's less typing for me and plainly an improvement). Most of them rely on the request helper, most often it will be request.path_info.
Conditional within a view.
Within any view, not just a layout:
%p
- if request.path_info == "/page1"
= "You are on page1"
- else
= "You are not on page1, but on #{request.path_info[1..]}"
%p= request.path_info == "/page1" ? "PAGE1!!!" : "NOT PAGE1!!!"
A conditional with a route.
get "/page1" do
# you are on page1
message = "This is page 1"
# you can use an instance variable if you want,
# but reducing scope is a best practice and very easy.
erb :page1, :locals => { message: message }
end
get "/page2" do
message = nil # not needed, but this is a silly example
erb :page2, :locals => { message: message }
end
get %r{/page(\d+)} do |digits|
# you'd never reach this with a 1 as the digit, but again, this is an example
message = "Page 1" if digits == "1"
erb :page_any, :locals => { message: message }
end
# page1.erb
%p= message unless message.nil?
A before block.
before do
#message = "Page1" if request.path_info == "/page1"
end
# page1.erb
%p= #message unless #message.nil?
or even better
before "/page1" do
#message = "Hello, this is page 1"
end
or better again
before do
#message = request.path_info == "/page1" ? "PAGE 1!" : "NOT PAGE 1!!"
end
# page1.erb
%p= #message
I would also suggest you take a look at Sinatra Partial if you're looking to do this, as it's a lot easier to handle splitting up views when you have a helper ready made for the job.
Sinatra has no "controller#action" Rail's like concept, so you wont find a way to instantiate the current route. In any case, you can check request.path.split('/').last to get a relative idea of what is the current route.
However, if you want something so be shown only if request.path == "x", a much better way is to put that content on the template, unless that content has to be rendered in a different place within your layout. In that case you can use something like Rail's content_for. Check sinatra-content-for.