Render pdf with page specific footnotes in ruby - ruby

I'm looking for a way to render a pdf document with footnotes, where the footnote text appears in the footer of the same page as the footnote reference (as opposed to end of document). This kind of footnotes appears in books, e.g. where translator's comments are. Being able to link the footnote ref with footnote text is optional.
So far I looked at Prawn and PDFKit, but I cannot seem to find a straightforward solution.
This is a sample of what I'm trying to do: footnote sample
And this is what I've come up with using Prawn. Obviously this needs more work to cover corner cases. I'm wondering if it's worth to go further down that route or maybe try something completely different, like latex (haven't used it with ruby).
Prawn::Document.generate("foo.pdf",:page_size => 'A4') do
#read 150 lorem ipsum records of various length
records = File.read(
Rails.root.join("lib/assets/lorem_ipsum_paragraphs.txt")
).split("\n").reject(&:blank?)
#assign some random footnotes to the paragraphs
footnotes = {
1 => '* ' + records[120],
2 => '* ' + records[54],
11 => '* ' + records[2]
}
#this array will hold the current footnotes to draw
#the assumption being we'll draw them on the current page as soon as we
#reach the part of the page where we need to fit those
footnotes_to_draw = []
#this will hold the amount of space required to draw the footnotes
space_needed = 0
for i in 0..records.length
str = records[i]
if footnotes.keys.include? i
str += '*'#this one has a footnote attached
footnotes_to_draw << footnotes[i]
space_needed += (height_of(footnotes[i]) + 15)
end
text "#{str}"
if space_needed > 0
#that means we will need to draw a footer on this page
space_available = cursor
puts "space needed: #{space_needed}, space available: #{space_available}"
#check if we can still draw the next record, or now's the time'
unless space_available - space_needed > height_of(records[i+1])
puts "draw footer now"
bounding_box [0,space_needed], :width => bounds.width, :height => space_needed do
stroke_horizontal_rule
footnotes_to_draw.each do |footnote|
pad(10){text footnote}
end
end
#reset current footnotes
footnotes_to_draw = []
space_needed = 0
move_down space_needed
end
end
end
end

Related

Ruby + Prawn: How do I make text stick to bottom of page?

I have footer text that needs to stay at the bottom of the page: "If you have any questions regarding your order, you may contact us". How would I position it absolutely?
Here's one way from the docs:
file = "lazy_bounding_boxes.pdf"
Prawn::Document.generate(file, :skip_page_creation => true) do
point = [bounds.right-50, bounds.bottom + 25]
page_counter = lazy_bounding_box(point, :width => 50) do
text "Page: #{page_count}"
end
10.times do
start_new_page
text "Some text"
page_counter.draw
end
end

Controlling content flow with Prawn

Let's say we want to display a title on the first page that takes up the top half of the page. The bottom half of the page should then fill up with our article text, and the text should continue to flow over into the subsequent pages until it runs out:
This is a pretty basic layout scenario but I don't understand how one would implement it in Prawn.
Here's some example code derived from their online documentation:
pdf = Prawn::Document.new do
text "The Prince", :align => :center, :size => 48
text "Niccolò Machiavelli", :align => :center, :size => 20
move_down 42
column_box([0, cursor], :columns => 3, :width => bounds.width) do
text((<<-END.gsub(/\s+/, ' ') + "\n\n") * 20)
All the States and Governments by which men are or ever have been ruled,
have been and are either Republics or Princedoms. Princedoms are either
hereditary, in which the bla bla bla bla .....
END
end
end.render
but that will just continue to show the title space for every page:
What's the right way to do this?
I have been fighting with this same problem. I ended up subclassing ColumnBox and adding a helper to invoke it like so:
module Prawn
class Document
def reflow_column_box(*args, &block)
init_column_box(block) do |parent_box|
map_to_absolute!(args[0])
#bounding_box = ReflowColumnBox.new(self, parent_box, *args)
end
end
private
class ReflowColumnBox < ColumnBox
def move_past_bottom
#current_column = (#current_column + 1) % #columns
#document.y = #y
if 0 == #current_column
#y = #parent.absolute_top
#document.start_new_page
end
end
end
end
end
Then it is invoked exactly like a normal column box, but on the next page break will reflow to the parents bounding box. Change your line:
column_box([0, cursor], :columns => 3, :width => bounds.width) do
to
reflow_column_box([0, cursor], :columns => 3, :width => bounds.width) do
Hope it helps you. Prawn is pretty low level, which is a two-edged sword, it sometimes fails to do what you need, but the tools are there to extend and build more complicated structures.
I know this is old, but I thought I'd share that a new option has been added to fix this in v0.14.0.
:reflow_margins is an option that sets column boxes to fill their parent boxes on new page creation.
column_box(reflow_margins: true, columns: 3)
So, the column_box method creates a bounding box. The documented behavior of the bounding box is that it starts at the same position as on the previous page if it changes to the next page. So the behavior you are seeing is basically correct, also not what you want. The suggested workaround I have found by googling is to use a span instead, because spans do not have this behavior.
The problem now is, how to build text columns with spans? They don't seem to support spans natively. I tried to build a small script that mimicks columns with spans. It creates one span for each column and aligns them accordingly. Then, the text is written with text_box, which has the overflow: :truncate option. This makes the method return the text that did not fit in the text box, so that this text can then be rendered in the next column. The code probably needs some tweaking, but it should be enough to demonstrate how to do this.
require 'prawn'
text_to_write = ((<<-END.gsub(/\s+/, ' ') + "\n\n") * 20)
All the States and Governments by which men are or ever have been ruled,
have been and are either Republics or Princedoms. Princedoms are either
hereditary, in which the bla bla bla bla .....
END
pdf = Prawn::Document.generate("test.pdf") do
text "The Prince", :align => :center, :size => 48
text "Niccolò Machiavelli", :align => :center, :size => 20
move_down 42
starting_y = cursor
starting_page = page_number
span(bounds.width / 3, position: :left) do
text_to_write = text_box text_to_write, at: [bounds.left, 0], overflow: :truncate
end
go_to_page(starting_page)
move_cursor_to(starting_y)
span(bounds.width / 3, position: :center) do
text_to_write = text_box text_to_write, at: [bounds.left, 0], overflow: :truncate
end
go_to_page(starting_page)
move_cursor_to(starting_y)
span(bounds.width / 3, position: :right) do
text_box text_to_write, at: [bounds.left, 0]
end
end
I know this is not an ideal solution. However, this was the best I could come up with.
Use floats.
float do
span((bounds.width / 3) - 20, :position => :left) do
# Row Table Code
end
end
float do
span((bounds.width / 3) - 20, :position => :center) do
# Row Table Code
end
end
float do
span((bounds.width / 3) - 20, :position => :right) do
# Row Table Code
end
end
Use Prawns grid layout instead. It is very well documented...and easier to control your layout.

Get text of a paragraph with all the markup (and their content) removed

How can I get only the text of the node <p> which has other tags in it like:
<p>hello my website is click here <b>test</b></p>
I only want "hello my website is"
This is what I tried:
begin
node = html_doc.css('p')
node.each do |node|
node.children.remove
end
return (node.nil?) ? '' : node.text
rescue
return ''
end
Update 2: all right, well you are removing all children with node.children.remove, including the text nodes, a proposed solution might look like:
# 1. select all <p> nodes
doc.css('p').
# 2. map children, and flatten
map { |node| node.children }.flatten.
# 3. select text nodes only
select { |node| node.text? }.
# 4. get text and join
map { |node| node.text }.join(' ').strip
This sample returns "hello my website is", but note that doc.css('p') als finds <p> tags within <p> tags.
Update: sorry, misread your question, you only want "hello my website is", see solution above, original answer:
Not directly with nokogiri, but the sanitize gem might be an option: https://github.com/rgrove/sanitize/
Sanitize.clean(html, {}) # => " hello my website is click here test "
FYI, it uses nokogiri internally.
Your test case did not include any interesting text interleaved with the markup.
If you want to turn <p>Hello <b>World</b>!</p> into "Hello !", then removing the children is one way to do it. Simpler (and less destructive) is to just find all the text nodes and join them:
require 'nokogiri'
html = Nokogiri::HTML('<p>Hello <b>World</b>!</p>')
# Find the first paragraph (in this case the only one)
para = html.at('p')
# Find all the text nodes that are children (not descendants),
# change them from nodes into the strings of text they contain,
# and then smush the results together into one big string.
p para.search('text()').map(&:text).join
#=> "Hello !"
If you want to turn <p>Hello <b>World</b>!</p> into "Hello " (no exclamation point) then you can simply do:
p para.children.first.text # if you know that text is the first child
p para.at('text()').text # if you want to find the first text node
As #Iwe showed, you can use the String#strip method to removing leading/trailing whitespace from the result, if you like.
There's a different way to go about this. Rather than bother with removing nodes, remove the text that those nodes contain:
require 'nokogiri'
doc = Nokogiri::HTML('<p>hello my website is click here <b>test</b></p>')
text = doc.search('p').map{ |p|
p_text = p.text
a_text = p.at('a').text
p_text[a_text] = ''
p_text
}
puts text
>>hello my website is test
This is a simple example, but the idea is to find the <p> tags, then scan inside those for the tags that contain the text you don't want. For each of those unwanted tags, grab their text and delete it from the surrounding text.
In the sample code, you'd have a list of undesirable nodes at the a_text assignment, loop over them, and iteratively remove the text, like so:
text = doc.search('p').map{ |p|
p_text = p.text
%w[a].each do |bad_nodes|
bad_nodes_text = p.at(bad_nodes).text
p_text[bad_nodes_text] = ''
end
p_text
}
You get back text which is an array of the tweaked text contents of the <p> nodes.

How to override or edit the last printed lines in a ruby CLI script?

I am trying to build a script that gives me feedback about progress on the command-line. Actually it is just putting a newline for every n-th progress step made. Console looks like
10:30:00 Parsed 0 of 1'000'000 data entries (0 %)
10:30:10 Parsed 1'000 of 1'000'000 data entries (1 %)
10:30:20 Parsed 2'000 of 1'000'000 data entries (2 %)
[...] etc [...]
11:00:00 Parsed 1'000'000 of 1'000'000 data entries (100 %)
Even if timestamp and progressnumbers are fictional, you should see the problem.
What I want is to do it "wget-style" with a progressbar updated on the command line, with linewidth in mind.
First I thought about the use of curses because I had hands on as I tried to learn C, but I never could get warm with it, also I think it is bloated for the purpose of manipulating just a few lines. Also I dont need any coloring. Also most other libraries I found seemed to be specialized for coloring.
Can someone help me with this problem?
A while ago I created a class to be a status text on which you can change part of the content of the text within the line. It might be useful to you.
The class with an example use are:
class StatusText
def initialize(parms={})
#previous_size = 0
#stream = parms[:stream]==nil ? $stdout : parms[:stream]
#parms = parms
#parms[:verbose] = true if parms[:verbose] == nil
#header = []
#onChange = nil
pushHeader(#parms[:base]) if #parms[:base]
end
def setText(complement)
text = "#{#header.join(" ")}#{#parms[:before]}#{complement}#{#parms[:after]}"
printText(text)
end
def cleanAll
printText("")
end
def cleanContent
printText "#{#parms[:base]}"
end
def nextLine(text=nil)
if #parms[:verbose]
#previous_size = 0
#stream.print "\n"
end
if text!=nil
line(text)
end
end
def line(text)
printText(text)
nextLine
end
#Callback in the case the status text changes
#might be useful to log the status changes
#The callback function receives the new text
def onChange(&block)
#on_change = block
end
def pushHeader(head)
#header.push(head)
end
def popHeader
#header.pop
end
def setParm(parm, value)
#parms[parm] = value
if parm == :base
#header.last = value
end
end
private
def printText(text)
#If not verbose leave without printing
if #parms[:verbose]
if #previous_size > 0
#go back
#stream.print "\033[#{#previous_size}D"
#clean
#stream.print(" " * #previous_size)
#go back again
#stream.print "\033[#{#previous_size}D"
end
#print
#stream.print text
#stream.flush
#store size
#previous_size = text.gsub(/\e\[\d+m/,"").size
end
#Call callback if existent
#on_change.call(text) if #on_change
end
end
a = StatusText.new(:before => "Evolution (", :after => ")")
(1..100).each {|i| a.setText(i.to_s); sleep(1)}
a.nextLine
Just copy, paste in a ruby file and try it out. I use escape sequences to reposition the cursor.
The class has lots of features I needed at the time (like piling up elements in the status bar) that you can use to complement your solution, or you can just clean it up to its core.
I hope it helps.
In the meanwhile I found some gems that give me a progressbar, I will list them up here:
ProgressBar from paul at github
a more recent version from pgericson at github
ruby-progressbar from jfelchner at github
simple_progressbar from bitboxer at github
I tried the one from pgericson and that from jfelchner, they both have pros and cons but also both fits my needs. Probably I will fork and extend one of them in the future.
I hope this one helps others to find faster, what I searched for months.
Perhaps replace your outputting to this:
print "Progress #{progress_var}%\r"

How to make an element take up all-the-width-that-is-left?

Shoes.app do
flow do
file = "something with variable length"
para "Loading #{file}: "
progress :width => -300
end
end
As you can see from the code I am trying to display a progress bar that goes from the end of the text until the right edge of the application window.
When the text has a fixed length this solution works but it doesn't once the text changes length in the above fragment: there will be either too little or too much space for the progress bar.
Is there a solution to this problem?
I tried asking the para element it's width but it is 0???
As I mentioned before, you have to get the width of the textblock after it is calculated. Try this:
Shoes.app do
flow do
file = "something with variable length"
#p = para "Loading #{file}: "
#prog = progress
start do
#prog.width = #prog.parent.width - #p.width
end
end
button 'Change text!' do
text = #p.text
#p.text = text + '1'
#prog.width = #prog.parent.width - #p.width
end
end

Resources