PDF generation under ruby - block should not cut by page separator - ruby

My PDF consists of a number of blocks (actually, a list of quotations), they go one after another till the end of the document. If the text of a quotation
does not fit on the page, the whole quotation should start from the top of the next page, instead of being torn apart. How can I implement that on any library under ruby?

Try PrinceXML - this is a standalone executable that generates PDF out of HTML or XML. It supports a lot of special CSS properties that will even help you to control page breaks. Refer to http://www.princexml.com/doc/6.0/page-breaks/
This application is available for windows and linux. I was using it for generation of a pretty complicated PDF documents with headers and footers on every page except first one. And since you don't need to output a PDF with precise positioning of elements, it might be a perfect solution for you.

I haven't tried it, but in Prawn I would try using either the Document#text_box method or looking up the table methods and putting your text in cells with invisible borders. The documentation's unclear on how page break functionality fits in with the bounding box models, but it's worth a shot.

HTMLDoc which converts HTML to PDF has a page break facility.

Related

What is the best way to convert PDF pairs into single pages?

I need to take an existing PDF (created with Prawn), and combine pairs after page 1 (the cover) into single pages. I would also like to add a vertical line in the center of the joined pages. The pages are to be printed in books, and the goal is to make single PDF pages that are similar to the side by side view in Acrobat. I know I can convert them to images, do what I need to with ImageMagick, then put them back into a PDF format, but I am trying to minimize the number of conversions so I can save as much quality as possible.
I also realize I can do this from the start with Prawn, but I am trying to avoid that as it would require a very large change to our application.
It is possible to do this with Ghostscript and the pdfwrite device, but its by no means simple. You need to write some PostScript to do the job.
You would need to add BeginPage and EndPage procedures, the BeginPage would need to check the current page number (and you would need to track this yourself). If its page 1, process normally. If its an even page, throw away the current PageSize and replace it with one which covers a pair of pages. Process the even page. Do not transmit the content.
If the page is odd (and not 1) then translate the origin so that its offset to the right by the width of the page. Process the odd page. use moveto, lineto and stroke to draw the required line between the two pages. Transmit the page.
This assumes that all the pages are the same size and orientation, or least that the sizes of each page are known in advance. It would be possible to retrieve those programmatically as well, but more complex.
Its definitely non-trivial, but if you rummage through my answers in the PostScript tags and look for anything with the word 'imposition' you'll probably find program outlines to do the job.
I did a quick look and here's an answer I wrote some time back. It uses a different approach to that outlined above, it copies some of the guts of the PDF interpreter and repurposes them. It does a chunk of what you want though.

C# PDFsharp end of page detection

Is it possible to detect the end of page in creating PDF file with PDFsharp library? How? Or overflowing text on page? I am generating PDF file with list of users and if the list is too long, I need to add new page and continue on it. I don't want to write ugly code, I want it to be as automatic as possible.
I am aware of MigraDoc library, but I already have a lot of code written in PDFsharp, so if it's not necessary to use MigraDoc (which seems to be better), I would rather stay with PDFsharp. Thanks.
When using PDFsharp, you are responsible to detect the end of page and create a new page for the continuation.
We always say that PDFsharp is low level: no automatic page breaks, but anything can be drawn anywhere.
Still you can write clean code with PDFsharp that handles page breaks properly.
You always have a current page, a current gfx, and a current y position on the page. So when you have to start a new page, re-initialize those variables.

Splitting Ruby strings into pages

I've been thinking about this problem for a while, and not quite sure the best way to go about it.
In a rails app I have books, which have many chapters, which have many sections. Chapters are basically just containers for sections, though may contain strings of text themselves. The sections hold most of the book text.
I'm planning to build an HTML 5 ebook reader that works in a mobile browser, and I don't want the user to have to scroll down -- I want the text to break at the end of the page.
I'd assumed using split might be the way to go, but I'm not sure there's a way to break at regular intervals? Would a javascript option work better here?
I'd looked at this: Dividing text article to smaller parts with paging in Ruby on Rails but can't feasibly insert manual break marks in the text, some of which are 90,000+ words.
Any ideas would be appreciated.
I think the main problem here is that the page length will depend on the device (and possibly the text size, if that is feature of your app). You should probably send large chunks that are sure to be at least say 5 pages long, at a time and then let the javascript do the paging. Rails has no access, nor should it, to the size of the display.
Text requires very little data, you shouldn't worry about transmitting more than you need or keeping too much in memory.
You may use blank line("\n" or "") as the separator.
I'd send enough of the page content down to easily fill a page and more, then use javascript on the client slide to remove sentences from the page until the scroll-bar disappears.
Resize.js is something similar I wrote a while ago. I wanted to enlarge/reduce the font size used on a screen until the screen was just full (for a dashboard monitor).. Yours would be similar, but instead of changing the font size, you are trimming off sentences.
Let me know if you can't see how to adapt this code.
Note: I would also make the javascript note the amount of text it ends up displaying, and pass that to the server in the 'next page' request, so the server knows where to start the next page from.

Modifying existing pdf elements (particularly images)

I am reading in template PDFs, customizing them, and appending pages before outputting the final document. What I want to do is modify the elements in the template I load before I append it to the output.
In particular I want to hide or remove images (and potentially other elements). I'm not even sure if elements in the imported page can be modified directly, if I can only add images (I haven't seen any sign of a removeImage() function) or what.
A little guidance would be greatly appreciated.
You should get hold of the book accompanying iText: iText in Action (1st or 2nd edition). It has some great examples of most things that can be done in iText.
Fist edition
Second edition
I believe you need to iterate through the pdf references in the reader to be able to identify images. I am not sure how one would replace them, but it's probably possible.
There are other libraries that do this better, pdfnet being one of them, but this is commercial.

Printing Reports and invoices with Ruby?

I just learn Ruby, and I wonder how to generate Reports and Invoices (with Logo, adressfield, footer, variable number of invoice-items (sometimes resulting in more than one page), carry over of the amount to pay from one page to the next, free-floating 2-column text (left-and-right-justified) below the resulting cash-informations).
Currently I get a canvas to print and draw on from the OperatingSystem (matching the printer specifications) and use some draw-, move-, line-, text- and formfeed-API-Functions and do some heavy calculations for textblock-moving (a bit TeX-like).
How will this be done in Ruby?
Building an .odt and throw it to OpenOffice or a .tex and throw it to LaTeX?
Or are there any free Libraries, thet do all this kind of things for me, so I only have to feed the relevant parts, and let Ruby do the Text-Formatting thing?
EDIT:
To be more specific: I want to put a corporation logo on the first page (DIN-A4-format, but may also be letter) on a specific position, also the footer on every page and the adress-box on the first page. all the rest should be free floating text blocks with left-right-justification, bold words in the middle of texts.
something like
pdf.column.blocktext("Hello Mr. P\nwe have [b]good news[/b] for you. bla bla bla and so on. Please keep this text together (no page break)...");
pdf.column.floatingblock("This is another block, that should be printed, and can be broken over more than one column...");
which should render the text in the corporate font on the paper, justified, and wrapping neatly to the next column/page if it reaches the bottom of the page.
Thinking about it, this is exactly, what LaTeX is for.
I suggest you consider PDF generation. In Rails, it's pretty simple with the Prawn library.
There is also a fresh new Railcast about that.
Official web site.
You could also check out HtmlDoc for generating PDFs, it just takes in HTML and generates a PDF from it. This approach is nice because it lets you very easily reuse a partial for an on-screen and hard copy invoice.
http://blog.adsdevshop.com/2007/11/20/easy-pdf-generation-with-ruby-rails-and-htmldoc/
The Ruport library (Ruby Reports) makes it pretty easy to spit report tables out in multiple formats, including PDF. There's also a ActiveRecord hook acts_as_reportable that gives your models a reporting interface.

Resources