What is the best way to convert PDF pairs into single pages? - ruby

I need to take an existing PDF (created with Prawn), and combine pairs after page 1 (the cover) into single pages. I would also like to add a vertical line in the center of the joined pages. The pages are to be printed in books, and the goal is to make single PDF pages that are similar to the side by side view in Acrobat. I know I can convert them to images, do what I need to with ImageMagick, then put them back into a PDF format, but I am trying to minimize the number of conversions so I can save as much quality as possible.
I also realize I can do this from the start with Prawn, but I am trying to avoid that as it would require a very large change to our application.

It is possible to do this with Ghostscript and the pdfwrite device, but its by no means simple. You need to write some PostScript to do the job.
You would need to add BeginPage and EndPage procedures, the BeginPage would need to check the current page number (and you would need to track this yourself). If its page 1, process normally. If its an even page, throw away the current PageSize and replace it with one which covers a pair of pages. Process the even page. Do not transmit the content.
If the page is odd (and not 1) then translate the origin so that its offset to the right by the width of the page. Process the odd page. use moveto, lineto and stroke to draw the required line between the two pages. Transmit the page.
This assumes that all the pages are the same size and orientation, or least that the sizes of each page are known in advance. It would be possible to retrieve those programmatically as well, but more complex.
Its definitely non-trivial, but if you rummage through my answers in the PostScript tags and look for anything with the word 'imposition' you'll probably find program outlines to do the job.
I did a quick look and here's an answer I wrote some time back. It uses a different approach to that outlined above, it copies some of the guts of the PDF interpreter and repurposes them. It does a chunk of what you want though.

Related

Mosaicking PDF documents?

I have (or, rather, will soon have) a number of maps created in ArcGIS 10.0 and exported as PDF documents. The maps all show contiguous areas, being rather like the pages in a map book. There will also be a smaller-scale map depicting the entire area (let's call it the "study area"), but with less detail, rather like that page of a map atlas that shows what page depicts what area.
I wonder if there is any way to create thumbnails of the larger-scale maps and mosaic them such as to create an index map of the study area. A user would then be able to see, for a particular point on the smaller-scale map, which of the larger-scale maps depicts that part of the study area. (And perhaps see that map by clicking on the larger map?) Does anyone have any ideas I can implement this? I would prefer exporting the maps in PDF format, but, if I can't do all of the above with PDF, then any other format to which a map can be exported from ArcGIS, such as JPG or TIF, will work.
You should be able to create a PDF which does this.
What you need to do is render each page to a small image.
Then collect each of these images and add them as a mosaic to an index page.
Then put links from each small image back to the original PDF page.
If the hierarchy was more than one level deep you could repeat the process.
You need a PDF component to do this. What you want in terms of features is something which does decent PDF rendering. It's an easy thing to do badly and a difficult thing to do well.
ABCpdf .NET does good quality rendering so it's what I would suggest, but then I would because I work on it. :-)

Splitting Ruby strings into pages

I've been thinking about this problem for a while, and not quite sure the best way to go about it.
In a rails app I have books, which have many chapters, which have many sections. Chapters are basically just containers for sections, though may contain strings of text themselves. The sections hold most of the book text.
I'm planning to build an HTML 5 ebook reader that works in a mobile browser, and I don't want the user to have to scroll down -- I want the text to break at the end of the page.
I'd assumed using split might be the way to go, but I'm not sure there's a way to break at regular intervals? Would a javascript option work better here?
I'd looked at this: Dividing text article to smaller parts with paging in Ruby on Rails but can't feasibly insert manual break marks in the text, some of which are 90,000+ words.
Any ideas would be appreciated.
I think the main problem here is that the page length will depend on the device (and possibly the text size, if that is feature of your app). You should probably send large chunks that are sure to be at least say 5 pages long, at a time and then let the javascript do the paging. Rails has no access, nor should it, to the size of the display.
Text requires very little data, you shouldn't worry about transmitting more than you need or keeping too much in memory.
You may use blank line("\n" or "") as the separator.
I'd send enough of the page content down to easily fill a page and more, then use javascript on the client slide to remove sentences from the page until the scroll-bar disappears.
Resize.js is something similar I wrote a while ago. I wanted to enlarge/reduce the font size used on a screen until the screen was just full (for a dashboard monitor).. Yours would be similar, but instead of changing the font size, you are trimming off sentences.
Let me know if you can't see how to adapt this code.
Note: I would also make the javascript note the amount of text it ends up displaying, and pass that to the server in the 'next page' request, so the server knows where to start the next page from.

Optimize display of a large number of images (1000+) for performance and ease of use in a web application

In our web application, the users need to review a large number of images. This is my current layout. 20 images will be displayed at a time, with a pagination bar above the thumbnails. Clicking a thumbnail will show the enlarged image to the left. The enlarged image will follow the scrollbar so it's always visible. Quite simple actually.
I was wondering what the best interface would be in this scenario:
One option is to implement an infinite scroll script which will lazy load thumbnails as the user scrolls. The thumbnails not visible will be removed from the DOM. But my concern with this approach is the number of changes in the DOM slowing down the page.
Another option could be something like Google's Fastflip.
What do you think is the best approach for this application? Radical ideas welcomed.
I think the question you have to ask is: what action is user supposed to do? What's the purpose of the site?
If "review images" entails rating every image, I'd rather go with a Fastflip approach where the focus is on the single image. A thumbnail gallery will distract from the desired action and might result in a smaller amount of pics rated/reviewed.
If the focus should rather be on the comparison of a given image against others, I'd say try the gallery approach, although I wouldn't impement an infinite scroll with thumbnails because user can quickly get lost in the abundance of choices. I think a standard pagination (whether static or ajaxified) would be better if you choose to go this route.
Just my 2c.
If you paginate thumbnails, you can pre-generate a single image containing all thumbnails for each page, then use an image map to handle mouseover text and clicking. This will reduce the number of HTTP requests and possibly lead to fewer bytes sent. The separation distance between images should be minimized for this to be most efficient. This would have some disadvantages.
To reduce image download size at the expense of preprocessing, you can try to save each image in the format (PNG or JPG) most efficient for its contents using an algorithm like the one in ImageGuide. Similarly, if the images are poorly compressed (like JPEGs from a cell phone camera), they can be recompressed at the cost of some quality.
Once the site has some testers, you can analyze patterns in which images tend to be clicked (if a pattern exists) and preload the full-size images, or even pre-load all of them once the thumbs are loaded.
You might play with JPEG2000 images (you did say "radical ideas welcomed"), which thumb very easily, because the thumbnail and main image needn't be sent as if they are separate files. This is an advantage of the compression format -- it isn't the same as the hack of telling the browser to resize the full size image to represent its own thumbnail.
You can take a look at Google's WebP image format.
At the server side, a separate image server optimized for static content delivery, perhaps using NginX or the Tux webserver.
I would show the thumbnails, since the user might want to skip some of the pictures. I would also stay away of pagination in the terms of
<<first <previuos n of x next> last>>
and go for something more easy to implement and efficient. A
load x more pictures.
No infinite scroll whatsoever and why not, even no scroll at all. Just load x more, previous x.
Although this answer might be a bit unradical and boring, I'd go with exactly your suggestion of asynchronously loading the thumbnails (and of course main picture), if they come into view. A similar technique is used by Google+ in the pane to add persons to circles. This way, you keep the server resources and bandwidth on the pictures that are needed by the client. As Google+ shows, the operations on the DOM tree are fast enough and don't slow down a computer of the past years.
You might also prebuilt a few lines of the thumbnail table ahead with a dummy image (e.g. a "loading circle" animated gif) and replace the image. That way, the table in view is already built and does not need to be rerendered, as the flow elements following the table would have to be, if no images are in there during scrolling.
Instead of paginating the thumbnails (as suggested by your layout scheme), you could also think about letting users filter the images by tag, theme, category, size or any other way to find their results faster.

Very large images in web browser

We would like to display very large (50mb plus) images in Internet Explorer. We would like to avoid compression as compression algorithms are not what CSI would have us believe that they are and the resulting files are too lossy.
As a result, we have come up with two options: Silverlight Deep Zoom or a Flash based solution (such as Zoomify). The issue is that both of these require conversion to a tiled output and/or conversion to a specific file type (Zoomify supports a single proprietary file type, PFF).
What we are wondering is if a solution exists which will allow us to view the image without a conversion before hand.
PS: I know that you can write an application to tile the images (as needed or after the load process) and output them; however, we would like to do this without chopping up the file.
The tiled approach really is the right way to do it.
Your users don't want to download a 50mb file before they can start viewing the image. You don't want to spend the bandwidth to serve 50 megs to every user who might only view a fraction of your image.
If you serve the whole file, users will eventually be able to load and view it, but it won't run smoothly for most of them.
There is no simple non-tiled way to serve just a portion of an image unless you want to use a server-side library like imagemagik or PIL to extract a specific subset of the image for each user. You probably don't want to do that because it will place a significant load on your server.
Alternatively, you might use something like google's map tool to provide zooming and scaling. Some comments on doing that are available here:
http://webtide.wordpress.com/2008/08/27/custom-google-maps/
Take a look at OpenSeadragon. To make a image can work with OpenSeadragon, you should generate a zoomable image format which mentioned here. Then follow starting guide here
The browser isn't going to smoothly load a 50 meg file; if you don't chop it up, there's no reasonable way to make it not lag.
If you dont want to tile, you could have the server open the file and render a screen sized view of the image for display in the browser at the particular zoom resolution requested. This way you arent sending 50 meg files across the line when someone only wants to get an overview of the image. That is, the browser requests a set of coordinates and an output size in pixels, the server opens the larger image and creates a smaller image that fits the desired view, and sends that back to the web browser.
As far as compression, you say its too lossy, but if thats what you are seeing you are probably using the wrong compression algorithm or setting for the type of image you have. The jpg format has quality settings to control lossiness, and PNG compression is lossless (the pixels you get after decompressing are the exact values you had prior to compression). So consider changing what you are using as compression, and dont just rely on the default settings in an image editor.

PDF generation under ruby - block should not cut by page separator

My PDF consists of a number of blocks (actually, a list of quotations), they go one after another till the end of the document. If the text of a quotation
does not fit on the page, the whole quotation should start from the top of the next page, instead of being torn apart. How can I implement that on any library under ruby?
Try PrinceXML - this is a standalone executable that generates PDF out of HTML or XML. It supports a lot of special CSS properties that will even help you to control page breaks. Refer to http://www.princexml.com/doc/6.0/page-breaks/
This application is available for windows and linux. I was using it for generation of a pretty complicated PDF documents with headers and footers on every page except first one. And since you don't need to output a PDF with precise positioning of elements, it might be a perfect solution for you.
I haven't tried it, but in Prawn I would try using either the Document#text_box method or looking up the table methods and putting your text in cells with invisible borders. The documentation's unclear on how page break functionality fits in with the bounding box models, but it's worth a shot.
HTMLDoc which converts HTML to PDF has a page break facility.

Resources