Mosaicking PDF documents? - image

I have (or, rather, will soon have) a number of maps created in ArcGIS 10.0 and exported as PDF documents. The maps all show contiguous areas, being rather like the pages in a map book. There will also be a smaller-scale map depicting the entire area (let's call it the "study area"), but with less detail, rather like that page of a map atlas that shows what page depicts what area.
I wonder if there is any way to create thumbnails of the larger-scale maps and mosaic them such as to create an index map of the study area. A user would then be able to see, for a particular point on the smaller-scale map, which of the larger-scale maps depicts that part of the study area. (And perhaps see that map by clicking on the larger map?) Does anyone have any ideas I can implement this? I would prefer exporting the maps in PDF format, but, if I can't do all of the above with PDF, then any other format to which a map can be exported from ArcGIS, such as JPG or TIF, will work.

You should be able to create a PDF which does this.
What you need to do is render each page to a small image.
Then collect each of these images and add them as a mosaic to an index page.
Then put links from each small image back to the original PDF page.
If the hierarchy was more than one level deep you could repeat the process.
You need a PDF component to do this. What you want in terms of features is something which does decent PDF rendering. It's an easy thing to do badly and a difficult thing to do well.
ABCpdf .NET does good quality rendering so it's what I would suggest, but then I would because I work on it. :-)

Related

Why are images in pdf sometimes sliced into multiple images?

Noticed that images sometimes are sliced up in PDFs.
Steps:
insert an image with a high resoultion (3000x1800) into a .docx
use "Microsoft Print to PDF" option of Word to convert to PDF
extracting all images with pdfimages or pymupdf
Result:
Image is sliced horizontally into three images
Questions:
What exactly happens in the in the transition from .docx to pdf (or in generell in the process to pdf) that makes the converter slice it up into three images instead of one?
Do the individuell XObjects of the sliced images contain information which says that these three images belong to originally one?
How do I know how the images are sliced (horizontally / vertically) and what if originally there were two images inserted into the .docx file and both of them are sliced. Can you tell if slice x belongs to original image y or z?
So, as you have found out: because the code which generates the PDF choose to do so.
The technical reasons may be various - it could be that historically there were printers which would only have so much memory, and would need to get limiterd size-images when printing, and someone at some point when writing the PDF export code present in Microsoft Office choose to apply this limit.
Anyway, technically, as put in the comments, an image in a PDF file could be composed of unlimited smaller images collated together.
Now, the second part, and your actual question: to know whether images ibn a PDF file belong together in a single original image one would need a custom extractor tool to check the geometry of all images in the document and find out which images have no margins or boundaries with others - it would not be that hard to do for well behaved files (which we can't know if MS Office generated files are: there are ways to obfuscate image positioning by making it indirectly). The metadata in the image-parts may or may not contain information that would allow one to recompose the original image: it would be up to the code generating the PDF to include this metadata or not - but the geometry can't lie in this case: if the final document presents a single image visually, it is possible to detect that when fetching the images.

Best way to create a Layout and generate a PNG from it. Example inside

I am searching for a performant way to generate a PNG based on a layout. These layouts will mostly consist of text and a 1-2 icons. The Datasource for these informations is JSON. However, the JSON won't be normalized to fit the Layout/Screen size. Let me clarify: The JSON will contain an attribute "Title". The title may be too long, so the font size has to be decreased. Or the description has too many attributes and only some of them need to be displayed, and so on.
We currently have a system in place for creating these layouts and generating a PNG, but creating new layouts is very time consuming and frankly speaking, a pain. However, the current solution is extremely performant, as it can generate a PNG in around 1-2ms. For my PoC to be deemed successful, i need to reach 10ms or lower. If there is a solution that takes slightly longer to generate, but can be scaled horizontally, that's fine as well.
TL;DR:
I'm searching for a way to generate a PNG based on a layout i create. The PNG generation needs to be performant (< 10ms) and the implementation of new layouts should be as hassle free as possible.
What technologies are suited for this use case?
Here is an example, of what a layout might look like:
Edit: I can't post images yet, but please search for "electronic shelf labeling" on google images.
Also:
I've already made a similar question yesterday, but it was pointed out that my way of trying to achieve this, probably won't lead to success. Original Post

What is the best way to convert PDF pairs into single pages?

I need to take an existing PDF (created with Prawn), and combine pairs after page 1 (the cover) into single pages. I would also like to add a vertical line in the center of the joined pages. The pages are to be printed in books, and the goal is to make single PDF pages that are similar to the side by side view in Acrobat. I know I can convert them to images, do what I need to with ImageMagick, then put them back into a PDF format, but I am trying to minimize the number of conversions so I can save as much quality as possible.
I also realize I can do this from the start with Prawn, but I am trying to avoid that as it would require a very large change to our application.
It is possible to do this with Ghostscript and the pdfwrite device, but its by no means simple. You need to write some PostScript to do the job.
You would need to add BeginPage and EndPage procedures, the BeginPage would need to check the current page number (and you would need to track this yourself). If its page 1, process normally. If its an even page, throw away the current PageSize and replace it with one which covers a pair of pages. Process the even page. Do not transmit the content.
If the page is odd (and not 1) then translate the origin so that its offset to the right by the width of the page. Process the odd page. use moveto, lineto and stroke to draw the required line between the two pages. Transmit the page.
This assumes that all the pages are the same size and orientation, or least that the sizes of each page are known in advance. It would be possible to retrieve those programmatically as well, but more complex.
Its definitely non-trivial, but if you rummage through my answers in the PostScript tags and look for anything with the word 'imposition' you'll probably find program outlines to do the job.
I did a quick look and here's an answer I wrote some time back. It uses a different approach to that outlined above, it copies some of the guts of the PDF interpreter and repurposes them. It does a chunk of what you want though.

Collapsable images inside pdf

Is it possible to create a collapsible image (similar to reddit) inside a PDF document?
The book is around 500 pages and has many large images. Hyperlinking wouldn't work very well because it will have to lead the reader to another page. Going back and finding where you stopped can be annoying.
If this isn't possible, I'm open to other suggestions. The goal is streamlined reading and viewing images.

Optimize display of a large number of images (1000+) for performance and ease of use in a web application

In our web application, the users need to review a large number of images. This is my current layout. 20 images will be displayed at a time, with a pagination bar above the thumbnails. Clicking a thumbnail will show the enlarged image to the left. The enlarged image will follow the scrollbar so it's always visible. Quite simple actually.
I was wondering what the best interface would be in this scenario:
One option is to implement an infinite scroll script which will lazy load thumbnails as the user scrolls. The thumbnails not visible will be removed from the DOM. But my concern with this approach is the number of changes in the DOM slowing down the page.
Another option could be something like Google's Fastflip.
What do you think is the best approach for this application? Radical ideas welcomed.
I think the question you have to ask is: what action is user supposed to do? What's the purpose of the site?
If "review images" entails rating every image, I'd rather go with a Fastflip approach where the focus is on the single image. A thumbnail gallery will distract from the desired action and might result in a smaller amount of pics rated/reviewed.
If the focus should rather be on the comparison of a given image against others, I'd say try the gallery approach, although I wouldn't impement an infinite scroll with thumbnails because user can quickly get lost in the abundance of choices. I think a standard pagination (whether static or ajaxified) would be better if you choose to go this route.
Just my 2c.
If you paginate thumbnails, you can pre-generate a single image containing all thumbnails for each page, then use an image map to handle mouseover text and clicking. This will reduce the number of HTTP requests and possibly lead to fewer bytes sent. The separation distance between images should be minimized for this to be most efficient. This would have some disadvantages.
To reduce image download size at the expense of preprocessing, you can try to save each image in the format (PNG or JPG) most efficient for its contents using an algorithm like the one in ImageGuide. Similarly, if the images are poorly compressed (like JPEGs from a cell phone camera), they can be recompressed at the cost of some quality.
Once the site has some testers, you can analyze patterns in which images tend to be clicked (if a pattern exists) and preload the full-size images, or even pre-load all of them once the thumbs are loaded.
You might play with JPEG2000 images (you did say "radical ideas welcomed"), which thumb very easily, because the thumbnail and main image needn't be sent as if they are separate files. This is an advantage of the compression format -- it isn't the same as the hack of telling the browser to resize the full size image to represent its own thumbnail.
You can take a look at Google's WebP image format.
At the server side, a separate image server optimized for static content delivery, perhaps using NginX or the Tux webserver.
I would show the thumbnails, since the user might want to skip some of the pictures. I would also stay away of pagination in the terms of
<<first <previuos n of x next> last>>
and go for something more easy to implement and efficient. A
load x more pictures.
No infinite scroll whatsoever and why not, even no scroll at all. Just load x more, previous x.
Although this answer might be a bit unradical and boring, I'd go with exactly your suggestion of asynchronously loading the thumbnails (and of course main picture), if they come into view. A similar technique is used by Google+ in the pane to add persons to circles. This way, you keep the server resources and bandwidth on the pictures that are needed by the client. As Google+ shows, the operations on the DOM tree are fast enough and don't slow down a computer of the past years.
You might also prebuilt a few lines of the thumbnail table ahead with a dummy image (e.g. a "loading circle" animated gif) and replace the image. That way, the table in view is already built and does not need to be rerendered, as the flow elements following the table would have to be, if no images are in there during scrolling.
Instead of paginating the thumbnails (as suggested by your layout scheme), you could also think about letting users filter the images by tag, theme, category, size or any other way to find their results faster.

Resources