Unit of measure for image dimensions in AsciiDoc/AsciiDoctor documents - image

There is no indication neither in the docs nor on first three pages of Google search results about what is the unit of measure of the images in AsciiDoc documents. This is significant topic, because AsciiDoc converts both to HTML (native unit is pixel) and to PDF (native unit is either cm or pt).
So, can someone answer and/or add a section to documentation about the following:
When I write image::path/to/image.png["Caption",300,200], what is those 300 and 200? Pixels?
When I produce a PDF from the AsciiDoc document, what DPI it uses, if we are using pixels? Can I control it in some configuration setting?
If those dimensions are in some abstract units, then, what is the widths of HTML and PDF pages in those units?
Is there any way to make output-agnostic AsciiDoc document with images? I don't want pixel-perfect identical layout in both formats, but it would be great to preserve the relative sizes of images at least.
I am using asciidoctor -b html for HTML output and asciidoctor-fopub for PDF. I will not use asciidoctor-pdf until it'll support at least the same feature set as fopub.
Thank you beforehand.

For html, it's pixels: https://github.com/asciidoctor/asciidoctor/blob/master/lib/asciidoctor/converter/html5.rb#L520-L523
For pdf, I'm not really sure, but it probably depends on Apache FOP, as that's what's being used. The xslt seems to indicate inches without any scaling. You'll have to dig into that whole mess to actually figure out what's being used and how to modify it.

Related

Why are images in pdf sometimes sliced into multiple images?

Noticed that images sometimes are sliced up in PDFs.
Steps:
insert an image with a high resoultion (3000x1800) into a .docx
use "Microsoft Print to PDF" option of Word to convert to PDF
extracting all images with pdfimages or pymupdf
Result:
Image is sliced horizontally into three images
Questions:
What exactly happens in the in the transition from .docx to pdf (or in generell in the process to pdf) that makes the converter slice it up into three images instead of one?
Do the individuell XObjects of the sliced images contain information which says that these three images belong to originally one?
How do I know how the images are sliced (horizontally / vertically) and what if originally there were two images inserted into the .docx file and both of them are sliced. Can you tell if slice x belongs to original image y or z?
So, as you have found out: because the code which generates the PDF choose to do so.
The technical reasons may be various - it could be that historically there were printers which would only have so much memory, and would need to get limiterd size-images when printing, and someone at some point when writing the PDF export code present in Microsoft Office choose to apply this limit.
Anyway, technically, as put in the comments, an image in a PDF file could be composed of unlimited smaller images collated together.
Now, the second part, and your actual question: to know whether images ibn a PDF file belong together in a single original image one would need a custom extractor tool to check the geometry of all images in the document and find out which images have no margins or boundaries with others - it would not be that hard to do for well behaved files (which we can't know if MS Office generated files are: there are ways to obfuscate image positioning by making it indirectly). The metadata in the image-parts may or may not contain information that would allow one to recompose the original image: it would be up to the code generating the PDF to include this metadata or not - but the geometry can't lie in this case: if the final document presents a single image visually, it is possible to detect that when fetching the images.

Does AsciiDoc support layers or text on images?

I'm making a poster (sort of) and would like to do these things, but I'm not sure if AsciiDoc or AsciiDoctor can do them, and if so, how:
Background image that can be stretched to the poster's dimensions
A rectangle with some transparency and a border, basically a bright frame, with text in it.
An image with text in it.
Text inside an image inside a rectangle.
(Bonus question: Is it possible to free-form specify where something goes, e.g. x=80%, y = 20% for something in the top right corner?)
I'm not sure that it makes sense to use AsciiDoc to source poster output, as opposed to a desktop publishing tool or a graphics program.
But if you are converting to HTML, you should be able to accomplish most of this with clever sourcing and some CSS/JavaScript on the front end. That is, you can source some of the metadata you want to impose on the final image, then have front-end code do the manipulation and imposition. For instance, you can provide a caption, classes, a title, and other info in the source, but AsciiDoc is intentionally agnostic about how that stuff is handled in output.
However, unless you need to create these things as part of technical documents, especially ones getting built/generated recuringly with automation, you're likely better off with a specialized tool.

Difference between text as image and graphics as image

The question seems to be weird, but I need to ask this, since I am witnessing a quite interesting output when I compare text as image and graphics as image.
Ideally I am in process of identifying an tool, or algorithm to compare two pdfs, generate output which will highlight the difference between them.
There are possibilities in pdfs, which will have text as image format (legacy text on papers, are converted to pdfs).
and we are doing migration of those legacy pdfs, and finally we are comparing with legacy and converted pdf output.
I am evaluating couple of tools like Adobe dc pro, i-net pdfc and power pdf etc, for comparing two pdfs.
While evaluating, I am able to see graphic images are getting compared(not accurate either) on either side of the pdfs. Where as text as images are completely ignored, unanimously same results in all the tools.
But I am more interested in text as image, since we deal more of legacy text pdfs.
Below, is attached graphic image comparison result, where it could able to capture the differences between the images.
But when I compare text image, differences are not highlighted in the tool.
What I understand from this, text is not compared as image graphics, and tool is completely ignoring the comparison. I would like have clarification whether my assumption is correct.
Secondly, I would like to know how to compare text image in pdfs to generate the differences?.
I'm working for the company that is author of i-net PDFC so I'll answer your first question as well:
Your assumption is correct. i-net PDFC is able to compare images and shapes, but it cannot detect if some content completely changed it's meaning, e.G. a line shape that is used to draw a letter or in your case an image that has to be recognized as text. Recognizing ASCII art as image won't work for the same reason either. Such cases will always be detected as differences even though their visual appearance is similar.
On your second question: Using an OCR conversion tool for one or both documents is a common solution to this problem. A simple image comparison of the compared pages in unlikely to work due to the different font styles and line wrappings in the converted file.
Please note that most OCR applications will use the rendered page images for the recognition. This may lead to incorrect recognition results even if there are no images in the PDF file.
i-net Software is aware of this general issue and an OCR module is currently in development. It'll provide an option to apply the recognition solely to the images in the PDF files.

AsciiDoc: How can I place graphical hints on an image

I am using AsciiDoc with Asciidoctor Gradle Plugin to generate technical documentation as PDF.
When I used M$ Word, I could easily place forms on an image, for example
colored rectangles,
boxes with numbers or
even links to sections within the document,
to better point out interesting areas within the image.
Example:
On the example image I have placed two rectangles and each one contains a link (starting with the word «Dialogbereich») leading to a other sections within the document.
Is it possible to achieve something like this (directly) in AsciiDoc?
Note that the answers to asciidoc: how to add callouts asciidoc to image do not apply here as the Asciidoctor PDF backend does not use DocBook to generate the PDF.
I know I could create a layered image in GIMP to at least place the rectangles. However, that wouldn't help me with the links.

Mosaicking PDF documents?

I have (or, rather, will soon have) a number of maps created in ArcGIS 10.0 and exported as PDF documents. The maps all show contiguous areas, being rather like the pages in a map book. There will also be a smaller-scale map depicting the entire area (let's call it the "study area"), but with less detail, rather like that page of a map atlas that shows what page depicts what area.
I wonder if there is any way to create thumbnails of the larger-scale maps and mosaic them such as to create an index map of the study area. A user would then be able to see, for a particular point on the smaller-scale map, which of the larger-scale maps depicts that part of the study area. (And perhaps see that map by clicking on the larger map?) Does anyone have any ideas I can implement this? I would prefer exporting the maps in PDF format, but, if I can't do all of the above with PDF, then any other format to which a map can be exported from ArcGIS, such as JPG or TIF, will work.
You should be able to create a PDF which does this.
What you need to do is render each page to a small image.
Then collect each of these images and add them as a mosaic to an index page.
Then put links from each small image back to the original PDF page.
If the hierarchy was more than one level deep you could repeat the process.
You need a PDF component to do this. What you want in terms of features is something which does decent PDF rendering. It's an easy thing to do badly and a difficult thing to do well.
ABCpdf .NET does good quality rendering so it's what I would suggest, but then I would because I work on it. :-)

Resources