Converting TIFF to PDF with GraphicsMagick MediaBox / CropBox resolution - imagemagick-convert

We are currently converting TIFF files to PDF using GraphicsMagick. The TIFF is coming from an eFAX and has a (pixel) resolution of 1728x2200.
If you do the conversion with tiff2pdf or just open it on Preview and convert export it to PDF, it is generated with a MediaBox value of 612x792 point, which is what is expected.
However graphics magick generates a MediaBox of 1728x4400 and a CropBox of 610x792. It all looks good if you open it on a PDF viewer because it's using the CropBox but if you're feeding it to GhostScript after, you don't get the Image on the full page but as a small square inside the document.
The lazy solutions would be to change for Tiff2PDF or add -dUseCropBox to our GhostScript command but I'd like to know what GraphicsMagick option should be used to have the PDF with the good MediaBox. It's like it doesn't understand that the resolution is in Pixels and not in Point. Hope somebody has insights

Related

converting pdf to image but after zooming in

This link shows how pdfs could be converted to images. Is there a way to zoom my pdfs before converting to images? In my project, i am converting pdfs to pngs and then using Python-tesseract library to extract text. I noticed that if I zoom pdfs and then save parts as pngs then OCR provides much better results. So is there a way to zoom pdfs before converting to pngs?
I think that raising the quality (resolution) of your image is a better solution than zooming into the pdf.
using pdf2image you can accomplish this quite easily:
install pdf2image: pip install pdf2image
then, in python, convert your pdf into a high quality image:
from pdf2image import convert_from_path
pages = convert_from_path('sample.pdf', 400) #400 is the Image quality in DPI (default 200)
pages[0].save("sample.png")
by playing around with the quality parameter you should get the result you desider

Dicom image not converting with dcmtk

The image http://www.barre.nom.fr/medical/samples/files/MR-MONO2-16-head.gz on http://www.barre.nom.fr/medical/samples/is not converting to other image formats. I tried following commands (after extracting the dicom file):
dcm2pnm --write-png MR-MONO2-16-head out.png
dcm2pnm +obr MR-MONO2-16-head out.bmp
dcm2pnm MR-MONO2-16-head out.pnm
It also did not work with dcmj2pnm and dcml2pnm. All of them just produce a gray box. The image otherwise is OK and is correctly read by proper dicom viewer softwares. Where is the problem and how can it be solved?
The problem is that no windowing settings are present in the header. You need to instruct dcm2pnm to calculate a window from the histogram (+Wm) or specify windowing values to apply.
dcm2pnm +Wm +obr MR-MONO2-16-head MR-MONO2-16-head.bmp
yields a bitmap image that looks fine to me.

Create small high quality PDF embedding optimized PNG?

I'm trying to create a small PDF file, embedding one optimized PNG image displayed as a header and footer on a 3 page PDF (same image must appear 6x in the PDF)
My optimized PNG image is only 2.3KB. It looks very sharp.
Failed with libreoffice
When I insert just one instance of the 2.3KB PNG image into a Libreoffice Writer doc containing only text, then export as PDF I can see that the image gets re-compressed to JPG and the resulting PDF file grows by about 40KB after adding the image. It also loses quality, the PNG also gets JPG fuzzy edges.
If I right click the image and select compression, there is no way to disable recompressing the image (it's already optimized better than libreoffice could do it) I've tried setting a compression level of 0,1,9 etc. Choosing JPG, no resize, lossless, etc but there was no improvement.
Failed with wkhtmltopdf
I also tried making a test page and used wkhtml2pdf but it did the same thing. Adding the low quality flag made no difference.
PDF Spec suggests PNG is supported?
From skimming the PDF spec, it looks like PNG images are supported.
Even plain text PDF files are surprisingly large
The disappointing thing is also when I take a 7KB HTML file which is basically just <html><body><p>foo...</p><p>bar...</p> (only about 15 paragraphs) with no CSS. The resulting 2 page PDF file is 30KB. Why should a 7kb (almost plain text) file become 30kb as a PDF?
Suggestions?
Can someone please suggest how to make a small PDF file in Linux?
I need to include 7KB of text and repeat one PNG image 6 times.
Manually or programatically. I'll take whatever I can get at this point.
PDF Spec suggests PNG is supported?
PNG isn't supported per se; PDF allows embedding JPEG images as-is, but not PNG images. PDF does borrow a set of features of the PNG format, however.
rinohtype (full disclosure: I'm the author) tries to embed as much as possible from PNG images as-is into the PDF. This does involve some bit-juggling to separate the alpha channel from the color data for example, but no reencoding of the image is performed. It does not (yet) support interlaced PNGs.
rinohtype should be able to do what you want to achieve. But please note that it currently is in a beta stage, so you might encounter some bugs.
Even plain text PDF files are surprisingly large
To keep the PDF size as small as possible, make sure not to embed/subset any of the fonts. Use only the fonts from the base 14 PDF fonts which are provided by PDF readers.
What you want is certainly achievable. Regarding the image quality, I would recommend making your image twice the size that you want it to actually display at in the PDF to keep it looking sharp.
As to the size, I've just modified a test in my PDF writer module (WIP..) to include a 7.2K png, 200px x 70px, in a PDF twice and the PDF came out at 6.8K 8). There's not much text included, but more text will only add what it's worth + a small percentage.
You can see the module and original test here.. https://github.com/DoccaPDF/docca-pdf-writer/blob/master/src/tests/writer.js#L40
That test adds ~112K of images to the PDF and results in a 103K PDF.
Of course not all images are created equal so you milage may vary..
*the images are only actually added to the PDF once, but are displayed multiple time.

Can a PNG image contain multiple pages?

On OSX I converted a multi-page PDF file to PNG and (somehow) it created a multi-page PNG file.
Is there an extension to the PNG format that allows this? Or is this not something I can validly create?
~~~~
To clarify, this is a PNG file, per the builtin file command and the identify command from imagemagick.
$ file algorithms-combined-print.png
algorithms-combined-print.png: PNG image data, 1275 x 1650, 8-bit/color RGBA, non-interlaced
$ identify algorithms-combined-print.png
algorithms-combined-print.png PNG 1275x1650 1275x1650+0+0 8-bit sRGB 3.537MB 0.000u 0:00.000
And here is a pastebin of the command identify -verbose algorithms-combined-print.png: http://pastebin.com/hw1yuRKa
What is notable from that output is that the pixel count is Number pixels: 2.104M which corresponds to one page. However, the file size is 3.537MB, which is clearly sufficient to hold all the pages.
Per request, here is the output of pngcheck: http://pastebin.com/aCRMEd9L
PNG does not support "multipage" images.
MNG is a PNG variant that supports multiple images - mostly for animations, but it's not a real PNG image (diffent signature/header), and has never become popular.
APNG is a similar attempt, but more focused on animations - it's more popular and alive, though it's less official - it's also PNG compatible (a standard PNG viewer, unaware of APNG, will display it as a single PNG image).
Another possible explanation is that your image is actually a TIFF image with a wrong .png extension, and the viewer ignores it.
The only way to know for sure is to look inside the image file itself (at least to the first bytes)
Update: given the pngcheck output, it seems to be a APNG file.

JPEG Shows in Firefox but Not IE8

I'm working on a Sidebar Gadget and cannot get my JPEGs to show up (PNGs work). When I try to open the file by itself in IE8 it doesn't work. Firefox, of course, can open it fine.
JPEG Details:
Dimensions: 1080X900
180 dpi
Bit depth 24
Color representation: uncalibrated
I've found some things talking about the images being compressed incorrectly (?) but I haven't been able to get it working...
Any clues?
IE8 drops support for CMYK JPEG and renders them as the infamous red X without so much as a warning.
If you have ImageMagick:
identify -verbose image.jpg
will show you the image colorspace. If it's CMYK, you can convert to RGB with:
convert broken.jpg -colorspace RGB fixed.jpg
If you need to do CMYK to RGB conversion on a whole batch of JPEG-images, this command may be helpful to you:
for i in *.jpg; do convert "$i" -colorspace RGB "$i"; done
PS: If you'd like to see what is going on, just add -verbose:
for i in *.jpg; do convert "$i" -colorspace RGB -verbose "$i"; done
I had a similar issue with IE8 not displaying two JPEG images. FF, Safari, Chrome all displayed them without complaint but IE acted as if the files were not there. I have no idea what was going on, but a quick image conversion to gif or png fixed the problem. Just another in a long line of confirmations that IE sucks.
Had similar problems with existing images, which will not show up in IE8.
Problem is, as converter42 says: CMYK-Images
Convert them to RGB colorspace and all is good
The Solution with the PNG is not the best, because PNG files can be MUUUCH larger than JPGS.
If you are using photoshop for creating the jpgs. Try the below.
Open the file and go to 'Image' menu
Go to Mode
Select RGB
Save and upload to server.
This should work.
Why are you dealing with the image at 180 dpi and not the 72dpi screen resolution? At screen resolution the image will be roughly double that size. Still, the size is manageable for any browser.
When creating a gadget, you should be using PNGs for all the elements of the gadgets. Are you having issues displaying JPEG photos?
Have you looked for the yellow bar at the top of IE that blocks certain suspicious content from being loaded (popups, activex, javascript, etc.)? If it appears, try telling it to "allow".
Lastly, what are you using to compress your images to JPEG?
EDIT: If you want to do batch conversion use the batch converter in photoshop or use the Actions panel to record the conversion process for a single image, then replay the action on an entire folder. Additionally, you can save this action to a "droplet" which is a small application containing the action that you can drop an image or folder on top to.
Alternatively, if you don't fell like learning Actions, XNView is an excellent image viewer and converter that supports something like 160 different image formats and can batch convert and batch rename huge lists of files.
I fixed this issue by opening the CMYK JPEG file in Windows Paint and then saving as a JPEG, which Paint encodes as RGB by default. Not a great solution because I'm sure that Paint's converter is not as robust as Photoshop's, but this can be a quick fix if the job needs to be done now and there's no access to the tools above.

Resources