i've got a hundred pages long high quality PDF (66MB) that needs to be converted to TIFF format (300 dpi, as high quality as possible :P).
I've tried Imagemagick/ghostscript, jpedal, Poppler, XPDF but they all produce different results due to the strange gradient of the PDF itself (blame the designers) and some actually take forever....
Does any one know any alternatives that i can try?
Thanks in advance,
M.
The GIMP does a pretty good job reading PDF files. Here's a Windows installer.
The GIMP will load each page of the PDF into its own layer, so you'll have to export each one as a TIFF. Luckily, this can be automated. I have no experience with the linked script, but if it doesn't suit your needs then it shouldn't be hard to modify it or write a quick script of your own to export each layer as a TIFF image.
Try using the Adobe Reader to flatten your PDF to PostScript (newer versions do that from the command line, but since it's just one file, you can print to a PS file) before going through Ghostscript. That might remove some weird PDF issues.
Related
You know, computer stores images as channels and pixels in those channels. And pixel values are like "00110101" which fills 8 bits at memory. I want to know truly where that bits stored at memory, and how can i make operations on them.
Thanks!
Well, the standard book is Digital Image Processing by Gonzalez and Woods.
Another book, where you can pick up the PDF for free is Image Processing in C by Dwayne Philips - PDF here.
First, you need to get a decent C compiler and development system - personally I use Mac OSX, but I guess you would want Visual Studio free edition on Windows.
Then you need to get started with some simple reading and writing of files and memory allocation. I would go with greyscale images of the NetPBM format - probably just PGM files - described here as they are the easiest. You can download the NetPBM programs and run them in a Windows Command Prompt and see how they work and try and implement them yourself in C. You can also download ImageMagick for Windows and try converting images from colour to greyscale and resizing them like this:
convert input.png -colorspace gray result.jpg
convert input.tif -resize 400x400 result.pgm
When you have got that, I would move on to colour PPM format and then maybe PNG and/or JPEG. Remember there are libraries for TIF/JPEG/PNG/BMP so don't be afraid to use them.
Finally, move on to displaying images yourself with Windows GDI etc.
Come back to StackOverflow if you get stuck - questions are free!
tl;dr wildly different with different encodings/filesystems/os'es/drivers
Well that depends on the image format. BMP is one of the easier formats, details on what these files look like can be found on for instance wiki
And to answer "where its stored", it is stored on permanent storage (hardrive/ssd), where exactly depends on the filesystem (FAT/NTFS/EXT etc).
When an image is to be displayed, its read into memory, where it can be manipulated and through some apis this data can be put into a memory region specifically meant to display the current images on you screen.
I use Abbyy FineReader for ScanSnap to OCR a couple of scanned PDF files. The software claims it retains the original PDF images. The PDF file sizes pre-OCR and post-OCR are almost identical, which is good.
After the software is done, all PDF images appear anti-aliased in Acrobat X. Page navigation is much slower than before, and when I zoom in/out, the images first go to what looks like the pre-anti-aliasing version before quickly changing to anti-aliased images.
Left: Scanned PDF / Right: after OCR with Abbyy
I would like to get the original images without anti-aliasing back. Interestingly, when I open a single page from the anti-aliased PDF in Photoshop, there is no anti-aliasing and the image looks like the left one.
My limited PDF programming experience leads me to believe that Abbyy likely sets some kind of anti-alias flag for each image during OCR processing. How do I un-set this flag?
Any pointers to useful ideas would be much appreciated.
After the software is done, all PDF images appear anti-aliased in Acrobat X. Page navigation is much slower than before, and when I zoom in/out, the images first go to what looks like the pre-anti-aliasing version before quickly changing to anti-aliased images.
Actually in the original file 2013_11_15_22_51_31.pdf contains a JPEG image while the OCR'ed file 2013_11_15_22_51_31_OCR.pdf contains a JPEG2000 image.
Comparing them in third party viewers, it becomes clear that the image in the OCR'ed file is not inherently anti-alias'ed. Furthermore there is no evident flag in the PDF instructing PDF viewers to apply anti-aliasing to the JPEG2000 image. Thus, Adobe Reader seems to automatically render JPEG and JPEG2000 images differently, applying anti-aliasing to the latter but not to the former.
Comparing both images in detail, though, it becomes clear that these images are not identical but instead the image in the OCR'ed PDF is slightly rotated.
I assume Abbyy FineReader recognized that the original scanned image is not correctly oriented. Thus, it rotated it slightly to correct this orientation.
Thus, replacing the image in the OCR'ed version with the one from the original one is no option: Due to the rotation the OCR information would partially be somewhat off.
What you might want to try is to recode the JPEG2000 image to JPEG and replace the image in the OCR'ed version with this recoded one. This will mean some loss of quality but most likely you can get rid of the anti-aliasing this way.
Be aware, though, that the JPEG2000 image is slightly larger than the JPEG image to accomodate for the rotation.
PS: As #VadimR pointed out, there is indeed an /Interpolate true entry in the image dictionary of the OCR-ed version I missed when looking at the file. This does not seem to be the major issue slowing down the rendering.
There is /Interpolate true entry in image dictionary of OCR-ed version, and that's what causes 'anti-aliasing'. Whether that (and not JPEG2000 instead of JPEG compression) is a cause of slow-down, you check on large enough files.
To un-set this key, the best would be to turn it off while creating a file, and if that's not possible, to write and run a small program in suitable language.
But, since your file doesn't sport 'compressed objects' and offending key is in plain view inside a file, in the spirit of 'job done quickly' you can simply process your file e.g. like this:
perl -M-encoding -0777pe "s!/Interpolate true!' 'x17!ge" <in.pdf >out.pdf
I am printing an image (monochrome) from a portable thermal printer (model: Brother PJ-623) using Raspberry Pi. Image is stored in eps format.
Problem: It takes ~9-12 seconds before the printing is started. I think may be the issue is in format of the image. So is there any format which the printer can read faster than eps format?
Or any other solution if possible?
EDIT: I'm using gnuplot in Qt-creator to create the image file.
Any help would be highly appreciated.
Best regards,
Hammad Tariq
I tried other format like png and pdf using following command in GNUPLOT:
set term png size H,W
where H, W were resolution of the image generated.
But they were not helping in the start as the print contained distortions. It turned out the problem was with the dpi.
So after some research I found out following link:
GNUPLOT pdf terminal
I found the following commands using the above link:
set terminal postscript enhanced color
set output '| ps2pdf - plot.pdf'
It converts the ps file to pdf with the same result in pdf as was in the ps file.
And it was taking only 3seconds for the printer to read.
I hope it would help someone in future.
Regards,
Hammad
If your EPS if vector based you don't really have a choice in format. The only other option is a PDF, which is likely to be less efficient than an EPS. It's likely that the driver you're using is not very efficient at printing vector graphics. You can do a test by converting the image to a high resolution JPEG and seeing if there's a difference in print time.
However, if you're using the EPS format for raster images then you're better off using a high quality JPEG file anyways. EPS is inefficient for storing pixel information compared to TIFF and JPEG.
Of course I have to wonder what's so bad about a 10 second print time. Is it just a curiosity thing, or is it critical that these print super fast?
Is there any way to stamp or overlap a tiff image on a existing PDF file and output the result using Ghostscript?
I have two PDF which i want to merge in a result PDF with one over the other using ghostscript. I want to know if this can be done and how, or if it may work with one PDF as tiff image on top of the base PDF.
Can ghostscript make this stamp using layers in the PDF?
Thank you for your answers
The pdfwrite device in Ghostscript doesn't really support layers, so you can't use that. Also its unclear why you think layers would help.
TIFF isn't part of PostScript (or PDF), so you can't directly read a TIFF file into GS. I have elsewhere posted a PostScript program which reads TIFF files and renders them for output. You could use that to read a TIFF file.
However, you would have to mess about with either the PDF interpreter or a custom EndPage procedure in order to read and render the TIFF file. And unless you take specific kinds of action, it will be opaque, which may well not be what you want.
The Ghostscript PDF interpreter doesn't really lend itself to this kind of manipulation, have you considered using pdftk instead ?
What is the preferred way to convert various images, bitmap and vector, for use in a LaTeX and PDFLaTeX document?
There are many ways to do this, some make use of standard inclusions in the various LaTeX packages, others give better results.
You can include a PDF image directly into a LaTeX document if you want to produce your final output using pdflatex, but not if you want to produce a dvi file.
pdflatex can use PDF, PNG, and JPEG
latex/dvips can use PS, EPS
See more details:
Including images in LaTeX files
Watch what you name graphics files in LaTeX
I convert bitmaps into PNG, and vector graphics (e.g. SVG) into PDF. pdflatex understand both PNG and PDF.
If you have an image "as PDF", and you don't want to include it as pdf, you may want to extract the complete image data first with pdfimages. Other conversions may render the image only with reduced resolution.
My current preferred way is using bmeps and epstopdf included in MikTeX. For the generation of pdf and eps versions of a png.
In a file called convertimage.bat,
bmeps -p3 -c -e8f -tpng %1.png > %1.eps
epstopdf %1.eps
Use by including in the path and writing convertimage.bat filenameminusextension
Include in the documents using,
\begin{figure}[h]
\begin{center}
\includegraphics[scale=0.25]{path/to/fileminuxextension}
\caption{My caption here}
\label{somelabelforreference}
\end{center}
\end{figure}
I only use Encapsulated PostScript (.eps) figures (converting bitmaps with NetPBM first), since I always use dvips + ps2pdf anyway, and then I do \includegraphics{file}.
As John D. Cook says, your available image formats depend on whether you are using latex or pdflatex.
I find ImageMagick a useful tool for converting images between formats. Handles bitmap images, plus ps/pdf/eps (with ghostscript) and a zillion others. Available through apt, macports, etc.
I use a mac so I use GraphicConverter to load images and export as PDFs.
When I draw diagrams, I use Omnigraffle which lets me export as PDFs.
On windows I used to use Visio which supported EPSs which I also had no problems embedding.
The basic issues are that a) you want to handle raster and vector images differently and b) this introduces potential pitfalls.
The "right" thing to do depends a bit on your final output.
If your final output is going to be a .pdf file, and you don't need pstricks or anything else that these days you're probably better off just using pdflatex to directly produce the file.
In this case:
store all vector figures as .pdf
store all raster figures as .png (or jpeg if they were originally jpeg)
use graphicx package and \includegraphics{filename-without-suffix}
If you don't do the above, your raster figures will be converted to jpegs and may gain compression artifacts. png is the best bet if you can choose output.
If you are headed for .dvi file you're going to want .eps for everything. (You can gzip these files as long as you generate a bounding box file).
If you're careful you can do both. I store all vector figures as (compressed) .eps because there are a few things .pdf can't do that .eps can. I store all raster figures as .png. Using make, I can have temporary copies of these canonical versions generated on the fly for .dvi or .pdf output as needed.
Someone above pointed out the filename issue. You want to avoid "." in the file names, and avoid suffixes always in your latex file itself.
I always include images in PNG format.
If you compile your code with pdflatex, then you also can use the \includegraphics to include images in pdf (you have to include the package graphix