Bash for splitting and auto cropping PDF files - bash

I have a long PDF file made of various numerized pages. In each page there's a small ticket that I would like to extract and save in a new different file.
I assume that I have to split the long PDF file in various pages and auto crop each ones to keep the small ticket.
Is there a bash script (or other) that could help me for that ?
Regards

Assuming you are cropping in the same place on each page, we can use cpdf to do this:
First, crop all the pages...
cpdf -crop "200 300 150 200" in.pdf -o out.pdf
Then, split them out...
cpdf -split out.pdf -o file%%%.pdf

Related

Image compression for website

I develop a website that uses a lot of images and it's starting to get really slow. In some pages I have to show hundreds of images, so it gets messy.
I have a structure of folders with different ID's, each one with 4-5 images. And each image is around 300Kb!! I realize that it's not enough compression for the web.
I need some tips to compress all this pictures. The problem with the structure folders is that I need to open each folder and change the pictures inside and I can't just compress them all at the same time with some compression software.
Also, is there a way to compress the pictures on the server side before it goes to the client side?
I'm definitely not an expert in image compression so I need a lot of help!
Thanks
if you are using Mac OS or Linux you can use command line:
//single image
convert -quality 70 image.jpg image.jpg
//folder
for f in *.jpg; do convert -quality 70 $f $f; done
For Windows you have to download external mass edit program
Also if you have a alot of images on a single page you could use 'lazy loading' technique. Loading a part of your page on demand. So you will load couple of images on page load and if the user scrolls down ajax will fire with request for another junk of images.
Give the image some bash love.
find image/path -type f -name '*.PNG' -exec sips -s formatOptions 40 {} \;
This will go throw every subfolder you have and make all of them smaller instantly.

ghostscript option to make a pdf with flattened images

Is there an option to print a pdf in ghostscript as images?
I can use:
gs -dNOPAUSE -dBATCH -sDEVICE=pngalpha -r300 -sOutputFile=p%03d.png my.pdf
Then use imagemagick to make a pdf out of them with:
convert *.png new.pdf
PDF printers seem to have an option that does the same thing that is a checkbox that says "print as image". I could not find anything in the ghostscript docs that sounded like that was an option. There may be a term for it that I just don't know to look for.
It is kind of hard to explain why you would want to take a pdf document that is text and turn it into a document of images of text that is 4 times the size of the original but that is what I want to do.
Currently the only way to do that would be to start with a PDF which contains transparency operations, and select a CompatibilityLevel of 1.3 or less.
I have an idea to implement this feature, but I have not had time to work on it.
You can do it as a 2-pass approach using Ghostscript to render an image, then using the view* scripts to read the image back into Ghostscript and produce a PDF. No better than using convert of course.

Extract 2nd page of each document and merge into a single document with Ghostscript

I have a set of pdf files from which I would like to:
extract the 2nd page of each
merge all the 2nd pages into a single document
I know how to do each of these independently with Ghostscript (generating a bunch of temporary 1-page PDF files on the way), but is there any way to do it in one command?
What have you tried ?
Provided you want the same page(s) from every file then this:
gs -sDEVICE=pdfwrite -o out.pdf \
-dFirstPage=2 -dLastPage=2 \
input1.pdf input2.pdf
should work.
Please note that my usual caveats apply; pdfwrite is not 'manipulating' the source PDF files, it is fully interpreting them to produce lists of drawing primitives, which are then reassembled to form a brand new PDF file. At no point are you 'extracting' or 'merging' PDF files, the content of the output file(s) bears no relation, other than visual appearance, to the input file(s).

Preserving page dimensions when converting PDF to TIFF with Ghostscript

I'm converting a folder full of print-ready PDFs into 600 dpi TIFFs, using CCITT Group IV compression (bitonal) on the TIFFs (one TIFF per page). My problem is that the PDFs, which begin with a page dimension of 9x6 inches, are converted into 8.5x11 inch TIFFs (5100 x 6600 px at 600 dpi). Here is the command I'm using to convert PDFs to TIFF files (using bash in Mac OS X):
for folder in $(find * -maxdepth 0 -type d ); \
do gs -dBATCH -dNOPAUSE -q -sDEVICE=tiffg4 -r600 "-sOutputFile=$folder/tiff/%04d.tif" "$folder/pdf/$folder.pdf";
done;
Is there a way to preserve the original page dimensions in my output files?
Thanks in advance!
Ghostscript will preserve the media size of the PDF w3hen creating the TIFF files, so if its not what you expected then either its a bug (you don't say which version of GS you are using, so it might be something that's been fixed) or, more likely, the PDF file has a CropBox which is different to the MediaBox. Screen viewers tend to use the CropBox, Ghostscript defaults to using the MediaBox (because it is at heart a printing application).
You can use the -dUseCropBox switch to have Ghostscript use the CropBox instead, if this is the problem. If it isn't I'd need to see a specimen PDF file. Probably the easiest way is to open a bug report at bugs.ghostscript.com where you can attach a file.

How can I generate a thumbnail of a specific page in a PDF on the command line in OS X?

I need to be able to generate a png thumbnail of a specific page of a PDF document in OS X.
I can use 'qlmanage -p MyFile.pdf -o outputDir -s1000' to get a 1000-pixel wide PNG of the first page. This works perfectly, and is almost exactly what I need. The only missing piece is being able to specify a certain page number of the PDF.
Can this be done with qlmanage, or some other command-line utility?
ImageMagick ought to be able to help:
convert -resize 10000x10000 MyFile.pdf[2] MyOutput.png
Where 2 is the page number. Enjoy!
You can use Aspose.Pdf to generate a thumbnail (or image) of any page. Very reliable and generates a perfect image (as good as Acrobat). Only downside is it takes ~20 SECONDS to generate a single thumbnail. And that sucks. Code is as follows:
Document document = new Document(pdfPath);
Page page = document.Pages[pageNum];
document.RemoveMetadata();
page.Flatten();
page.SendTo(new PngDevice(page.PageInfo.Width, page.PageInfo.Height), pngPath);
document.Dispose();

Resources