Extract 2 PDF images and combine on one PDF page - imagemagick-convert

I have two scanned PDF pages with miscellaneous writings all around. There are two main segments in the middle of both pages that I want to extract and then place onto one page as a PDF image.
-------
| |
| xxx |
| xxx |
| |
--------
I need to extract just the middle portions marked by x's, then place those images one above the other to make a new PDF page. Because of the miscellaneous markings all around, I need to get as close as possible to the text boxes. The page size from identify is 612x786. If I know the best command to use, I can experiment to get as close as possible. I have no prior experience with ImageMagick and need to get this done as quickly as possible. Any help is greatly appreciated.

convert in1.pdf -crop widthxheight+woffset+hoffset! out1.pdf
convert in2.pdf -crop widthxheight+woffset+hoffset! out2.pdf
convert out1.pdf out2.pdf -append out3.pdf
The '!' sets the new canvas size to the widthxheight size.

Related

PostScript/PCL - Get document page info: page size, bw/color

I need to determine document page information from a postscript or a pcl file. Preferably in Java, but Ghostscript/Ghostpcl is as good as well.
What I tried to get the following info:
Page color
This can be achieved with ghostscript/ghostpcl using the device called inkcov.
PostScript
gswin64c.exe -dNOPAUSE -dBATCH -sDEVICE=inkcov -o- input.ps
PCL6
gpcl6win64 -dNOPAUSE -dBATCH -sDEVICE=inkcov -o- input.pcl
Page size
There is a device called bbox which gives me the boundary box per page for PostScript or PCL6 documents
PostScript
gswin64c.exe -dNOPAUSE -dBATCH -sDEVICE=bbox -o- input.ps
PCL6
gpcl6win64 -dNOPAUSE -dBATCH -sDEVICE=bbox -o- input.pcl
But in the end the boundary box is an inaccurate approximation for the page size.
I checked the following post, but the solution seems not to work with my ghostscript version 9.5
Getting the page sizes of a PostScript document
The bbox device should provide accurate information, in what way is it inaccurate ? I'd test it myself but you haven't supplied a file to demonstrate this.
You need to bear in mind that its possible some objects (eg images) might mark the page with white space. That still counts as marking the page for the purposes of the bbox device. If you want to only count non-white output samples, then you need to render the document (at the final resolution you intend to use) and actually count the non-white pixels. That's a potentially very slow operation because it needs to read every output colour sample of what could be a very large image.
Its not hard to code though, and you could use the inkcov device as a basis for doing both operations in the same pass.
Or you could just have GhostPDL deliver the rendered bitmap for you and code a solution to the bounding box using some other tool/language.
Ah, are you actually looking for the requested media size, rather than the Bounding Box ? That's not the same thing at all. The bounding box returns the smallest rectangle which encloses all the marks on the output, it doesn't tell you how big the requested media was. So a small rectangle in the bottom left would give you a tiny BBox, even if hte media was large.
You can reasonably easily get the media size requests from PostScript by writing a small PostScript program, but you can't do that with PCL. Perhaps the easiest solution in both cases is to render the content to a file at 72 dpi, then read the width/heiight of the rendered output and that gives you the media size in points.
Or use the pdfwrite device to convert the input into PDF and then the pdf_info.ps PostScript program can be used to give you the sizes of the pages from the PDF file.
Indeed I am looking for the requested media size, rather than the Bounding Box.
Maybe I should have been more specific.
Here is some ascii art to brighten up your day.
y
^
|
|
+-----------+
| +----+ |
| |bbox| |
| +----+ |
| |
| |
| |
| |
| |
+-----------+----> x
A simple document with some text in the upper left corner.
KenS: "The bounding box returns the smallest rectangle which encloses all the marks on the output, it doesn't tell you how big the requested media was."
So for the time being the "easiest" solution was really to transform the ps/pcl file into a pdf and read the media size from there.
Conversion to PDF
PostScript
gswin64c.exe -dBATCH -dNOPAUSE -dNOOUTERSAVE -sDEVICE=pdfwrite -sOutputFile=output.pdf input.ps
PCL6
gpcl6win64 -dBATCH -dNOPAUSE -dNOOUTERSAVE -sDEVICE=pdfwrite -sOutputFile=output.pdf input.pcl

Huge whitespace appearing when converting .png to .gif with bash convert

I have a directory called "plots_for_gifs", which contains 105 files, whose names are identical apart from they end in ...000.png, ...001.png ... etc. up to ...104.png. I am trying to convert them to a .gif using:
convert -density 150 -trim -delay 35 -loop 0 ./plots_for_gifs/*.png ./river_diff.gif
The image files are 491x411 pixels, however the gif produced is 7017x4958 pixels! Even though I am including "-trim", and the same occurs even if I add "-size 491x411"... any ideas?
I am running this in a bash shell in Ubuntu 16.04.3.
Mmmmm.... a couple of things.
You don't need -density at all with PNG files because it only sets the density to be used when rasterising vector files such as SVG. So, you can omit that.
If, as you say, your images are already the correct size, you don't need -trim. So, you can omit that too.
You don't need to prefix filenames with ./, as that just means "the current directory" which is the default anyway, so you can omit that.
Now to the actual problem. I guess your PNG files have been cropped from some larger images and have "remembered" their previous canvas size. The best way to make them forget, is to use +repage after loading them.
So, without seeing your files, I suspect you want something more like:
convert -delay 35 -loop 0 plots_for_gifs/*.png +repage river_diff.gif
If you find you do need -trim, add it into the above command before +repage.
If that doesn't work, please run the following command and paste the output in your original question - by clicking edit underneath it:
identify plots_for_gifs/*000.png

Bash for splitting and auto cropping PDF files

I have a long PDF file made of various numerized pages. In each page there's a small ticket that I would like to extract and save in a new different file.
I assume that I have to split the long PDF file in various pages and auto crop each ones to keep the small ticket.
Is there a bash script (or other) that could help me for that ?
Regards
Assuming you are cropping in the same place on each page, we can use cpdf to do this:
First, crop all the pages...
cpdf -crop "200 300 150 200" in.pdf -o out.pdf
Then, split them out...
cpdf -split out.pdf -o file%%%.pdf

Add background to bitonal djvu file

I have a few black and white djvu files that I would like to add a few different background images to at random. This is to make it seem more book like and I think looks better.
Using the command line I can extract each image and then write some code to add the background however this bloats the file a lot because of duplication. I would like to add the background to the file once and then include it using the INCL chunk for the other pages. However it is very confusing how to do this through the DjvuLibre command set.
The current djvu file also has a text layer that I would like to extract and then reapply.
I wrote some code to automate the steps here.
Which are listed below:
In order to successfully add a background image to a foreground image, I have to follow these steps (using a DOS Cmd window):
1- extract the bitomal RLE image from the Djvu File
ddjvu -format=rle -v myfile.djvu temp.rle
2- extract (or create) the background image. Be sure that the size of this image is equal or greater than the foreground image in order to have, after a reduction a integer:
e.g. I have a 2592 x 3508 300dpi foreground image, and I want a background image of 100dpi. So I create a 2592] x 3510 100dpi image (I added 2 pixels to the height in order to have 2594 modulo 3=0).
After a 1/3 resampling, I have a 864 x 1170 image.
3- (do something with this background image) and save it as myfile.ppm (24 bits per pixel)
4- join into an unique file the 2 images:
copy /b myfile.rle + myfile.ppm myfile.mix (using a brave old DOS command)
5- encode the new page into a DjVu file:
csepdjvu -vv -d 300 myfile.mix myNewFile.DjVu
Bingo: It works!!!

Divide large image into A4 sized images

I would like to split a large PNG file into A4 pages so they can be printed out easily.
I would like to use a Linux command line script to do this:
shell> split-into-a4-sized-pages some-big.png
I assume you have ImageMagick & pdfposter installed.
A) convert your .png to .pdf (using ImageMagick)
convert input0.png input1.pdf
B) tile your image using pdfposter:
pdfposter -s4 input1.pdf out.pdf
this command enlarges input0 exactly 4 times, print on the default A4
media, and let pdfposter determine the number of pages required.
Try using imagemagick's crop to your desired size.
Say you have a 640x962 image:
and you want to crop it into 4 320x481 images:
Use:
convert pexels-adonyi-gábor-1400172.jpg -crop 240x240+0+0 cropped.jpg
convert pexels-adonyi-gábor-1400172.jpg -crop 320x481+320+0 cropped.jpg
convert pexels-adonyi-gábor-1400172.jpg -crop 320x481+0+481 cropped.jpg
convert pexels-adonyi-gábor-1400172.jpg -crop 320x481+320+481 cropped.jpg
Now you'd have to find out how many pixels fit into an A4 page in your printer, and the dimensions of the image, and it is a very simple script from here.
Photo by Adonyi Gábor from Pexels.
You can use convert of ImageMagick to scale the image; there are probably other tools in ImageMagick to clip the image if you want.
I don't know of any ready-made command line tool to do this. Unless you use it all the time, ImageMagick may take longer to figure out the right combination of commands and options, than to write a quickie program.
An easy way, if you know Python at all, is write a few-line program using PIL (Python Imaging Library). To read an image takes one line. To extract chunks of some width and height at specified location to save as new image files, is also easy. Add a couple for loops to scan rows and columns of A4-sized chunks, and you're done.
If you don't know Python, just about all quick-to-write programming languages have a similar capability. The GD library comes to mind; it has bindings for several languages.
NetPBM's pamdice will do the splitting into multiple pages. You'll have to set the -width and -height options according to the DPI of your desired A4 images.
And you'll also have to convert the input image to netpbm format first with pngtopam:
pngtopam big.png | pamdice -outstem tile -height h -width w
That will leave you will a bunch of files called tile_x_y.ppm
Convert each one of those to PNG with pnmtopng

Resources