PDF to JPEG conversion using Ghostscript - image

I'm using Ghostscript to convert my PDF files to JPEGs with Ghostscript which works great.
For my output images I'm using %03d in the file name, so the file names come out 001, 002 ... and so on according to the page numbers.
But i want in some case the numbers to start from an higher number.
For example I process a file with two pages so the output images are page001.jpg, page002.jpg
Now I want to process another PDF and instead of replacing those files, I want to create page003.jpg, page004.jpg.
How can this be done?
This is my full command line I'm using now:
'C:\gs\gs9.14\bin \gswin64c -dNOPAUSE -sDEVICE=png16m \
-sOutputFile=page-%03d.jpg -r100x100 -q' . $pdf_file. '-c quit'

Here is a workaround trick, that you could use:
gswin64c.exe ^
-sDEVICE=png16m ^
-sOutputFile=page-%03d.jpg ^
-r100x100 ^
-c "showpage showpage" ^
-f filename.pdf
The -c "showpage showpage" inserts two empty pages into the output. The output files will be named
page-001.jpg + page-002.jpg + page-003.jpg + page-004.jpg
So the first two are white-only JPEGs and should be deleted afterwards.
You can extend this command with any number of empty pages you want.
Update
Of course, if you know in advance that you want to convert several different PDF files to images where you want the counting for a new PDF to continue exactly from where the last PDF ended, you could do this:
gswin64c.exe ^
-sDEVICE=jpeg ^
-sOutputFile=page-%03d.jpg ^
-r100x100 ^
-f file1.pdf ^
-f file2.pdf ^
-f file3.pdf ^
-f [...]
BTW, your original command requests .jpg file suffixes, while the Ghostscript device is png16m. This doesn't match. Initially I blindly copied your command, but now I've corrected it.

You cannot do that with the standard version of Ghostscript, the output file numbers are given as the emitted page number (so if you had a 10 page file, with /NumCOpies 2, you would get files numbered 0 to 19).
Of course, you can process the two files on the same command line, I think that will give you the second file with page numbers beginning where the first set left off.
Otherwise you will have to modify the source code of the Ghostscript device.

Related

Add part of filename as PDF metadata using bash script and exiftool

I have about 600 books in PDF format where the filename is in the format:
AuthorForename AuthorSurname - Title (Date).pdf
For example:
Foo Z. Bar - Writing Scripts for Idiots (2017)
Bar Foo - Fun with PDFs (2016)
The metadata is unfortunately missing for pretty much all of them so when I import them into Calibre the Author field is blank.
I'm trying to write a script that will take everything that appears before the '-', removes the trailing space, and then adds it as the author in the PDF metadata using exiftool.
So far I have the following:
for i in "*.pdf";
do exiftool -author=$(echo $i | sed 's/-.*//' | sed 's/[ \t]*$//') "$i";
done
When trying to run it, however, the following is returned:
Error: File not found - Z.
Error: File not found - Bar
Error: File not found - *.pdf
0 image files updated
3 files weren't updated due to errors
What about the -author= phrase is breaking here? Please could someone enlighten me?
You don't need to script this. In fact, doing so will be much slower than letting exiftool do it by itself as you would require exiftool to startup once for every file.
Try this
exiftool -ext pdf '-author<${filename;s/\s+-.*//}' /path/to/target/directory
Breakdown:
-ext pdf process only PDF files
-author the tag to copy to
< The copy from another tag option. In this case, the filename will be treated as a pseudo-tag
${filename;s/\s+-.*//} Copying from the filename, but first performing a regex on it. In this case, looking for 1 or more spaces, a dash, and the rest of the name and removing it.
Add -r if you want to recurse into subdirectories. Add -overwrite_original to avoid making backupfiles with _original added to the filename.
The error with your first command was that the value you wanted to assign had spaces in it and needed to be enclosed by quotes.

Append to PDF file with convert in bash

I'm basically downloading some images from a website using wget to then append them into a PDF file using the command line program "convert". But this last thing seems not to work.
I'm getting all the .jpg images and storing them into one folder with no problems, but when I try to merge them into the PDF file, it always reminds with the last appended image. I've read of the convert's -append argument, but it still won't work.
This is how my code looks like:
for file in *.jpg
do
convert "${file}" -append "myfile.pdf"
done
But as logical as it seems, myfile.pdf always ends up having only the last jpg appended image.
I know that using convert like:
convert img1.jpg img2.jpg img3.jpg myfile.pdf
Would do the trick. But as I don't know how many images will I have in the download directory, I cannot hardcode the arguments, so I guess a loop for each image in that directory as I'm trying would be the best solution.
Does anybody know how to achieve my goal? Any help will be much appreciated.
Thanks in advance.
bash automatically expands wildcard arguments (unless if they are quoted or escaped) so even if convert does not support wildcard expansion, bash does. So you could just do
convert *.jpg myfile.pdf
note that if there are too many files, this can result with "arglist too long". But that should be OK for several hundred files.
If your file name follows a pattern like img1.jpg img2.jpg ..... . Then you may also use bash range:
convert img{1..5}.jpg
this will work for img1.jpg img2.jpg img3.jpg img4.jpg img5.jpg . You can change your range as per your requirement.
For converting all the jpg files , answer is already present in other answer by #Jean-François Fabre.

ffmpeg scale image with unrelated number in the name

I'm attempting to scale image(s) that have any given name. I've found my script is failing on files that have numbers in the name. "0% financing", "24 Hour", etc. The other files are working fine, so it's not the script itself. I get:
[image2 # 0x7fbce2008000] Could find no file with path '/path/to/0% image.jpeg' and index in the range 0-4
How can I tell ffmpeg that this isn't a search pattern, or sequential numbered files ? There's only 1 jpeg in each location, and I do not have control of the file names to change them.
-update-
I've figured out the command
ffmpeg -pattern_type none -i /path/to/0%\ image/0%\ image.jpeg -vf scale=320:-1 /path/to/0%\ image/0%\ image.out.jpeg
gets me past the initial problem, but the output won't work because I can't get it now to escape the final argument. If I am in the directory (so no path) and change the output to just out.jpeg it will work, so I'm confident the first error is corrected.
Now I need to figure out how to use spaces in the path in the output argument? I've tried surrounding it in quotes:
"0% image.out.jpeg"
regular escapes:
0%\ image.out.jpeg
and surrounding it in quotes and using escapes at the same time:
"0%\ image.out.jpeg"

Ghostscript PDF to PNG: output is always 595x842 (A4)

I try to convert PDF to PNG, but ouput image is always A4, however the source PDF is very huge. Here are my commands:
-dNOPAUSE ^
-dBATCH ^
-dSAFER ^
-sDEVICE=png16m ^
-dFirstPage=1 ^
-sOutputFile="D:\PDF.png" ^
"D:\PDF.pdf" ^
-sPAPERSIZE=a1
I tried several options (-r, -g, -sDEFAULTPAPERSIZE), but none worked.
How can I force the output image dimensions?
P.S: my PDF file
Your linked-to PDF file has only 1 page. That means your commandline parameter -dFirstPage=1 doesn't have any influence.
Also, your -sPAPERSIZE=a1 parameter should not be last (it doesn't have any influence here -- so Ghostscript takes the default size from the pagesize of the input PDF, which is A4). Instead it should appear somewhere before the "D:\PDF.pdf" (which must be last).
It looks like you want a PNG with the size of A1, and your OS is Windows (guessing from the partial commandline you provided)?
Try this instead (it adds -dPDFFitPage=true to the commandline and puts the arguments into a correct order, while also shortening it a bit using the -o trick):
gswin32c.exe ^
-o "D:\PDF.png ^
-sDEVICE=png16m ^
-sPAPERSIZE=a1 ^
-dPDFFitPage=true ^
"D:\PDF.pdf"
This should give you a PNG with the size of 1684x2384 pixel at 72dpi (which is the builtin default for all Ghostscript image output, used if no other resolution is specified). For different combinations of resolution and pagesize add your variation of -rXXX and -gNNNxMMM (instead of -sPAPERSIZE=a1) but by all means keep the -dPDFFitPage=true....
You can also keep the -sPAPERSIZE=a1 and add -r100 or -r36 or -r200 if you want a different resolution only. Be aware that increasing resolution may not improve the image quality compared to the default output of 72dpi. That depends on the resolution of the images that were embedded in the PDF page. But it surely increases the file size...
function pdf2png-mutool() {
#: "mutool draw [options] file [pages]"
# pages: Comma separated list of page numbers and ranges (for example: 1,5,10-15,20-N), where
# the character N denotes the last page. If no pages are specified, then all pages
# will be included.
local i="$1"
local out="${pdf2png_o:-${i:r}_%03d.png}"
[[ "$out" == *.png ]] || out+='.png'
command mutool draw -o "$out" -F png "$i" "${#[2,-1]}"
#: '`-r 300` to set dpi'
}

Can Ghostscript start numbering pages from zero?

I am using Ghostscript to convert a multi-page PDF to individual JPEG files and can get it to output the files numbered like page_%03d.jpg.
But it always starts at page_001.jpg and I need it to start numbering the output files starting from page_000.jpg.
Is there a setting I can use to get Ghostscript to start at zero or am I going to have to rename all the files after processing?
Hmm... tricky question. I don't think there is a way to tweak the -sOutputFile=string_%03d.jpeg-syntax to start at zero.
However, what about trying it with a little workaround?
The trick is to use 2 passes for processing your PDF file
First pass: make processing by Ghostscript start with page 2 through the end. Your page numbering for this pass will still start at 1. But each consecutive page will now have a filename which is offset by -1.
Second pass: make processing by Ghostscript stop after page 1, and hardcode the output filename to include your desired zero numbering.
Here is are the two commands spelled out explicitely:
First pass:
gswin32c.exe ^
-o c:/path/to/output/page_%03d.jpg ^
-sDEVICE=jpeg ^
[...more options as needed...] ^
-dFirstPage=2 ^
-f c:/path/to/input.pdf
This will result in:
first page processed, page 2 ....... named as page_001.jpg
second page processed, page 3 ....... named as page_002.jpg
third page processed, page 4 ....... named as page_003.jpg
[...]
Second pass:
gswin32c.exe ^
-o c:/path/to/output/page_000.jpg ^
-sDEVICE=jpeg ^
[...more options as needed...] ^
-dLastPage=1 ^
-f c:/path/to/input.pdf
This will result in:
only page processed, page 1 ....... named as page_000.jpg
Voila!
This little trick can spare you a lot of work renaming all the pages. It's surely faster as soon as you have more than a few pages to process. And of course, this basic approach can easily be scripted.
Enjoy...
☺
To close this question, i will answer myself: no ghostscript cannot start numbering from zero. I had to rename all the files after ghostscript was done processing.

Resources