complete noob, How do I make a script to convert individual pages of a PDF as images AND save them in folders with the same name as the PDF? - image

For example, if I have book1.pdf and book2.pdf, I would like to create a script where the pages of the pdfs are converted to images and are saveed in their separate folders: book1 folder and book2 folder.
It's something this program does but I do not want to pay 27 bucks just for this.
I'm a complete noob when it comes to coding. I installed Ghostscript and added a printer that runs ghostscript, so now I do have the option of opening a PDF (or any document), and print using the Ghostscript printer, and it outputs the resulting images to a folder.
This is the code for printer properties->ports->arguments for this program:
-sDEVICE=jpeg -r300 -dJPEGQ=100 -o -dSAFER -sOutputFile="C:\IMAGEfiles\image%%03d.jpg" -
My goal now is to automate the system so that I can have a list of PDFs and convert their pages into images and sorted into folders based on the same name as the PDFs. Thank you

This isn't a Ghostscript question really, this is a shell script programming problem.
Since you are using C: I'm assuming you are on Windows. I'm also going to assume you have created the folders in advance.
If you then open a command shell and do :
for %s in (*.pdf) do "c:\program files\gs\gs9.52\bin\gswin64c" -sDEVICE=jpeg -r300 -dJPEGQ=100 -dBATCH -dNOPAUSE -sOutputFile=c:/%~ns/image%03d.jpg %s
That will find all the files with names of the form *.pdf, execute Ghostscript (you may have to alter the paths and executable name, it depends on the version you installed) and output the resulting JPG files to a folder whose names is the '*' part of the input filename.
Note that your original command line has both -o and -sOutputFile, you should modify it to remove one or the other. -o is supposed to be followed by the name of the output file and includes -dBATCH and -dNOPAUSE all wrapped up as one. Whereas -sOutputFile= just sets the output filename. Using both is a bad idea, if it works I'm surprised, and it certainly wouldn't surprise me if it stopped working at some point, or had unexpected side effects.

Related

Delete the input file after GhostScript finishes converting to PDF

Can someone show me how to use the PostScript deletefile operator to delete the input file after GhostScript finishes converting the input file to a PDF file.
This appears to work for me, first creating the PDF file, then setting the permissions on the input file, and finally deleting the input file.
"C:/Program Files/gs/gs9.55.0/bin/gswin64c.exe" -q -sDEVICE#pdfwrite
-o "C:/Temp/Temp_0001.pdf"
-f "C:/Temp/Temp_0001.ps"
--permit-file-all=C:/Temp/Temp_0001.ps
-c (C:/Temp/Temp_0001.ps) deletefile
NOTE: Since I had to switch to Unix-style path separators (even though I am running this on Windows) for the permit-file-all and the deletefile, I decided to use the same convention for both the output and input files as well. Windows seems to be OK with that, and the convention was uniformly used for all paths/files.

How to append output file name using input file name without extention while using ghostscript?

I'm using GS to "compress" PDF with 2 clicks. I've added a context menu in windows register with abovementioned code.
For instance if I use it on test.pdf the output file will be test.pdf-compressed.pdf. It works, but I would like to get rid of extention in the filename. Is there any way to do so?
I've tryied to use cmd arguments, but it does not seem to work with the postscript.
C:\Program Files\gs\gs9.27\bin\gswin64c.exe -q -dNOPAUSE -dBATCH -sDEVICE=pdfwrite -dPDFSETTINGS=/ebook -dCompatibilityLevel=1.4 -sOutputFile=%1%-compressed.pdf -c .setpdfwrite -f %1
This isn't a Ghostscript question. If you get the arguments correct then the parameters passed to Ghostscript will be correct and the output file will be what you want.
You haven't said what you've tried, so that makes it pretty hard to make suggestions. However you should be able to use %~dp1 and/or %~n1 instead of simply %1 to expand to just a path or file. There are other variations, typing "help for" at the Windows command line will give you more details.
Note as always that Ghostscript does not compress PDF files, by using -dPDFSETTINSG=/ebook you are producing a brand-new PDF file which has altered the content from the original (image will be downsampled for example).
Also the sequence -c .setpdfwrite -f has been redundant for years, you don't need it.
[EDIT]
This batch file demonstrates the use of the command shell variable expansion in a batch file
#ECHO OFF
ECHO Input file is %1
ECHO Input directory is %~dp1
\ghostpdl\debugbin\gswin32c -dBATCH -dNOPAUSE -sDEVICE=pdfwrite -sOutputFile=%~dp1\new.pdf %1
When saved as test.bat and then test d:\temp\input.ps the following output is generated:
Input file is d:\temp\input.ps
Input directory is d:\temp\
GPL Ghostscript GIT PRERELEASE 9.28 (2019-04-04)
Copyright (C) 2019 Artifex Software, Inc. All rights reserved.
This software is supplied under the GNU AGPLv3 and comes with NO WARRANTY:
see the file COPYING for details.
And a file new.pdf is created in the directory d:\temp
So the variable expansion works as expected, because it happens before the command line is executed.
If it still doesn't work for you, then you're going to have to provide more information. In your place I would start by removing the -dNOPAUSE and -dBATCH switches from the command line, at least that way you'll be able to see if Ghostscript is trying to tell you something.
I've clearly stumbled upon this posting a bit late.
However, I wanted to post my answer in case someone comes looking for a solution to a similar issue, in the future.
I started by creating a new Folder on my Desktop, titled "PDF", in which I placed the "test.pdf" File.
I then created a .BAT File, titled "CompressPDF.bat", to which I added the Script below.
This Script will Loop through and Compress Any/All .PDF Files, that are placed in the "PDF" Folder.
It then correctly appends the "-compressed.pdf" string to the File Name, thereby saving the "test.pdf" File as "test-compressed.pdf", per the request of the OP.
As you will Notice, I have Added the "PAUSE" Command at the very end of the Script.
This will keep the Window from Automatically Closing until you Press Any Key, which will allow you to review any Errors that may have arisen, during the compression process.
#echo off
cd "%USERPROFILE%\Desktop\PDF"
for %%f in (*.pdf) do (
gswin64c.exe -q -dNOPAUSE -dBATCH -sDEVICE=pdfwrite -sOutputFile="%%~nf-compressed.pdf" "%%f"
)
PAUSE
I hope this post is able to help others, who may be looking for an answer to a similar issue.
Please, feel free to response or to reach out to me, if anyone has any questions, as I am always happy to help.

How to convert images from TIFF to JPG preserving the comments and tags

I am using preview (that comes with OS X El Capitan) feature to convert a file form TIFF format into JPG for example. I expected the export process will include the original comments, but it doesn't happen (it applies also for the tag fields).
The generated JPG file has no comment
The compression and change image format work, but the META INFO such as comment or tags are not exported.
Any suggestion or workaround about how to include that information. I need to convert about 500 images so manually copy/paste doesn't work for me.
Updated Answer
In the light of your comments, I think the best way forward is to try and identify how/where the comments are stored for each platform (Windows vs macOS) and then to decide which method you want to use going forward.
macOS Finder/Spotlight comments will not be legible on Windows, so if you want Windows compatibility, you need to standardise on JPEG or EXIF comments.
I recommend using exiftool which you can install with homebrew, using:
brew install exiftool
Then I suggest you try extracting the comments from your files to see how/where they are stored:
exiftool -a image.jpg
will show you all tags in image.jpg. Your comments may be under:
comment - which is the JPEG comment, or
EXIF:UserComment - which is the EXIF comment
If you find your comments in the JPEG or the EXIF section, you can extract just the comments with:
exiftool -comment image.jpg # extract JPEG comment
exiftool -EXIF:UserComment image.jpg # extract EXIF UserComment
Add the option -s3 to suppress the field-names in the above to save having to parse them out.
Likewise, you can set the comments with:
exiftool -comment="FUNKY JPEG COMMENT" image.jpg # set JPEG comment
exiftool -EXIF:UserComment="FUNKY EXIF USER COMMENT" image.jpg # set EXIF UserComment
You can also extract the EXIF user comments to a CSV with:
exiftool -EXIF:UserComment -csv *.jpg
SourceFile,UserComment
a.jpg,FUNKY EXIF:UserComment
b.jpg,b FUNKY EXIF:UserComment
You can also apply comments from a CSV.
You should also be able to extract macOS/Spotlight/Finder comments using the script in my main answer:
$HOME/macOSGetFinderComment "/Users/someone/soneFile.tif"
Original Answer
I would suggest you try the following using ImageMagick.
First, use the Finder, or any other tool you are familiar with, to make a copy of your photos including the entire directory structure to some new place where we cannot damage your existing photos. So, let's say you copy (NOT move) the entire tree of TIFs to a subdirectory called "NEW" inside your HOME directory.
Then start the Terminal and change directory to "NEW":
cd NEW
Easy Method
If all the TIFs are in a single directory or two, just use mogrify:
mogrify -format jpg *.tif
Harder Method
If the TIF files are in multiple directories, you will need to work a bit harder. Inside Terminal copy and paste this:
find NEW -name \*.tif -exec sh -c 'new="${1%.tif}.jpg"; convert "{}" "$new"' _ {} \;
That starts looking in the "NEW" directory for files named "*.tif". When it finds one, it starts a new shell (sh) passing it the filename of the TIF. It then works out the new filename by replacing a trailing "tif" with "jpg" and invokes ImageMagick convert to do the conversion.
As regards the Finder/Spotlight comments, here is a little script to get the Finder comment of a file:
#!/bin/bash
# macOSGetFinderComment
# Pass an absolute path to the file!
file=$1
osascript<<EOF
tell application "Finder" to get comment of item POSIX file "$file"
EOF
And here is one to set the Finder/Spotlight comment:
#!/bin/bash
# macOSSetFinderComment
# Pass an absolute path to the file!
file=$1
comm=$2
osascript<<EOF
tell application "Finder" to set comment of item POSIX file "$file" to "$comm"
EOF
So, I would save those 2 scripts in your HOME directory and then make them executable with:
cd
chmod +x macOS*FinderComment
Then save this file in your HOME directory under $HOME/CopyComments:
#!/bin/bash
shopt -s nullglob
for f in $(pwd)/*.tif; do
comment=$($HOME/macOSGetFinderComment "$f")
new="${f%.tif}.jpg"
echo Setting comment of $new to $comment
$HOME/macOSSetFinderComment "$new" "$comment"
done
and make it executable with:
chmod +x $HOME/CopyComments
and run it with:
cd NEW
$HOME/CopyComments
I have posted this problem also in Apple Community, here is the solution proposed by VikingOSX. It is a big piece of code, so better download it from here or directly from the Apple Community Link mentioned. Here is a description about the solution as described in the original post:
Prompts for a source folder, and a destination folder.
Duplicates folder hierarchy from source to destination folder.
Selects all TIFF images in the folder hierarchy and converts them to JPEG.
For sub-folders and their files, transfers the original Finder comments, color tags and tag name(s) to the destination hierarchy.
The compression level for the JPG file is high, it can be modified for: medium or low in the line: save this_img as JPEG in outfile_name with compression level medium with icon
Limitation: Source folder can only contain one-level of sub-folders. Ignoring this will result in unplanned results.
Additional Comments
Uses a with timeout clause to allow for large number of files. AppleScript does not yet support Finder tag names, so this script uses AppleScript/Objective-C to get and set those tag name(s). Due to this extension, the script now requires AppleScript 2.4 and must be run on OS 10.10 or later.
Due to the AppleScript/Objective-C code, the script cannot be run interactively as a script/script bundle without using the control+command+R keyboard shortcut. A test is made when the script starts, and will warn appropriately. It is best to save the script as an application to avoid this keyboard shortcut altogether.
Usage
Save the script and then copy and paste the file contains into the Script Editor (you can find the application in the folder: Utilities under the name: Script Editor), compile and save the file with the format: Application, then double click on it to run the script application.
I have tested the script under with Mac Air 2010, with OS El Capitan, for a folder with 884 TIFF files with 2.25GB size and it takes about 18 minutes to convert them into JPG files with medium compression level. The generated files will contain the tags and comments from the original equivalent TIFF file.
Disclaimer
Comment and tags generated in one platform for example Windows or mac OS are not visualized in the other platform. Tags created in Windows are treated in mac OS as keywords (Comand+i for visualizing them), but comments generated in Windows are not visualized in mac OS. This is general incompatibility problem that apply for photos in any format (for example TIFF or JPG).
EDIT (updated solution for solving cross-platform problem with comments)
Taking the idea from #MarkSetchell, I adapted the original script to at least solve the cross-platform problem from macOS to Windows, i.e. a comment from macOS can be seen in Windows platform. The idea is to use EXIF metadata. Then the Applescript will invoke the shell script for invoking the exiftool:
set uxFilepath to POSIX path of NewIMG
do shell script "/usr/local/bin/exiftool -overwrite_original -EXIF:UserComment=\"" & cmtstr & "\" " & uxFilepath
Windows processes the UserComment metadata from EXIF as a regular file comment. Now same comment on the TIF file will be on the JPG and also because such comments were copied (copy-paste) into an EXIF metadata the same information will be visualized under Windows. The same idea can be used for other file properties, in case Windows/Mac read it.
The EXIF metadata in macOS can be visualized from command line as suggest #MarkSetchell, but also from Finder: Command+o (to launch preview app), then Command+i (to launch the inspector). Then click on tap: "More Info", then the tab EXIF.
For the opposite process will require an script that does the opposite, i.e., copy EXIF comment using exiftool, into macOS comment. I have verified that in such case the Windows comment will appear under the label: XPComment. The script uses: UserComment, but it works using XPComment as label in both directions.

Ghostscript: ps2pdf doesn't work with Win 7 32-bit

I have the latest GPL Ghostscript v9.05 and I am running it on Win 32 bit systems. On my XP machine, both commands
ps2pdf -v -
and
rungs -v (used internally by TeXLive)
report of Ghostscript 9.05 being available as follows:
GPL Ghostscript 9.05 (2012-02-08)
Copyright (C) 2010 Artifex Software, Inc. All rights reserved.
On another Win 7 computer, the command
ps2pdf -v -
at the command line is not recognised as being a valid ps2pdfsyntax but rungs -v works fine.
I have read on the internet about setting temporary directories for Ghostscript here:
http://schlingel.bplaced.net/?p=54
and it says basically to edit the gssetgs.bat file in the lib folder of Ghostscript and add the lines:
set path=%ProgramFiles%\gs\gs9.02\lib;%ProgramFiles%\gs\gs9.02\bin;%windir%\SysWOW64
set TMP=%YOUR_TEMP%
set TEMP=%TMP%
This needs to be modified appropriately by replacing 9.02 with 9.05 in my case. Now on Win 7, how should the last two lines regarding the temporary folder be? Can I have for Set TMP as follows:
set TMP=%"C:\Users\hihi\AppData\Local\Temp"%
Is it the right syntax?
I have also put the tmp and temp variables in my Environment variables.
Update
A. Using ps2pdf, here is how I convert a PS to a PDF file on my Win 7 machine and I get an error:
C:\work\misc>ps2pdf -dNOSAFER -sDEVICE=pdfwrite -r720 -dCompatibilityLevel=1.5 -dUseFlateCompression=true -dMaxSubsetPct=100 -dSubsetFonts=true -dEmbedAllFonts=true -dEPSCrop "%1.ps" "%1.pdf"
Unknown device:
Unrecoverable error: undefined in .uninstallpagedevice
Operand stack:
defaultdevice
Note: the above command works fine on my Win XP machine!
B. Using gswin32c here is how I convert a PS to a PDF file on my Win 7 machine and this works:
C:\work\misc>gswin32c.exe -o "%1.pdf" -dNOSAFER -sDEVICE=pdfwrite -r720 dCompatibilityLevel=1.5 -dUseFlateCompression=true -dMaxSubsetPct=100 -dSubsetFonts=true -dEmbedAllFonts=true -dNumRenderingThreads=2 -c "60000000 setvmthreshold" -f -dEPSCrop "%1.ps"
GPL Ghostscript 9.05 (2012-02-08)
Copyright (C) 2010 Artifex Software, Inc. All rights reserved.
This software comes with NO WARRANTY: see the file PUBLIC for details.
Loading NimbusSanL-Regu font from %rom%Resource/Font/NimbusSanL-Regu... 2837152
1454727 4413848 3026018 1 done.
Loading Dingbats font from %rom%Resource/Font/Dingbats... 2837152 1510862 457461
6 3180865 1 done.
Loading NimbusSanL-Bold font from %rom%Resource/Font/NimbusSanL-Bold... 2857248
1553499 4655000 3251960 1 done.
Note: the above command also works fine on my Win XP machine
To summarise: I have problem with ps2pdf on my Win 7 machine.
New (May 09, 2012)
This is what I get when I rem the last two lines as Ken suggested:
C:\work\misc>ps2pdf -dNOSAFER -r720 -dCompatibilityLevel=1.5 -dUseFlateCompression=true -dMaxSubsetPct=100 -dSubsetFonts=true -dEmbedAllFonts=true -dEPSCrop "%1.ps" "%1.pdf"
Unrecoverable error: typecheck in .putdeviceprops
avoiding clean up
The temp folder has those temporary files you mentioned.
The contents of _.at:
-dCompatibilityLevel#1.4
-dNOSAFER
-r720
-dCompatibilityLevel
The contents of _.at2:
-q -P- -dSAFER -dNOPAUSE -dBATCH -sDEVICE#pdfwrite
-sOutputFile#-dUseFlateCompression
-dCompatibilityLevel#1.4
-dNOSAFER
-r720
-dCompatibilityLevel
-c .setpdfwrite -f1.5
Update May 11
Note: This is working fine now on my win 7 machine
C:\work\misc>ps2pdf -dNOSAFER -r720 -dCompatibilityLevel#1.5 -dUseFlateCompression#true -dMaxSubsetPct#100 -dSubsetFonts#true -dEmbedAllFonts#true -dEPSCrop "defense.ps" "defense.pdf"
avoiding clean up
The contents of _.at:
-dCompatibilityLevel#1.4
-dNOSAFER
-r720
-dCompatibilityLevel#1.5
-dUseFlateCompression#true
-dMaxSubsetPct#100
-dSubsetFonts#true
-dEmbedAllFonts#true
-dEPSCrop
The contents of _.at2:
-q -P- -dSAFER -dNOPAUSE -dBATCH -sDEVICE#pdfwrite
-sOutputFile#"defense.pdf"
-dCompatibilityLevel#1.4
-dNOSAFER
-r720
-dCompatibilityLevel#1.5
-dUseFlateCompression#true
-dMaxSubsetPct#100
-dSubsetFonts#true
-dEmbedAllFonts#true
-dEPSCrop
-c .setpdfwrite -f"defense.ps"
Thanks.
Far more likely than a temporary file problem is permissions on the directory where you are trying to write the destination file. The error message you quote occurs long before any temporary files are used, but is a very common error if you try to write to a directory which does not exist, or which the process has no write permission for.
First thing to do is post the actual gswin32 command line you are using.
The syntax you are querying is incorrect. %value% is a Windows scripting operation which says 'replace the stuff between the % signs by the named value'. So if I declare 'set VAL=c:/temp' Then I can say 'set NEWVAL=%VAL%/New' which will make NEWVAL 'c:/temp/new'. You can find more about windows scripting in the Windows help, or by a quick Google.
Given that 'ps2pdf' (which is a Windows script) can't be found on your Windows 7 machine (at least I assume that's what you mean by 'is not recognised as being a valid ps2pdf command') it does seem like you need to add the Ghostscript paths to your environment. Simply altering gssetgs.bat on its own will do nothing, you need to ensure that this script file is called from your autoexec.bat script, so that the additional environment settings are applied.
If you aren't sure what autoexec.bat is, or how to modify it, then again Google should help you pretty quickly.
Did you actually install Ghostscript, or simply copy it ?
Given that Ghostscript works correctly, the problem must be in the shell script 'ps2pdf', or more accurately some change in Windows 7 is causing the old script not to work.
This script is (unfortunately) rather more complex than I would like (I didn't write it). it actually uses about three different scripts to do the work. You really need to find out what is being sent to GS.
Probably the simplest way to do this is to edit 'ps2pdfxx.bat'. At the :end label you'll see 'rem Clean up' followed by two lines beginning 'if exist'. Put rem in front of those two. Add a line which says 'echo avoiding clean up' The end of the file should look like this:
:end
rem Clean up.
rem if exist "%TEMP%_.at" erase "%TEMP%_.at"
rem if exist "%TEMP%_.at"2 erase "%TEMP%_.at"2
echo avoiding clean up
Now run your command line (by the way you really don't need to put -sDEVICE= when using ps2pdf.....)
In your TEMP directory you should have files called _.at and possibly _.at2 which will contain the actual commands being sent to GS.
OK the file _.at is copied into the file .at2, and then.at2 is used as the list of arguments to Ghostscript. Commenting up the file you got:
---This line added by the batch file ps2pdfxx.bat
-q -P- -dSAFER -dNOPAUSE -dBATCH -sDEVICE#pdfwrite
---These lines come from _.at
-sOutputFile#-dUseFlateCompression
-dCompatibilityLevel#1.4
-dNOSAFER
-r720
-dCompatibilityLevel
--This line added by the batch file ps2pdfxx.bat
-c .setpdfwrite -f1.5
There are a number of problems with this:
-sOutputFile#-dUseFlateCompression
This in effect sets the output file to '-dUseFlateCompression'
-c .setpdfwrite -f1.5
I'm not completely sure what this will do. Either it will handle the -f properly and terminate the PostScript input, or it will ignore it as an unrecognised switch (probably the latter). The '1.5' ought to be the input filename, without that Ghostscript doesn't know which file to use..... Even if it did, it will try to write the output to a bogus filename.
To be honest I would suggest that, if you want to set all these switches, you simply invoke Ghostscript directly rather than trying to use the script. In fact I'd recommend that anyway, every time I look at these scripts I shudder more.
Almost everything that the ps2pdf script is doing is being overridden by your command line, or is not required in the first place.

How to split PDFs (with applescript)

Does anyone know how to use the PDF kit thing to split pdfs in apple script, as i would like to split my pdf documents in to pairs of uncoloured and some colour pages.
I have tried pdftk, as i was orignally writing a bash script, but it fails on my document, which was produced from LaTeX.
I'd look at installing Ghostscript via MacPorts or Fink. Ghost script has pretty simple command line arguments for doing what you want. You can then control it within an Applescript script.
Typically to split a pdf with ghostscript you do the following:
gs -sDEVICE=pdfwrite -dNOPAUSE -dQUIET -dBATCH -dFirstPage=m -dLastPage=n -sOutputFile=out.pdf in.pdf
Where m and n are page numbers.
You can merge pdfs with
gs -sDEVICE=pdfwrite -dNOPAUSE -dQUIET -dBATCH -sOutputFile=out.pdf *.pdf
Automator has a "PDF to Images" choice which extracts all of the pages into individual pdf files... try that.

Resources