How can I merge pdf files together and take only the first page from each file? - qpdf

I am using qpdf to merge all pdf files in a directory and I would like to merge only the first page of multiple inputfiles. According to the qpdf documentation on page selection this should be possible. I have tried couple variants without luck:
qpdf --empty --pages *.pdf 1-1 -- "output.pdf"
qpdf --empty --pages *.pdf 1 -- "output.pdf"
What can I do?

As explained in this qpdf issue,
the shell expands *.pdf in the command qpdf --empty --pages *.pdf 1 -- "output.pdf", that means it replaces *.pdf
with a list of pdf files in the current directory. Assuming you have the following pdf files in the current directory:
file1.pdf
file2.pdf
file3.pdf
the command becomes:
qpdf --empty --pages file1.pdf file2.pdf file3.pdf 1 -- "output.pdf"
so the page selector is only applied to the last pdf. On a Mac or Linux you can script the command to add a 1 after
each pdf-filename, to take the first page of each pdf file and put it all together like so:
qpdf --empty --pages $(for i in *.pdf; do echo $i 1; done) -- output.pdf

Related

ImageMagik Combining JPGS in folders and subfolders into PDFs

I have a script that, when I right click on a folder, combines all pngs/jpgs/tifs inside the folder into a PDF and renames the PDF to the name of the folder it resides in.
cd %~dpnx1
for %%a in (.) do set currentfolder=%%~na
start cmd /k magick "*.{png,jpg,tif}" "%currentfolder%.pdf"
However, I have quite a lot of folders and currently have to do this one by one.
How can I create a function where I can right click on a folder, which searches subfolders and combines the jpgs to PDF?
So in the example below, Im wanting to create 3 PDFS (Folder A, Folder B and Folder C) by right clicking and running batch on the parent folder.
Example:
Parent folder (one that I would right click and run script from)
|- Folder A
||- test1.jpg
||- test2.jpg
||- test3.jpg
|- Folder B
||- example1.jpg
|| - example2.jpg
|- Folder C
|| Folder D
|||- temp.jpg
|||- temp2.jpg
I have also recently moved to Mac so I'm looking to use zsh. I've had some help to attempt to use the following myself but no luck:
#!/bin/bash
# Set the output directory
output_dir='./pdfs/'
# Make the output directory if it doesn't exist
mkdir -p "$output_dir"
# Check if an input directory was provided as a command-line argument
if [ $# -eq 0 ]
then
# Use the current directory as the input directory if none was provided
input_dir='./'
else
# Use the first command-line argument as the input directory
input_dir="$1"
fi
# Find all the directories in the input directory
find "$input_dir" -type d | while read dir; do
# Extract the base directory name
dirname=$(basename "$dir")
# Create a PDF file with the same name as the base directory name
output_file="$output_dir/$dirname.pdf"
# Find all the JPEG files in the current directory
find "$dir" -type f -name '*.jpg' | while read file; do
# Convert the JPEG file to PDF and append it to the output file
convert "$file" "$file.pdf"
done
# Concatenate all the PDF files in the current directory into a single PDF
gs -q -dNOPAUSE -dBATCH -sDEVICE=pdfwrite -sOutputFile="$output_file" "$dir"/*.pdf
# Remove the temporary PDF files
rm "$dir"/*.pdf
done
Hope you can help. Thank you.
There are several aspects to this question, and judging by your attempted solution, they will all be non-trivial for you. It is more than a normal question so I'll just give you an outline so you can tackle it in chunks. You'll need to:
install homebrew
install ImageMagick
use Automator to make a workflow for right-click
learn some bash scripting to recurse through directories
learn some ImageMagick to make PDFs
Install homebrew
Go to here and follow instructions to install homebrew. I am not repeating the instructions here as they may change.
You'll likely need to install Xcode command-line tools with:
xcode-select --install
You'll need to set your PATH properly afterwards. Don't omit this step.
Install ImageMagick
You'll need to do:
brew install imagemagick
Setup workflow with Automator for right-click
Next you need to make a script that will be executed when you right-click on a directory. It will look like this when we have done it. I right-clicked on the Junk directory on my desktop and went down to Quick Actions and across to makePDFs.
So, in order to do that you need to start the Automator by clicking ⌘ SPACE and typing Automator and hitting ENTER when it guesses.
Then select New Document and Quick Action. Now navigate the orange areas in the diagram till you find Run Shell Script then drag Run Shell Script over to the right side and drop it in the blue zone. Go on the Edit menu and click ⌘ and then Save As and enter makePDFs in the box. This is the name that will appear in future on your right-click menu.
Now set the options in the green box like I have done.
Now replace all the code in the blue box with the code copied from below:
#!/bin/bash
################################################################################
# Recurse into all subdirectories specified in parameter and make PDF in each
# directory of all images found in there.
################################################################################
# Add ImageMagick from homebrew to PATH
PATH=$PATH:/opt/homebrew/bin
# Check we got a directory as parameter
if [ $# -ne 1 ] ; then
>&2 echo "Usage: $0 DIRECTORY"
exit 1
fi
# Find and process all subdirectories
shopt -s nullglob
while read -rd $'\0' dir; do
# Start a subshell so we don't have to cd back somewhere
(
cd "$dir" || exit 1
# Make list of all images in directory
declare -a images
for image in *.jpg *.png *.tif ; do
images+=("$image")
done
numImages=${#images[#]}
if [ $numImages -gt 0 ] ; then
pdfname=${PWD##*/}
magick "${images[#]}" "${pdfname}.pdf"
fi
)
done < <(find "$1" -type d -print0)
Finally, set the options like I did in the cyan coloured box.
Now save the whole workflow again and everything should work nicely.
For bash, try this (tested on Linux):
for d in */; do convert "$d"/*.{png,jpg,tif} "$d/${d%/}.pdf" ; done
In slo-mo:
for d in */: loop on all directories in current directory (the / restrict matches to directories)). Variable $d contains the directory name, and name has a final /.
"$d"/*.{png,jpg,tif}: all files with png, jpg or tif extension in the directory "$d"
"$d/${d%/}.pdf": the directory name, a slash, the directory name with the ending slash removed, and .pdf`
If you look carefully, the explicit /s in the code aren't necessary since there is already one at the end of $d, but leaving them in makes the code a bit more readable and the multiple '//' are coalesced into a single one.
This code may however complain that there are no png/jpg/tif. A slightly different form makes it behave more nicely:
shopt -s extglob # this is possibly already set by default
for d in */; do convert "$d"/*.#(png|jpg|tif) "$d/${d%/}".pdf ; done
for zsh this could be (untested!):
# shopt -s extglob # no shopt necessary
for d in */; do convert "$d"/*.png(N) "$d"/*.jpg(N) "$d"/*.tif(N) "$d/${d%/}".pdf ; done
with the caveat that if no pattern matches, the command will just be convert whatever_output.pdf and you will get the built-in help.
The difference is that *.{png,jpg,tif} is expanded to *.png *.jpg *.tif before any pattern matching is done, so this represents three file patterns and the shell tries to match each pattern in turn (and leaves the literal pattern in case there is no match), while *.#(png|jpg|tif) is a single file pattern that matches any of the three extensions. This can also make a difference for you because the files do not appear in the same order, *.{png,jpg,tif} lists all the PNG, then the JPG,then the TIF, while *.#(png|jpg|tif) has them all sorted in alphabetical order without regard for the extension.

write xmp data to all jpeg files in a folder

can someone point me in the right direction if i need this
exiftool -tagsfromfile XYZ.xmp -all:all XYZ.jpg
to work for hundreds of jpgs? so i have a folgeder with houndreds of jpegs and xmps with the same name but different file ending (xmp and jpeg). what would be the elegant way to go through all of them and replace XYZ with the actual filename?
i want / need to do this in a shell on osx.
do in need something like a for loop? or is there any direct way in the shell?
Thank you so much in advance!!
Your command will be
exiftool -r --ext xmp -tagsfromfile %d%f.xmp -all:all /path/to/files/
See Metadata Sidecar Files example #15.
The -r (-recurse) option allows recursion into subdirectories. Remove it if recursion is not desired.
The -ext (-extension) option is used to prevent the copying from the XMP files back onto themselves.
The %d variable is the directory of the file currently being processed. The %f variable is the base filename without the extension of that file. Then xmp is used as the extension. The result creates a file path to a corresponding XMP file in the same directory for every image file found. This will work for any writable file found (see FAQ #16).
This command creates backup files. Add -overwrite_original to suppress the creation of backup files.
You do not want to loop exiftool as shown in the other answers. Exiftool's biggest performance hit is the startup time and looping it will increase the processing time. This is Exiftool Common Mistake #3.
solved it by doing this:
#!/bin/bash
FILES="$1/*.jpg"
for f in $FILES; do
if [ -f "$f" ]; then
echo "Processing $f file..."
#cat "$f"
FILENAME="${f%%.*}"
echo $FILENAME
# exiftool -tagsfromfile "$FILENAME".xmp -all:all "$FILENAME".jpg
exiftool -overwrite_original_in_place -ext jpg -tagsFromFile "$FILENAME".xmp -# xmp2exif.args -# xmp2iptc.args '-all:all' '-FileCreateDate<XMP-photoshop:DateCreated' '-FileModifyDate<XMP-photoshop:DateCreated' "$FILENAME".jpg
else
echo "Warning: Some problem with \"$f\""
fi
done
An elegant and easy way, IMHO, is to use GNU Parallel to do them all in parallel:
parallel --dry-run exiftool -tagsfromfile {.}.xmp -all:all {} ::: *.jpg
If that looks correct, remove --dry-run and run again to do it for real.
{} just means "the current file"
{.} just means "the current file without its extension"
::: is just a separator followed by the names of the files you want GNU Parallel to process
You can install GNU Parallel on macOS with homebrew:
brew install parallel

Is my bash script accurate enough to check if the list of images are being referred anywhere in directory?

I have a list of images which I wanted to delete if they are not being referred anywhere. My directory consists of multiple directories and within them, there are .js files. I need to search each image name in the above files. If they are referred anywhere, I need to output them so I will retain those images.
My script goes like this: I am trying to check each image in the following .js or .json files in the entire directory ( includes multiple directories inside) and output them to c.out if any of these files contain the above image name. Am I doing it right? I still could see some images are not coming in output even if they are being used.
#!/bin/bash
filename='images.txt'
echo Start
while read p; do
echo $p
find -name "*.js" | xargs grep -i $p > c.out
done < $filename
images.txt contains:
a.png
b.png
c.jpeg
....
Step 1: Keep a text file with list of images ( one name per line ), use dos2unix file_name if the file is generated/ created on Windows machine
Step 2: Run find /path/to/proj/dir -name '*.js' -o -name '*.json' | xargs grep -Ff pic_list.txt
You get the list of paths where those images are being referred.
Thanks #shelter for the answer

How can I only list files with complementary extensions?

In a directory, I have a bunch of files like file1.tex, file1.pdf, file2.tex, file2.pdf etc along with other files. I want, preferably a one liner in bash (to include in a make file as dependency), to
list all tex files if their pdf versions are also available.
list all pdf files if their tex versions are also available.
EDIT
I had tried
find . \( -name '*.pdf' -name '*.tex' \)
but it did not work. I guess the above approach is useful if I want to find files with pdf or tex extensions by using the -o option between two name switches.
thanks
suresh
The Solution
This isn't very readable--it would be much better to make this a script or shell function with some decent line wrapping--but you can force the loop to be a one-liner if you really want to.
# List TeX files with matching PDFs.
for file in *.pdf; do [ -f "${file/.pdf}.tex" ] && ls "${file/.pdf/.tex}"; done
# List PDFs with matching TeX files.
for file in *.tex; do [ -f "${file/.tex}.pdf" ] && ls "${file/.tex/.pdf}"; done
Validating the Solution
You can quickly test that the solution works properly with with some sample data.
$ touch file1.pdf file1.tex file2.pdf file2.tex file3.pdf file4.tex
$ for file in *.pdf; do [ -f "${file/.pdf}.tex" ] && ls "${file/.pdf/.tex}"; done
file1.tex
file2.tex
$ for file in *.tex; do [ -f "${file/.tex}.pdf" ] && ls "${file/.tex/.pdf}"; done
file1.pdf
file2.pdf
Note that in both cases, files without complements in the other format are silently ignored.

basic shell script

I have some video files all ending in .wmv .mov .mpg. I have music files ending in .mp3 .wma. And finally I have some text files all end in the extension .txt
I wrote a shell script that generates subfolders, one for the music files, one for the video files, and one for the text files, and then organizes all of the files into the correct subfolders.
But I ran into a little problem ...
*** I would like the script to be interactive and to prompt the user if he/she wants to organize the files. Also, I would like the script to write a log file that contains, for each file, the original file name, as well as the new file path/name it was moved to. And I would like the script to accept a command line argument, which is the folder that contains the unorganized files. This should allow the script to be located and run from anywhere in the file system, and accept any folder of unorganized files.
Example:
organizefiles.sh mystuff/media
where the subfolders would go inside "media"
Any ideas on how to do that?
Thank you!
here's a partial implementation for you to start with. Try and do the rest yourself.
find /path -type f \( -iname "*.mp3" -o -iname "*.txt" \) -exec file -N "{}" + | while IFS=":" read -r filename type
do
case "$type" in
*[vV]ideo*|*AVI* ) echo "Video: $filename";;
*[Aa]udio*|*MPEG*ADTS*) echo "Audio file: $filename";;
*[Aa]scii*[tT]ext*) echo "Text: $filename" ;;
* ) echo "No type: $filename-> type: $type";;
esac
done
Please read bash tutorial or this to get familiarize with shell scripting.
I would recommend switching to python, much easier to write and maintain such a code.
As for bash:
Reading input: read
Reading command line arguments: see here
Writing to a log file, simply do echo "something" >> mylogfile.log
Here is the script I have so far
#!/bin/bash
mkdir movies
mkdir songs
mkdir textfiles
mv *.wmv movies
mv *.mov movies
mv *.mpg movies
mv *.mp3 songs
mv *.wma songs
mv *.txt text
ls -l movies, >> log.txt
ls -l songs >> log.txt
ls -l textfiles >> log.txt

Resources