Regexp lines from file and run command - bash

I have a file with output from the identify command, looks like this (following format: FILENAME FORMAT SIZE METADATA)
/foo/bar.jpg JPEG 2055x1381 2055x1381+0+0 8-bit DirectClass
/foo/ham spam.jpg JPEG 855x781 855x781+0+0 8-bit DirectClass
...
Note that the filenames can contain spaces! What I want to do is to basically run this on each of those lines:
convert -size <SIZE> -colors 1 xc:black <FILENAME>
In other words, creating blank images of existing ones. I've tried doing this with cat/sed/xargs but it's making my head explode. Any hints? And preferably a command-line solution..

Assuming, that filename is the string before " JPEG":
LINE="/foo/ham spam.jpg JPEG 855x781 855x781+0+0 8-bit DirectClass"
You can get file name as:
FILENAME=$(echo "$LINE" | sed 's/\(.*\) JPEG.*/\1/')

cat data_file | sed -e 's/\(.*\) JPEG \([^ ]*\) .*/convert -size \2 -colors 1 xc:black "\1"/' | bash

You can do what MichaƂ suggests. Also, if the metadata has a fixed number of words, you could do this easily like the following (supposing you process every line):
FILENAME=`echo $LINE | rev | cut -d\ -f 6- | rev`
(that is, reverse the line, and take the name from the sixth parameter on, then you have to reverse to obtain the filename proper.)
If not, you can use the fact that all the images have an extension and that the extension itself doesn't have spaces, and search for the extension till the first space afterwards:
FILENAME=`echo $LINE | sed -e '/([^.]+) .*$/\1/'`

Related

How to sort images by aspect ratio

I want to sort images by aspect ratio, then use MPV to browse them, and I got some codes from Google:
identify * |
gawk '{split($3,sizes,"x"); print $1,sizes[1]/sizes[2]}' |
sed 's/\[.\]//' | sort -gk 2
This is a output:
28.webp 0.698404
1.webp 0.699544
27.webp 0.706956
10.webp 0.707061
25.webp 0.707061
9.webp 0.707061
2.webp 0.707241
22.webp 1.41431
23.webp 1.41431
24.webp 1.41431
Then I made some adaptations to fit my need:
identify * |
gawk '{split($3,sizes,"x"); print $1,sizes[1]/sizes[2]}' |
sed 's/\[.\]//' | sort -gk 2 |
gawk '{print $1}' |
mpv --no-resume-playback --really-quiet --playlist=-
It works, but isn't perfect. It can't deal with filename with space and identify is too slower than exiftool especially when handling WebP format, besides, exiftool has a -r option, so I want to use exiftool to get this output instead, but I don't know how to deal with the output of exiftool -r -s -ImageSize, anyone could help me?
Using exiftool you could use
exiftool -p '$filename ${ImageSize;m/(\d+)x(\d+)/;$_=$1/$2}' /path/to/files | sort -gk 2
This will format the output the same as your example and I assume the same sort command will work with that. If not, then the sort part would need editing.
Display aspect ratio and image filename without additional calculations with identify
identify -format '%f %[fx:w/h]\n' *.jpg | sort -n -k2,2
file1.jpg 1
file2.jpg 1.46789
file6.jpg 1.50282
file5.jpg 1.52
file7.jpg 1.77778
file3.jpg 1.90476
Regarding performance of identify vs exiftool, identify makes less calls but exiftool looks faster
strace -c identify -format '%f %[fx:w/h]\n' *.jpg 2>&1 | grep -E 'syscall|total'
% time seconds usecs/call calls errors syscall
100.00 0.001256 867 43 total
strace -c exiftool -r -s -ImageSize *.jpg 2>&1 | grep -E 'syscall|total'
% time seconds usecs/call calls errors syscall
100.00 0.000582 1138 311 total

How to extract "Create Date" in a faster way than with "identify"

I have done a short and ugly script to create a list of photos and datetime of when it was taken.
identify -verbose *.JPG | grep "Image:\|CreateDate:" | sed ':a;N;$!ba;s/JPG\n/JPG/g' | sed 's[^ ]* \([^ ]*\)[^0-9]*\(.*\)$/\1 \2/'
The output looks like
photo1.JPG 2018-11-28T16:11:44.06
photo2.JPG 2018-11-28T16:11:48.32
photo3.JPG 2018-11-28T16:13:23.01
It works pretty well, but my last folder had 3000 images and the script ran for a few hours after completing the task. This is mostly because identify is very slow. Does anyone have and alternative method? Preferably (but not exclusively) using native tools because it's a server and it is not so easy to convince the admin to install new tools.
Lose the grepand sed and such and use -format. This took about 10 seconds for 500 jpgs:
$ for i in *jpg ; do identify -format '%f %[date:create]\n' "$i" ; done
Output:
image1.jpg 2018-01-19T04:53:59+02:00
image2.jpg 2018-01-19T04:53:59+02:00
...
If you want to modify the output, put the command after the done to avoid forking a process after each image, like:
$ for i in *jpg ; do identify -format '%f %[date:create]\n' "$i" ; done | awk '{gsub(/+.*/,"",$NF)}1'
image1.jpg 2018-01-19T04:53:59
image2.jpg 2018-01-19T04:53:59
...
native tools? identify is the best ("native", I would call imagemagick a native tool) for this job. I don't think you'll find a faster method. Run it for 3000 images in parallel, you will have like nth-x speedup.
find . -maxdepth 1 -name '*.JPG' |
xargs -P0 -- sh -c "
identify -verbose \"\$1\" |
grep 'Image:\|CreateDate:' |
sed ':a;N;$!ba;s/JPG\n/JPG/g' |
sed 's[^ ]* \([^ ]*\)[^0-9]*\(.*\)$/\1 \2/'
" --
Or you can just use bash for f in "*.JPF"; do ( identify -verbose "$f" | .... ) & done.
Your seds look strange and output "unmatched ]" on my platform, I don't know what they are supposed to do, but I think cut -d: -f2 | tr -d '\n' would suffice. Greping for image name is also strange - you already now the image name...
find . -maxdepth 1 -name '*.JPG' |
xargs -P0 -- sh -c "
echo \"\$1 \$(
identify -verbose \"\$1\" |
grep 'CreateDate:' |
tr -d '[:space:]'
cut -d: -f2-
)\"
" --
This will work for filenames without any spaces in them. I think it will be ok with you, as your output is space separated, so you assume your filenames have no special characters.
jhead is small, fast and a stand-alone utility. Sample output:
jhead ~/sample/images/iPhoneSample.JPG
Sample Output
File name : /Users/mark/sample/images/iPhoneSample.JPG
File size : 2219100 bytes
File date : 2013:03:09 08:59:50
Camera make : Apple
Camera model : iPhone 4
Date/Time : 2013:03:09 08:59:50
Resolution : 2592 x 1936
Flash used : No
Focal length : 3.8mm (35mm equivalent: 35mm)
Exposure time: 0.0011 s (1/914)
Aperture : f/2.8
ISO equiv. : 80
Whitebalance : Auto
Metering Mode: pattern
Exposure : program (auto)
GPS Latitude : N 20d 50.66m 0s
GPS Longitude: E 107d 5.46m 0s
GPS Altitude : 1.13m
JPEG Quality : 96
I did 5,000 iPhone images like this in 0.13s on a MacBook Pro:
jhead *jpg | awk '/^File name/{f=substr($0,16)} /^Date\/Time/{print f,substr($0,16)}'
In case you are unfamiliar with awk, that says "Look out for lines starting with File name and if you see one, save characters 16 onwards as f, the filename. Look out for lines starting with Date/Time and if you see any, print the last filename you remembered and the 16th character of the current line onwards".

Insert sed in a one liner to strip a string from the filename

This should be an easy question. Using the linux terminal I want to crop images in a folder using convert and change their name using sed. For example,
The following one liner crops the images in a folder as i expect:
for file in Screenshot*.png; do convert -crop 1925x1060+10+1 $file newname_$file; done
However, I want to strip the "Screenshot-" string from the filename. With sed I could use sed -n 's/Screenshot-//p' :
echo "Screenshot-1.png" | sed -n 's/Screenshot\-//p'
But how can I insert sed in the for loop above?
For example, if I have a folder with these images:
Screenshot.png Screenshot-1.png Screenshot-2.png do_not_crop.png
I expect to see these files:
Screenshot.png 1.png 2.png do_not_crop.png
Additional points for who can tell me how to convert Screenshot.png to 0.png
Edit: based on hek2mgl's answer, this script works:
for file in Screen*png;
do convert -crop 1925x1060+10+1 $file
$(if [[ "$file" == "Screenshot.png" ]];
then echo "0.png";
else echo "${file#Screenshot-}";
fi);
done
and outputs 0.png
I would use bash's parameter expansion rather than sed.
Example:
for file in Screenshot.png Screenshot-1.png Screenshot-2.png do_not_crop.png ; do
echo "${file#Screenshot-}"
done
Output:
Screenshot.png
1.png
2.png
do_not_crop.png

Pass .txt list of .jpgs to convert (bash)

I'm currently working on an exercise that requires me to write a shell script whose function is to take a single command-line argument that is a directory. The script takes the given directory, and finds all the .jpgs in that directory and its sub-directories, and creates an image-strip of all the .jpgs in order of modification time (newest on bottom).
So far, I've written:
#!bin/bash/
dir=$1 #the first argument given will be saved as the dir variable
#find all .jpgs in the given directory
#then ls is run for the .jpgs, with the date format %s (in seconds)
#sed lets the 'cut' process ignore the spaces in the columns
#fields 6 and 7 (the name and the time stamp) are then cut and sorted by modification date
#then, field 2 (the file name) is selected from that input
#Finally, the entire sorted output is saved in a .txt file
find "$dir" -name "*.jpg" -exec ls -l --time-style=+%s {} + | sed 's/ */ /g' | cut -d' ' -f6,7 | sort -n | cut -d' ' -f2 > jgps.txt
The script correctly outputs the directory's .jpgs in order of time modification. The part that I am currently struggling on is how to give the list in the .txt file to the convert -append command that will create an image-strip for me (For those who aren't aware of that command, what would be inputted is: convert -append image1.jpg image2.jpg image3.jpg IMAGESTRIP.jpgwith IMAGESTRIP.jpg being the name of the completed image strip file made up of the previous 3 images).
I can't quite figure out how to pass the .txt list of files and their paths to this command. I've been scouring the man pages to find a possible solution but no viable ones have arisen.
xargs is your friend:
find "$dir" -name "*.jpg" -exec ls -l --time-style=+%s {} + | sed 's/ */ /g' | cut -d' ' -f6,7 | sort -n | cut -d' ' -f2 | xargs -I files convert -append files IMAGESTRIP.jpg
Explanation
The basic use of xargs is:
find . -type f | xargs rm
That is, you specify a command to xargs, it appends the arguments it receives from standard input and then executes it. The avobe line would execute:
rm file1 file2 ...
But you also need to specify a final argument to the command, so you need to use the xarg -I parameter, which tells xargs the string you will use after to indicate where the arguments read from standard input will be put.
So, we use the string files to indicate it. Then we write the command, putting the string files where the variable arguments will be, resulting in:
xargs -I files convert -append files IMAGESTRIP.jpg
Put the list of filenames in a file called filelist.txt and call convert with the filename prepended by an ampersand:
convert #filelist.txt -append result.jpg
Here's a little example:
# Create three blocks of colour
convert xc:red[200x100] red.png
convert xc:lime[200x100] green.png
convert xc:blue[200x100] blue.png
# Put their names in a file called "filelist.txt"
echo "red.png green.png blue.png" > filelist.txt
# Tell ImageMagick to make a strip
convert #filelist.txt +append strip.png
As there's always some image with a pesky space in its name...
# Make the pesky one
convert -background black -pointsize 128 -fill white label:"Pesky" -resize x100 "image with pesky space.png"
# Whack it in the list for IM
echo "red.png green.png blue.png 'image with pesky space.png'" > filelist.txt
# IM do your stuff
convert #filelist.txt +append strip.png
By the way, it is generally poor practice to parse the output of ls in case there are spaces in your filenames. If you want to find a list of images, across directories and sort them by time, look at something like this:
# Find image files only - ignoring case, so "JPG", "jpg" both work
find . -type f -iname \*.jpg
# Now exec `stat` to get the file ages and quoted names
... -exec stat --format "%Y:%N {} \;
# Now sort that, and strip the times and colon at the start
... | sort -n | sed 's/^.*://'
# Put it all together
find . -type f -iname \*.jpg -exec stat --format "%Y:%N {} \; | sort -n | sed 's/^.*://'
Now you can either redirect all that to filelist.txt and call convert like this:
find ...as above... > file list.txt
convert #filelist +append strip.jpg
Or, if you want to avoid intermediate files and do it all in one go, you can make this monster where convert reads the filelist from its standard input stream:
find ...as above... | sed 's/^.*://' | convert #- +append strip.jpg

Build sorted and annotated pdf from images

I am trying to build a pdf from a set of image files in the same folder from the bash. So far I've got this code:
ls *.jpg | sort > files.txt
ls *.jpg | sort | tr '\n' ' ' | sed 's/$/\ data_graphs.pdf/' | xargs convert -gravity North -annotate #files.txt
rm files.txt
This code collapses the image, but they are not properly sorted, and the annotation is the same for every image (the first one in the list).
Here is the ls * jpg |sort output for reference.
$ ls *.jpg | sort
01.20.2014_A549_void.jpg
01.20.2014_EPOR_full_sorter.jpg
01.20.2014_EPOR_trunc_sorter.jpg
01.20.2014_WTGFP_sorter.jpg
01.27.2014_A549_void.jpg
01.27.2014_EPOR_full_I10412.jpg
01.27.2014_EPOR_full_sorter.jpg
01.27.2014_EPOR_trunc_I10412.jpg
01.27.2014_EPOR_trunc_sorter.jpg
01.27.2014_WTGFP_I10412.jpg
01.27.2014_WTGFP_sorter.jpg
02.03.2014_A549_void.jpg
02.03.2014_EPOR_full_sorter.jpg
02.03.2014_EPOR_trunc_sorter.jpg
02.03.2014_WTGFP_sorter.jpg
How about this, no need generate the temporary file files.txt
convert -gravity North -annotate `ls *.jpg | sort -t . -k3.3n -k1.1n -k2.2n ` data_graphs.pdf
According the comments, these jpg files have time-stamp in file name (MM-DD-YYYY), I updated the sort command.
another way, convert each jpg file to pdf first, then use pdftk to merge them, I used pdftk for long years and know the software can do the job easily. Here is the pdftk server url : pdflabs.com/tools/pdftk-server.
Below script will convert jpg file to pdf one by one
for file in *jpg
do
convert -gravity North -annotate "$file" "$file".pdf
done
Then run the pdftk command, if you have hugh number of pdf. With pdftk, you can merge every 10~20 into a small pdf, then merge the small pdf to final pdf. For example:
pdftk 1.pdf 2.pdf 3.pdf output m1.pdf
then you will get mXXX.pdf files, then run the pdftk again:
pdftk m1.pdf m2.pdf m3.pdf output final.pdf

Resources