How to extract "Create Date" in a faster way than with "identify" - bash

I have done a short and ugly script to create a list of photos and datetime of when it was taken.
identify -verbose *.JPG | grep "Image:\|CreateDate:" | sed ':a;N;$!ba;s/JPG\n/JPG/g' | sed 's[^ ]* \([^ ]*\)[^0-9]*\(.*\)$/\1 \2/'
The output looks like
photo1.JPG 2018-11-28T16:11:44.06
photo2.JPG 2018-11-28T16:11:48.32
photo3.JPG 2018-11-28T16:13:23.01
It works pretty well, but my last folder had 3000 images and the script ran for a few hours after completing the task. This is mostly because identify is very slow. Does anyone have and alternative method? Preferably (but not exclusively) using native tools because it's a server and it is not so easy to convince the admin to install new tools.

Lose the grepand sed and such and use -format. This took about 10 seconds for 500 jpgs:
$ for i in *jpg ; do identify -format '%f %[date:create]\n' "$i" ; done
Output:
image1.jpg 2018-01-19T04:53:59+02:00
image2.jpg 2018-01-19T04:53:59+02:00
...
If you want to modify the output, put the command after the done to avoid forking a process after each image, like:
$ for i in *jpg ; do identify -format '%f %[date:create]\n' "$i" ; done | awk '{gsub(/+.*/,"",$NF)}1'
image1.jpg 2018-01-19T04:53:59
image2.jpg 2018-01-19T04:53:59
...

native tools? identify is the best ("native", I would call imagemagick a native tool) for this job. I don't think you'll find a faster method. Run it for 3000 images in parallel, you will have like nth-x speedup.
find . -maxdepth 1 -name '*.JPG' |
xargs -P0 -- sh -c "
identify -verbose \"\$1\" |
grep 'Image:\|CreateDate:' |
sed ':a;N;$!ba;s/JPG\n/JPG/g' |
sed 's[^ ]* \([^ ]*\)[^0-9]*\(.*\)$/\1 \2/'
" --
Or you can just use bash for f in "*.JPF"; do ( identify -verbose "$f" | .... ) & done.
Your seds look strange and output "unmatched ]" on my platform, I don't know what they are supposed to do, but I think cut -d: -f2 | tr -d '\n' would suffice. Greping for image name is also strange - you already now the image name...
find . -maxdepth 1 -name '*.JPG' |
xargs -P0 -- sh -c "
echo \"\$1 \$(
identify -verbose \"\$1\" |
grep 'CreateDate:' |
tr -d '[:space:]'
cut -d: -f2-
)\"
" --
This will work for filenames without any spaces in them. I think it will be ok with you, as your output is space separated, so you assume your filenames have no special characters.

jhead is small, fast and a stand-alone utility. Sample output:
jhead ~/sample/images/iPhoneSample.JPG
Sample Output
File name : /Users/mark/sample/images/iPhoneSample.JPG
File size : 2219100 bytes
File date : 2013:03:09 08:59:50
Camera make : Apple
Camera model : iPhone 4
Date/Time : 2013:03:09 08:59:50
Resolution : 2592 x 1936
Flash used : No
Focal length : 3.8mm (35mm equivalent: 35mm)
Exposure time: 0.0011 s (1/914)
Aperture : f/2.8
ISO equiv. : 80
Whitebalance : Auto
Metering Mode: pattern
Exposure : program (auto)
GPS Latitude : N 20d 50.66m 0s
GPS Longitude: E 107d 5.46m 0s
GPS Altitude : 1.13m
JPEG Quality : 96
I did 5,000 iPhone images like this in 0.13s on a MacBook Pro:
jhead *jpg | awk '/^File name/{f=substr($0,16)} /^Date\/Time/{print f,substr($0,16)}'
In case you are unfamiliar with awk, that says "Look out for lines starting with File name and if you see one, save characters 16 onwards as f, the filename. Look out for lines starting with Date/Time and if you see any, print the last filename you remembered and the 16th character of the current line onwards".

Related

How to sort images by aspect ratio

I want to sort images by aspect ratio, then use MPV to browse them, and I got some codes from Google:
identify * |
gawk '{split($3,sizes,"x"); print $1,sizes[1]/sizes[2]}' |
sed 's/\[.\]//' | sort -gk 2
This is a output:
28.webp 0.698404
1.webp 0.699544
27.webp 0.706956
10.webp 0.707061
25.webp 0.707061
9.webp 0.707061
2.webp 0.707241
22.webp 1.41431
23.webp 1.41431
24.webp 1.41431
Then I made some adaptations to fit my need:
identify * |
gawk '{split($3,sizes,"x"); print $1,sizes[1]/sizes[2]}' |
sed 's/\[.\]//' | sort -gk 2 |
gawk '{print $1}' |
mpv --no-resume-playback --really-quiet --playlist=-
It works, but isn't perfect. It can't deal with filename with space and identify is too slower than exiftool especially when handling WebP format, besides, exiftool has a -r option, so I want to use exiftool to get this output instead, but I don't know how to deal with the output of exiftool -r -s -ImageSize, anyone could help me?
Using exiftool you could use
exiftool -p '$filename ${ImageSize;m/(\d+)x(\d+)/;$_=$1/$2}' /path/to/files | sort -gk 2
This will format the output the same as your example and I assume the same sort command will work with that. If not, then the sort part would need editing.
Display aspect ratio and image filename without additional calculations with identify
identify -format '%f %[fx:w/h]\n' *.jpg | sort -n -k2,2
file1.jpg 1
file2.jpg 1.46789
file6.jpg 1.50282
file5.jpg 1.52
file7.jpg 1.77778
file3.jpg 1.90476
Regarding performance of identify vs exiftool, identify makes less calls but exiftool looks faster
strace -c identify -format '%f %[fx:w/h]\n' *.jpg 2>&1 | grep -E 'syscall|total'
% time seconds usecs/call calls errors syscall
100.00 0.001256 867 43 total
strace -c exiftool -r -s -ImageSize *.jpg 2>&1 | grep -E 'syscall|total'
% time seconds usecs/call calls errors syscall
100.00 0.000582 1138 311 total

Get total size of a list of files in UNIX

I want to run a find command that will find a certain list of files and then iterate through that list of files to run some operations. I also want to find the total size of all the files in that list.
I'd like to make the list of files FIRST, then do the other operations. Is there an easy way I can report just the total size of all the files in the list?
In essence I am trying to find a one-liner for the 'total_size' variable in the code snippet below:
#!/bin/bash
loc_to_look='/foo/bar/location'
file_list=$(find $loc_to_look -type f -name "*.dat" -size +100M)
total_size=???
echo 'total size of all files is: '$total_size
for file in $file_list; do
# do a bunch of operations
done
You should simply be able to pass $file_list to du:
du -ch $file_list | tail -1 | cut -f 1
du options:
-c display a total
-h human readable (i.e. 17M)
du will print an entry for each file, followed by the total (with -c), so we use tail -1 to trim to only the last line and cut -f 1 to trim that line to only the first column.
Methods explained here have hidden bug. When file list is long, then it exceeds limit of shell comand size. Better use this one using du:
find <some_directories> <filters> -print0 | du <options> --files0-from=- --total -s|tail -1
find produces null ended file list, du takes it from stdin and counts.
this is independent of shell command size limit.
Of course, you can add to du some switches to get logical file size, because by default du told you how physical much space files will take.
But I think it is not question for programmers, but for unix admins :) then for stackoverflow this is out of topic.
This code adds up all the bytes from the trusty ls for all files (it excludes all directories... apparently they're 8kb per folder/directory)
cd /; find -type f -exec ls -s \; | awk '{sum+=$1;} END {print sum/1000;}'
Note: Execute as root. Result in megabytes.
The problem with du is that it adds up the size of the directory nodes as well. It is an issue when you want to sum up only the file sizes. (Btw., I feel strange that du has no option for ignoring the directories.)
In order to add the size of files under the current directory (recursively), I use the following command:
ls -laUR | grep -e "^\-" | tr -s " " | cut -d " " -f5 | awk '{sum+=$1} END {print sum}'
How it works: it lists all the files recursively ("R"), including the hidden files ("a") showing their file size ("l") and without ordering them ("U"). (This can be a thing when you have many files in the directories.) Then, we keep only the lines that start with "-" (these are the regular files, so we ignore directories and other stuffs). Then we merge the subsequent spaces into one so that the lines of the tabular aligned output of ls becomes a single-space-separated list of fields in each line. Then we cut the 5th field of each line, which stores the file size. The awk script sums these values up into the sum variable and prints the results.
ls -l | tr -s ' ' | cut -d ' ' -f <field number> is something I use a lot.
The 5th field is the size. Put that command in a for loop and add the size to an accumulator and you'll get the total size of all the files in a directory. Easier than learning AWK. Plus in the command substitution part, you can grep to limit what you're looking for (^- for files, and so on).
total=0
for size in $(ls -l | tr -s ' ' | cut -d ' ' -f 5) ; do
total=$(( ${total} + ${size} ))
done
echo ${total}
The method provided by #Znik helps with the bug encountered when the file list is too long.
However, on Solaris (which is a Unix), du does not have the -c or --total option, so it seems there is a need for a counter to accumulate file sizes.
In addition, if your file names contain special characters, this will not go too well through the pipe (Properly escaping output from pipe in xargs
).
Based on the initial question, the following works on Solaris (with a small amendment to the way the variable is created):
file_list=($(find $loc_to_look -type f -name "*.dat" -size +100M))
printf '%s\0' "${file_list[#]}" | xargs -0 du -k | awk '{total=total+$1} END {print total}'
The output is in KiB.

Scripting: get number of root files in RAR archive

I'm trying to write a bash script that determines whether a RAR archive has more than one root file.
The unrar command provides the following type of output if I run it with the v option:
[...#... dir]$ unrar v my_archive.rar
UNRAR 4.20 freeware Copyright (c) 1993-2012 Alexander Roshal
Archive my_archive.rar
Pathname/Comment
Size Packed Ratio Date Time Attr CRC Meth Ver
-------------------------------------------------------------------------------
file1.foo
2208411 2037283 92% 08-08-08 08:08 .....A. 00000000 m3g 2.9
file2.bar
103 103 100% 08-08-08 08:08 .....A. 00000000 m0g 2.9
baz/file3.qux
9911403 9003011 90% 08-08-08 08:08 .....A. 00000000 m3g 2.9
-------------------------------------------------------------------------------
3 12119917 11040397 91%
and since RAR is proprietary I'm guessing this output is as close as I'll get.
If I can get just the file list part (the lines between ------), and then perhaps filter out all even lines or lines beginning with multiple spaces, then I could do num_root_files=$(list of files | cut -d'/' -f1 | uniq | wc -l) and see whether [ $num_root_files -gt 1 ].
How do I do this? Or is there a saner approach?
I have searched for and found ways to grep text between two words, but then I'd have to include those "words" in the command, and doing that with entire lines of dashes is just too ugly. I haven't been able to find any solutions for "grep text between lines beginning with".
What I need this for is to decide whether to create a new directory or not before extracting RAR archives.
The unrar program does provide the x option to extract with full path and e for extracting everything to the current path, but I don't see how that could be useful in this case.
SOLUTION using the accepted answer:
num_root_files=$(unrar v "$file" | sed -n '/^----/,/^----/{/^----/!p}' | grep -v '^ ' | cut -d'/' -f1 | uniq | wc -l)
which seems to be the same as the shorter:
num_root_files=$(unrar v "$file" | sed -n '/^----/,/^----/{/^----/!p}' | grep -v '^ ' | grep -c '^ *[^/]*$')
OR using 7z as mentioned in a comment below:
num_root_files=$(7z l -slt "$file" | grep -c 'Path = [^/]*$')
# check if value is gt 2 rather than gt 1 - the archive itself is also listed
Oh no... I didn't have a man page for unrar so I looked one up online, which seems to have lacked some options that I just discovered with unrar --help. Here's the real solution:
unrar vb "$file" | grep -c '^[^/]*$'
I haven't been able to find any solutions for "grep text between lines
beginning with".
In order to get the lines between ----, you can say:
unrar v my_archive.rar | sed -n '/^----/,/^----/{/^----/!p}'

Batch to rename files with metadata name

I recently accidentally formatted a 2TB hard drive mac os jounaled!
I was able to recover files with Data Rescue 3, the only problem is the program didn't gave me the files as they were, root tree, and name.
For example I had
|-Music
||-Enya
|||-Sonadora.mp3
|||-Now we are free.mp3
|-Documents
||-CV.doc
||-LetterToSomeone.doc
...and so on
And now I got
|-MP3
||-M0001.mp3
||-M0002.mp3
|-DOCUMENTS
||-D0001.doc
||-D0002.doc
So with a huge amount of data it would take me centuries to manually open, see what is it and rename.
Is there some batch which can scan all my subfolders and take the previous name? By metadata perhaps?
Or do you know a better tool which will keep the same name and path of files (doesn't matter if must pay, ther's always a solution for that :P)
Thank you
My contribution for you music at least...
The idea is to go through all of the MP3 files found, and distributed them based on their ID3 tags.
I'd do something like :
for i in `find /MP3 -type f -iname "*.mp3"`;
do
ARTIST=`id3v2 -l $i | grep TPE1 | cut -d":" -f2 | sed -e 's/^[[:space:]]*//'`; # This gets you the Artist
ALBUM=`id3v2 -l $i | grep TALB | cut -d":" -f2 | sed -e 's/^[[:space:]]*//'`; # This gets you the Album title
TRACK_NUM=`id3v2 -l $i | grep TRCK | cut -d":" -f2 | sed -e 's/^[[:space:]]*//'`; # This gets the track ID/position, like "2/13"
TR_TITLE=`id3v2 -l $i | grep TIT2 | cut -d":" -f2 | sed -e 's/^[[:space:]]*//'`; # Track title
mkdir -p /MUSIC/$ARTIST/$ALBUM/;
cp $i /MUSIC/$ARTIST/$ALBUM/$TRACK_NUM.$TR_TITLE.mp3
done
Basically :
* It looks for all ".mp3" files in /MP3
* then analyses each file's ID3 tags, and parses them to fill 4 variables, using "id3v2" tool (you'll need to install it first). The tags are cleaned to get only the value, sed is used to trim the leading spaces that might pollute.
* then creates (if needed), a tree in /MUSIC/ with Artist name and album name
* then copies the input files to the new tree, and renames it thanks to the tags.

Regexp lines from file and run command

I have a file with output from the identify command, looks like this (following format: FILENAME FORMAT SIZE METADATA)
/foo/bar.jpg JPEG 2055x1381 2055x1381+0+0 8-bit DirectClass
/foo/ham spam.jpg JPEG 855x781 855x781+0+0 8-bit DirectClass
...
Note that the filenames can contain spaces! What I want to do is to basically run this on each of those lines:
convert -size <SIZE> -colors 1 xc:black <FILENAME>
In other words, creating blank images of existing ones. I've tried doing this with cat/sed/xargs but it's making my head explode. Any hints? And preferably a command-line solution..
Assuming, that filename is the string before " JPEG":
LINE="/foo/ham spam.jpg JPEG 855x781 855x781+0+0 8-bit DirectClass"
You can get file name as:
FILENAME=$(echo "$LINE" | sed 's/\(.*\) JPEG.*/\1/')
cat data_file | sed -e 's/\(.*\) JPEG \([^ ]*\) .*/convert -size \2 -colors 1 xc:black "\1"/' | bash
You can do what MichaƂ suggests. Also, if the metadata has a fixed number of words, you could do this easily like the following (supposing you process every line):
FILENAME=`echo $LINE | rev | cut -d\ -f 6- | rev`
(that is, reverse the line, and take the name from the sixth parameter on, then you have to reverse to obtain the filename proper.)
If not, you can use the fact that all the images have an extension and that the extension itself doesn't have spaces, and search for the extension till the first space afterwards:
FILENAME=`echo $LINE | sed -e '/([^.]+) .*$/\1/'`

Resources