Exiftool: batch-write metadata to JPEGs from text file - shell

I'd like to use ExifTool to batch-write metadata that have been previously saved in a text file.
Say I have a directory containing the following JPEG files:
001.jpg 002.jpg 003.jpg 004.jpg 005.jpg
I then create the file metadata.txt, which contains the file names followed by a colon, and I hand it out to a coworker, who will fill it with the needed metadata — in this case comma-separated IPTC keywords. The file would look like this after being finished:
001.jpg: Keyword, Keyword, Keyword
002.jpg: Keyword, Keyword, Keyword
003.jpg: Keyword, Keyword, Keyword
004.jpg: Keyword, Keyword, Keyword
005.jpg: Keyword, Keyword, Keyword
How would I go about feeding this file to ExifTool and making sure that the right keywords get saved to the right file? I'm also open to changing the structure of the file if that helps, for example by formatting it as CSV, JSON or YAML.

If you can change the format to a CSV file, then exiftool can directly read it with the -csv option.
You would have to reformat it in this way. The first row would have to have the header of "SourceFile" above the filenames and "Keywords" above the keywords. If the filenames don't include the path to the files, then command would have to be run from the same directory as the files. The whole keywords string need to be enclosed in quotes so they aren't read as a separate columns. The result would look like this:
SourceFile,Keywords
001.jpg,"KeywordA, KeywordB, KeywordC"
002.jpg,"KeywordD, KeywordE, KeywordF"
003.jpg,"KeywordG, KeywordH, KeywordI"
004.jpg,"KeywordJ, KeywordK, KeywordL"
005.jpg,"KeywordM, KeywordN, KeywordO"
At that point, your command would be
exiftool -csv=/path/to/file.csv -sep ", " /path/to/files
The -sep option is needed to make sure the keywords are treated as separate keywords rather than a single, long keyword.
This has an advantage over a script looping over the file contents and running exiftool once for each line. Exiftool's biggest performance hit is in its startup and running it in a loop will be very slow, especially on a large amount of files (see Common Mistake #3).
See ExifTool FAQ #26 for more details on reading from a csv file.

I believe the answer by #StarGeek is superior to mine, but I will leave mine for completeness and reference of a more basic, Luddite approach :-)
I think you want this:
#!/bin/bash
while IFS=': ' read file keywords ; do
exiftool -sep ", " -iptc:Keywords="$keywords" "$file"
done < list.txt
Here is the list.txt:
001.jpg: KeywordA, KeywordB, KeywordC
002.jpg: KeywordD, KeywordE, KeywordF
003.jpg: KeywordG, KeywordH, KeywordI
And here is a result:
exiftool -b -keywords 002.jpg
KeywordD
KeywordE
KeywordF
Many thanks to StarGeek for his corrections and explanations.

Related

Converting a TXT file with double quotes to a pipe-delimited format using sed

I'm trying to convert TXT files into pipe-delimited text files.
Let's say I have a file called sample.csv:
aaa",bbb"ccc,"ddd,eee",fff,"ggg,hhh,iii","jjj kkk","lll"" mmm","nnn"ooo,ppp"qqq",rrr" sss,"ttt,""uuu",Z
I'd like to convert this into an output that looks like this:
aaa"|bbb"ccc|ddd,eee|fff|ggg,hhh,iii|jjj kkk|lll" mmm|"nnn"ooo|ppp"qqq"|rrr" sss|ttt,"uuu|Z
Now after tons of searching, I have come the closest using this sed command:
sed -r 's/""/\v/g;s/("([^"]+)")?,/\2\|/g;s/"([^"]+)"$/\1/;s/\v/"/g'
However, the output that I received was:
aaa"|bbb"ccc|ddd,eee|fff|ggg,hhh,iii|jjj kkk|lll" mmm|"nnn"ooo|pppqqq|rrr" sss|ttt,"uuu|Z
Where the expected for the 9th column should have been ppp"qqq" but the result removed the double quotes and what I got was pppqqq.
I have been playing around with this for a while, but to no avail.
Any help regarding this would be highly appreciated.
As suggested in comments sed or any other Unix tool is not recommended for this kind of complex CSV string. It is much better to use a dedicated CSV parser like this in PHP:
$s = 'aaa",bbb"ccc,"ddd,eee",fff,"ggg,hhh,iii","jjj kkk","lll"" mmm","nnn"ooo,ppp"qqq",rrr" sss,"ttt,""uuu",Z';
echo implode('|', str_getcsv($s));
aaa"|bbb"ccc|ddd,eee|fff|ggg,hhh,iii|jjj kkk|lll" mmm|nnnooo|ppp"qqq"|rrr" sss|ttt,"uuu|Z
The problem with sample.csv is that it mixes non-quoted fields (containing quotes) with fully quoted fields (that should be treated as such).
You can't have both at the same time. Either all fields are (treated as) unquoted and quotes are preserved, or all fields containing a quote (or separator) are fully quoted and the quotes inside are escaped with another quote.
So, sample.csv should become:
"aaa""","bbb""ccc","ddd,eee",fff,"ggg,hhh,iii","jjj kkk","lll"" mmm","""nnn""ooo","ppp""qqq""","rrr"" sss","ttt,""uuu",Z
to give you the desired result (using a csv parser):
aaa"|bbb"ccc|ddd,eee|fff|ggg,hhh,iii|jjj kkk|lll" mmm|"nnn"ooo|ppp"qqq"|rrr" sss|ttt,"uuu|Z
Have the same problem.
I found right result with https://www.papaparse.com/demo
Here is a FOSS on github. So maybe you can check how it works.
With the source of [ "aaa""","bbb""ccc","ddd,eee",fff,"ggg,hhh,iii","jjj kkk","lll"" mmm","""nnn""ooo","ppp""qqq""","rrr"" sss","ttt,""uuu",Z ]
The result appears in the browser console:
[1]: https://i.stack.imgur.com/OB5OM.png

Update the tar. bz2 compressed file

We have 100 hundreds of file in trx_date.tar.bz2 compressed file which has request and response . below is file structure of trx_date.tar.bz2 : trx_date.tar: trx_date contains : log1 ,log2,log3 files which has xml request having some sensitive info and i would like to mask it to some default value. Request Request is having tag 1234567 and i want to mask it to i.e update it to log file to 3333333
I am able to grep it using the the :
Number1=bzcat $LOGDIR/$LOG_FORMAT | grep "<number>[0-2,4-9][0-2,4-9][0-2,4-9][0-2,4-9][0-2,4-9][0-2,4-9][0-2,4-9]"
how we can override the those value in the log files using shell script ?
Log file contains request and response.. Where we have tag like 123456 and also other tags as well . I want to read all the line of log file and replace that specific tag like below 333333 and save the info into same file. We have info tag with 333333 as well but I don't want to consider that.
In principle, you cannot do directly what you want (without extracting the file from your .tar.bz2 compressed archive), since a .tar.bz2 file is a bzip2-ed compression of a tar archive. So the only good solution would be to extract files from the archive, do the modification on the extracted files (e.g. with sed(1) or awk), and recreate an archive from it. Using sed on one particular textual file to replace a pattern like <number>[0-9]*</number> by <number>0000000</number> is easy. Writing a bash for loop to iterate that on several files is easy. So combine both approaches, or write a tiny shell or Python script doing that (on the extracted files).
In practice (but that is risky and I don't recommend that) you could hope that <number> digits </number> happens only in the files part of the tar archive you want to modify in place, and then you could perhaps replace (directly in the uncompressed tar archive), using e.g. sed(1), such sequences with other sequences of the same byte length (read more about the tar format: meta data such as file sizes appear in textual form, NUL bytes completed).
You might also consider using tardy, a tar post-processor (that you need to install).
I strongly recommend extracting the tar archive, operate on the extracted files, then recreate that archive again. Of course, you need enough disk space, and you have to estimate it. But tell your manager that disk space is cheap, generally cheaper than your labor costs.
PS. The command given in your question is really wrong and does not do what you dream of. Read more about redirection, pipelines, globbing, unix shells. Read carefully the documentation of Bash (notably basic shell features, shell expansion, command substitution). Read also the documentation of each command that you want to use, e.g. tar(1), grep(1), sed(1), etc....). Read the relevant man-pages(7) perhaps with the man(1) command.

Add part of filename as PDF metadata using bash script and exiftool

I have about 600 books in PDF format where the filename is in the format:
AuthorForename AuthorSurname - Title (Date).pdf
For example:
Foo Z. Bar - Writing Scripts for Idiots (2017)
Bar Foo - Fun with PDFs (2016)
The metadata is unfortunately missing for pretty much all of them so when I import them into Calibre the Author field is blank.
I'm trying to write a script that will take everything that appears before the '-', removes the trailing space, and then adds it as the author in the PDF metadata using exiftool.
So far I have the following:
for i in "*.pdf";
do exiftool -author=$(echo $i | sed 's/-.*//' | sed 's/[ \t]*$//') "$i";
done
When trying to run it, however, the following is returned:
Error: File not found - Z.
Error: File not found - Bar
Error: File not found - *.pdf
0 image files updated
3 files weren't updated due to errors
What about the -author= phrase is breaking here? Please could someone enlighten me?
You don't need to script this. In fact, doing so will be much slower than letting exiftool do it by itself as you would require exiftool to startup once for every file.
Try this
exiftool -ext pdf '-author<${filename;s/\s+-.*//}' /path/to/target/directory
Breakdown:
-ext pdf process only PDF files
-author the tag to copy to
< The copy from another tag option. In this case, the filename will be treated as a pseudo-tag
${filename;s/\s+-.*//} Copying from the filename, but first performing a regex on it. In this case, looking for 1 or more spaces, a dash, and the rest of the name and removing it.
Add -r if you want to recurse into subdirectories. Add -overwrite_original to avoid making backupfiles with _original added to the filename.
The error with your first command was that the value you wanted to assign had spaces in it and needed to be enclosed by quotes.

Howto find specifc metadata content and delete it from pictures?

I try to find a specific metadata content like "Odeon" in my *.jpg files with the help of exiftool.exe and then delete this specific tag from the file.
I can't find "Odeon" with the command
exiftool -if "$keywords =~ /Odeon/" .
I know it's there, for this example, the content is stored in the tag "Location".
Can please someone tell me, howto
a) find the content, wherever it will be stored inside a *.jpg
b) delete exactly this found tag from this file (without backup of the file)?
The reason that your command doesn't work is because you're only checking the Keywords tag. Location is a different tag and you would have check there for that info.
Unfortunately, Exiftool doesn't have the ability to list only the tags that have matching data. You can pipe the output through another command line program like Find (since you're on Windows) or Grep (other platforms). In that case, your command line would look like this:
exiftool -g1 -a -s FileOrDir | Find "Odeon"
That would list all the tags that have your info.
After you found the tag, you could then remove it without having a backup file with this command, replacing TAG with the name of the tag:
exiftool -overwrite_original -TAG= FileOrDir
Take note that this command would remove that tag from all the files if you specify a dir. If you want to be more selective and the tag contains ONLY the text "Odeon", then you could use this command. Note that this command is case sensitive. It would not remove "oDeON" or other variations:
exiftool -overwrite_original -TAG-="Odeon" FileOrDir
If you wanted to remove a certain tag that contains "Odeon" as part of a longer string and be case insensitive, then you could add the -if option.
exiftool -overwrite_original -if "$TAG=~/odeon/i" -TAG= FileOrDir
Finally, there is the shotgun approach using the -api "Filter=…" option. This requires version 10.05 or greater. This command:
exiftool -overwrite_original -api "Filter=s/odeon//gi" -tagsfromfile # -all:all FileOrDir
would remove "odeon" (case insensitive) from all tags in the file. It would not remove the tag and if odeon was part of a longer string, the rest of the string would remain. For example, if Location was equal to "Odeon", it would become a blank string. If Description was "This is Odeon", it would become "This is ". The part after "Filter=" is a perl regex substitution and you could further refine it by looking into regex.

Testing "framework" for scripts with nonstandard filenames

Here are many comments on some questions (especially for shell) that say basically one or more of the following:
This will fail on file names that contain spaces, newlines, etc,
This will fail if the file is a symbolic link (or not),
This will fail if the $filaneme is a directory and not regular file,
and so on.
While I understand that every script needs its own testing environment, but
these are some common things for what the script should be immune against.
So, my intention is to write a script what will create some directory hierarchy
with "specially crafted" file names for testing purposes.
The question is: what "special" file names are good for this test?
Currently I have (the script creates files and directories) with:
space in the file name
newline in the file name
file name that starts with one of:
- (like command argument)
# (comment char)
! (command history)
file name that contains one of:
| char (pipe)
() chars
* and ? (wildcards)
file name with unicode characters
all above for the directories
symbolic link to the directory
symbolic link to the file
Any other idea what I shouldn't miss?
What comes to my mind:
quotes in the filename single and double
the $ character at the start
several redirection characters like > < << <<<
the ~ char ($HOME)
the ';' (as command delimiter)
backslash in the filename \
basically, go thru ascii table and test all chars, if you think that you need this :)
Some another comments:
If you want test scripts for the stack-overflow questions, you should create one file with the OP's content (calling as the "basic file")
And the all above "special files" should be symlinks to the above basic file. With this method you can easily modify the content of the files (you need change only one - the basic).
Or, if symlinks not a solution for you use hard-links.
Not directly about special characters in the filenames, but it is good care about:
different case filenames, especially for images like image.jpg image.JPG, same filename only different extension
EDIT: Ideas from the comments:
Very long filenames, lots and lots of files, and very deep directory hierarchies (tripleee)

Resources