Add part of filename as PDF metadata using bash script and exiftool

Add part of filename as PDF metadata using bash script and exiftool - bash

I have about 600 books in PDF format where the filename is in the format:
AuthorForename AuthorSurname - Title (Date).pdf
For example:
Foo Z. Bar - Writing Scripts for Idiots (2017)
Bar Foo - Fun with PDFs (2016)
The metadata is unfortunately missing for pretty much all of them so when I import them into Calibre the Author field is blank.
I'm trying to write a script that will take everything that appears before the '-', removes the trailing space, and then adds it as the author in the PDF metadata using exiftool.
So far I have the following:
for i in "*.pdf";
do exiftool -author=$(echo $i | sed 's/-.*//' | sed 's/[ \t]*$//') "$i";
done
When trying to run it, however, the following is returned:
Error: File not found - Z.
Error: File not found - Bar
Error: File not found - *.pdf
0 image files updated
3 files weren't updated due to errors
What about the -author= phrase is breaking here? Please could someone enlighten me?

You don't need to script this. In fact, doing so will be much slower than letting exiftool do it by itself as you would require exiftool to startup once for every file.
Try this
exiftool -ext pdf '-author<${filename;s/\s+-.*//}' /path/to/target/directory
Breakdown:
-ext pdf process only PDF files
-author the tag to copy to
< The copy from another tag option. In this case, the filename will be treated as a pseudo-tag
${filename;s/\s+-.*//} Copying from the filename, but first performing a regex on it. In this case, looking for 1 or more spaces, a dash, and the rest of the name and removing it.
Add -r if you want to recurse into subdirectories. Add -overwrite_original to avoid making backupfiles with _original added to the filename.
The error with your first command was that the value you wanted to assign had spaces in it and needed to be enclosed by quotes.

Related

rename files keeping basename adding extra word and changing extension

I have a script to encode video files. When I run this script, I would like to keep the basename of the file, add a 'modified' string to the filename and also changing the extension.
The following for loop does that with the exception of changing the extension:
for file in *.mkv;
do
encode $file "${file%%.*}_modified.${i#*.}";
done
I'd like that my_file.mkv will become my_file_modified.mp4. The previous loop just converts my_file.mkv into my_file_modified.mkv
How to change also the extension from .mkv to .mp4?
Thanks in advance.

I'm not totally sure if I got your question right, and probably not; but if I did, you should just do this:
for file in *.mkv;
do
encode $file "${file%.*}_modified.mp4";
done
The ${i#*.} part in your previous command actually took the original extension from the file name; you can just omit it and set your own extension instead.
Also, as #M.NejatAydin pointed out in the comment, you should use ${file%.*} instead of ${file%%.*}, to keep the entire original filename if it has a dot inside it.
For example:
$ file="test.file.mkv"
$ echo "${file%%.*}_modified.mp4"
test_modified.mp4 # This is probably NOT what you want
$ echo "${file%.*}_modified.mp4"
test.file_modified.mp4 # This is probably what you want

Extract image URI from markdown files using sed/grep containing duplicates in a single line

I have some markdown files to process which contain links to images that I wish to download. e.g. a markdown file:
[![](https://imgs.xkcd.com/comics/git.png)](https://imgs.xkcd.com/comics/git.png)
a lot of text
some more text...
[![](https://1.bp.blogspot.com/-Ze2SiBflkZ4/XbtF1TjELcI/AAAAAAAALL4/IDC6W-b5moU0eGu2eN60aZ4pxfXW1ybmQCLcBGAsYHQ/s320/take_a_break_git.gif)](https://1.bp.blogspot.com/-Ze2SiBflkZ4/XbtF1TjELcI/AAAAAAAALL4/IDC6W-b5moU0eGu2eN60aZ4pxfXW1ybmQCLcBGAsYHQ/s1600/take_a_break_git.gif)
some more text
another URL but not image
[https://github.com]
so on
I am trying to parse through this file and extract the list of image URLs, which I can later pass on wget command to download.
So far I have used grep and sed and have got results:
$ sed -nE "/https?:\/\/[^ ]+.(jpg|png|gif)/p" $path
[![](https://imgs.xkcd.com/comics/git.png)](https://imgs.xkcd.com/comics/git.png)
[![](https://1.bp.blogspot.com/-Ze2SiBflkZ4/XbtF1TjELcI/AAAAAAAALL4/IDC6W-b5moU0eGu2eN60aZ4pxfXW1ybmQCLcBGAsYHQ/s320/take_a_break_git.gif)](https://1.bp.blogspot.com/-Ze2SiBflkZ4/XbtF1TjELcI/AAAAAAAALL4/IDC6W-b5moU0eGu2eN60aZ4pxfXW1ybmQCLcBGAsYHQ/s1600/take_a_break_git.gif)
$ grep -Eo "https?://[^ ]+.(jpg|png|gif)" $path
https://imgs.xkcd.com/comics/git.png)](https://imgs.xkcd.com/comics/git.png
https://1.bp.blogspot.com/-Ze2SiBflkZ4/XbtF1TjELcI/AAAAAAAALL4/IDC6W-b5moU0eGu2eN60aZ4pxfXW1ybmQCLcBGAsYHQ/s320/take_a_break_git.gif)](https://1.bp.blogspot.com/-Ze2SiBflkZ4/XbtF1TjELcI/AAAAAAAALL4/IDC6W-b5moU0eGu2eN60aZ4pxfXW1ybmQCLcBGAsYHQ/s1600/take_a_break_git.gif
The regex is essentially working fine, but the issue is that as the same URL is present twice in the same line, the text selected is the first occurrence of https and last occurrence of jpg|png|gif. But I want the first occurrence of https and first occurrence of jpg|png|gif
How can fix this?
P.S. I have also tried lynx -dump -image_links -listonly $path but this prints the entire file.
I am also open to other options that solve the purpose, and as long as I can hook the code up in my current shell script.

You may add square brackets into the negated bracket expression:
grep -Eo "https?://[^][ ]+\.(jpg|png|gif)"
See the online demo. Details:
https?:// - http:// or https://
[^][ ]+ - one or more chars other than ], [ and space
\. - a dot
(jpg|png|gif) - either of the three alternative substrings.

Exiftool: batch-write metadata to JPEGs from text file

I'd like to use ExifTool to batch-write metadata that have been previously saved in a text file.
Say I have a directory containing the following JPEG files:
001.jpg 002.jpg 003.jpg 004.jpg 005.jpg
I then create the file metadata.txt, which contains the file names followed by a colon, and I hand it out to a coworker, who will fill it with the needed metadata — in this case comma-separated IPTC keywords. The file would look like this after being finished:
001.jpg: Keyword, Keyword, Keyword
002.jpg: Keyword, Keyword, Keyword
003.jpg: Keyword, Keyword, Keyword
004.jpg: Keyword, Keyword, Keyword
005.jpg: Keyword, Keyword, Keyword
How would I go about feeding this file to ExifTool and making sure that the right keywords get saved to the right file? I'm also open to changing the structure of the file if that helps, for example by formatting it as CSV, JSON or YAML.

If you can change the format to a CSV file, then exiftool can directly read it with the -csv option.
You would have to reformat it in this way. The first row would have to have the header of "SourceFile" above the filenames and "Keywords" above the keywords. If the filenames don't include the path to the files, then command would have to be run from the same directory as the files. The whole keywords string need to be enclosed in quotes so they aren't read as a separate columns. The result would look like this:
SourceFile,Keywords
001.jpg,"KeywordA, KeywordB, KeywordC"
002.jpg,"KeywordD, KeywordE, KeywordF"
003.jpg,"KeywordG, KeywordH, KeywordI"
004.jpg,"KeywordJ, KeywordK, KeywordL"
005.jpg,"KeywordM, KeywordN, KeywordO"
At that point, your command would be
exiftool -csv=/path/to/file.csv -sep ", " /path/to/files
The -sep option is needed to make sure the keywords are treated as separate keywords rather than a single, long keyword.
This has an advantage over a script looping over the file contents and running exiftool once for each line. Exiftool's biggest performance hit is in its startup and running it in a loop will be very slow, especially on a large amount of files (see Common Mistake #3).
See ExifTool FAQ #26 for more details on reading from a csv file.

I believe the answer by #StarGeek is superior to mine, but I will leave mine for completeness and reference of a more basic, Luddite approach :-)
I think you want this:
#!/bin/bash
while IFS=': ' read file keywords ; do
exiftool -sep ", " -iptc:Keywords="$keywords" "$file"
done < list.txt
Here is the list.txt:
001.jpg: KeywordA, KeywordB, KeywordC
002.jpg: KeywordD, KeywordE, KeywordF
003.jpg: KeywordG, KeywordH, KeywordI
And here is a result:
exiftool -b -keywords 002.jpg
KeywordD
KeywordE
KeywordF
Many thanks to StarGeek for his corrections and explanations.

Howto find specifc metadata content and delete it from pictures?

I try to find a specific metadata content like "Odeon" in my *.jpg files with the help of exiftool.exe and then delete this specific tag from the file.
I can't find "Odeon" with the command
exiftool -if "$keywords =~ /Odeon/" .
I know it's there, for this example, the content is stored in the tag "Location".
Can please someone tell me, howto
a) find the content, wherever it will be stored inside a *.jpg
b) delete exactly this found tag from this file (without backup of the file)?

The reason that your command doesn't work is because you're only checking the Keywords tag. Location is a different tag and you would have check there for that info.
Unfortunately, Exiftool doesn't have the ability to list only the tags that have matching data. You can pipe the output through another command line program like Find (since you're on Windows) or Grep (other platforms). In that case, your command line would look like this:
exiftool -g1 -a -s FileOrDir | Find "Odeon"
That would list all the tags that have your info.
After you found the tag, you could then remove it without having a backup file with this command, replacing TAG with the name of the tag:
exiftool -overwrite_original -TAG= FileOrDir
Take note that this command would remove that tag from all the files if you specify a dir. If you want to be more selective and the tag contains ONLY the text "Odeon", then you could use this command. Note that this command is case sensitive. It would not remove "oDeON" or other variations:
exiftool -overwrite_original -TAG-="Odeon" FileOrDir
If you wanted to remove a certain tag that contains "Odeon" as part of a longer string and be case insensitive, then you could add the -if option.
exiftool -overwrite_original -if "$TAG=~/odeon/i" -TAG= FileOrDir
Finally, there is the shotgun approach using the -api "Filter=…" option. This requires version 10.05 or greater. This command:
exiftool -overwrite_original -api "Filter=s/odeon//gi" -tagsfromfile # -all:all FileOrDir
would remove "odeon" (case insensitive) from all tags in the file. It would not remove the tag and if odeon was part of a longer string, the rest of the string would remain. For example, if Location was equal to "Odeon", it would become a blank string. If Description was "This is Odeon", it would become "This is ". The part after "Filter=" is a perl regex substitution and you could further refine it by looking into regex.

Use ffmpeg to edit metadata titles for multiple files

I'd like to be able to add/edit video metadata titles to multiple files at once or with a single command, but I don't know how to tell ffmpeg to do this.
I read a similar post on the Ubuntu Forums, but I have never used string manipulation in Linux before, so the commands I'm seeing in the post are way out of my comprehension at the moment, and much of the discussion goes over my head.
I've got all of my video files in a filename format that includes the show name, the episode number, and episode title. For example:
show_name - episode_number - episode_title.extension
Bleach - 001 - A Shinigami Is Born!.avi
Is there a simple way to read the title and episode number from the filename and put it into a metadata tag without having to go through each and every file manually?
EDIT 1: So I found out that I can iterate through files in a directory, and echo the filename, and I was told by a friend to try bash to parse the strings and return values from that to use in the ffmpeg command line. The problem is, I have absolutely no idea how to do this. The string manipulation in bash is very confusing on first look, and I can't seem to get it to output what I want into my variables. My test bash:
for file in "Bleach - 206 - The Past Chapter Begins! The Truth from 110 Years Ago.mkv"; do extension=${file##*.} showName=${file%% *} episode=${file:9:3}; echo Extension: $extension Show: $showName Episode: $episode; done
That outputs
Extension: mkv Show: Bleach Episode: 206
Which are all the variables I'm going to need, I just don't know how to move those to be run in ffmpeg now.
EDIT 2: I believe I was able, through much trial and error, to find a bash command that would do exactly what I wanted.
for file in *; do newname=${file:0:-4}_2 ext=${file##*.} filename=${file} showname=${file%% *} episode=${file:9:3} nameext=${file##*- } title=${nameext%.*}; ffmpeg -i "$filename" -metadata title="$title" -metadata track=$episode -metadata album=$showname -c copy "$newname.$ext"; mv -f "$newname.$ext" "$filename"; done
This lets me parse the information from the filename, copy it to some variables, and then run ffmpeg using those variables. It outputs to a second file, then moves that file to the original location, overwriting the original. One could remove that section out if you're not sure about how it's going to parse your files, but I'm glad I was able to get a solution that works for me.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio