Recursive wildcards in GNU make? - makefile

It's been a while since I've used make, so bear with me...
I've got a directory, flac, containing .FLAC files. I've got a corresponding directory, mp3 containing MP3 files. If a FLAC file is newer than the corresponding MP3 file (or the corresponding MP3 file doesn't exist), then I want to run a bunch of commands to convert the FLAC file to an MP3 file, and copy the tags across.
The kicker: I need to search the flac directory recursively, and create corresponding subdirectories in the mp3 directory. The directories and files can have spaces in the names, and are named in UTF-8.
And I want to use make to drive this.

I would try something along these lines
FLAC_FILES = $(shell find flac/ -type f -name '*.flac')
MP3_FILES = $(patsubst flac/%.flac, mp3/%.mp3, $(FLAC_FILES))
.PHONY: all
all: $(MP3_FILES)
mp3/%.mp3: flac/%.flac
#mkdir -p "$(#D)"
#echo convert "$<" to "$#"
A couple of quick notes for make beginners:
The # in front of the commands prevents make from printing the command before actually running it.
$(#D) is the directory part of the target file name ($#)
Make sure that the lines with shell commands in them start with a tab, not with spaces.
Even if this should handle all UTF-8 characters and stuff, it will fail at spaces in file or directory names, as make uses spaces to separate stuff in the makefiles and I am not aware of a way to work around that. So that leaves you with just a shell script, I am afraid :-/

You can define your own recursive wildcard function like this:
rwildcard=$(foreach d,$(wildcard $(1:=/*)),$(call rwildcard,$d,$2) $(filter $(subst *,%,$2),$d))
The first parameter ($1) is a list of directories, and the second ($2) is a list of patterns you want to match.
Examples:
To find all the C files in the current directory:
$(call rwildcard,.,*.c)
To find all the .c and .h files in src:
$(call rwildcard,src,*.c *.h)
This function is based on the implementation from this article, with a few improvements.

If you're using Bash 4.x, you can use a new globbing option, for example:
SHELL:=/bin/bash -O globstar
list:
#echo Flac: $(shell ls flac/**/*.flac)
#echo MP3: $(shell ls mp3/**/*.mp3)
This kind of recursive wildcard can find all the files of your interest (.flac, .mp3 or whatever). O

FWIW, I've used something like this in a Makefile:
RECURSIVE_MANIFEST = `find . -type f -print`
The example above will search from the current directory ('.') for all "plain files" ('-type f') and set the RECURSIVE_MANIFEST make variable to every file it finds. You can then use pattern substitutions to reduce this list, or alternatively, supply more arguments into find to narrow what it returns. See the man page for find.

My solution is based on the one above, uses sed instead of patsubst to mangle the output of find AND escape the spaces.
Going from flac/ to ogg/
OGGS = $(shell find flac -type f -name "*.flac" | sed 's/ /\\ /g;s/flac\//ogg\//;s/\.flac/\.ogg/' )
Caveats:
Still barfs if there are semi-colons in the filename, but they're pretty rare.
The $(#D) trick won't work (outputs gibberish), but oggenc creates directories for you!

Here's a Python script I quickly hacked together to solve the original problem: keep a compressed copy of a music library. The script will convert .m4a files (assumed to be ALAC) to AAC format, unless the AAC file already exists and is newer than the ALAC file. MP3 files in the library will be linked, since they are already compressed.
Just beware that aborting the script (ctrl-c) will leave behind a half-converted file.
I originally also wanted to write a Makefile to handle this, but since it cannot handle spaces in filenames (see the accepted answer) and because writing a bash script is guaranteed to put in me in a world of pain, Python it is. It's fairly straightforward and short, and thus should be easy to tweak to your needs.
from __future__ import print_function
import glob
import os
import subprocess
UNCOMPRESSED_DIR = 'Music'
COMPRESSED = 'compressed_'
UNCOMPRESSED_EXTS = ('m4a', ) # files to convert to lossy format
LINK_EXTS = ('mp3', ) # files to link instead of convert
for root, dirs, files in os.walk(UNCOMPRESSED_DIR):
out_root = COMPRESSED + root
if not os.path.exists(out_root):
os.mkdir(out_root)
for file in files:
file_path = os.path.join(root, file)
file_root, ext = os.path.splitext(file_path)
if ext[1:] in LINK_EXTS:
if not os.path.exists(COMPRESSED + file_path):
print('Linking {}'.format(file_path))
link_source = os.path.relpath(file_path, out_root)
os.symlink(link_source, COMPRESSED + file_path)
continue
if ext[1:] not in UNCOMPRESSED_EXTS:
print('Skipping {}'.format(file_path))
continue
out_file_path = COMPRESSED + file_path
if (os.path.exists(out_file_path)
and os.path.getctime(out_file_path) > os.path.getctime(file_path)):
print('Up to date: {}'.format(file_path))
continue
print('Converting {}'.format(file_path))
subprocess.call(['ffmpeg', '-y', '-i', file_path,
'-c:a', 'libfdk_aac', '-vbr', '4',
out_file_path])
Of course, this can be enhanced to perform the encoding in parallel. That is left as an exercise to the reader ;-)

Related

Wildcard on mv folder destination

I'm writing a small piece of code that checks for .mov files in a specific folder over 4gb and writes it to a log.txt file by name (without an extension). I'm then reading the names into a while loop line by line which signals some archiving and copying commands.
Consider a file named abcdefg.mov (new) and a corresponding folder somewhere else named abcdefg_20180525 (<-*underscore timestamp) that also contains a file named abcedfg.mov (old).
When reading in the filename from the log.txt, I strip the extension to store the variable "abcdefg" ($in1) and i'm using that variable to locate a folder elsewhere that contains that matching string at the beginning.
My problem is with how the mv command seems to support a wild card in the "source" string, but not in the "destination" string.
For example i can write;
mv -f /Volumes/Myshare/SourceVideo/$in1*/$in1.mov /Volumes/Myshare/Archive
However a wildcard on the destination doesn't work in the same way. For example;
mv -f /Volumes/Myshare/Processed/$in1.mov Volumes/Myshare/SourceVideo/$in1*/$in1.mov
Is there an easy fix here that doesn't involve using another method?
Cheers for any help.
mv accepts a single destination path. Suppose that $in1 is abcdfg, and that $in1* expands to abcdefg_20180525 and abcdefg_20180526. Then the command
mv -f /dir1/$in1 /dir2/$in1*/$in1.mov
will be equivalent to:
mv -f /dir1/abcdefg.mov /dir2/abcdefg_20180526/abcdefg.mov
mv -f /dir1/abcdefg.mov /dir2/abcdefg_20180526/abcdefg.mov
mv -f /dir2/abcdefg_20180525/abcdefg.mov /dir2/abcdefg_20180526/abcdefg.mov
Moreover, because the destination file is the same in all three cases, the first two files will be overwritten by the third.
You should create a precise list and do a precise copy instead of using wild cards.
This is what I would probably do, generate a list of results in a file with FULL path information, then read those results in another function. I could have used arrays but I wanted to keep it simple. At the bottom of this script is a function call to scan for files of EXT mp4 (case insensitive) then writes the results to a file in tmp. then the script reads the results from that file in another function and performs some operation (mv etc.). Note, if functions are confusing , you can just remove the function name { } and name calls and it becomes a normal script again. functions are really handy, learn to love them!
#!/usr/bin/env bash
readonly SIZE_CHECK_LIMIT_MB="10M"
readonly FOLDER="/tmp"
readonly DESTINATION_FOLDER="/tmp/archive"
readonly SAVE_LIST_FILE="/tmp/$(basename $0)-save-list.txt"
readonly EXT="mp4"
readonly CASE="-iname" #change to -name for exact ext type upper/lower
function find_files_too_large() {
> ${SAVE_LIST_FILE}
find "${FOLDER}" -maxdepth 1 -type f "${CASE}" "*.${EXT}" -size +${SIZE_CHECK_LIMIT_MB} -print0 | while IFS= read -r -d $'\0' line ; do
echo "FOUND => $line"
echo "$line" >> ${SAVE_LIST_FILE}
done
}
function archive_large_files() {
local read_file="${SAVE_LIST_FILE}"
local write_folder="$DESTINATION_FOLDER"
if [ ! -s "${read_file}" ] || [ ! -f "${read_file}" ] ;then
echo "No work to be done ... "
return
fi
while IFS= read -r line ;do
echo "mv $line $write_folder" ;sleep 1
done < "${read_file}"
}
# MAIN (this is where the script starts) We just call two functions.
find_files_too_large
archive_large_files
it might be easier, i think, to change the filenames to the folder name initially. So abcdefg.mov would be abcdefg_timestamp.mov. I can always strip the timestamp from the filename easy enough after its copied to the right location. I was hoping i had a small syntax issue but i think there is no easy way of doing what i thought i could...
I think you have a basic misunderstanding of how wildcards work here. The mv command doesn't support wildcards at all; the shell expands all wildcards into lists of matching files before they get passed to the mv command as wildcards. Furthermore, the mv command doesn't know if the list of arguments it got came from wildcards or not, and the shell doesn't know anything about what the command is going to do with them. For instance, if you run the command grep *, the grep command just gets a list of names of files in the current directory as arguments, and will treat the first of them as a regex pattern ('cause that's what the first argument to grep is) to search the rest of the files for. If you ran mv * (note: don't do this!), it will interpret all but the last filename as sources, and the last one as a destination.
I think there's another source of confusion as well: when the shell expands a string containing a wildcard, it tries to match the entire thing to existing files and/or directories. So when you use Volumes/Myshare/SourceVideo/$in1*/$in1.mov, it looks for an already-existing file in a matching directory; AIUI the file isn't there yet, there's no match. What it does in that case is pass the raw (unexpanded) wildcard-containing string to mv as an argument, which looks for that exact name, doesn't find it, and gives you an error.
(BTW, should there be a "/" at the front of that pattern? I assume so below.)
If I understand the situation correctly, you might be able to use this:
mv -f /Volumes/Myshare/Processed/$in1.mov /Volumes/Myshare/SourceVideo/$in1*/
Since the filename isn't supplied in the second string, it doesn't look for existing files by that name, just directories with the right prefix; mv will automatically retain the filename from the source.
However, I'll echo #Sergio's warning about chaos from multiple matches. In this case, it won't overwrite files (well, it might, but for other reasons), but if it gets multiple matching target directories it'll move all but the last one into the last one (along with the file you meant to move). You say you're 100% certain this won't be a problem, but in my experience that means that there's at least a 50% chance that something you'd never have thought of will go ahead and make it happen anyway. For instance, is it possible that $in1 could wind up empty, or contain a space, or...?
Speaking of spaces, I'd also recommend double-quoting all variable references. You want the variables inside double-quotes, but the wildcards outside them (or they won't be expanded), like this:
mv -f "/Volumes/Myshare/Processed/$in1.mov" "/Volumes/Myshare/SourceVideo/$in1"*/

Iterating a group of folders and files while removing certain files that are contained in a list

I have a set of files that I download that contain files that I want to remove. I would like to create a list of some form, the script should support blobbing so I can be pretty aggressive with file removal without getting into the complexities of using regex within the list of files.
I am also stumped in that I put a sleep command within the loop of my script, and that is not getting run after each iteration, but only once at the end of run.
Here is the script
# Get to the place where all the durty work happens
cd /Volumes/Videos
FILES=".DS_Store
*.txt
*.sample
*.sample.*
*.samples"
if [ "$(pwd)" == "/Volumes/Videos" ]; then
echo "You are currently in $(pwd)"
echo "You would not have read the above if this script were operating anywhere else"
# Dekete fikes from list above
for f in "$FILES"
do
echo "Removing $f";
rm -f "$f";
echo "$f has been deleted";
sleep 10;
echo "";
echo "";
done
# See if dir is empty, ask if we want to delete it or keep it
# Iterate evert movie file, see if we want to nuke contents. Maybe use part of last openned to help find those files fast
else
# Not in the correct directory
echo "This script is trying to alter files in a location that it should not be working"
echo "Script is currently trying to work in $(pwd)"
exit 1
fi
The main thing that has be completely stumped is the sleep command. It is run once, not once per file iteration. If I have 100 files to go through I get 10 seconds of sleep, not 100*10.
I will be adding in some other features, like if a file is smaller than x bytes, go ahead and delete it too. These files will have spaces and other odd characters in the filenames, am I creating my variables correctly to make this script handle those scenarios as well as be as POSIX compliant as possible. I will change the shebang to sh over bash and try to add in set -o noun set and set -o err exit though I tend to have a lot of trouble when I do that.
Is there a better form of list I should be using? I am not objectionable to storing the pattern match list in a separate file. I can include it, or read it in with any of a few commands.
These are also nested files, a dir, that contains files, or a dir that contains a dir that contains some files. Something like this:
/Volumes/Videos:
The Great guy in a tree
The Great guy in a tree S01e01
sample.avi
readme.txt
The Great guy in a tree S01e01.mpg
The Great guy in a tree S01e02
The Great guy in a tree S01e02.mpg
The Great guy in a tree S01e03
The Great guy in a tree S01e03.mpg
The Great guy in a tree S01e04
The Great guy in a tree S01e04.mpg
Thank you.
The reason that your script is not working as you expect is because your for loop is written incorrectly. This example shows what is going on:
$ i=0
$ FILES=".DS_Store
*.txt
*.sample
*.sample.*
*.samples"
$ for f in "$FILES"; do echo $((++i)) "$f"; done
1 .DS_Store
*.txt
*.sample
*.sample.*
*.samples
Note that only one number is output, indicating that the loop is only going around once. Also, no pathname expansion has occurred.
In order to make your script work as you expect, you can remove the quotes around "$FILES". This means that each word in your string will be evaluated separately, rather than all at once. It also means that pathname expansion of the wildcards that you are using will occur, so all files ending in .txt will be removed, which I guess is what you meant.
Instead of using a string to store your list of expressions, you might prefer to make use of an array:
FILES=( '.DS_Store' '*.txt' '*.sample' '*.sample.*' '*.samples' )
The quotes around each element prevent expansion (so the array only has 5 elements, not the fully expanded list). You could then change your loop to for f in ${FILES[#]} (again, no double quotes results in each element of the list being expanded).
Although removing the quotes fixes your script, I would agree with #hek2mgl's suggestion of using find. It allows you to find files by name, size, date modified and a lot more in one line. If you want to pause between the deletion of each file, you could use something like this:
find \( -name "*.sample" -o -name "*.txt" \) -delete -exec sleep 10 \;
You can use find:
find -type f -name '.DS_Store' -o -name '*.txt' -o -name '*.sample.*' -o -name '*.samples' -delete

Copying unique files to a new location

I am a total newbie to Linux and bash scripting and am currently stumped with this problem!
I have a directory containing many images from which I need to copy the unique images to a new location. I know there are numerous options for how to go about doing this but have very limited knowledge at the moment so appreciate I may be going about this the wrong way.
I used find and cat to create this list and have attempted to copy the files across with the intention of comparing them (using md5 and checking file names) when they are there.
However, the text file has 30 files on it but only 18 have been copied over. Can anyone advise?
My code to find files is -
find $1 -name "IMG_****.JPG" | cat > list.txt
and my code to copy from the list is
for image in $(cat list.txt);
do
cp $image $2
done
You're doing this much too complicated. Do not pipe find output to cat to pipe it into a list. This is an unnecessary use of cat. If you must, you can redirect the output of every program directly:
find "$1" -name "IMG_*.JPG" > list.txt
Also, do not use for to read lines from a file. Better use while with read:
while read -r filename; do
cp "$filename" "$2"
done < list.txt
But it's even easier. You can just work with the files directly from find:
find "$1" -name "IMG_*.JPG" -exec cp {} "$2" \;
Here, {} will be replaced by each filename that find finds. Don't forget to quote your variables, so that spaces in file paths are no problem.
Another much simpler method with Bash options:
shopt -s nullglob globstar
cp -t "$2" -- "$1"/**/IMG_*.JPG
Here, globstar enables recursive matching of directories through **. The -t option to cp specifies the target of the copy operation.* The command will be expanded to cp -t target -- source1/IMG_foo.JPG source2/IMG_bar.JPG et cetera.
Now, as to your original issue, it could have been that some images have a space in their name. This would have broken your original script. If your image files contained a newline in their name, it also wouldn't have worked with while read … – but you would have gotten an error in that case of a file not being found.
Also note that cp overwrites files with the same name. Without asking for confirmation. So if in your subdirectories there are images with the same filename, you'd only get one result, with the latest overwriting the existing one.
* The -- isn't strictly necessary, but it's a good habit to include it to tell the command when the options arguments are over.

how to change the extension of multiple files using bash script?

I am very new with linux usage maybe this is my first time so i hope some detailed help please.
I have more than 500 files in multiple directories on my server (Linux) I want to change their extensions to .xml using bash script
I used a lot of codes but none of them work some codes i used :
for file in *.txt
do
mv ${file} ${file/.txt}/.xml
done
or
for file in *.*
do
mv ${file} ${file/.*}/.xml
done
i do not know even if the second one is valid code or not i tried to change the txt extension beacuse the prompt said no such file '.txt'
I hope some good help for that thank you
Explanation
For recursivity you need Bash >=4 and to enable ** (i.e. globstar) ;
First, I use parameter expansion to remove the string .txt, which must be anchored at the end of the filename (%) :
the # anchors the pattern (plain word or glob) to the beginning,
and the % anchors it to the end.
Then I append the new extension .xml
Be extra cautious with filename, you should always quote parameters expansion.
Code
This should do it in Bash (note that I only echothe old/new filename, to actually rename the files, use mv instead of echo) :
shopt -s globstar # enable ** globstar/recursivity
for i in **/*.txt; do
[[ -d "$i" ]] && continue; # skip directories
echo "$i" "${i/%.txt}.xml";
done
If its a matter of a one or two sub-directories, you can use the rename command:
rename .txt .xml *.txt
This will rename all the .txt to .xml files in the directory from which the command is executed.
If all the files are in same directory, it can be done using a single command. For example you want to convert all jpg files to png, go to the related directory location and then use command
rename .jpg .png *
I wanted to rename "file.txt" to "file.jpg.txt", used rename easy peezy:
rename 's/.txt$/.jpg.txt/' *.txt
man rename will tell you everything you need to know.
Got to love Linux, there's a tool for everything :-)
passing command line argument for dir path
#!/bin/sh
cd $1
names_1=`ls`
for file in ${names_1}
do
mv ${file} ${file}.jpg
done

Bash: Check all files in a location against another for existence

I'm after a little help with some Bash scripting (on OSX). I want to create a script that takes two parameters - source folder and target folder - and checks all files in the source hierarchy to see whether or not they exist in the target hierarchy. i.e. Given a data DVD check whether the files contained on it are already on the internal drive.
What I've come up with so far is
#!/bin/bash
if [ $# -ne 2 ]
then
echo "Usage is command sourcedir targetdir"
exit 0
fi
source="$1"
target="$2"
for f in "$( find $source -type f -name '*' -print )"
do
I'm now not sure how it's best to obtain the filename without its path and then see if it exists. I am really a beginner at scripting.
Edit: The answers given so far are all very efficient in terms of compact code. However I need to be able to look for files found within the total source hierarchy anywhere within the target hierarchy. If found I would like to compare checksums and last modified dates etc and comment or, if not found, I would like to note this. The purpose is to check whether files on external media have been uploaded to a file server.
This should give you some ideas:
#!/bin/bash
DIR1="tmpa"
DIR2="tmpb"
function sorted_contents
{
cd "$1"
find . -type f | sort
}
DIR1_CONTENTS=$(sorted_contents "$DIR1")
DIR2_CONTENTS=$(sorted_contents "$DIR2")
diff -y <(echo "$DIR1_CONTENTS") <(echo "$DIR2_CONTENTS")
In my test directories, the output was:
[user#host so]$ ./dirdiff.sh
./address-book.dat ./address-book.dat
./passwords.txt ./passwords.txt
./some-song.mp3 <
./the-holy-grail.info ./the-holy-grail.info
> ./victory.wav
./zzz.wad ./zzz.wad
If its not clear, "some-song.mp3" was only in the first directory while "victory.wav" was only in the second. The rest of the files were common.
Note that this only compares the file names, not the contents. If you like where this is headed, you could play with the diff options (maybe --suppress-common-lines if you want cleaner output).
But this is probably how I'd approach it -- offload a lot of the work onto diff.
EDIT: I should also point out that something as simple as:
[user#host so]$ diff tmpa tmpb
would also work:
Only in tmpa: some-song.mp3
Only in tmpb: victory.wav
... but not feel as satisfying as writing a script yourself. :-)
To list only files in $source_dir that do not exist in $target_dir:
comm -23 <(cd "$source_dir" && find .|sort) <(cd "$target_dir" && find .|sort)
You can limit it to just regular files with -f on the find commands, etc.
The comm command (short for "common") finds lines in common between two text files and outputs three columns: lines only in the first file, lines only in the second file, and lines common to both. The numbers suppress the corresponding column, so the output of comm -23 is only the lines from the first file that don't appear in the second.
The process substitution syntax <(command) is replaced by the pathname to a named pipe connected to the output of the given command, which lets you use a "pipe" anywhere you could put a filename, instead of only stdin and stdout.
The commands in this case generate lists of files under the two directories - the cd makes the output relative to the directories being compared, so that corresponding files come out as identical strings, and the sort ensures that comm won't be confused by the same files listed in different order in the two folders.
A few remarks about the line for f in "$( find $source -type f -name '*' -print )":
Make that "$source". Always use double quotes around variable substitutions. Otherwise the result is split into words that are treated as wildcard patterns (a historical oddity in the shell parsing rules); in particular, this would fail if the value of the variable contain spaces.
You can't iterate over the output of find that way. Because of the double quotes, there would be a single iteration through the loop, with $f containing the complete output from find. Without double quotes, file names containing spaces and other special characters would trip the script.
-name '*' is a no-op, it matches everything.
As far as I understand, you want to look for files by name independently of their location, i.e. you consider /dvd/path/to/somefile to be a match to /internal-drive/different/path-to/somefile. So make an list of files on each side indexed by name. You can do this by massaging the output of find a little. The code below can cope with any character in file names except newlines.
list_files () {
find . -type f -print |
sed 's:^\(.*\)/\(.*\)$:\2/\1/\2:' |
sort
}
source_files="$(cd "$1" && list_files)"
dest_files="$(cd "$2" && list_files)"
join -t / -v 1 <(echo "$source_files") <(echo "$dest_files") |
sed 's:^[^/]*/::'
The list_files function generates a list of file names with paths, and prepends the file name in front of the files, so e.g. /mnt/dvd/some/dir/filename.txt will appear as filename.txt/./some/dir/filename.txt. It then sorts the files.
The join command prints out lines like filename.txt/./some/dir/filename.txt when there is a file called filename.txt in the source hierarchy but not in the destination hierarchy. We finally massage its output a little since we no longer need the filename at the beginning of the line.

Resources