Filter folders that do not contain any audio files with bash - bash

Given a root folder, how do I filter down subfolders that do not contain any audio files (mp3, wav and flac)? Do I need to set a variable like
folders = find /parentfolder/ -type d
and then pass some expression on ${folders} or is there a one-liner for this?

All the subdirectories of . (we write that into a file):
find . -type d | sort > all_dirs.txt
All subdirectories that do contain an mp3 file (goes into another file):
find . -name "*.mp3" | xargs dirname | sort | uniq > music_dirs.txt
And this is the lines that are only contained in the first file but not the second:
diff --new-line-format="" --unchanged-line-format="" all_dirs.txt music_dirs.txt
If you think oneliners are cool and you are working in bash, here it is a bit more condensed:
diff --new-line-format="" --unchanged-line-format="" <(find . -type d | sort) <(find . -name "*.mp3" | xargs dirname | sort | uniq)

Related

Bash find: exec in reverse oder

I am iterating over files like so:
find $directory -type f -exec codesign {} \;
Now the problem here is that files on a higher hierarchy are signed first.
Is there a way to iterate over a directory tree and handle the deepest files first?
So that
/My/path/to/app/bin
is handled before
/My/path/mainbin
Yes, just use -depth:
-depth
The primary shall always evaluate as true; it shall cause descent of the directory hierarchy to be done so that all entries in a directory are acted on before the directory itself. If a -depth primary is not specified, all entries in a directory shall be acted on after the directory itself. If any -depth primary is specified, it shall apply to the entire expression even if the -depth primary would not normally be evaluated.
For example:
$ mkdir -p top/a/b/c/d/e/f/g/h
$ find top -print
top
top/a
top/a/b
top/a/b/c
top/a/b/c/d
top/a/b/c/d/e
top/a/b/c/d/e/f
top/a/b/c/d/e/f/g
top/a/b/c/d/e/f/g/h
$ find top -depth -print
top/a/b/c/d/e/f/g/h
top/a/b/c/d/e/f/g
top/a/b/c/d/e/f
top/a/b/c/d/e
top/a/b/c/d
top/a/b/c
top/a/b
top/a
top
Note that at a particular level, ordering is still arbitrary.
Using GNU utilities, and decorate-sort-undecorate pattern (aka Schwartzian transform):
find . -type f -printf '%d %p\0' |
sort -znr |
sed -z 's/[0-9]* //' |
xargs -0 -I# echo codesign #
Drop the echo if the output looks ok.
Using find's -depth option as my other answer, or naive sort as some others, only ensures that sub-directories of a directory are processed before the directory itself, but not that the deepest level is processed first.
For example:
$ mkdir -p top/a/b/d/f/h top/a/c/e/g
$ find top -depth -print
top/a/c/e/g
top/a/c/e
top/a/c
top/a/b/d/f/h
top/a/b/d/f
top/a/b/d
top/a/b
top/a
top
For overall deepest level to be processed first, the ordering should be something like:
top/a/b/d/f/h
top/a/c/e/g
top/a/b/d/f
top/a/c/e
top/a/b/d
top/a/c
top/a/b
top/a
top
To determine this ordering, the entire list must be known, and then the number of levels (ie. /) of each path counted to enable ranking.
A simple-ish Perl script (assigned to a shell function for this example) to do this ordering is:
$ dsort(){
perl -ne '
BEGIN { $/ = "\0" } # null-delimited i/o
$fname[$.] = $_;
$depth[$.] = tr|/||;
END {
print
map { $fname[$_] }
sort { $depth[$b] <=> $depth[$a] }
keys #fname
}
'
}
Then:
$ find top -print0 | dsort | xargs -0 -I# echo #
top/a/b/d/f/h
top/a/c/e/g
top/a/b/d/f
top/a/c/e
top/a/b/d
top/a/c
top/a/b
top/a
top
How about sorting the output of find in descending order:
while IFS= read -d "" -r f; do
codesign "$f"
done < <(find "$directory" -type f -print0 | sort -zr)
<(command ..) is a process substitution which feeds the output
of the command to the read command in while loop via the redirect.
-print0, sort -z and read -d "" combo uses a null character
as a file delimiter. It is useful to protect filenames which include
special characters such as whitespace.
I don't know if there is a native way in find, but you may pipe the output of it into a loop and process it line by line as you wish this way:
find . | while read file; do echo filename: "$file"; done
In your case, if you are happy just reversing the output of find, you may go with something like:
find $directory -type f | tac | while read file; do codesign "$file"; done

list subdirectories recursively

I want to write a shell script so as to recursively list all different subdirectories which contain NEW subdirectory (NEW is a fixed name).
Dir1/Subdir1/Subsub1/NEW/1.jpg
Dir1/Subdir1/Subsub1/NEW/2.jpg
Dir1/Subdir1/Subsub2/NEW/3.jpg
Dir1/Subdir1/Subsub2/NEW/4.jpg
Dir1/Subdir2/Subsub3/NEW/5.jpg
Dir1/Subdir2/Subsub4/NEW/6.jpg
I want to get
Dir1/Subdir1/Subsub1
Dir1/Subdir1/Subsub2
Dir1/Subdir2/Subsub3
Dir1/Subdir2/Subsub4
How can I do that?
find . -type d -name NEW | sed 's|/NEW$||'
--- EDIT ---
for your comment, sed does not have a -print0. There are various ways of doing this (most of which are wrong). One possible solution would be:
find . -type d -name NEW -print0 | \
while IFS= read -rd '' subdirNew; do \
subdir=$(sed 's|/NEW$||' <<< "$subdirNew"); \
echo "$subdir"; \
done
which should be tolerant of spaces and newlines in the filename
ls -R will list things recursively.
of find . | grep "/NEW/" should give you the type of list you are looking for.
You could try this:
find . -type d -name "NEW" -exec dirname {} \;

bash script to list duplicate hash files [duplicate]

This question already has answers here:
Linux Command Line using for loop and formatting results
(3 answers)
Closed 5 years ago.
I want to create a bash script that searches a given directory for pictures to copy. the pictures have to have the name format IMG_\d\d\d\d.JPG. If the pictures have a duplicate filename, then copy them to /images/archives and append .JPG to the end of their name, so the duplicates have .JPG.JPG. There are also duplicate pictures, so I want to hash each picture and check if it is a duplicate picture. If it is a duplicate picture, then do not copy the duplicate into /archives but store the duplicate file path into a file called output.txt.
I am struggling with trying to get the duplicate hashes to display the filenames as well. This is what I had so far:
if [ -d $1 ]
then echo using directory $1 as source
else echo Sorry, not a valid drive
exit
fi
if [ -d $2 ]
then echo $2 target location already exists
else mkdir -p $2
fi
cd $1
myList=`find . -mindepth 1 -type f -name "*MG_[0-9][0-9][0-9][0-9].JPG"`
echo $myList
ImagesToCopy=`find . -mindepth 1 -type f -name "*MG_[0-9][0-9][0-9][0-9].JPG" -exec md5sum {} \; | cut -f1 -d" " | sort | uniq`
echo $ImagesToCopy
This gives me a list of the files I need to copy and their hashes. In the command line if I type in the command:
# find . -mindepth 1 -type f -name "*MG_[0-9][0-9][0-9][0-9].JPG" -exec md5sum {} \; | sort | cut -f1 -d" "| uniq -d
I receive the results:
266ab54fd8a6dbc7ba61a0ee526763e5
88761da2c2a0e57d8aab5327a1bb82a9
cc640e50f69020dd5d2d4600e20524ac
This is the list of duplicate files that I do not want to copy but I want to also display the file path and filenames alongside this, like this:
# find . -mindepth 1 -type f -name "*MG_[0-9][0-9][0-9][0-9].JPG" -exec md5sum {} \; | sort -k1 | uniq -u
043007387f39f19b3418fcba67b8efda ./IMG_1597.JPG
05f0c10c49983f8cde37d65ee5790a9f ./images/IMG_2012/IMG_2102.JPG
077c22bed5e0d0fba9e666064105dc72 ./DCIM/IMG_0042.JPG
1a2764a21238aaa1e28ea6325cbf00c2 ./images/IMG_2012/IMG_1403.JPG
1e343279cd05e8dbf371331314e3a2f6 ./images/IMG_1959.JPG
2226e652bf5e3ca3fbc63f3ac169c58b ./images/IMG_0058.JPG
266ab54fd8a6dbc7ba61a0ee526763e5 ./images/IMG_0079.JPG
266ab54fd8a6dbc7ba61a0ee526763e5 ./images/IMG_2012/IMG_0079.JPG
2816dbcff1caf70aecdbeb934897fd6e ./images/IMG_1233.JPG
451110cc2aff1531e64f441d253b7fec ./DCIM/103canon/IMG_0039.JPG
45a00293c0837f10e9ec2bfd96edde9f ./DCIM/103canon/IMG_0097.JPG
486f9dd9ee20ba201f0fd9a23c8e7289 ./images/IMG_2013/IMG_0060.JPG
4c2054c57a2ca71d65f92caf49721b4e ./DCIM/IMG_1810.JPG
53313e144725be3993b1d208c7064ef6 ./IMG_2288.JPG
5ac56dcddd7e0fd464f9b243213770f5 ./images/IMG_2012/favs/IMG_0039.JPG
65b15ebd20655fae29f0d2cf98588fc3 ./DCIM/IMG_2564.JPG
88761da2c2a0e57d8aab5327a1bb82a9 ./images/IMG_2012/favs/IMG_1729.JPG
88761da2c2a0e57d8aab5327a1bb82a9 ./images/IMG_2013/IMG_1729.JPG
8fc75b0dd2806d5b4b2545aa89618eb6 ./DCIM/103canon/IMG_2317.JPG
971f0a4a064bb1a2517af6c058dc3eb3 ./images/IMG_2012/favs/IMG_2317.JPG
aad617065e46f97d97bd79d72708ec10 ./images/IMG_2013/IMG_1311.JPG
c937509b5deaaee62db0bf137bc77366 ./DCIM/IMG_1152.JPG
cc640e50f69020dd5d2d4600e20524ac ./images/IMG_2012/favs/IMG_2013.JPG
cc640e50f69020dd5d2d4600e20524ac ./images/IMG_2013/IMG_2013.JPG
d8edfcc3f9f322ae5193e14b5f645368 ./images/IMG_2012/favs/IMG_1060.JPG
dcc1da7daeb8507f798e4017149356c5 ./DCIM/103canon/IMG_1600.JPG
ded2f32c88796f40f080907d7402eb44 ./IMG_0085.JPG
Thanks in advance.
Let's suppose that you have the results of md5sum. For example:
$ cat file
266ab54fd8a6dbc7ba61a0ee526763e5 /path/to/file1a
88761da2c2a0e57d8aab5327a1bb82a9 /path/to/file2a
266ab54fd8a6dbc7ba61a0ee526763e5 /path/to/file1b
cc640e50f69020dd5d2d4600e20524ac /path/to/file3
88761da2c2a0e57d8aab5327a1bb82a9 /path/to/file2b
To remove duplicates from the list, use awk:
$ awk '!($1 in a){a[$1]; print}' file
266ab54fd8a6dbc7ba61a0ee526763e5 /path/to/file1a
88761da2c2a0e57d8aab5327a1bb82a9 /path/to/file2a
cc640e50f69020dd5d2d4600e20524ac /path/to/file3
This uses the array a to keep track of which md5 sums we have seen so far. For each line, if the md5 has not appeared before, !($1 in a), we mark that md5 as having been seen and print the line.
Alternative
A shorter version of the code is:
$ awk '!a[$1]++' file
266ab54fd8a6dbc7ba61a0ee526763e5 /path/to/file1a
88761da2c2a0e57d8aab5327a1bb82a9 /path/to/file2a
cc640e50f69020dd5d2d4600e20524ac /path/to/file3
This uses array a to count the number of times that md5sum $1 has appeared. If the count is initially zero, then the line is printed.

using find sort and wc -l in the one command

This is how find files using find and show the number of lines in each file
$ find ./ -type f -name "data*.csv" -exec wc -l {} +
380723 ./data_2016-07-07-10-41-13.csv
369869 ./data_2016-07-11-10-42-01.csv
363941 ./data_2016-07-08-10-41-50.csv
378981 ./data_2016-07-12-10-41-28.csv
1493514 total
how do I sort the results by file name? Below is my attempt, but it is not working.
$ find ./ -type f -name "data*.csv" -exec wc -l {} + | sort
1493514 total
363941 ./data_2016-07-08-10-41-50.csv
369869 ./data_2016-07-11-10-42-01.csv
378981 ./data_2016-07-12-10-41-28.csv
380723 ./data_2016-07-07-10-41-13.csv
$ find ./ -type f -name "data*.csv" | sort -exec wc -l {} +
sort: invalid option -- 'e'
Try `sort --help' for more information.
$ find ./ -type f -name "data*.csv" -exec sort | wc -l {} +
find: wc: {}missing argument to `-exec'
: No such file or directory
wc: +: No such file or directory
0 total
$
Can someone offer a solution and correct me so I understand it better?
EDIT1
from man sort
-k, --key=POS1[,POS2]
start a key at POS1 (origin 1), end it at POS2 (default end of line). See POS syntax below
POS is F[.C][OPTS], where F is the field number and C the character position in the field; both are origin 1. If neither -t nor -b is in effect, characters in a field are counted from the beginā€
ning of the preceding whitespace. OPTS is one or more single-letter ordering options, which override global ordering options for that key. If no key is given, use the entire line as the key.
Ismail's suggestion of using sort -k is correct. However, I'm often too lazy to learn (or relearn) how -k works, so here's a cheap solution:
find . -name 'data*.csv' -print0 | sort -z | xargs -0 wc -l
Edit: after some experimentation, I did figure out how -k works:
find . -name 'data*.csv' -exec wc -l {} + | sort -k 2

Find and replace filename recursively in a directory with various names

I have the following file structure
./lib/mylib-client.c
./lib/mylib-object.h
./lib/mylib-rpc-wrapper.c
./lib/mylib-session-base.c
./lib/mylib-session-base.lo
./lib/mylib-session-base.o
./lib/mylibobj.c
./lib/mylibobj.lo
./lib/mylibobj.o
./lib/mylibobj.vala
./lib/mylibrpc-transport.c
./net/cluster/.deps/mylib-config.Po
./net/cluster/.deps/mylib-db.Po
./net/common/mylib-config.c
./net/common/mylib-config.h
./net/common/mylib-db.c
./net/common/mylib-db.h
./net/daemon/.deps/mylib-config.Po
./net/daemon/.deps/mylib-daemon.Po
./net/daemon/.deps/mylib-test.Po
./net/daemon/mylib
./net/daemon/mylib-config.o
./net/daemon/mylib-daemon.c
./net/daemon/mylib-daemon.o
That I want to recursively rename to :
./lib/libvertio-client.c
./lib/libvertio-object.h
./lib/libvertio-rpc-wrapper.c
./lib/libvertio-session-base.c
./lib/libvertio-session-base.lo
./lib/libvertio-session-base.o
./lib/libvertioobj.c
./lib/libvertioobj.lo
./lib/libvertioobj.o
./lib/libvertioobj.vala
./lib/libvertiorpc-transport.c
./net/cluster/.deps/libvertio-config.Po
./net/cluster/.deps/libvertio-db.Po
./net/common/libvertio-config.c
./net/common/libvertio-config.h
./net/common/libvertio-db.c
./net/common/libvertio-db.h
./net/daemon/.deps/libvertio-config.Po
./net/daemon/.deps/libvertio-daemon.Po
./net/daemon/.deps/libvertio-test.Po
./net/daemon/libvertio
./net/daemon/libvertio-config.o
./net/daemon/libvertio-daemon.c
./net/daemon/libvertio-daemon.o
I have found this : Find and replace filename recursively in a directory
But i can't figure out what to change
find . -name "mylib*" | awk '{a=$1; gsub(/mylib/,"libvertio"); printf "mv \"%s\" \"%s\"\n", a, $1}' | sh
Doesn't work. What Am I missing here ?
You can try rename script:
find . -name "mylib*" -exec rename 's/mylib/libvertio/' '{}' \;
Without rename:
find . -name "mylib*" -exec bash -c 'mv "$1" "${1/\/mylib//libvertio}"' - '{}' \;

Resources