Randomly select a file with a given extension and copy it to a given directory

Randomly select a file with a given extension and copy it to a given directory - bash

I would like to write a simple bash script that randomly selects a .flac file from the /music/ folder (it will have to be recursive because there are many subfolders within that folder), and copy that file to /test/random.flac
This post is very close to what I want to do but I'm not sure how to change the script to do what I want.
I tried this:
ls /music |sort -R |tail -1 |while read file; do
cp $file /test/random.flac
done
But I'm missing how to tell ls to do a recursive search of all the .flac inside the subfolders.

You can shove all the files into an array, then select one at a random index to move. This requires the globstar option to be enabled so you can use the **/* glob pattern:
shopt -s globstar
flacfiles=(/music/**/*.flac)
cp "${flacfiles[RANDOM % ${#flacfiles[#]}]}" /test/random.flac
To select a random index, we take $RANDOM modulo the number of elements in the flacfiles array.
If you want an error message in case the glob doesn't match anything, you can use the failglob shell option:
shopt -s globstar failglob
if flacfiles=(/music/**/*.flac); then
cp "${flacfiles[RANDOM % ${#flacfiles[#]}]}" /test/random.flac
fi
This fails with an error message
-bash: no match: music/**/*.flac
in case there are no matching files and doesn't try to copy anything.
If you know for sure that there are .flac files, you can ignore this.

Recurse
Use find instead of ls:
find /music/ -iname '*.flac' -print0 | shuf -z -n1
The find part will find all flac files inside the directory /music/ and list them.
shuf shuffles that list and prints the first entry (which is random, because the list was shuffled).
-print0 and -z are there to use \NUL for separating the file names. Without these options, \n (newline) would be used, which is unsafe, because filenames can contain newlines (even though it is very uncommon).
Copy
If you want to copy just one random file, there's no need for a while loop. Use Substitution $() instead.
cp "$(find /music/ -iname '*.flac' -print0 | shuf -z -n1)" /test/random.flac

Related

Replace the complete filenames for files with their MD5 hash string of the content in bash

Problem:
I have a bunch of files in a folder,i want to rename all of them to the md5 of the content of the file.
What i tried:
This is the command i tried.
for i in $(find /home/admin/test -type f);do mv $i $(md5sum $i|cut -d" " -f 1);done
But this is failing after sometime with the error and only some files are getting renamed leaving rest untouched.
mv: missing destination file operand after /home/admin/test/help.txt
Try `mv --help' for more information.
Is the implementation correct? Am i doing something wrong in the script.

Make things simple by making use the glob patterns that the shell provides, instead of using external utilities like find. Also see Why you don't read lines with "for"
Navigate inside the folder /home/admin/test and do the following which should be sufficient
for file in *; do
[ -f "$file" ] || continue
md5sum -- "$file" | { read sum _; mv "$file" "$sum"; }
done
Try using echo inplace of mv first to check once if they files are renamed as expected.
To go to sub-directories below, which I assume would also be your requirement, enable globstar, which is one of the extended globing options provided by the shell to go deeper
shopt -s globstar
for file in **/*; do

If you want to recursively rename all files with their md5 hash, you could try this:
find /home/admin/test -type f -exec bash -c 'md5sum "$1" | while read s f; do mv "${f#*./}" "$(dirname ${f#*./})/$s"; done' _ {} \;
The hash and filename is given as argument into the s and f variables. The ${f#*./} removes the prefix added by md5sum and find commands.
Note that if some file have exact same content, it will end up with only 1 file.

How to remove files from a directory if their names are not in a text file? Bash script

I am writing a bash script and want it to tell me if the names of the files in a directory appear in a text file and if not, remove them.
Something like this:
counter = 1
numFiles = ls -1 TestDir/ | wc -l
while [$counter -lt $numFiles]
do
if [file in TestDir/ not in fileNames.txt]
then
rm file
fi
((counter++))
done
So what I need help with is the if statement, which is still pseudo-code.

You can simplify your script logic a lot :
#/bin/bash
# for loop to iterate over all files in the testdir
for file in TestDir/*
do
# if grep exit code is 1 (file not found in the text document), we delete the file
[[ ! $(grep -x "$file" fileNames.txt &> /dev/null) ]] && rm "$file"
done

It looks like you've got a solution that works, but I thought I'd offer this one as well, as it might still be of help to you or someone else.
find /Path/To/TestDir -type f ! -name '.*' -exec basename {} + | grep -xvF -f /Path/To/filenames.txt"
Breakdown
find: This gets file paths in the specified directory (which would be TestDir) that match the given criteria. In this case, I've specified it return only regular files (-type f) whose names don't start with a period (-name '.*'). It then uses its own builtin utility to execute the next command:
basename: Given a file path (which is what find spits out), it will return the base filename only, or, more specifically, everything after the last /.
|: This is a command pipe, that takes the output of the previous command to use as input in the next command.
grep: This is a regular-expression matching utility that, in this case, is given two lists of files: one fed in through the pipe from find—the files of your TestDir directory; and the files listed in filenames.txt. Ordinarily, the filenames in the text file would be used to match against filenames returned by find, and those that match would be given as the output. However, the -v flag inverts the matching process, so that grep returns those filenames that do not match.
What results is a list of files that exist in the directory TestDir, but do not appear in the filenames.txt file. These are the files you wish to delete, so you can simply use this line of code inside a parameter expansion $(...) to supply rm with the files it's able to delete.
The full command chain—after you cd into TestDir—looks like this:
rm $(find . -type f ! -name '.*' -exec basename {} + | grep -xvF -f filenames.txt")

Trying to rename certain file types within recursive directories

I have a bunch of files within a directory structure as such:
Dir
SubDir
File
File
Subdir
SubDir
File
File
File
Sorry for the messy formatting, but as you can see there are files at all different directory levels. All of these file names have a string of 7 numbers appended to them as such: 1234567_filename.ext. I am trying to remove the number and underscore at the start of the filename.
Right now I am using bash and using this oneliner to rename the files using mv and cut:
for i in *; do mv "$i" "$(echo $i | cut -d_ -f2-10)"; done
This is being run while I am CD'd into the directory. I would love to find a way to do this recursively, so that it only renamed files, not folders. I have also used a foreach loop in the shell, outside of bash for directories that have a bunch of folders with files in them and no other subdirectories as such:
foreach$ set p=`echo $f | cut -d/ -f1`
foreach$ set n=`echo $f | cut -d/ -f2 | cut -d_ -f2-10`
foreach$ mv $f $p/$n
foreach$ end
But that only works when there are no other subdirectories within the folders.
Is there a loop or oneliner I can use to rename all files within the directories? I even tried using find but couldn't figure out how to incorporate cut into the code.
Any help is much appreciated.

With Perl‘s rename (standalone command):
shopt -s globstar
rename -n 's|/[0-9]{7}_([^/]+$)|/$1|' **/*
If everything looks fine remove -n.
globstar: If set, the pattern ** used in a pathname expansion context will
match all files and zero or more directories and subdirectories. If
the pattern is followed by a /, only directories and subdirectories
match.

bash does provide functions, and these can be recursive, but you don't need a recursive function for this job. You just need to enumerate all the files in the tree. The find command can do that, but turning on bash's globstar option and using a shell glob to do it is safer:
#!/bin/bash
shopt -s globstar
# enumerate all the files in the tree rooted at the current working directory
for f in **; do
# ignore directories
test -d "$f" && continue
# separate the base file name from the path
name=$(basename "$f")
dir=$(dirname "$f")
# perform the rename, using a pattern substitution on the name part
mv "$f" "${dir}/${name/#???????_/}"
done
Note that that does not verify that file names actually match the pattern you specified before performing the rename; I'm taking you at your word that they do. If such a check were wanted then it could certainly be added.

How about this small tweak to what you have already:
for i in `find . -type f`; do mv "$i" "$(echo $i | cut -d_ -f2-10)"; done
Basically just swapping the * with `find . -type f`

Should be possible to do this using find...
find -E . -type f \
-regex '.*/[0-9]{7}_.*\.txt' \
-exec sh -c 'f="${0#*/}"; mv -v "$0" "${0%/*}/${f#*_}"' {} \;
Your find options may be different -- I'm doing this in FreeBSD. The idea here is:
-E instructs find to use extended regular expressions.
-type f causes only normal files (not directories or symlinks) to be found.
-regex ... matches the files you're looking for. You can make this more specific if you need to.
exec ... \; runs a command, using {} (the file we've found) as an argument.
The command we're running uses parameter expansion first to grab the target directory and second to strip the filename. Note the temporary variable $f, which is used to address the possibility of extra underscores being part of the filename.
Note that this is NOT a bash command, though you can of course run it from the bash shell. If you want a bash solution that does not require use of an external tool like find, you may be able to do the following:
$ shopt -s extglob # use extended glob format
$ shopt -s globstar # recurse using "**"
$ for f in **/+([0-9])_*.txt; do f="./$f"; echo mv "$f" "${f%/*}/${f##*_}"; done
This uses the same logic as the find solution, but uses bash v4 extglob to provide better filename matching and globstar to recurse through subdirectories.
Hope these help.

Filenames with wildcards in variables

#!/bin/bash
outbound=/home/user/outbound/
putfile=DATA_FILE_PUT_*.CSV
cd $outbound
filecnt=0
for file in $putfile; do let filecnt=filecnt+1; done
echo "Filecount: " $filecnt
So this code works well when there are files located in the outbound directory. I can place files into the outbound path and as long as they match the putfile mask then the files are incremented as expected.
Where the problem comes in is if I run this while there are no files located in $outbound.
If there are zero files there $filecnt still returns a 1 but I'm looking to have it return a 0 if there are no files there.
Am I missing something simple?

Put set -x just below the #! line to watch what your script is doing.
If there is no matching file, then the wildcard is left unexpanded, and the loop runs once, with file having the value DATA_FILE_PUT_*.CSV.
To change that, set the nullglob option. Note that this only works in bash, not in sh.
shopt -s nullglob
putfile=DATA_FILE_PUT_*.CSV
for file in $putfile; do let filecnt=filecnt+1; done
Note that the putfile variable contains the wildcard pattern, not the list of file names. It might make more sense to put the list of matches in a variable instead. This needs to be an array variable, and you need to change the current directory first. The number of matching files is then the length of the array.
#!/bin/bash
shopt -s nullglob
outbound=/home/user/outbound/
cd "$outbound"
putfiles=(DATA_FILE_PUT_*.CSV)
echo "Filecount: " ${#putfiles}
If you need to iterate over the files, take care to protect the expansion of the array with double quotes, otherwise if a file name contains whitespace then it will be split over several words (and if a filename contains wildcard characters, they will be expanded).
#!/bin/bash
shopt -s nullglob
outbound=/home/user/outbound/
cd "$outbound"
putfiles=(DATA_FILE_PUT_*.CSV)
for file in "${putfiles[#]}"; do
echo "Processing $file"
done

You could test if file exists first
for file in $putfile; do
if [ -f "$file" ] ; then
let filecnt=filecnt+1
fi
done
Or look for your files with find
for file in $(find . -type f -name="$putfile"); do
let filecnt=filecnt+1
done
or simply (fixed)
filecnt=$(find . -type f -name "$putfile" | wc -l); echo $filecnt

This is because when no matches are found, bash by default expands the wildcard DATA_FILE_PUT_*.CSV to the word DATA_FILE_PUT_*.CSV and therefore you end up with a count of 1.
To disable this behavior, use shopt -s nullglob

Not sure why you need a piece of code here. Following one liner should do your job.
ls ${outbound}/${putfile} | wc -l
Or
find ${outbound} -maxdepth 1 -type f -name "${putfile}" | wc -l

How can I use inverse or negative wildcards when pattern matching in a unix/linux shell?

Say I want to copy the contents of a directory excluding files and folders whose names contain the word 'Music'.
cp [exclude-matches] *Music* /target_directory
What should go in place of [exclude-matches] to accomplish this?

In Bash you can do it by enabling the extglob option, like this (replace ls with cp and add the target directory, of course)
~/foobar> shopt extglob
extglob off
~/foobar> ls
abar afoo bbar bfoo
~/foobar> ls !(b*)
-bash: !: event not found
~/foobar> shopt -s extglob # Enables extglob
~/foobar> ls !(b*)
abar afoo
~/foobar> ls !(a*)
bbar bfoo
~/foobar> ls !(*foo)
abar bbar
You can later disable extglob with
shopt -u extglob

The extglob shell option gives you more powerful pattern matching in the command line.
You turn it on with shopt -s extglob, and turn it off with shopt -u extglob.
In your example, you would initially do:
$ shopt -s extglob
$ cp !(*Music*) /target_directory
The full available extended globbing operators are (excerpt from man bash):
If the extglob shell option is enabled using the shopt builtin, several extended
pattern matching operators are recognized.A pattern-list is a list of one or more patterns separated by a |. Composite patterns may be formed using one or more of the following sub-patterns:
?(pattern-list)
Matches zero or one occurrence of the given patterns
*(pattern-list)
Matches zero or more occurrences of the given patterns
+(pattern-list)
Matches one or more occurrences of the given patterns
#(pattern-list)
Matches one of the given patterns
!(pattern-list)
Matches anything except one of the given patterns
So, for example, if you wanted to list all the files in the current directory that are not .c or .h files, you would do:
$ ls -d !(*#(.c|.h))
Of course, normal shell globing works, so the last example could also be written as:
$ ls -d !(*.[ch])

Not in bash (that I know of), but:
cp `ls | grep -v Music` /target_directory
I know this is not exactly what you were looking for, but it will solve your example.

If you want to avoid the mem cost of using the exec command, I believe you can do better with xargs. I think the following is a more efficient alternative to
find foo -type f ! -name '*Music*' -exec cp {} bar \; # new proc for each exec
find . -maxdepth 1 -name '*Music*' -prune -o -print0 | xargs -0 -i cp {} dest/

A trick I haven't seen on here yet that doesn't use extglob, find, or grep is to treat two file lists as sets and "diff" them using comm:
comm -23 <(ls) <(ls *Music*)
comm is preferable over diff because it doesn't have extra cruft.
This returns all elements of set 1, ls, that are not also in set 2, ls *Music*. This requires both sets to be in sorted order to work properly. No problem for ls and glob expansion, but if you're using something like find, be sure to invoke sort.
comm -23 <(find . | sort) <(find . | grep -i '.jpg' | sort)
Potentially useful.

You can also use a pretty simple for loop:
for f in `find . -not -name "*Music*"`
do
cp $f /target/dir
done

In bash, an alternative to shopt -s extglob is the GLOBIGNORE variable. It's not really better, but I find it easier to remember.
An example that may be what the original poster wanted:
GLOBIGNORE="*techno*"; cp *Music* /only_good_music/
When done, unset GLOBIGNORE to be able to rm *techno* in the source directory.

My personal preference is to use grep and the while command. This allows one to write powerful yet readable scripts ensuring that you end up doing exactly what you want. Plus by using an echo command you can perform a dry run before carrying out the actual operation. For example:
ls | grep -v "Music" | while read filename
do
echo $filename
done
will print out the files that you will end up copying. If the list is correct the next step is to simply replace the echo command with the copy command as follows:
ls | grep -v "Music" | while read filename
do
cp "$filename" /target_directory
done

One solution for this can be found with find.
$ mkdir foo bar
$ touch foo/a.txt foo/Music.txt
$ find foo -type f ! -name '*Music*' -exec cp {} bar \;
$ ls bar
a.txt
Find has quite a few options, you can get pretty specific on what you include and exclude.
Edit: Adam in the comments noted that this is recursive. find options mindepth and maxdepth can be useful in controlling this.

The following works lists all *.txt files in the current dir, except those that begin with a number.
This works in bash, dash, zsh and all other POSIX compatible shells.
for FILE in /some/dir/*.txt; do # for each *.txt file
case "${FILE##*/}" in # if file basename...
[0-9]*) continue ;; # starts with digit: skip
esac
## otherwise, do stuff with $FILE here
done
In line one the pattern /some/dir/*.txt will cause the for loop to iterate over all files in /some/dir whose name end with .txt.
In line two a case statement is used to weed out undesired files. – The ${FILE##*/} expression strips off any leading dir name component from the filename (here /some/dir/) so that patters can match against only the basename of the file. (If you're only weeding out filenames based on suffixes, you can shorten this to $FILE instead.)
In line three, all files matching the case pattern [0-9]*) line will be skipped (the continue statement jumps to the next iteration of the for loop). – If you want to you can do something more interesting here, e.g. like skipping all files which do not start with a letter (a–z) using [!a-z]*, or you could use multiple patterns to skip several kinds of filenames e.g. [0-9]*|*.bak to skip files both .bak files, and files which does not start with a number.

this would do it excluding exactly 'Music'
cp -a ^'Music' /target
this and that for excluding things like Music?* or *?Music
cp -a ^\*?'complete' /target
cp -a ^'complete'?\* /target

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Randomly select a file with a given extension and copy it to a given directory - bash

Related

Replace the complete filenames for files with their MD5 hash string of the content in bash

How to remove files from a directory if their names are not in a text file? Bash script

Trying to rename certain file types within recursive directories

Filenames with wildcards in variables

How can I use inverse or negative wildcards when pattern matching in a unix/linux shell?

Categories

Resources