Interpreting shell commands within Bash if/else statements - bash

While trying to make a conditional statement to check the amount of files in a directory, I have stumbled upon a problem. My initial way of writing this script:
ELEM=`ls -l $DIR | wc -l`
if [ $ELEM -lt 5 ] ; then
works. However I want to move my $ELEM into the conditional parameter block so it can be interpreted when I reach that if statement. I have tried playing around with different combinations of single quotes, double quotes, back ticks, and parentheses.
Does anyone know how to solve this problem?

Never use ls in batch mode, use globbing instead. Also avoid backquotes, unquoted variables, and the [ builtin:
shopt -s nullglob # expand to an empty array if directory is empty
shopt -s dotglob # also glob dotfiles
files=("$DIR"/*)
count=${#files[#]}
if ((count > 5))
then
...
fi

For some reason it didn't occur to me until just now to use this:
ELEM="`ls -l $DIR | wc -l`"
Thanks for your time.

What you seem to want to be able to do is something that C allows:
if (elem = countfiles(dir) < 5)
Bash can do it like this:
if (( (elem = $(ls "$DIR" 2>/dev/null | wc -l) ) < 5))
Or you could create a function called countfiles and use it like this:
if (( ( elem = $(countfile "$DIR") ) < 5))
where the function might be something like:
countfiles() {
find "$1" -maxdepth 1 -type f | wc -l
}

Most of the solutions posted so far are (IMHO) overcomplicated. As long as the ls command on your system supports the -B option (display special characters as octal escape sequences), you can just use:
if [[ $(ls -B "$DIR" | wc -l) -lt 5 ]]; then
Note that while parsing ls is generally a bad idea, in this case all we're trying to do is find the number of listings, so there's a lot less to go wrong than usual. In fact, the only thing that wc -l cares about is the number of lines, so the only thing that could go wrong is that a filename might have a newline in it, and hence be mistaken for two filenames; using the -B option to ls protects against this, so it winds up being safe (at least as far as I can see).

Improved countfiles function (which doesn't get confused by newlines in file names):
countfiles() {
find "$1" -maxdepth 1 -type f -print0 | tr -dc '\0' | wc -c
}

Related

How to create a txt file with a list of directory names if directories have a certain file

I have a parent directory with over 800+ directories, each of these has a unique name. Some of these directories house a sub-directory called y in which a file called z, (if it exists) can be found.
I need to script a loop that will check each of the 800+ for z, and if it's there, I need to append the name of the directory (the directory before y) into a text file. I'm not sure how to do this.
This is what I have
#!/bin/bash
for d in *; do
if [ -d "y"]; then
for f in *; do
if [ -f "x"]
echo $d >> IDlist.txt
fi
fi
done
Let's assume that any foo/y/z is a file (that is, you do not have directories with such names). If you had a really large number of such files, storing all paths in a bash variable could lead to memory issues, and would advocate for another solution, but about 800 paths is not large. So, something like this should be OK:
declare -a names=(*/y/z)
printf '%s\n' "${names[#]%%/*}" > IDlist.txt
Explanation: the paths of all z files are first stored in array names, thanks to a glob pattern: */y/z. Then, a pattern substitution is applied to each array element to suppress the /y/z part: "${names[#]%%/*}". The result is printed, one name per line: printf '%s\n'.
If you also had directories named z, or if you had millions of files, find could be used, instead, with a bit of awk to retain only the leading directory name:
find . -mindepth 3 -maxdepth 3 -path './*/y/z' -type f |
awk -F/ '{print $2}' > IDlist.txt
If you prefer sed for the post-processing:
find . -mindepth 3 -maxdepth 3 -path './*/y/z' -type f |
sed 's|^\./\(.*\)/y/z|\1|' > IDlist.txt
These two are probably also more efficient (faster).
Note: your initial attempt could also work, even if using bash loops is far less efficient, but it needs several changes:
#!/bin/bash
for d in *; do
if [ -d "$d/y" ]; then
for f in "$d"/y/*; do
if [ "$f" = "$d/y/z" ]; then
printf '%s\n' "$d" >> IDlist.txt
fi
done
fi
done
As noted by #LéaGris, printf is better than echo because if d is the -e string, for instance, echo "$d" interprets it as an option of the echo command and does not print it.
But a simpler and more efficient version (even if not as efficient as the first proposal or the find-based ones) would be:
#!/bin/bash
for d in *; do
if [ -f "$d/y/z" ]; then
printf '%s\n' "$d"
fi
done > IDlist.txt
As you can see there is another improvement (also suggested by #LéaGris), which consists in redirecting the output of the entire loop to the IDlist.txt file. This will open and close the file only once, instead of once per iteration.
This should solve it:
for f in */y/z; do
[ -f "$f" ] && echo ${f%%/*}
done
Note:
If there is a possibility of weird top level directory name like "-e", use printf instead of echo, as in the comment below.
This should do it:
shopt -s nullglob
outfile=IDlist.txt
>$outfile
for found in */y/x
do
[[ -f $found ]] && echo "${found%%/*}" >>$outfile # Drop the /y/x part
done
The nullglob ensures that the loop is skipped if there is no match, and the quotes in the echo ensure that the directory name is output correctly even if it contains two successive spaces.
You can first try to do some filtering using find
Below will list all z files recursively within current directory
Then let's say the one of the output was
./dir001/y/z
Then you can extract required part using multiple ways grep, sed, awk, etc
e.g. with grep
find . -type f | grep z | grep -E -o "y.*$"
will give
y/z
The first example doesn't check that z is a file, but I think it's worth showing compgen:
#!/bin/bash
compgen -G '*/y/z' | sed 's|/.*||' > IDlist.txt
Doing glob expansion, file check and path splitting with perl only:
perl -E 'foreach $p (glob "*/y/z") {say substr($p, 0, index($p, "/")) if -f $p}' > IDlist.txt

Bash for loop doesn't execute one sentence more that once [duplicate]

I have a directory with about 2000 files. How can I select a random sample of N files through using either a bash script or a list of piped commands?
Here's a script that uses GNU sort's random option:
ls |sort -R |tail -$N |while read file; do
# Something involving $file, or you can leave
# off the while to just get the filenames
done
You can use shuf (from the GNU coreutils package) for that. Just feed it a list of file names and ask it to return the first line from a random permutation:
ls dirname | shuf -n 1
# probably faster and more flexible:
find dirname -type f | shuf -n 1
# etc..
Adjust the -n, --head-count=COUNT value to return the number of wanted lines. For example to return 5 random filenames you would use:
find dirname -type f | shuf -n 5
Here are a few possibilities that don't parse the output of ls and that are 100% safe regarding files with spaces and funny symbols in their name. All of them will populate an array randf with a list of random files. This array is easily printed with printf '%s\n' "${randf[#]}" if needed.
This one will possibly output the same file several times, and N needs to be known in advance. Here I chose N=42.
a=( * )
randf=( "${a[RANDOM%${#a[#]}]"{1..42}"}" )
This feature is not very well documented.
If N is not known in advance, but you really liked the previous possibility, you can use eval. But it's evil, and you must really make sure that N doesn't come directly from user input without being thoroughly checked!
N=42
a=( * )
eval randf=( \"\${a[RANDOM%\${#a[#]}]\"\{1..$N\}\"}\" )
I personally dislike eval and hence this answer!
The same using a more straightforward method (a loop):
N=42
a=( * )
randf=()
for((i=0;i<N;++i)); do
randf+=( "${a[RANDOM%${#a[#]}]}" )
done
If you don't want to possibly have several times the same file:
N=42
a=( * )
randf=()
for((i=0;i<N && ${#a[#]};++i)); do
((j=RANDOM%${#a[#]}))
randf+=( "${a[j]}" )
a=( "${a[#]:0:j}" "${a[#]:j+1}" )
done
Note. This is a late answer to an old post, but the accepted answer links to an external page that shows terrible bash practice, and the other answer is not much better as it also parses the output of ls. A comment to the accepted answer points to an excellent answer by Lhunath which obviously shows good practice, but doesn't exactly answer the OP.
ls | shuf -n 10 # ten random files
A simple solution for selecting 5 random files while avoiding to parse ls. It also works with files containing spaces, newlines and other special characters:
shuf -ezn 5 * | xargs -0 -n1 echo
Replace echo with the command you want to execute for your files.
This is an even later response to #gniourf_gniourf's late answer, which I just upvoted because it's by far the best answer, twice over. (Once for avoiding eval and once for safe filename handling.)
But it took me a few minutes to untangle the "not very well documented" feature(s) this answer uses. If your Bash skills are solid enough that you saw immediately how it works, then skip this comment. But I didn't, and having untangled it I think it's worth explaining.
Feature #1 is the shell's own file globbing. a=(*) creates an array, $a, whose members are the files in the current directory. Bash understands all the weirdnesses of filenames, so that list is guaranteed correct, guaranteed escaped, etc. No need to worry about properly parsing textual file names returned by ls.
Feature #2 is Bash parameter expansions for arrays, one nested within another. This starts with ${#ARRAY[#]}, which expands to the length of $ARRAY.
That expansion is then used to subscript the array. The standard way to find a random number between 1 and N is to take the value of random number modulo N. We want a random number between 0 and the length of our array. Here's the approach, broken into two lines for clarity's sake:
LENGTH=${#ARRAY[#]}
RANDOM=${a[RANDOM%$LENGTH]}
But this solution does it in a single line, removing the unnecessary variable assignment.
Feature #3 is Bash brace expansion, although I have to confess I don't entirely understand it. Brace expansion is used, for instance, to generate a list of 25 files named filename1.txt, filename2.txt, etc: echo "filename"{1..25}".txt".
The expression inside the subshell above, "${a[RANDOM%${#a[#]}]"{1..42}"}", uses that trick to produce 42 separate expansions. The brace expansion places a single digit in between the ] and the }, which at first I thought was subscripting the array, but if so it would be preceded by a colon. (It would also have returned 42 consecutive items from a random spot in the array, which is not at all the same thing as returning 42 random items from the array.) I think it's just making the shell run the expansion 42 times, thereby returning 42 random items from the array. (But if someone can explain it more fully, I'd love to hear it.)
The reason N has to be hardcoded (to 42) is that brace expansion happens before variable expansion.
Finally, here's Feature #4, if you want to do this recursively for a directory hierarchy:
shopt -s globstar
a=( ** )
This turns on a shell option that causes ** to match recursively. Now your $a array contains every file in the entire hierarchy.
If you have Python installed (works with either Python 2 or Python 3):
To select one file (or line from an arbitrary command), use
ls -1 | python -c "import sys; import random; print(random.choice(sys.stdin.readlines()).rstrip())"
To select N files/lines, use (note N is at the end of the command, replace this by a number)
ls -1 | python -c "import sys; import random; print(''.join(random.sample(sys.stdin.readlines(), int(sys.argv[1]))).rstrip())" N
If you want to copy a sample of those files to another folder:
ls | shuf -n 100 | xargs -I % cp % ../samples/
make samples directory first obviously.
MacOS does not have the sort -R and shuf commands, so I needed a bash only solution that randomizes all files without duplicates and did not find that here. This solution is similar to gniourf_gniourf's solution #4, but hopefully adds better comments.
The script should be easy to modify to stop after N samples using a counter with if, or gniourf_gniourf's for loop with N. $RANDOM is limited to ~32000 files, but that should do for most cases.
#!/bin/bash
array=(*) # this is the array of files to shuffle
# echo ${array[#]}
for dummy in "${array[#]}"; do # do loop length(array) times; once for each file
length=${#array[#]}
randomi=$(( $RANDOM % $length )) # select a random index
filename=${array[$randomi]}
echo "Processing: '$filename'" # do something with the file
unset -v "array[$randomi]" # set the element at index $randomi to NULL
array=("${array[#]}") # remove NULL elements introduced by unset; copy array
done
If you have more files in your folder, you can use the below piped command I found in unix stackexchange.
find /some/dir/ -type f -print0 | xargs -0 shuf -e -n 8 -z | xargs -0 cp -vt /target/dir/
Here I wanted to copy the files, but if you want to move files or do something else, just change the last command where I have used cp.
This is the only script I can get to play nice with bash on MacOS. I combined and edited snippets from the following two links:
ls command: how can I get a recursive full-path listing, one line per file?
http://www.linuxquestions.org/questions/linux-general-1/is-there-a-bash-command-for-picking-a-random-file-678687/
#!/bin/bash
# Reads a given directory and picks a random file.
# The directory you want to use. You could use "$1" instead if you
# wanted to parametrize it.
DIR="/path/to/"
# DIR="$1"
# Internal Field Separator set to newline, so file names with
# spaces do not break our script.
IFS='
'
if [[ -d "${DIR}" ]]
then
# Runs ls on the given dir, and dumps the output into a matrix,
# it uses the new lines character as a field delimiter, as explained above.
# file_matrix=($(ls -LR "${DIR}"))
file_matrix=($(ls -R $DIR | awk '; /:$/&&f{s=$0;f=0}; /:$/&&!f{sub(/:$/,"");s=$0;f=1;next}; NF&&f{ print s"/"$0 }'))
num_files=${#file_matrix[*]}
# This is the command you want to run on a random file.
# Change "ls -l" by anything you want, it's just an example.
ls -l "${file_matrix[$((RANDOM%num_files))]}"
fi
exit 0
I use this: it uses temporary file but goes deeply in a directory until it find a regular file and return it.
# find for a quasi-random file in a directory tree:
# directory to start search from:
ROOT="/";
tmp=/tmp/mytempfile
TARGET="$ROOT"
FILE="";
n=
r=
while [ -e "$TARGET" ]; do
TARGET="$(readlink -f "${TARGET}/$FILE")" ;
if [ -d "$TARGET" ]; then
ls -1 "$TARGET" 2> /dev/null > $tmp || break;
n=$(cat $tmp | wc -l);
if [ $n != 0 ]; then
FILE=$(shuf -n 1 $tmp)
# or if you dont have/want to use shuf:
# r=$(($RANDOM % $n)) ;
# FILE=$(tail -n +$(( $r + 1 )) $tmp | head -n 1);
fi ;
else
if [ -f "$TARGET" ] ; then
rm -f $tmp
echo $TARGET
break;
else
# is not a regular file, restart:
TARGET="$ROOT"
FILE=""
fi
fi
done;
How about a Perl solution slightly doctored from Mr. Kang over here:
How can I shuffle the lines of a text file on the Unix command line or in a shell script?
$ ls | perl -MList::Util=shuffle -e '#lines = shuffle(<>); print
#lines[0..4]'

How can I loop through specific elements of a list in Bash?

I have a list of files stored in a variable obtained by entering
files="./*.fasta"
I would like to create a for loop that will loop through: the first 200 elements, elements 201-400, elements 401-578, for example.
How can I achieve this? I tried something like
for file in $files[1-200]; do
echo $file
done
but clearly this does not work.
Using a variable to populate a list of files is not recommended. The best way to do it would be using arrays!
You need to enable a shell option to avoid null glob expansion by doing shopt -s nullglob so that if no files are found the for-loop exits gracefully. The example below shows iterating over the 200 files at a time. You could change the indices as needed to print from 200-400 and 400-600 as needed in the for-loop.
shopt -s nullglob
files=(*.fasta)
if (( "${#files}" >= 200 )); then
for ((i=0; i<200; i++)); do
printf '%s\n' "${files[i]}"
done
fi
Put them in an array, then use substring expansion to get batches of files.
files=(./*.fasta)
for ((i=0; i< ${#fasta[*]}; i+=200)); do
process "${files[#]:i:200}" &
done
The problem may be approach differently. Instead of using a for loop you may use find and xargs:
find * -name '*.fasta' -maxdepth 0 -print0 | xargs -0 -n 200 -P 0 echo
find passes every file name to xargs which in turn spawns a process (-P 0) for every 200 input files (-n 200).
This one-liner uses -print0 and -0 flags just in case your filenames contain whitespace.
The for loop construct is less than ideal in this scenario.
Alternatively, you might use a while loop and a readarray builtin:
find * -name '*.fasta' -maxdepth 0 | while readarray -n 3 a && [[ ${#a} -ne 0 ]]
do
echo ${a[#]}
done

Errors from if and else statements in shell

I am just new to programming in Unix and have a small issue that I am unsure of how to solve. The objective of this piece of my script is to offer the user various options as to the type of scan they would like to use. This scan detects duplicate files with specified variables depending on the option chosen.
I am unable to get it working at all and am unsure why?
Also could you please offer me advice on how I could better display the selection screen if possible. I have only pasted part of my code as I would like to figure out the rest of my objective myself.
#!/bin/bash
same_name="1"
filesize="2"
md5sum="3"
different_name="4"
echo "The list of choices are, same_name=1, filesize=2, md5sum=3 and different name=4"
echo "Search for files with the:"
read choice
if [$choice == "$same_name" ];then
find /home/user/OSN -type f -exec basename '{}' \; | sort > filelist.txt
find /home/user/OSN -type f -exec basename '{}' \; | sort | uniq -d > repeatlist.txt
else
ls -al /home/user/OSN >2filelist.txt
fi
The shell command [ also known as test needs a space after it for the shell to parse correctly. For example:
if [ "x$choice" == x"$same_name" ] ; then
is equivalent to
if test "x$choice" == "x$same_name" ; then
prepending "x" to the variables is an idiom to prevent test from seeing too few arguments. Test would complain if called as test 5 == so if $choice and $same_name were empty the call to expr is syntactically correct.
You can also use the construct ${choice:-default} or ${choice:=default} to guard against unset or null shell variables.
It would help if you included the error messages you were receiving. When I tried this, I got an error of:
./foo: line 9: [1: command not found
This makes the problem fairly clear. The [ operator in the if statement is, in Unix's "never use something complicated when some simple hack will work" style, just another program. (See ls /bin/[ for proof!) As such, it needs to be treated like any other program with command-line options; you separate it from its options with whitespace. Otherwise, bash will think that "[$choice", concatenated, is the name of a program to execute and will try to execute it. Thus, that line needs to be:
if [ $choice == "$same_name" ];then
After I changed that, it worked.
Also, as a style suggestion, I'd note that the case construct is a much easier way to write this code than using if statements, when you've got more than one test. And, as noted in other answers, you should put " marks around $choice, to guard against the case where the user input is empty or contains spaces -- $choice, unquoted, will expand to a list of zero or more tokens separated by whitespace, whereas "$choice" always expands to a single token.
can't believe nobody's picked up this error: if you use [ (or test), the operator for string equality is = not ==.
you can do it like this.
while true
do
cat <<EOF
The list of choices are:
1) same name
2) filesize
3) md5sum
4) different name
5) exit
EOF
read -r -p "Enter your choice: " choice
case "$choice" in
1)
find /home/user/OSN -type f -exec basename '{}' \; | sort > filelist.txt
find /home/user/OSN -type f -exec basename '{}' \; | sort | uniq -d > repeatlist.txt
5) exit;
*) ls -al /home/user/OSN >2filelist.txt
esac
done
Bash's double square brackets are much more forgiving of quoting and null or unset variables.
if [[ $choice == "$same_name" ]]; then
You should take a look at Bash's select and case statements:
choices="same_name filesize md5sum different_name exit"
PS3="Make a selection: " # this is the prompt that the select statement will display
select choice in $choices
do
case $choice in
same_name)
find ...
;;
filesize)
do_something
;;
.
.
.
exit)
break
;;
esac
done

How can I select random files from a directory in bash?

I have a directory with about 2000 files. How can I select a random sample of N files through using either a bash script or a list of piped commands?
Here's a script that uses GNU sort's random option:
ls |sort -R |tail -$N |while read file; do
# Something involving $file, or you can leave
# off the while to just get the filenames
done
You can use shuf (from the GNU coreutils package) for that. Just feed it a list of file names and ask it to return the first line from a random permutation:
ls dirname | shuf -n 1
# probably faster and more flexible:
find dirname -type f | shuf -n 1
# etc..
Adjust the -n, --head-count=COUNT value to return the number of wanted lines. For example to return 5 random filenames you would use:
find dirname -type f | shuf -n 5
Here are a few possibilities that don't parse the output of ls and that are 100% safe regarding files with spaces and funny symbols in their name. All of them will populate an array randf with a list of random files. This array is easily printed with printf '%s\n' "${randf[#]}" if needed.
This one will possibly output the same file several times, and N needs to be known in advance. Here I chose N=42.
a=( * )
randf=( "${a[RANDOM%${#a[#]}]"{1..42}"}" )
This feature is not very well documented.
If N is not known in advance, but you really liked the previous possibility, you can use eval. But it's evil, and you must really make sure that N doesn't come directly from user input without being thoroughly checked!
N=42
a=( * )
eval randf=( \"\${a[RANDOM%\${#a[#]}]\"\{1..$N\}\"}\" )
I personally dislike eval and hence this answer!
The same using a more straightforward method (a loop):
N=42
a=( * )
randf=()
for((i=0;i<N;++i)); do
randf+=( "${a[RANDOM%${#a[#]}]}" )
done
If you don't want to possibly have several times the same file:
N=42
a=( * )
randf=()
for((i=0;i<N && ${#a[#]};++i)); do
((j=RANDOM%${#a[#]}))
randf+=( "${a[j]}" )
a=( "${a[#]:0:j}" "${a[#]:j+1}" )
done
Note. This is a late answer to an old post, but the accepted answer links to an external page that shows terrible bash practice, and the other answer is not much better as it also parses the output of ls. A comment to the accepted answer points to an excellent answer by Lhunath which obviously shows good practice, but doesn't exactly answer the OP.
ls | shuf -n 10 # ten random files
A simple solution for selecting 5 random files while avoiding to parse ls. It also works with files containing spaces, newlines and other special characters:
shuf -ezn 5 * | xargs -0 -n1 echo
Replace echo with the command you want to execute for your files.
This is an even later response to #gniourf_gniourf's late answer, which I just upvoted because it's by far the best answer, twice over. (Once for avoiding eval and once for safe filename handling.)
But it took me a few minutes to untangle the "not very well documented" feature(s) this answer uses. If your Bash skills are solid enough that you saw immediately how it works, then skip this comment. But I didn't, and having untangled it I think it's worth explaining.
Feature #1 is the shell's own file globbing. a=(*) creates an array, $a, whose members are the files in the current directory. Bash understands all the weirdnesses of filenames, so that list is guaranteed correct, guaranteed escaped, etc. No need to worry about properly parsing textual file names returned by ls.
Feature #2 is Bash parameter expansions for arrays, one nested within another. This starts with ${#ARRAY[#]}, which expands to the length of $ARRAY.
That expansion is then used to subscript the array. The standard way to find a random number between 1 and N is to take the value of random number modulo N. We want a random number between 0 and the length of our array. Here's the approach, broken into two lines for clarity's sake:
LENGTH=${#ARRAY[#]}
RANDOM=${a[RANDOM%$LENGTH]}
But this solution does it in a single line, removing the unnecessary variable assignment.
Feature #3 is Bash brace expansion, although I have to confess I don't entirely understand it. Brace expansion is used, for instance, to generate a list of 25 files named filename1.txt, filename2.txt, etc: echo "filename"{1..25}".txt".
The expression inside the subshell above, "${a[RANDOM%${#a[#]}]"{1..42}"}", uses that trick to produce 42 separate expansions. The brace expansion places a single digit in between the ] and the }, which at first I thought was subscripting the array, but if so it would be preceded by a colon. (It would also have returned 42 consecutive items from a random spot in the array, which is not at all the same thing as returning 42 random items from the array.) I think it's just making the shell run the expansion 42 times, thereby returning 42 random items from the array. (But if someone can explain it more fully, I'd love to hear it.)
The reason N has to be hardcoded (to 42) is that brace expansion happens before variable expansion.
Finally, here's Feature #4, if you want to do this recursively for a directory hierarchy:
shopt -s globstar
a=( ** )
This turns on a shell option that causes ** to match recursively. Now your $a array contains every file in the entire hierarchy.
If you have Python installed (works with either Python 2 or Python 3):
To select one file (or line from an arbitrary command), use
ls -1 | python -c "import sys; import random; print(random.choice(sys.stdin.readlines()).rstrip())"
To select N files/lines, use (note N is at the end of the command, replace this by a number)
ls -1 | python -c "import sys; import random; print(''.join(random.sample(sys.stdin.readlines(), int(sys.argv[1]))).rstrip())" N
If you want to copy a sample of those files to another folder:
ls | shuf -n 100 | xargs -I % cp % ../samples/
make samples directory first obviously.
MacOS does not have the sort -R and shuf commands, so I needed a bash only solution that randomizes all files without duplicates and did not find that here. This solution is similar to gniourf_gniourf's solution #4, but hopefully adds better comments.
The script should be easy to modify to stop after N samples using a counter with if, or gniourf_gniourf's for loop with N. $RANDOM is limited to ~32000 files, but that should do for most cases.
#!/bin/bash
array=(*) # this is the array of files to shuffle
# echo ${array[#]}
for dummy in "${array[#]}"; do # do loop length(array) times; once for each file
length=${#array[#]}
randomi=$(( $RANDOM % $length )) # select a random index
filename=${array[$randomi]}
echo "Processing: '$filename'" # do something with the file
unset -v "array[$randomi]" # set the element at index $randomi to NULL
array=("${array[#]}") # remove NULL elements introduced by unset; copy array
done
If you have more files in your folder, you can use the below piped command I found in unix stackexchange.
find /some/dir/ -type f -print0 | xargs -0 shuf -e -n 8 -z | xargs -0 cp -vt /target/dir/
Here I wanted to copy the files, but if you want to move files or do something else, just change the last command where I have used cp.
This is the only script I can get to play nice with bash on MacOS. I combined and edited snippets from the following two links:
ls command: how can I get a recursive full-path listing, one line per file?
http://www.linuxquestions.org/questions/linux-general-1/is-there-a-bash-command-for-picking-a-random-file-678687/
#!/bin/bash
# Reads a given directory and picks a random file.
# The directory you want to use. You could use "$1" instead if you
# wanted to parametrize it.
DIR="/path/to/"
# DIR="$1"
# Internal Field Separator set to newline, so file names with
# spaces do not break our script.
IFS='
'
if [[ -d "${DIR}" ]]
then
# Runs ls on the given dir, and dumps the output into a matrix,
# it uses the new lines character as a field delimiter, as explained above.
# file_matrix=($(ls -LR "${DIR}"))
file_matrix=($(ls -R $DIR | awk '; /:$/&&f{s=$0;f=0}; /:$/&&!f{sub(/:$/,"");s=$0;f=1;next}; NF&&f{ print s"/"$0 }'))
num_files=${#file_matrix[*]}
# This is the command you want to run on a random file.
# Change "ls -l" by anything you want, it's just an example.
ls -l "${file_matrix[$((RANDOM%num_files))]}"
fi
exit 0
I use this: it uses temporary file but goes deeply in a directory until it find a regular file and return it.
# find for a quasi-random file in a directory tree:
# directory to start search from:
ROOT="/";
tmp=/tmp/mytempfile
TARGET="$ROOT"
FILE="";
n=
r=
while [ -e "$TARGET" ]; do
TARGET="$(readlink -f "${TARGET}/$FILE")" ;
if [ -d "$TARGET" ]; then
ls -1 "$TARGET" 2> /dev/null > $tmp || break;
n=$(cat $tmp | wc -l);
if [ $n != 0 ]; then
FILE=$(shuf -n 1 $tmp)
# or if you dont have/want to use shuf:
# r=$(($RANDOM % $n)) ;
# FILE=$(tail -n +$(( $r + 1 )) $tmp | head -n 1);
fi ;
else
if [ -f "$TARGET" ] ; then
rm -f $tmp
echo $TARGET
break;
else
# is not a regular file, restart:
TARGET="$ROOT"
FILE=""
fi
fi
done;
How about a Perl solution slightly doctored from Mr. Kang over here:
How can I shuffle the lines of a text file on the Unix command line or in a shell script?
$ ls | perl -MList::Util=shuffle -e '#lines = shuffle(<>); print
#lines[0..4]'

Resources