Bash: trim a parameter from both ends - bash

Greetings!
This are well know Bash parameter expansion patterns:
${parameter#word}, ${parameter##word}
and
${parameter%word}, ${parameter%%word}
I need to chop one part from the beginning and anoter part from the trailing of the parameter. Could you advice something for me please?

If you're using Bash version >= 3.2, you can use regular expression matching with a capture group to retrieve the value in one command:
$ path='/xxx/yyy/zzz/ABC/abc.txt'
$ [[ $path =~ ^.*/([^/]*)/.*$ ]]
$ echo ${BASH_REMATCH[1]}
ABC
This would be equivalent to:
$ path='/xxx/yyy/zzz/ABC/abc.txt'
$ path=$(echo "$path" | sed 's|^.*/\([^/]*\)/.*$|\1|p')
$ echo $path
ABC

I don't know that there's an easy way to do this without resorting to sub-shells, something you probably want to avoid for efficiency. I would just use:
> xx=hello_there
> yy=${xx#he}
> zz=${yy%re}
> echo ${zz}
llo_the
If you're not fussed about efficiency and just want a one-liner:
> zz=$(echo ${xx%re} | sed 's/^he//')
> echo ${zz}
llo_the
Keep in mind that this second method starts sub-shells - it's not something I'd be doing a lot of if your script has to run fast.

This solution uses what Andrey asked for and it does not employ any external tool. Strategy: Use the % parameter expansion to remove the file name, then use the ## to remove all but the last directory:
$ path=/path/to/my/last_dir/filename.txt
$ dir=${path%/*}
$ echo $dir
/path/to/my/last_dir
$ dir=${dir##*/}
$ echo $dir
last_dir

I would highly recommend going with bash arrays as their performance is just over 3x faster than regular expression matching.
$ path='/xxx/yyy/zzz/ABC/abc.txt'
$ IFS='/' arr=( $path )
$ echo ${arr[${#arr[#]}-2]}
ABC
This works by telling bash that each element of the array is separated by a forward slash / via IFS='/'. We access the penultimate element of the array by first determining how many elements are in the array via ${#arr[#]} then subtracting 2 and using that as the index to the array.

Related

Bash for loop doesn't execute one sentence more that once [duplicate]

I have a directory with about 2000 files. How can I select a random sample of N files through using either a bash script or a list of piped commands?
Here's a script that uses GNU sort's random option:
ls |sort -R |tail -$N |while read file; do
# Something involving $file, or you can leave
# off the while to just get the filenames
done
You can use shuf (from the GNU coreutils package) for that. Just feed it a list of file names and ask it to return the first line from a random permutation:
ls dirname | shuf -n 1
# probably faster and more flexible:
find dirname -type f | shuf -n 1
# etc..
Adjust the -n, --head-count=COUNT value to return the number of wanted lines. For example to return 5 random filenames you would use:
find dirname -type f | shuf -n 5
Here are a few possibilities that don't parse the output of ls and that are 100% safe regarding files with spaces and funny symbols in their name. All of them will populate an array randf with a list of random files. This array is easily printed with printf '%s\n' "${randf[#]}" if needed.
This one will possibly output the same file several times, and N needs to be known in advance. Here I chose N=42.
a=( * )
randf=( "${a[RANDOM%${#a[#]}]"{1..42}"}" )
This feature is not very well documented.
If N is not known in advance, but you really liked the previous possibility, you can use eval. But it's evil, and you must really make sure that N doesn't come directly from user input without being thoroughly checked!
N=42
a=( * )
eval randf=( \"\${a[RANDOM%\${#a[#]}]\"\{1..$N\}\"}\" )
I personally dislike eval and hence this answer!
The same using a more straightforward method (a loop):
N=42
a=( * )
randf=()
for((i=0;i<N;++i)); do
randf+=( "${a[RANDOM%${#a[#]}]}" )
done
If you don't want to possibly have several times the same file:
N=42
a=( * )
randf=()
for((i=0;i<N && ${#a[#]};++i)); do
((j=RANDOM%${#a[#]}))
randf+=( "${a[j]}" )
a=( "${a[#]:0:j}" "${a[#]:j+1}" )
done
Note. This is a late answer to an old post, but the accepted answer links to an external page that shows terrible bash practice, and the other answer is not much better as it also parses the output of ls. A comment to the accepted answer points to an excellent answer by Lhunath which obviously shows good practice, but doesn't exactly answer the OP.
ls | shuf -n 10 # ten random files
A simple solution for selecting 5 random files while avoiding to parse ls. It also works with files containing spaces, newlines and other special characters:
shuf -ezn 5 * | xargs -0 -n1 echo
Replace echo with the command you want to execute for your files.
This is an even later response to #gniourf_gniourf's late answer, which I just upvoted because it's by far the best answer, twice over. (Once for avoiding eval and once for safe filename handling.)
But it took me a few minutes to untangle the "not very well documented" feature(s) this answer uses. If your Bash skills are solid enough that you saw immediately how it works, then skip this comment. But I didn't, and having untangled it I think it's worth explaining.
Feature #1 is the shell's own file globbing. a=(*) creates an array, $a, whose members are the files in the current directory. Bash understands all the weirdnesses of filenames, so that list is guaranteed correct, guaranteed escaped, etc. No need to worry about properly parsing textual file names returned by ls.
Feature #2 is Bash parameter expansions for arrays, one nested within another. This starts with ${#ARRAY[#]}, which expands to the length of $ARRAY.
That expansion is then used to subscript the array. The standard way to find a random number between 1 and N is to take the value of random number modulo N. We want a random number between 0 and the length of our array. Here's the approach, broken into two lines for clarity's sake:
LENGTH=${#ARRAY[#]}
RANDOM=${a[RANDOM%$LENGTH]}
But this solution does it in a single line, removing the unnecessary variable assignment.
Feature #3 is Bash brace expansion, although I have to confess I don't entirely understand it. Brace expansion is used, for instance, to generate a list of 25 files named filename1.txt, filename2.txt, etc: echo "filename"{1..25}".txt".
The expression inside the subshell above, "${a[RANDOM%${#a[#]}]"{1..42}"}", uses that trick to produce 42 separate expansions. The brace expansion places a single digit in between the ] and the }, which at first I thought was subscripting the array, but if so it would be preceded by a colon. (It would also have returned 42 consecutive items from a random spot in the array, which is not at all the same thing as returning 42 random items from the array.) I think it's just making the shell run the expansion 42 times, thereby returning 42 random items from the array. (But if someone can explain it more fully, I'd love to hear it.)
The reason N has to be hardcoded (to 42) is that brace expansion happens before variable expansion.
Finally, here's Feature #4, if you want to do this recursively for a directory hierarchy:
shopt -s globstar
a=( ** )
This turns on a shell option that causes ** to match recursively. Now your $a array contains every file in the entire hierarchy.
If you have Python installed (works with either Python 2 or Python 3):
To select one file (or line from an arbitrary command), use
ls -1 | python -c "import sys; import random; print(random.choice(sys.stdin.readlines()).rstrip())"
To select N files/lines, use (note N is at the end of the command, replace this by a number)
ls -1 | python -c "import sys; import random; print(''.join(random.sample(sys.stdin.readlines(), int(sys.argv[1]))).rstrip())" N
If you want to copy a sample of those files to another folder:
ls | shuf -n 100 | xargs -I % cp % ../samples/
make samples directory first obviously.
MacOS does not have the sort -R and shuf commands, so I needed a bash only solution that randomizes all files without duplicates and did not find that here. This solution is similar to gniourf_gniourf's solution #4, but hopefully adds better comments.
The script should be easy to modify to stop after N samples using a counter with if, or gniourf_gniourf's for loop with N. $RANDOM is limited to ~32000 files, but that should do for most cases.
#!/bin/bash
array=(*) # this is the array of files to shuffle
# echo ${array[#]}
for dummy in "${array[#]}"; do # do loop length(array) times; once for each file
length=${#array[#]}
randomi=$(( $RANDOM % $length )) # select a random index
filename=${array[$randomi]}
echo "Processing: '$filename'" # do something with the file
unset -v "array[$randomi]" # set the element at index $randomi to NULL
array=("${array[#]}") # remove NULL elements introduced by unset; copy array
done
If you have more files in your folder, you can use the below piped command I found in unix stackexchange.
find /some/dir/ -type f -print0 | xargs -0 shuf -e -n 8 -z | xargs -0 cp -vt /target/dir/
Here I wanted to copy the files, but if you want to move files or do something else, just change the last command where I have used cp.
This is the only script I can get to play nice with bash on MacOS. I combined and edited snippets from the following two links:
ls command: how can I get a recursive full-path listing, one line per file?
http://www.linuxquestions.org/questions/linux-general-1/is-there-a-bash-command-for-picking-a-random-file-678687/
#!/bin/bash
# Reads a given directory and picks a random file.
# The directory you want to use. You could use "$1" instead if you
# wanted to parametrize it.
DIR="/path/to/"
# DIR="$1"
# Internal Field Separator set to newline, so file names with
# spaces do not break our script.
IFS='
'
if [[ -d "${DIR}" ]]
then
# Runs ls on the given dir, and dumps the output into a matrix,
# it uses the new lines character as a field delimiter, as explained above.
# file_matrix=($(ls -LR "${DIR}"))
file_matrix=($(ls -R $DIR | awk '; /:$/&&f{s=$0;f=0}; /:$/&&!f{sub(/:$/,"");s=$0;f=1;next}; NF&&f{ print s"/"$0 }'))
num_files=${#file_matrix[*]}
# This is the command you want to run on a random file.
# Change "ls -l" by anything you want, it's just an example.
ls -l "${file_matrix[$((RANDOM%num_files))]}"
fi
exit 0
I use this: it uses temporary file but goes deeply in a directory until it find a regular file and return it.
# find for a quasi-random file in a directory tree:
# directory to start search from:
ROOT="/";
tmp=/tmp/mytempfile
TARGET="$ROOT"
FILE="";
n=
r=
while [ -e "$TARGET" ]; do
TARGET="$(readlink -f "${TARGET}/$FILE")" ;
if [ -d "$TARGET" ]; then
ls -1 "$TARGET" 2> /dev/null > $tmp || break;
n=$(cat $tmp | wc -l);
if [ $n != 0 ]; then
FILE=$(shuf -n 1 $tmp)
# or if you dont have/want to use shuf:
# r=$(($RANDOM % $n)) ;
# FILE=$(tail -n +$(( $r + 1 )) $tmp | head -n 1);
fi ;
else
if [ -f "$TARGET" ] ; then
rm -f $tmp
echo $TARGET
break;
else
# is not a regular file, restart:
TARGET="$ROOT"
FILE=""
fi
fi
done;
How about a Perl solution slightly doctored from Mr. Kang over here:
How can I shuffle the lines of a text file on the Unix command line or in a shell script?
$ ls | perl -MList::Util=shuffle -e '#lines = shuffle(<>); print
#lines[0..4]'

Bash: Extract filenames by pattern and insert them into an array

I have a list of files within a folder and I want to extract the filenames with the following pattern and insert them into array.
The pattern is that the file name always begin with either "MCABC_" or "MCBBC_" and then a date and then ends with ".csv"
An example would be "MCABC_20110101.csv" , ""MCBBC_20110304.csv"
Right now, I can only come up with the following solution which works but it is not ideal .
ls | grep -E "MCABC_[ A-Za-z0-9]*|MC221_[ A-Za-z0-9]*"
I read that it is bad to use ls. And I should use glob.
I am completely new to bash scripting. How could I extract the filenames with the patterns above and insert it into an array ? Thanks.
Update: Thanks for the answers. Really appreciate your answers. I have the following code
#!/bin/bash
shopt -s nullglob
files=(MC[1-2]21_All_[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9].csv)
echo ${#files[*]}
echo ${files[0]}
And this is the result that I got when I ran bash testing.sh.
: invalid shell option namesh: line 2: shopt: nullglob
1
(MC[1-2]21_All_[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9].csv)
However, if I just ran on the command line files=(MC[1-2]21_All_[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9].csv) and then echo ${files[*]}, I manage to get the output:
MC121_All_20180301.csv MC121_All_20180302.csv MC121_All_20180305.csv MC221_All_20180301.csv MC221_All_20180302.csv MC221_All_20180305.csv
I am very confused. Why is this happening ? (Pls note that I running this on ubuntu within window 10.)
I think you can just populate the array directly using a glob:
files=( MC[AB]BC_[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9].csv )
The "date" part can certainly be improved, since it matches completely invalid dates like 98765432, but maybe that's not a problem.
This will work in BASH.
#!/bin/bash
for file_name in M*
do
line="$line $( printf "${file_name%_*}")"
done
array=( $line )
echo "${array[2]}"
Another way :
#!/bin/bash
declare -a files_array
i=0
for file_name in M*
do
files_array[$i]="$( printf "${file_name%_*}")"
(( i++ ))
done
echo "${files_array[2]}"
Regards!

Extracting a string between last two slashes in Bash

I know this can be easily done using regex like I answered on https://stackoverflow.com/a/33379831/3962126, however I need to do this in bash.
So the closest question on Stackoverflow I found is this one bash: extracting last two dirs for a pathname, however the difference is that if
DIRNAME = /a/b/c/d/e
then I need to extract
d
This may be relatively long, but it's also much faster to execute than most preceding answers (other than the zsh-only one and that by j.a.), since it uses only string manipulations built into bash and uses no subshell expansions:
string='/a/b/c/d/e' # initial data
dir=${string%/*} # trim everything past the last /
dir=${dir##*/} # ...then remove everything before the last / remaining
printf '%s\n' "$dir" # demonstrate output
printf is used in the above because echo doesn't work reliably for all values (think about what it would do on a GNU system with /a/b/c/-n/e).
Here a pure bash solution:
[[ $DIRNAME =~ /([^/]+)/[^/]*$ ]] && printf '%s\n' "${BASH_REMATCH[1]}"
Compared to some of the other answers:
It matches the string between the last two slashes. So, for example, it doesn't match d if DIRNAME=d/e.
It's shorter and fast (just uses built-ins and doesn't create subprocesses).
Support any character between last two slashes (see Charles Duffy's answer for more on this).
Also notice that is not the way to assign a variable in bash:
DIRNAME = /a/b/c/d/e
^ ^
Those spaces are wrong, so remove them:
DIRNAME=/a/b/c/d/e
Using awk:
echo "/a/b/c/d/e" | awk -F / '{ print $(NF-1) }' # d
Edit: This does not work when the path contains newlines, and still gives output when there are less than two slashes, see comments below.
Using sed
if you want to get the fourth element
DIRNAME="/a/b/c/d/e"
echo "$DIRNAME" | sed -r 's_^(/[^/]*){3}/([^/]*)/.*$_\2_g'
if you want to get the before last element
DIRNAME="/a/b/c/d/e"
echo "$DIRNAME" | sed -r 's_^.*/([^/]*)/[^/]*$_\1_g'
OMG, maybe this was obvious, but not to me initially. I got the right result with:
dir=$(basename -- "$(dirname -- "$str")")
echo "$dir"
Using zsh parameter substitution is pretty cool too
echo ${${DIRNAME%/*}##*/}
I think it's faster than the double $() as well, because it won't need any subprocesses.
Basically it slices off the right side first, and then all the remaining left side second.

Match exact word in bash script, extract number from string

I'm trying to create a very simple bash script that will open new link base on the input command
Use case #1
$ ./myscript longname55445
It should take the number 55445 and then assign that to a variable which will later be use to open new link based on the given number.
Use case #2
$ ./myscript l55445
It should do the exact same thing as above by taking the number and then open the same link.
Use case #3
$ ./myscript 55445
If no prefix given then we just simply open that same link as a fallback.
So far this is what I have
#!/bin/sh
BASE_URL=http://api.domain.com
input=$1
command=${input:0:1}
if [ "$command" == "longname" ]; then
number=${input:1:${#input}}
url="$BASE_URL?id="$number
open $url
elseif [ "$command" == "l" ]; then
number=${input:1:${#input}}
url="$BASE_URL?id="$number
open $url
else
number=${input:1:${#input}}
url="$BASE_URL?id="$number
open $url
fi
But this will always fallback to the elseif there.
I'm using zsh at the moment.
input=$1
command=${input:0:1}
sets command to the first character of the first argument. It's not possible for a one character string to be equal to an eight-character string ("longname"), so the if condition must always fail.
Furthermore, both your elseif and your else clauses set
number=${input:1:${#input}}
Which you could have written more simply as
number=${input:1}
But in both cases, you're dropping the first character of input. Presumably in the else case, you wanted the entire first argument.
see whether this construct is helpful for your purpose:
#!/bin/bash
name="longname55445"
echo "${name##*[A-Za-z]}"
this assumes a letter adjacent to number.
The following is NOT another way to write the same, because it is wrong.
Please see comments below by mklement0, who noticed this. Mea culpa.
echo "${name##*[:letter:]}"
You have command=${input:0:1}
It takes the first single char, and you compare it to "longname", of course it will fail, and go to elseif.
The key problem is to check if the input is beginning with l or longnameor nothing. If in one of the 3 cases, take the trailing numbers.
One grep line could do it, you can just grep on input and get the returned text:
kent$ grep -Po '(?<=longname|l|^)\d+' <<<"l234"
234
kent$ grep -Po '(?<=longname|l|^)\d+' <<<"longname234"
234
kent$ grep -Po '(?<=longname|l|^)\d+' <<<"234"
234
kent$ grep -Po '(?<=longname|l|^)\d+' <<<"foobar234"
<we got nothing>
You can use regex matching in bash.
[[ $1 =~ [0-9]+ ]] && number=$BASH_REMATCH
You can also use regex matching in zsh.
[[ $1 =~ [0-9]+ ]] && number=$MATCH
Based on the OP's following clarification in a comment,
I'm only looking for the numbers [...] given in the input.
the solution can be simplified as follows:
#!/bin/bash
BASE_URL='http://api.domain.com'
# Strip all non-digits from the 1st argument to get the desired number.
number=$(tr -dC '[:digit:]' <<<"$1")
open "$BASE_URL?id=$number"
Note the use of a bash shebang, given the use of 'bashism' <<< (which could easily be restated in a POSIX-compliant manner).
Similarly, the OP's original code should use a bash shebang, too, due to use of non-POSIX substring extraction syntax.
However, judging by the use of open to open a URL, the OP appears to be on OSX, where sh is essentially bash (though invocation as sh does change behavior), so it'll still work there. Generally, though, it's safer to be explicit about the required shell.

How can I select random files from a directory in bash?

I have a directory with about 2000 files. How can I select a random sample of N files through using either a bash script or a list of piped commands?
Here's a script that uses GNU sort's random option:
ls |sort -R |tail -$N |while read file; do
# Something involving $file, or you can leave
# off the while to just get the filenames
done
You can use shuf (from the GNU coreutils package) for that. Just feed it a list of file names and ask it to return the first line from a random permutation:
ls dirname | shuf -n 1
# probably faster and more flexible:
find dirname -type f | shuf -n 1
# etc..
Adjust the -n, --head-count=COUNT value to return the number of wanted lines. For example to return 5 random filenames you would use:
find dirname -type f | shuf -n 5
Here are a few possibilities that don't parse the output of ls and that are 100% safe regarding files with spaces and funny symbols in their name. All of them will populate an array randf with a list of random files. This array is easily printed with printf '%s\n' "${randf[#]}" if needed.
This one will possibly output the same file several times, and N needs to be known in advance. Here I chose N=42.
a=( * )
randf=( "${a[RANDOM%${#a[#]}]"{1..42}"}" )
This feature is not very well documented.
If N is not known in advance, but you really liked the previous possibility, you can use eval. But it's evil, and you must really make sure that N doesn't come directly from user input without being thoroughly checked!
N=42
a=( * )
eval randf=( \"\${a[RANDOM%\${#a[#]}]\"\{1..$N\}\"}\" )
I personally dislike eval and hence this answer!
The same using a more straightforward method (a loop):
N=42
a=( * )
randf=()
for((i=0;i<N;++i)); do
randf+=( "${a[RANDOM%${#a[#]}]}" )
done
If you don't want to possibly have several times the same file:
N=42
a=( * )
randf=()
for((i=0;i<N && ${#a[#]};++i)); do
((j=RANDOM%${#a[#]}))
randf+=( "${a[j]}" )
a=( "${a[#]:0:j}" "${a[#]:j+1}" )
done
Note. This is a late answer to an old post, but the accepted answer links to an external page that shows terrible bash practice, and the other answer is not much better as it also parses the output of ls. A comment to the accepted answer points to an excellent answer by Lhunath which obviously shows good practice, but doesn't exactly answer the OP.
ls | shuf -n 10 # ten random files
A simple solution for selecting 5 random files while avoiding to parse ls. It also works with files containing spaces, newlines and other special characters:
shuf -ezn 5 * | xargs -0 -n1 echo
Replace echo with the command you want to execute for your files.
This is an even later response to #gniourf_gniourf's late answer, which I just upvoted because it's by far the best answer, twice over. (Once for avoiding eval and once for safe filename handling.)
But it took me a few minutes to untangle the "not very well documented" feature(s) this answer uses. If your Bash skills are solid enough that you saw immediately how it works, then skip this comment. But I didn't, and having untangled it I think it's worth explaining.
Feature #1 is the shell's own file globbing. a=(*) creates an array, $a, whose members are the files in the current directory. Bash understands all the weirdnesses of filenames, so that list is guaranteed correct, guaranteed escaped, etc. No need to worry about properly parsing textual file names returned by ls.
Feature #2 is Bash parameter expansions for arrays, one nested within another. This starts with ${#ARRAY[#]}, which expands to the length of $ARRAY.
That expansion is then used to subscript the array. The standard way to find a random number between 1 and N is to take the value of random number modulo N. We want a random number between 0 and the length of our array. Here's the approach, broken into two lines for clarity's sake:
LENGTH=${#ARRAY[#]}
RANDOM=${a[RANDOM%$LENGTH]}
But this solution does it in a single line, removing the unnecessary variable assignment.
Feature #3 is Bash brace expansion, although I have to confess I don't entirely understand it. Brace expansion is used, for instance, to generate a list of 25 files named filename1.txt, filename2.txt, etc: echo "filename"{1..25}".txt".
The expression inside the subshell above, "${a[RANDOM%${#a[#]}]"{1..42}"}", uses that trick to produce 42 separate expansions. The brace expansion places a single digit in between the ] and the }, which at first I thought was subscripting the array, but if so it would be preceded by a colon. (It would also have returned 42 consecutive items from a random spot in the array, which is not at all the same thing as returning 42 random items from the array.) I think it's just making the shell run the expansion 42 times, thereby returning 42 random items from the array. (But if someone can explain it more fully, I'd love to hear it.)
The reason N has to be hardcoded (to 42) is that brace expansion happens before variable expansion.
Finally, here's Feature #4, if you want to do this recursively for a directory hierarchy:
shopt -s globstar
a=( ** )
This turns on a shell option that causes ** to match recursively. Now your $a array contains every file in the entire hierarchy.
If you have Python installed (works with either Python 2 or Python 3):
To select one file (or line from an arbitrary command), use
ls -1 | python -c "import sys; import random; print(random.choice(sys.stdin.readlines()).rstrip())"
To select N files/lines, use (note N is at the end of the command, replace this by a number)
ls -1 | python -c "import sys; import random; print(''.join(random.sample(sys.stdin.readlines(), int(sys.argv[1]))).rstrip())" N
If you want to copy a sample of those files to another folder:
ls | shuf -n 100 | xargs -I % cp % ../samples/
make samples directory first obviously.
MacOS does not have the sort -R and shuf commands, so I needed a bash only solution that randomizes all files without duplicates and did not find that here. This solution is similar to gniourf_gniourf's solution #4, but hopefully adds better comments.
The script should be easy to modify to stop after N samples using a counter with if, or gniourf_gniourf's for loop with N. $RANDOM is limited to ~32000 files, but that should do for most cases.
#!/bin/bash
array=(*) # this is the array of files to shuffle
# echo ${array[#]}
for dummy in "${array[#]}"; do # do loop length(array) times; once for each file
length=${#array[#]}
randomi=$(( $RANDOM % $length )) # select a random index
filename=${array[$randomi]}
echo "Processing: '$filename'" # do something with the file
unset -v "array[$randomi]" # set the element at index $randomi to NULL
array=("${array[#]}") # remove NULL elements introduced by unset; copy array
done
If you have more files in your folder, you can use the below piped command I found in unix stackexchange.
find /some/dir/ -type f -print0 | xargs -0 shuf -e -n 8 -z | xargs -0 cp -vt /target/dir/
Here I wanted to copy the files, but if you want to move files or do something else, just change the last command where I have used cp.
This is the only script I can get to play nice with bash on MacOS. I combined and edited snippets from the following two links:
ls command: how can I get a recursive full-path listing, one line per file?
http://www.linuxquestions.org/questions/linux-general-1/is-there-a-bash-command-for-picking-a-random-file-678687/
#!/bin/bash
# Reads a given directory and picks a random file.
# The directory you want to use. You could use "$1" instead if you
# wanted to parametrize it.
DIR="/path/to/"
# DIR="$1"
# Internal Field Separator set to newline, so file names with
# spaces do not break our script.
IFS='
'
if [[ -d "${DIR}" ]]
then
# Runs ls on the given dir, and dumps the output into a matrix,
# it uses the new lines character as a field delimiter, as explained above.
# file_matrix=($(ls -LR "${DIR}"))
file_matrix=($(ls -R $DIR | awk '; /:$/&&f{s=$0;f=0}; /:$/&&!f{sub(/:$/,"");s=$0;f=1;next}; NF&&f{ print s"/"$0 }'))
num_files=${#file_matrix[*]}
# This is the command you want to run on a random file.
# Change "ls -l" by anything you want, it's just an example.
ls -l "${file_matrix[$((RANDOM%num_files))]}"
fi
exit 0
I use this: it uses temporary file but goes deeply in a directory until it find a regular file and return it.
# find for a quasi-random file in a directory tree:
# directory to start search from:
ROOT="/";
tmp=/tmp/mytempfile
TARGET="$ROOT"
FILE="";
n=
r=
while [ -e "$TARGET" ]; do
TARGET="$(readlink -f "${TARGET}/$FILE")" ;
if [ -d "$TARGET" ]; then
ls -1 "$TARGET" 2> /dev/null > $tmp || break;
n=$(cat $tmp | wc -l);
if [ $n != 0 ]; then
FILE=$(shuf -n 1 $tmp)
# or if you dont have/want to use shuf:
# r=$(($RANDOM % $n)) ;
# FILE=$(tail -n +$(( $r + 1 )) $tmp | head -n 1);
fi ;
else
if [ -f "$TARGET" ] ; then
rm -f $tmp
echo $TARGET
break;
else
# is not a regular file, restart:
TARGET="$ROOT"
FILE=""
fi
fi
done;
How about a Perl solution slightly doctored from Mr. Kang over here:
How can I shuffle the lines of a text file on the Unix command line or in a shell script?
$ ls | perl -MList::Util=shuffle -e '#lines = shuffle(<>); print
#lines[0..4]'

Resources