Bash script defining a variable based on time slot [duplicate] - bash

If I have 3 different scripts to run at various times each time a file is written to, how would a bash script be written to run a specific script only at a specific time. This is not as simple as a cron job (though cron could probably swap out the .sh file based on time), I am looking for the time variables.
For instance:
If 9am-11:30 am run scriptA.sh if file.txt is changed.
If 11:30am-5:45pm run scriptB.sh if file is changed.
If 5:46pm-8:59am run scriptC.sh if file is changed.
I asked a similar question but I don't think I was clear enough about the variables I am seeking or how to define them..

The traditional tool for comparing time stamps to determine whether work needs to be performed or not is make. Its default behavior is to calculate a dependency chain for the specified target(s) and determine whether any of the dependent files have changed; if not, the target does not need to be remade. This is a great tool for avoiding recompilation, but it easily extends to other tasks.
In concrete terms, you'd create a flag file (say, .made) and specify it as dependent on your file. Now, if file has changed, .made needs to be recreated, and so make will run the commands you specify to do so. In this scenario, we would run a simple piece of shell script, then touch .made to communicate the latest (successful) run time to future runs.
What remains is for the recipe to run different commands at different times. My approach to that would be a simple case statement. Notice that make interprets dollar signs, so we need to double those dollar signs which should be passed through to the shell.
.made: file
case $$(date +%H:%M) in \
09:* | 10:* | 11:[0-2]? ) \
scriptA.sh ;; \
11:[3-5]? | 1[2-6]:* | 17:[0-3]? | 17:4[0-5]) \
scriptB.sh;; \
17:4[6-9] | 17:5? | 1[89]:* | 2?:* | 0[0-8]:* ) \
scriptC.sh;; \
esac
touch $# # $# expands to the current target
The entire case statement needs to be passed as a single logical line to the shell, so we end up with those pesky backslashes to escape the newlines.
Also notice that make is picky about indentation; each (logical) line in the recipe should be preceded by a literal tab character.
The default behavior of make is to run the first target in the file; this Makefile only contains one target, so make is equivalent to make .made.
Also notice that make cares about exit codes; if scriptA, scriptB, or scriptC could exit with a non-zero exit status, that is regarded by make as a fatal error, and the .made file will not be updated. (You can easily guard against this, though.)

I see there are two issues. 1, how to determine if the current hour is within a particular range. 2, how to determine if a file has been modified, recently.
Here's how I would approach that:
#!/bin/bash
now=$( date +%s )
current_hour=$( date +%H )
file_mod=$( ls -l --time-style=+%s file.txt | awk '{print $(NF-1)}' )
file_age=$(( $now - $file_mod ))
if [[ $current_hour -gt 9 ]] && \
[[ $current_hour -lt 11 ]] && \
[[ $file_age -lt 3600 ]]
then
./scriptA.sh
fi
Here I'm using the bash operators for numeric comparison: -gt, -lt
For greater granularity with time, you would need to compute the amount of time since midnight. Perhaps something like:
hour=$( date +%H )
min=$( date +%M )
minutes=$( echo "( 60 * $hour ) + $min" | bc -l )
eleven_thirty=$( echo "( 60 * 11 ) + 30" | bc -l )

Well, since Bash variables can store string or integer only,
"date" variables are just a string manipulations, like example below:
hours=`date +%H`
minutes=`date +%M`
sum=$(($hours + $minutes))
digit=`echo $sum | sed -e 's/\(^.*\)\(.$\)/\2/'` # counts total digits

Related

How to parse out specific filenames from basename in bash

I'm working on the following script for a research project with my school
for f in $(ls Illumina_Data/Aphyllon/PE150_2016_04_05* ); do
if [[ "${f}" == *"_R1"* ]] ;then
echo "INITIALIZE THE SEQUENCE"
echo `basename " ${f%%_R1*}"`
get_organelle_from_reads.py -1 ${f%%_R1*}_R1_001.fastq.gz \
-2 ${f%%_R1*}_R2_001.fastq.gz \
-o Sequenced_Aphyllon_Data/`basename "${f%%_R1*}"` \
-R 15 -k 21,45,65,85,105 -F embplant_pt
fi
done
What we're getting with this script right now is kinda of a long name and we're wanting it to be shorter for organization sake. If you take a look at the -o command and the section that says Sequenced_Aphyllon_Data/'basename "${f%%_R1*}"'. What this is spitting out is the entire fastq file name that we originally used of the following format
A_speciesname_IDtag_(some set of number and letters)_(some set of numbers and letters)_(some set of number and letters)_(some set of numbers and letters)
The issue I'm having is that we're wanting the A_speciesname_IDtag section to remain, though sometimes our reads don't contain the IDtag section which makes it so we need to parse at either the second or third _ from the left. However there are always four _ from the right without fail.
So is there a way to specifically target an _ from the right of a string? From the right the amount of _ separating what we need will always remain the same but will change from the left.
grep with a lookahead assertion?
$ s1=dog_ID1_a000_b111_c222_d333
$ s2=cat_a000_b111_c222_d333
$ grep -oP ".+(?=_\w+_\w+_\w+_\w+)" <<<$s1
dog_ID1
$ grep -oP ".+(?=_\w+_\w+_\w+_\w+)" <<<$s2
cat

Bash for loop doesn't execute one sentence more that once [duplicate]

I have a directory with about 2000 files. How can I select a random sample of N files through using either a bash script or a list of piped commands?
Here's a script that uses GNU sort's random option:
ls |sort -R |tail -$N |while read file; do
# Something involving $file, or you can leave
# off the while to just get the filenames
done
You can use shuf (from the GNU coreutils package) for that. Just feed it a list of file names and ask it to return the first line from a random permutation:
ls dirname | shuf -n 1
# probably faster and more flexible:
find dirname -type f | shuf -n 1
# etc..
Adjust the -n, --head-count=COUNT value to return the number of wanted lines. For example to return 5 random filenames you would use:
find dirname -type f | shuf -n 5
Here are a few possibilities that don't parse the output of ls and that are 100% safe regarding files with spaces and funny symbols in their name. All of them will populate an array randf with a list of random files. This array is easily printed with printf '%s\n' "${randf[#]}" if needed.
This one will possibly output the same file several times, and N needs to be known in advance. Here I chose N=42.
a=( * )
randf=( "${a[RANDOM%${#a[#]}]"{1..42}"}" )
This feature is not very well documented.
If N is not known in advance, but you really liked the previous possibility, you can use eval. But it's evil, and you must really make sure that N doesn't come directly from user input without being thoroughly checked!
N=42
a=( * )
eval randf=( \"\${a[RANDOM%\${#a[#]}]\"\{1..$N\}\"}\" )
I personally dislike eval and hence this answer!
The same using a more straightforward method (a loop):
N=42
a=( * )
randf=()
for((i=0;i<N;++i)); do
randf+=( "${a[RANDOM%${#a[#]}]}" )
done
If you don't want to possibly have several times the same file:
N=42
a=( * )
randf=()
for((i=0;i<N && ${#a[#]};++i)); do
((j=RANDOM%${#a[#]}))
randf+=( "${a[j]}" )
a=( "${a[#]:0:j}" "${a[#]:j+1}" )
done
Note. This is a late answer to an old post, but the accepted answer links to an external page that shows terrible bash practice, and the other answer is not much better as it also parses the output of ls. A comment to the accepted answer points to an excellent answer by Lhunath which obviously shows good practice, but doesn't exactly answer the OP.
ls | shuf -n 10 # ten random files
A simple solution for selecting 5 random files while avoiding to parse ls. It also works with files containing spaces, newlines and other special characters:
shuf -ezn 5 * | xargs -0 -n1 echo
Replace echo with the command you want to execute for your files.
This is an even later response to #gniourf_gniourf's late answer, which I just upvoted because it's by far the best answer, twice over. (Once for avoiding eval and once for safe filename handling.)
But it took me a few minutes to untangle the "not very well documented" feature(s) this answer uses. If your Bash skills are solid enough that you saw immediately how it works, then skip this comment. But I didn't, and having untangled it I think it's worth explaining.
Feature #1 is the shell's own file globbing. a=(*) creates an array, $a, whose members are the files in the current directory. Bash understands all the weirdnesses of filenames, so that list is guaranteed correct, guaranteed escaped, etc. No need to worry about properly parsing textual file names returned by ls.
Feature #2 is Bash parameter expansions for arrays, one nested within another. This starts with ${#ARRAY[#]}, which expands to the length of $ARRAY.
That expansion is then used to subscript the array. The standard way to find a random number between 1 and N is to take the value of random number modulo N. We want a random number between 0 and the length of our array. Here's the approach, broken into two lines for clarity's sake:
LENGTH=${#ARRAY[#]}
RANDOM=${a[RANDOM%$LENGTH]}
But this solution does it in a single line, removing the unnecessary variable assignment.
Feature #3 is Bash brace expansion, although I have to confess I don't entirely understand it. Brace expansion is used, for instance, to generate a list of 25 files named filename1.txt, filename2.txt, etc: echo "filename"{1..25}".txt".
The expression inside the subshell above, "${a[RANDOM%${#a[#]}]"{1..42}"}", uses that trick to produce 42 separate expansions. The brace expansion places a single digit in between the ] and the }, which at first I thought was subscripting the array, but if so it would be preceded by a colon. (It would also have returned 42 consecutive items from a random spot in the array, which is not at all the same thing as returning 42 random items from the array.) I think it's just making the shell run the expansion 42 times, thereby returning 42 random items from the array. (But if someone can explain it more fully, I'd love to hear it.)
The reason N has to be hardcoded (to 42) is that brace expansion happens before variable expansion.
Finally, here's Feature #4, if you want to do this recursively for a directory hierarchy:
shopt -s globstar
a=( ** )
This turns on a shell option that causes ** to match recursively. Now your $a array contains every file in the entire hierarchy.
If you have Python installed (works with either Python 2 or Python 3):
To select one file (or line from an arbitrary command), use
ls -1 | python -c "import sys; import random; print(random.choice(sys.stdin.readlines()).rstrip())"
To select N files/lines, use (note N is at the end of the command, replace this by a number)
ls -1 | python -c "import sys; import random; print(''.join(random.sample(sys.stdin.readlines(), int(sys.argv[1]))).rstrip())" N
If you want to copy a sample of those files to another folder:
ls | shuf -n 100 | xargs -I % cp % ../samples/
make samples directory first obviously.
MacOS does not have the sort -R and shuf commands, so I needed a bash only solution that randomizes all files without duplicates and did not find that here. This solution is similar to gniourf_gniourf's solution #4, but hopefully adds better comments.
The script should be easy to modify to stop after N samples using a counter with if, or gniourf_gniourf's for loop with N. $RANDOM is limited to ~32000 files, but that should do for most cases.
#!/bin/bash
array=(*) # this is the array of files to shuffle
# echo ${array[#]}
for dummy in "${array[#]}"; do # do loop length(array) times; once for each file
length=${#array[#]}
randomi=$(( $RANDOM % $length )) # select a random index
filename=${array[$randomi]}
echo "Processing: '$filename'" # do something with the file
unset -v "array[$randomi]" # set the element at index $randomi to NULL
array=("${array[#]}") # remove NULL elements introduced by unset; copy array
done
If you have more files in your folder, you can use the below piped command I found in unix stackexchange.
find /some/dir/ -type f -print0 | xargs -0 shuf -e -n 8 -z | xargs -0 cp -vt /target/dir/
Here I wanted to copy the files, but if you want to move files or do something else, just change the last command where I have used cp.
This is the only script I can get to play nice with bash on MacOS. I combined and edited snippets from the following two links:
ls command: how can I get a recursive full-path listing, one line per file?
http://www.linuxquestions.org/questions/linux-general-1/is-there-a-bash-command-for-picking-a-random-file-678687/
#!/bin/bash
# Reads a given directory and picks a random file.
# The directory you want to use. You could use "$1" instead if you
# wanted to parametrize it.
DIR="/path/to/"
# DIR="$1"
# Internal Field Separator set to newline, so file names with
# spaces do not break our script.
IFS='
'
if [[ -d "${DIR}" ]]
then
# Runs ls on the given dir, and dumps the output into a matrix,
# it uses the new lines character as a field delimiter, as explained above.
# file_matrix=($(ls -LR "${DIR}"))
file_matrix=($(ls -R $DIR | awk '; /:$/&&f{s=$0;f=0}; /:$/&&!f{sub(/:$/,"");s=$0;f=1;next}; NF&&f{ print s"/"$0 }'))
num_files=${#file_matrix[*]}
# This is the command you want to run on a random file.
# Change "ls -l" by anything you want, it's just an example.
ls -l "${file_matrix[$((RANDOM%num_files))]}"
fi
exit 0
I use this: it uses temporary file but goes deeply in a directory until it find a regular file and return it.
# find for a quasi-random file in a directory tree:
# directory to start search from:
ROOT="/";
tmp=/tmp/mytempfile
TARGET="$ROOT"
FILE="";
n=
r=
while [ -e "$TARGET" ]; do
TARGET="$(readlink -f "${TARGET}/$FILE")" ;
if [ -d "$TARGET" ]; then
ls -1 "$TARGET" 2> /dev/null > $tmp || break;
n=$(cat $tmp | wc -l);
if [ $n != 0 ]; then
FILE=$(shuf -n 1 $tmp)
# or if you dont have/want to use shuf:
# r=$(($RANDOM % $n)) ;
# FILE=$(tail -n +$(( $r + 1 )) $tmp | head -n 1);
fi ;
else
if [ -f "$TARGET" ] ; then
rm -f $tmp
echo $TARGET
break;
else
# is not a regular file, restart:
TARGET="$ROOT"
FILE=""
fi
fi
done;
How about a Perl solution slightly doctored from Mr. Kang over here:
How can I shuffle the lines of a text file on the Unix command line or in a shell script?
$ ls | perl -MList::Util=shuffle -e '#lines = shuffle(<>); print
#lines[0..4]'

Time Variables in Bash Scripting

If I have 3 different scripts to run at various times each time a file is written to, how would a bash script be written to run a specific script only at a specific time. This is not as simple as a cron job (though cron could probably swap out the .sh file based on time), I am looking for the time variables.
For instance:
If 9am-11:30 am run scriptA.sh if file.txt is changed.
If 11:30am-5:45pm run scriptB.sh if file is changed.
If 5:46pm-8:59am run scriptC.sh if file is changed.
I asked a similar question but I don't think I was clear enough about the variables I am seeking or how to define them..
The traditional tool for comparing time stamps to determine whether work needs to be performed or not is make. Its default behavior is to calculate a dependency chain for the specified target(s) and determine whether any of the dependent files have changed; if not, the target does not need to be remade. This is a great tool for avoiding recompilation, but it easily extends to other tasks.
In concrete terms, you'd create a flag file (say, .made) and specify it as dependent on your file. Now, if file has changed, .made needs to be recreated, and so make will run the commands you specify to do so. In this scenario, we would run a simple piece of shell script, then touch .made to communicate the latest (successful) run time to future runs.
What remains is for the recipe to run different commands at different times. My approach to that would be a simple case statement. Notice that make interprets dollar signs, so we need to double those dollar signs which should be passed through to the shell.
.made: file
case $$(date +%H:%M) in \
09:* | 10:* | 11:[0-2]? ) \
scriptA.sh ;; \
11:[3-5]? | 1[2-6]:* | 17:[0-3]? | 17:4[0-5]) \
scriptB.sh;; \
17:4[6-9] | 17:5? | 1[89]:* | 2?:* | 0[0-8]:* ) \
scriptC.sh;; \
esac
touch $# # $# expands to the current target
The entire case statement needs to be passed as a single logical line to the shell, so we end up with those pesky backslashes to escape the newlines.
Also notice that make is picky about indentation; each (logical) line in the recipe should be preceded by a literal tab character.
The default behavior of make is to run the first target in the file; this Makefile only contains one target, so make is equivalent to make .made.
Also notice that make cares about exit codes; if scriptA, scriptB, or scriptC could exit with a non-zero exit status, that is regarded by make as a fatal error, and the .made file will not be updated. (You can easily guard against this, though.)
I see there are two issues. 1, how to determine if the current hour is within a particular range. 2, how to determine if a file has been modified, recently.
Here's how I would approach that:
#!/bin/bash
now=$( date +%s )
current_hour=$( date +%H )
file_mod=$( ls -l --time-style=+%s file.txt | awk '{print $(NF-1)}' )
file_age=$(( $now - $file_mod ))
if [[ $current_hour -gt 9 ]] && \
[[ $current_hour -lt 11 ]] && \
[[ $file_age -lt 3600 ]]
then
./scriptA.sh
fi
Here I'm using the bash operators for numeric comparison: -gt, -lt
For greater granularity with time, you would need to compute the amount of time since midnight. Perhaps something like:
hour=$( date +%H )
min=$( date +%M )
minutes=$( echo "( 60 * $hour ) + $min" | bc -l )
eleven_thirty=$( echo "( 60 * 11 ) + 30" | bc -l )
Well, since Bash variables can store string or integer only,
"date" variables are just a string manipulations, like example below:
hours=`date +%H`
minutes=`date +%M`
sum=$(($hours + $minutes))
digit=`echo $sum | sed -e 's/\(^.*\)\(.$\)/\2/'` # counts total digits

Bourne Shell Scripting -- simple for loop syntax

I'm not entirely new to programming, but I'm not exactly experienced. I want to write small shell script for practice.
Here's what I have so far:
#!/bin/sh
name=$0
links=$3
owner=$4
if [ $# -ne 1 ]
then
echo "Usage: $0 <directory>"
exit 1
fi
if [ ! -e $1 ]
then
echo "$1 not found"
exit 1
elif [ -d $1 ]
then
echo "Name\t\tLinks\t\tOwner\t\tDate"
echo "$name\t$links\t$owner\t$date"
exit 0
fi
Basically what I'm trying to do is have the script go through all of the files in a specified directory and then display the name of each file with the amount of links it has, its owner, and the date it was created. What would be the syntax for displaying the date of creation or at least the date of last modification of the file?
Another thing is, what is the syntax for creating a for loop? From what I understand I would have to write something like for $1 in $1 ($1 being all of the files in the directory the user typed in correct?) and then go through checking each file and displaying the information for each one. How would I start and end the for loop (what is the syntax for this?).
As you can see I'm not very familiar bourne shell programming. If you have any helpful websites or have a better way of approaching this please show me!
Syntax for a for loop:
for var in list
do
echo $var
done
for example:
for var in *
do
echo $var
done
What you might want to consider however is something like this:
ls -l | while read perms links owner group size date1 date2 time filename
do
echo $filename
done
which splits the output of ls -l into fields on-the-fly so you don't need to do any splitting yourself.
The field-splitting is controlled by the shell-variable IFS, which by default contains a space, tab and newline. If you change this in a shell script, remember to change it back. Thus by changing the value of IFS you can, for example, parse CSV files by setting this to a comma. this example reads three fields from a CSV and spits out the 2nd and 3rd only (it's effectively the shell equivalent of cut -d, -f2,3 inputfile.csv)
oldifs=$IFS
IFS=","
while read field1 field2 field3
do
echo $field2 $field3
done < inputfile.csv
IFS=oldifs
(note: you don't need to revert IFS, but I generally do to make sure that further text processing in a script isn't affected after I'm done with it).
Plenty of documentation out the on both for and while loops; just google for it :-)
$1 is the first positional parameter, so $3 is the third and $4 is the fourth. They have nothing to do with the directory (or its files) the script was started from. If your script was started using this, for example:
./script.sh apple banana cherry date elderberry
then the variable $1 would equal "apple" and so on. The special parameter $# is the count of positional parameters, which in this case would be five.
The name of the script is contained in $0 and $* and $# are arrays that contain all the positional parameters which behave differently depending on whether they appear in quotes.
You can refer to the positional parameters using a substring-style index:
${#:2:1}
would give "banana" using the example above. And:
${#: -1}
or
${#:$#}
would give the last ("elderberry"). Note that the space before the minus sign is required in this context.
You might want to look at Advanced Bash-Scripting Guide. It has a section that explains loops.
I suggest to use find with the option -printf "%P\t%n\t%u\t%t"
for x in "$#"; do
echo "$x"
done
The "$#" protects any whitespace in supplied file names. Obviously, do your real work in place of "echo $x", which isn't doing much. But $# is all the junk supplied on the command line to your script.
But also, your script bails out if $# is not equal to 1, but you're apparently fully expecting up to 4 arguments (hence the $4 you reference in the early part of your script).
assuming you have GNU find on your system
find /path -type f -printf "filename: %f | hardlinks: %n| owner: %u | time: %TH %Tb %TY\n"

How can I select random files from a directory in bash?

I have a directory with about 2000 files. How can I select a random sample of N files through using either a bash script or a list of piped commands?
Here's a script that uses GNU sort's random option:
ls |sort -R |tail -$N |while read file; do
# Something involving $file, or you can leave
# off the while to just get the filenames
done
You can use shuf (from the GNU coreutils package) for that. Just feed it a list of file names and ask it to return the first line from a random permutation:
ls dirname | shuf -n 1
# probably faster and more flexible:
find dirname -type f | shuf -n 1
# etc..
Adjust the -n, --head-count=COUNT value to return the number of wanted lines. For example to return 5 random filenames you would use:
find dirname -type f | shuf -n 5
Here are a few possibilities that don't parse the output of ls and that are 100% safe regarding files with spaces and funny symbols in their name. All of them will populate an array randf with a list of random files. This array is easily printed with printf '%s\n' "${randf[#]}" if needed.
This one will possibly output the same file several times, and N needs to be known in advance. Here I chose N=42.
a=( * )
randf=( "${a[RANDOM%${#a[#]}]"{1..42}"}" )
This feature is not very well documented.
If N is not known in advance, but you really liked the previous possibility, you can use eval. But it's evil, and you must really make sure that N doesn't come directly from user input without being thoroughly checked!
N=42
a=( * )
eval randf=( \"\${a[RANDOM%\${#a[#]}]\"\{1..$N\}\"}\" )
I personally dislike eval and hence this answer!
The same using a more straightforward method (a loop):
N=42
a=( * )
randf=()
for((i=0;i<N;++i)); do
randf+=( "${a[RANDOM%${#a[#]}]}" )
done
If you don't want to possibly have several times the same file:
N=42
a=( * )
randf=()
for((i=0;i<N && ${#a[#]};++i)); do
((j=RANDOM%${#a[#]}))
randf+=( "${a[j]}" )
a=( "${a[#]:0:j}" "${a[#]:j+1}" )
done
Note. This is a late answer to an old post, but the accepted answer links to an external page that shows terrible bash practice, and the other answer is not much better as it also parses the output of ls. A comment to the accepted answer points to an excellent answer by Lhunath which obviously shows good practice, but doesn't exactly answer the OP.
ls | shuf -n 10 # ten random files
A simple solution for selecting 5 random files while avoiding to parse ls. It also works with files containing spaces, newlines and other special characters:
shuf -ezn 5 * | xargs -0 -n1 echo
Replace echo with the command you want to execute for your files.
This is an even later response to #gniourf_gniourf's late answer, which I just upvoted because it's by far the best answer, twice over. (Once for avoiding eval and once for safe filename handling.)
But it took me a few minutes to untangle the "not very well documented" feature(s) this answer uses. If your Bash skills are solid enough that you saw immediately how it works, then skip this comment. But I didn't, and having untangled it I think it's worth explaining.
Feature #1 is the shell's own file globbing. a=(*) creates an array, $a, whose members are the files in the current directory. Bash understands all the weirdnesses of filenames, so that list is guaranteed correct, guaranteed escaped, etc. No need to worry about properly parsing textual file names returned by ls.
Feature #2 is Bash parameter expansions for arrays, one nested within another. This starts with ${#ARRAY[#]}, which expands to the length of $ARRAY.
That expansion is then used to subscript the array. The standard way to find a random number between 1 and N is to take the value of random number modulo N. We want a random number between 0 and the length of our array. Here's the approach, broken into two lines for clarity's sake:
LENGTH=${#ARRAY[#]}
RANDOM=${a[RANDOM%$LENGTH]}
But this solution does it in a single line, removing the unnecessary variable assignment.
Feature #3 is Bash brace expansion, although I have to confess I don't entirely understand it. Brace expansion is used, for instance, to generate a list of 25 files named filename1.txt, filename2.txt, etc: echo "filename"{1..25}".txt".
The expression inside the subshell above, "${a[RANDOM%${#a[#]}]"{1..42}"}", uses that trick to produce 42 separate expansions. The brace expansion places a single digit in between the ] and the }, which at first I thought was subscripting the array, but if so it would be preceded by a colon. (It would also have returned 42 consecutive items from a random spot in the array, which is not at all the same thing as returning 42 random items from the array.) I think it's just making the shell run the expansion 42 times, thereby returning 42 random items from the array. (But if someone can explain it more fully, I'd love to hear it.)
The reason N has to be hardcoded (to 42) is that brace expansion happens before variable expansion.
Finally, here's Feature #4, if you want to do this recursively for a directory hierarchy:
shopt -s globstar
a=( ** )
This turns on a shell option that causes ** to match recursively. Now your $a array contains every file in the entire hierarchy.
If you have Python installed (works with either Python 2 or Python 3):
To select one file (or line from an arbitrary command), use
ls -1 | python -c "import sys; import random; print(random.choice(sys.stdin.readlines()).rstrip())"
To select N files/lines, use (note N is at the end of the command, replace this by a number)
ls -1 | python -c "import sys; import random; print(''.join(random.sample(sys.stdin.readlines(), int(sys.argv[1]))).rstrip())" N
If you want to copy a sample of those files to another folder:
ls | shuf -n 100 | xargs -I % cp % ../samples/
make samples directory first obviously.
MacOS does not have the sort -R and shuf commands, so I needed a bash only solution that randomizes all files without duplicates and did not find that here. This solution is similar to gniourf_gniourf's solution #4, but hopefully adds better comments.
The script should be easy to modify to stop after N samples using a counter with if, or gniourf_gniourf's for loop with N. $RANDOM is limited to ~32000 files, but that should do for most cases.
#!/bin/bash
array=(*) # this is the array of files to shuffle
# echo ${array[#]}
for dummy in "${array[#]}"; do # do loop length(array) times; once for each file
length=${#array[#]}
randomi=$(( $RANDOM % $length )) # select a random index
filename=${array[$randomi]}
echo "Processing: '$filename'" # do something with the file
unset -v "array[$randomi]" # set the element at index $randomi to NULL
array=("${array[#]}") # remove NULL elements introduced by unset; copy array
done
If you have more files in your folder, you can use the below piped command I found in unix stackexchange.
find /some/dir/ -type f -print0 | xargs -0 shuf -e -n 8 -z | xargs -0 cp -vt /target/dir/
Here I wanted to copy the files, but if you want to move files or do something else, just change the last command where I have used cp.
This is the only script I can get to play nice with bash on MacOS. I combined and edited snippets from the following two links:
ls command: how can I get a recursive full-path listing, one line per file?
http://www.linuxquestions.org/questions/linux-general-1/is-there-a-bash-command-for-picking-a-random-file-678687/
#!/bin/bash
# Reads a given directory and picks a random file.
# The directory you want to use. You could use "$1" instead if you
# wanted to parametrize it.
DIR="/path/to/"
# DIR="$1"
# Internal Field Separator set to newline, so file names with
# spaces do not break our script.
IFS='
'
if [[ -d "${DIR}" ]]
then
# Runs ls on the given dir, and dumps the output into a matrix,
# it uses the new lines character as a field delimiter, as explained above.
# file_matrix=($(ls -LR "${DIR}"))
file_matrix=($(ls -R $DIR | awk '; /:$/&&f{s=$0;f=0}; /:$/&&!f{sub(/:$/,"");s=$0;f=1;next}; NF&&f{ print s"/"$0 }'))
num_files=${#file_matrix[*]}
# This is the command you want to run on a random file.
# Change "ls -l" by anything you want, it's just an example.
ls -l "${file_matrix[$((RANDOM%num_files))]}"
fi
exit 0
I use this: it uses temporary file but goes deeply in a directory until it find a regular file and return it.
# find for a quasi-random file in a directory tree:
# directory to start search from:
ROOT="/";
tmp=/tmp/mytempfile
TARGET="$ROOT"
FILE="";
n=
r=
while [ -e "$TARGET" ]; do
TARGET="$(readlink -f "${TARGET}/$FILE")" ;
if [ -d "$TARGET" ]; then
ls -1 "$TARGET" 2> /dev/null > $tmp || break;
n=$(cat $tmp | wc -l);
if [ $n != 0 ]; then
FILE=$(shuf -n 1 $tmp)
# or if you dont have/want to use shuf:
# r=$(($RANDOM % $n)) ;
# FILE=$(tail -n +$(( $r + 1 )) $tmp | head -n 1);
fi ;
else
if [ -f "$TARGET" ] ; then
rm -f $tmp
echo $TARGET
break;
else
# is not a regular file, restart:
TARGET="$ROOT"
FILE=""
fi
fi
done;
How about a Perl solution slightly doctored from Mr. Kang over here:
How can I shuffle the lines of a text file on the Unix command line or in a shell script?
$ ls | perl -MList::Util=shuffle -e '#lines = shuffle(<>); print
#lines[0..4]'

Resources