I have multiple svg files....and I need to rename them by adding 1.
0.svg --> 1.svg
1.svg --> 2.svg
2.svg --> 3.svg
etc...
What would be the best way to do this using the linux terminal?
The trick is to process the files backwards so you don't overwrite existing files while renaming. Use parameter expansion to extract the numbers from the file names.
#!/bin/bash
files=(?.svg)
for (( i = ${#files[#]} - 1; i >= 0; --i )) ; do
n=${files[i]%.svg}
mv $n.svg $(( n + 1 )).svg
done
If the files can have names of different length (e.g. 9.svg, 10.svg) the solution will be more complex, as you need to sort the files numerically rather than lexicographically.
Considering the case that the filename numbers have multiple digits, please try the following:
while IFS= read -r num; do
new="$(( num + 1 )).svg"
mv -- "$num.svg" "$new"
done < <(
for f in *.svg; do
n=${f%.svg}
echo "$n"
done | sort -rn
)
This Shellcheck-clean code is intended to operate safely and cleanly no matter what is in the current directory:
#! /bin/bash -p
shopt -s nullglob # Globs that match nothing expand to nothing
shopt -s extglob # Enable extended globbing (+(...), ...)
# Put the file base numbers in a sparse array.
# (Bash automatically keeps such arrays sorted by increasing indices.)
sparse_basenums=()
for svgfile in +([0-9]).svg ; do
# Skip files with extra leading zeros (e.g. '09.svg')
[[ $svgfile == 0[0-9]*.svg ]] && continue
basenum=${svgfile%.svg}
sparse_basenums[$basenum]=$basenum
done
# Convert the sparse array to a non-sparse array (preserving order)
# so it can be processed in reverse order with a 'for' loop
basenums=( "${sparse_basenums[#]}" )
# Process the files in reverse (i.e. decreasing) order by base number
for ((i=${#basenums[*]}-1; i>=0; i--)) ; do
basenum=${basenums[i]}
mv -i -- "$basenum.svg" "$((basenum+1)).svg"
done
shopt -s nullglob prevents bad behaviour if the directory doesn't contain any files whose names are a decimal number followed by '.svg'. Without it the code would try to process a file called '+([0-9]).svg'.
shopt -s extglob enables a richer set of globbing patterns than the default. See the 'extglob' section in glob - Greg's Wiki for details.
The usefulness of sparse_basenums depends on the fact that Bash arrays can have arbitrary non-negative integer indices, that arrays with gaps in their indices are stored efficiently (sparse arrays), and that elements in arrays are always stored in order of increasing index. See Arrays (Bash Reference Manual) for more information.
The code skips files whose names have extra leading zeros ('09.svg', but not '0.svg') because it can't handle them safely as it is now. Trying to treat '09' as a number causes an error because it's treated as an illegal octal number. That is easily fixable, but there could still be problems if, for instance, you had both '9.svg' and '09.svg' (they would both naturally be renamed to '10.svg').
The code uses mv -i to prompt for user input in case something goes wrong and it tries to rename a file to one that already exists.
Note that the code will silently do the wrong thing (due to arithmetic overflow) if the numbers are too big (e.g. '99999999999999999999.svg'). The problem is fixable.
Related
I have a directory with about 2000 files. How can I select a random sample of N files through using either a bash script or a list of piped commands?
Here's a script that uses GNU sort's random option:
ls |sort -R |tail -$N |while read file; do
# Something involving $file, or you can leave
# off the while to just get the filenames
done
You can use shuf (from the GNU coreutils package) for that. Just feed it a list of file names and ask it to return the first line from a random permutation:
ls dirname | shuf -n 1
# probably faster and more flexible:
find dirname -type f | shuf -n 1
# etc..
Adjust the -n, --head-count=COUNT value to return the number of wanted lines. For example to return 5 random filenames you would use:
find dirname -type f | shuf -n 5
Here are a few possibilities that don't parse the output of ls and that are 100% safe regarding files with spaces and funny symbols in their name. All of them will populate an array randf with a list of random files. This array is easily printed with printf '%s\n' "${randf[#]}" if needed.
This one will possibly output the same file several times, and N needs to be known in advance. Here I chose N=42.
a=( * )
randf=( "${a[RANDOM%${#a[#]}]"{1..42}"}" )
This feature is not very well documented.
If N is not known in advance, but you really liked the previous possibility, you can use eval. But it's evil, and you must really make sure that N doesn't come directly from user input without being thoroughly checked!
N=42
a=( * )
eval randf=( \"\${a[RANDOM%\${#a[#]}]\"\{1..$N\}\"}\" )
I personally dislike eval and hence this answer!
The same using a more straightforward method (a loop):
N=42
a=( * )
randf=()
for((i=0;i<N;++i)); do
randf+=( "${a[RANDOM%${#a[#]}]}" )
done
If you don't want to possibly have several times the same file:
N=42
a=( * )
randf=()
for((i=0;i<N && ${#a[#]};++i)); do
((j=RANDOM%${#a[#]}))
randf+=( "${a[j]}" )
a=( "${a[#]:0:j}" "${a[#]:j+1}" )
done
Note. This is a late answer to an old post, but the accepted answer links to an external page that shows terrible bash practice, and the other answer is not much better as it also parses the output of ls. A comment to the accepted answer points to an excellent answer by Lhunath which obviously shows good practice, but doesn't exactly answer the OP.
ls | shuf -n 10 # ten random files
A simple solution for selecting 5 random files while avoiding to parse ls. It also works with files containing spaces, newlines and other special characters:
shuf -ezn 5 * | xargs -0 -n1 echo
Replace echo with the command you want to execute for your files.
This is an even later response to #gniourf_gniourf's late answer, which I just upvoted because it's by far the best answer, twice over. (Once for avoiding eval and once for safe filename handling.)
But it took me a few minutes to untangle the "not very well documented" feature(s) this answer uses. If your Bash skills are solid enough that you saw immediately how it works, then skip this comment. But I didn't, and having untangled it I think it's worth explaining.
Feature #1 is the shell's own file globbing. a=(*) creates an array, $a, whose members are the files in the current directory. Bash understands all the weirdnesses of filenames, so that list is guaranteed correct, guaranteed escaped, etc. No need to worry about properly parsing textual file names returned by ls.
Feature #2 is Bash parameter expansions for arrays, one nested within another. This starts with ${#ARRAY[#]}, which expands to the length of $ARRAY.
That expansion is then used to subscript the array. The standard way to find a random number between 1 and N is to take the value of random number modulo N. We want a random number between 0 and the length of our array. Here's the approach, broken into two lines for clarity's sake:
LENGTH=${#ARRAY[#]}
RANDOM=${a[RANDOM%$LENGTH]}
But this solution does it in a single line, removing the unnecessary variable assignment.
Feature #3 is Bash brace expansion, although I have to confess I don't entirely understand it. Brace expansion is used, for instance, to generate a list of 25 files named filename1.txt, filename2.txt, etc: echo "filename"{1..25}".txt".
The expression inside the subshell above, "${a[RANDOM%${#a[#]}]"{1..42}"}", uses that trick to produce 42 separate expansions. The brace expansion places a single digit in between the ] and the }, which at first I thought was subscripting the array, but if so it would be preceded by a colon. (It would also have returned 42 consecutive items from a random spot in the array, which is not at all the same thing as returning 42 random items from the array.) I think it's just making the shell run the expansion 42 times, thereby returning 42 random items from the array. (But if someone can explain it more fully, I'd love to hear it.)
The reason N has to be hardcoded (to 42) is that brace expansion happens before variable expansion.
Finally, here's Feature #4, if you want to do this recursively for a directory hierarchy:
shopt -s globstar
a=( ** )
This turns on a shell option that causes ** to match recursively. Now your $a array contains every file in the entire hierarchy.
If you have Python installed (works with either Python 2 or Python 3):
To select one file (or line from an arbitrary command), use
ls -1 | python -c "import sys; import random; print(random.choice(sys.stdin.readlines()).rstrip())"
To select N files/lines, use (note N is at the end of the command, replace this by a number)
ls -1 | python -c "import sys; import random; print(''.join(random.sample(sys.stdin.readlines(), int(sys.argv[1]))).rstrip())" N
If you want to copy a sample of those files to another folder:
ls | shuf -n 100 | xargs -I % cp % ../samples/
make samples directory first obviously.
MacOS does not have the sort -R and shuf commands, so I needed a bash only solution that randomizes all files without duplicates and did not find that here. This solution is similar to gniourf_gniourf's solution #4, but hopefully adds better comments.
The script should be easy to modify to stop after N samples using a counter with if, or gniourf_gniourf's for loop with N. $RANDOM is limited to ~32000 files, but that should do for most cases.
#!/bin/bash
array=(*) # this is the array of files to shuffle
# echo ${array[#]}
for dummy in "${array[#]}"; do # do loop length(array) times; once for each file
length=${#array[#]}
randomi=$(( $RANDOM % $length )) # select a random index
filename=${array[$randomi]}
echo "Processing: '$filename'" # do something with the file
unset -v "array[$randomi]" # set the element at index $randomi to NULL
array=("${array[#]}") # remove NULL elements introduced by unset; copy array
done
If you have more files in your folder, you can use the below piped command I found in unix stackexchange.
find /some/dir/ -type f -print0 | xargs -0 shuf -e -n 8 -z | xargs -0 cp -vt /target/dir/
Here I wanted to copy the files, but if you want to move files or do something else, just change the last command where I have used cp.
This is the only script I can get to play nice with bash on MacOS. I combined and edited snippets from the following two links:
ls command: how can I get a recursive full-path listing, one line per file?
http://www.linuxquestions.org/questions/linux-general-1/is-there-a-bash-command-for-picking-a-random-file-678687/
#!/bin/bash
# Reads a given directory and picks a random file.
# The directory you want to use. You could use "$1" instead if you
# wanted to parametrize it.
DIR="/path/to/"
# DIR="$1"
# Internal Field Separator set to newline, so file names with
# spaces do not break our script.
IFS='
'
if [[ -d "${DIR}" ]]
then
# Runs ls on the given dir, and dumps the output into a matrix,
# it uses the new lines character as a field delimiter, as explained above.
# file_matrix=($(ls -LR "${DIR}"))
file_matrix=($(ls -R $DIR | awk '; /:$/&&f{s=$0;f=0}; /:$/&&!f{sub(/:$/,"");s=$0;f=1;next}; NF&&f{ print s"/"$0 }'))
num_files=${#file_matrix[*]}
# This is the command you want to run on a random file.
# Change "ls -l" by anything you want, it's just an example.
ls -l "${file_matrix[$((RANDOM%num_files))]}"
fi
exit 0
I use this: it uses temporary file but goes deeply in a directory until it find a regular file and return it.
# find for a quasi-random file in a directory tree:
# directory to start search from:
ROOT="/";
tmp=/tmp/mytempfile
TARGET="$ROOT"
FILE="";
n=
r=
while [ -e "$TARGET" ]; do
TARGET="$(readlink -f "${TARGET}/$FILE")" ;
if [ -d "$TARGET" ]; then
ls -1 "$TARGET" 2> /dev/null > $tmp || break;
n=$(cat $tmp | wc -l);
if [ $n != 0 ]; then
FILE=$(shuf -n 1 $tmp)
# or if you dont have/want to use shuf:
# r=$(($RANDOM % $n)) ;
# FILE=$(tail -n +$(( $r + 1 )) $tmp | head -n 1);
fi ;
else
if [ -f "$TARGET" ] ; then
rm -f $tmp
echo $TARGET
break;
else
# is not a regular file, restart:
TARGET="$ROOT"
FILE=""
fi
fi
done;
How about a Perl solution slightly doctored from Mr. Kang over here:
How can I shuffle the lines of a text file on the Unix command line or in a shell script?
$ ls | perl -MList::Util=shuffle -e '#lines = shuffle(<>); print
#lines[0..4]'
I have a bunch of files in the same directory with names like:
IMG_20160824_132614.jpg
IMG_20160824_132658.jpg
IMG_20160824_132738.jpg
The middle section is the date and last section is time the photo was taken. So if I were to sort these files by their name the result would be the same as sorting by date/time modified
I'd like to batch rename these files using bash to something of the form:
1-x-3.jpg
Where the x represents the place of the file in the sequential ordering (ordered by name/time modified)
So the 3 examples above would be renamed to:
1-1-3.jpg
1-2-3.jpg
1-3-3.jpg
Is there a bash command that can achieve this? Or is a script required?
Try:
i=1; for f in *.jpg; do mv "$f" "1-$((i++))-3.jpg"; done
For example, using your file names:
$ ls
IMG_20160824_132614.jpg IMG_20160824_132658.jpg IMG_20160824_132738.jpg
$ i=1; for f in *.jpg; do mv "$f" "1-$((i++))-3.jpg"; done
$ ls
1-1-3.jpg 1-2-3.jpg 1-3-3.jpg
Notes:
When expanding *.jpg, the shell lists the files in alphanumeric order. This seems to be what you want. Note, though, that alphanumeric order can depend on locale.
The sequential numbering is done with $((i++)). Here, $((...)) represents arithmetic expansion. ++ simply means increment the variable by 1.
This question already has answers here:
Checking from shell script if a directory contains files
(30 answers)
Closed 2 years ago.
I have the following bash script:
if ls /Users/david/Desktop/empty > /dev/null
then
echo 'yes -- files'
else
echo 'no -- files'
fi
How would I modify the top line such that it evaluates true if there are one or more files in the /Users/david/Desktop/empty dir?
This is covered in detail in BashFAQ #004. Notably, use of ls for this purpose is an antipattern and should be avoided.
shopt -s dotglob # if including hidden files is desired
files=( "$dir"/* )
[[ -e $files || -L $files ]] && echo "Directory is not empty"
[[ -e $files ]] doesn't actually check if the entire array's contents exist; rather, it checks the first name returned -- which handles the case when no files match, wherein the glob expression itself is returned as the sole result.
Notably:
This is far faster than invoking ls, which requires using fork() to spawn a subshell, execve() to replace that subshell with /bin/ls, the operating system's dynamic linker to load shared libraries used by the ls binary, etc, etc. [An exception to this is extremely large directories, of tens of thousands of files -- a case in which ls will also be slow; see the find-based solution below for those].
This is more correct than invoking ls: The list of files returned by globbing is guaranteed to exactly match the literal names of files, whereas ls can munge names with hidden characters. If the first entry is a valid filename, "${files[#]}" can be safely iterated over with assurance that each returned value will be a name, and there's no need to worry about filesystems with literal newlines in their names inflating the count if the local ls implementation does not escape them.
That said, an alternative approach is to use find, if you have one with the -empty extension (available both from GNU find and from modern BSDs including Mac OS):
[[ $(find -H "$dir" -maxdepth 0 -type d -empty) ]] || echo "Directory is not empty"
...if any result is given, the directory is nonempty. While slower than globbing on directories which are not unusually large, this is faster than either ls or globbing for extremely large directories not present in the direntry cache, as it can return results without a full scan.
Robust pure Bash solutions:
For background on why a pure Bash solution with globbing is superior to using ls, see Charles Duffy's helpful answer, which also contains a find-based alternative, which is much faster and less memory-intensive with large directories.[1]
Also consider anubhava's equally fast and memory-efficient stat-based answer, which, however, requires distinct syntax forms on Linux and BSD/OSX.
Updated to a simpler solution, gratefully adapted from this answer.
# EXCLUDING hidden files and folders - note the *quoted* use of glob '*'
if compgen -G '*' >/dev/null; then
echo 'not empty'
else
echo 'empty, but may have hidden files/dirs.'
fi
compgen -G is normally used for tab completion, but it is useful in this case as well:
Note that compgen -G does its own globbing, so you must pass it the glob (filename pattern) in quotes for it to output all matches. In this particular case, even passing an unquoted pattern up front would work, but the difference is worth nothing.
if nothing matches, compgen -G always produces no output (irrespective of the state of the nullglob option), and it indicates via its exit code whether at least 1 match was found, which is what the conditional takes advantage of (while suppressing any stdout output with >/dev/null).
# INCLUDING hidden files and folders - note the *unquoted* use of glob *
if (shopt -s dotglob; compgen -G * >/dev/null); then
echo 'not empty'
else
echo 'completely empty'
fi
compgen -G never matches hidden items (irrespective of the state of the dotglob option), so a workaround is needed to find hidden items too:
(...) creates a subshell for the conditional; that is, the commands executed in the subshell don't affect the current shell's environment, which allows us to set the dotglob option in a localized way.
shopt -s dotglob causes * to match hidden items too (except for . and ..).
compgen -G * with unquoted *, thanks to up-front expansion by the shell, is either passed at least one filename, whether hidden or not (additional filenames are ignored) or the empty string, if neither hidden nor non-hidden items exists. In the former case the exit code is 0 (signaling success and therefore a nonempty directory), in the later 1 (signaling a truly empty directory).
[1]
This answer originally falsely claimed to offer a Bash-only solution that is efficient with large directories, based on the following approach: (shopt -s nullglob dotglob; for f in "$dir"/*; do exit 0; done; exit 1).
This is NOT more efficient, because, internally, Bash still collects all matches in an array first before entering the loop - in other words: for * is not evaluated lazily.
Here is a solution based on stat command that can return number of hard links if run against a directory (or link to a directory). It starts incrementing number of hard links from 3 as first two are . and .. entries thus subtracting 2 from this number gives as actual number of entries in the given directory (this includes symlinks as well).
So putting it all together:
(( ($(stat -Lc '%h' "$dir") - 2) > 0)) && echo 'not empty' || echo 'empty'
As per man stat options used are:
%h number of hard links
-L --dereference, follow links
EDIT: To make it BSD/OSX compatible use:
(( ($(stat -Lf '%l' "$dir") - 2) > 0)) && echo 'not empty' || echo 'empty'
I have a shell script to count all my files and directories sizes using recursive function
Here's my code:
#!/bin/bash
count() {
local file
total=$2
files=(`ls $1`)
for file in "$files"
do
if [ -d "$1/$file" ]
then
count "$1/$file" $total
else
#size=`du $file | grep -o [0-9]*`
#total=$(($2 + $size))
echo "$1/$file"
fi
done
}
total=0
count . $total
echo "$total"
I have error somewhere it just goes into the first directory prints the file and stops. Where's my error? :)
This line is wrong:
for file in "$files"
It should be:
for file in "${files[#]}"
$files just expands to the first element of the array.
Note: The recursive shell-function approach is suboptimal in real life (given that there are specific utilities such as du that do the job), but needed to satisfy the OP's specific requirements.
Update: The original answer mistakenly simply counted files instead of determining the combined file size - this has been corrected.
A revised version of your code that demonstrates several advanced bash techniques; note that the function has been renamed to sumFileSizes to better reflect its purpose:
Declares local variables, including one with -i to type it as an integer
Uses a string composed of quoted and unquoted elements (wildcards) for safe globbing (pathname expansion) - "$1/"*
Uses stdout output to "return" the desired result and captures it with command substitution ($(...)) rather than trying to pass a variable "by reference" (which bash doesn't directly support).
Use of process substitution via stdin (< <(...)) to provide a command's output as input to another command.
Shows relevant shell options (set with shopt) that govern globbing (pathname expansion) behavior.
#!/bin/bash
# Recursive function to report the *combined size of all files*
# in the specified directory's *subtree*.
sumFileSizes() {
# Declare the variable as an integer (-i) to ensure
# that += assignments performs *arithmetic*.
local -i size=0
local file
# Loop over all files/subdirectories
for file in "$1/"*; do
if [[ -d $file ]]; then # item is a *directory*
# Recurse, adding to the size so far.
size+=$($FUNCNAME "$file")
else # a *file*
# Add this file's size to the size so far.
# Note: `du` reports `{size} {file}`, so we need to
# extract the 1st token, which we do with `read` and
# process substitution, followed by printing the 1st token
# and capturing the output via command substitution.
size+=$(read thisSize unused < <(du -- "$file"); printf $thisSize)
fi
done
# Output combined size.
printf $size
}
# Ensure that:
# - globs expand to *nothing* in case there are *no* matching files/directories:
# option `nullglob`
# - hidden files/directories (those whose names start with '.') are included:
# option `dotglob`
shopt -s nullglob dotglob
# Make `du` report sizes in KB (this is the default on Linux).
export BLOCKSIZE=1024
# Invoke the recursive function and capture
# its outoput.
totalSize=$(sumFileSizes .)
# Output combined size of all files
# in multiples of 1KB.
echo "$totalSize"
If you want to get the size of a directory recursively, try this:
du -sh
-h is for human readable and -s is for summarize
I have a directory with about 2000 files. How can I select a random sample of N files through using either a bash script or a list of piped commands?
Here's a script that uses GNU sort's random option:
ls |sort -R |tail -$N |while read file; do
# Something involving $file, or you can leave
# off the while to just get the filenames
done
You can use shuf (from the GNU coreutils package) for that. Just feed it a list of file names and ask it to return the first line from a random permutation:
ls dirname | shuf -n 1
# probably faster and more flexible:
find dirname -type f | shuf -n 1
# etc..
Adjust the -n, --head-count=COUNT value to return the number of wanted lines. For example to return 5 random filenames you would use:
find dirname -type f | shuf -n 5
Here are a few possibilities that don't parse the output of ls and that are 100% safe regarding files with spaces and funny symbols in their name. All of them will populate an array randf with a list of random files. This array is easily printed with printf '%s\n' "${randf[#]}" if needed.
This one will possibly output the same file several times, and N needs to be known in advance. Here I chose N=42.
a=( * )
randf=( "${a[RANDOM%${#a[#]}]"{1..42}"}" )
This feature is not very well documented.
If N is not known in advance, but you really liked the previous possibility, you can use eval. But it's evil, and you must really make sure that N doesn't come directly from user input without being thoroughly checked!
N=42
a=( * )
eval randf=( \"\${a[RANDOM%\${#a[#]}]\"\{1..$N\}\"}\" )
I personally dislike eval and hence this answer!
The same using a more straightforward method (a loop):
N=42
a=( * )
randf=()
for((i=0;i<N;++i)); do
randf+=( "${a[RANDOM%${#a[#]}]}" )
done
If you don't want to possibly have several times the same file:
N=42
a=( * )
randf=()
for((i=0;i<N && ${#a[#]};++i)); do
((j=RANDOM%${#a[#]}))
randf+=( "${a[j]}" )
a=( "${a[#]:0:j}" "${a[#]:j+1}" )
done
Note. This is a late answer to an old post, but the accepted answer links to an external page that shows terrible bash practice, and the other answer is not much better as it also parses the output of ls. A comment to the accepted answer points to an excellent answer by Lhunath which obviously shows good practice, but doesn't exactly answer the OP.
ls | shuf -n 10 # ten random files
A simple solution for selecting 5 random files while avoiding to parse ls. It also works with files containing spaces, newlines and other special characters:
shuf -ezn 5 * | xargs -0 -n1 echo
Replace echo with the command you want to execute for your files.
This is an even later response to #gniourf_gniourf's late answer, which I just upvoted because it's by far the best answer, twice over. (Once for avoiding eval and once for safe filename handling.)
But it took me a few minutes to untangle the "not very well documented" feature(s) this answer uses. If your Bash skills are solid enough that you saw immediately how it works, then skip this comment. But I didn't, and having untangled it I think it's worth explaining.
Feature #1 is the shell's own file globbing. a=(*) creates an array, $a, whose members are the files in the current directory. Bash understands all the weirdnesses of filenames, so that list is guaranteed correct, guaranteed escaped, etc. No need to worry about properly parsing textual file names returned by ls.
Feature #2 is Bash parameter expansions for arrays, one nested within another. This starts with ${#ARRAY[#]}, which expands to the length of $ARRAY.
That expansion is then used to subscript the array. The standard way to find a random number between 1 and N is to take the value of random number modulo N. We want a random number between 0 and the length of our array. Here's the approach, broken into two lines for clarity's sake:
LENGTH=${#ARRAY[#]}
RANDOM=${a[RANDOM%$LENGTH]}
But this solution does it in a single line, removing the unnecessary variable assignment.
Feature #3 is Bash brace expansion, although I have to confess I don't entirely understand it. Brace expansion is used, for instance, to generate a list of 25 files named filename1.txt, filename2.txt, etc: echo "filename"{1..25}".txt".
The expression inside the subshell above, "${a[RANDOM%${#a[#]}]"{1..42}"}", uses that trick to produce 42 separate expansions. The brace expansion places a single digit in between the ] and the }, which at first I thought was subscripting the array, but if so it would be preceded by a colon. (It would also have returned 42 consecutive items from a random spot in the array, which is not at all the same thing as returning 42 random items from the array.) I think it's just making the shell run the expansion 42 times, thereby returning 42 random items from the array. (But if someone can explain it more fully, I'd love to hear it.)
The reason N has to be hardcoded (to 42) is that brace expansion happens before variable expansion.
Finally, here's Feature #4, if you want to do this recursively for a directory hierarchy:
shopt -s globstar
a=( ** )
This turns on a shell option that causes ** to match recursively. Now your $a array contains every file in the entire hierarchy.
If you have Python installed (works with either Python 2 or Python 3):
To select one file (or line from an arbitrary command), use
ls -1 | python -c "import sys; import random; print(random.choice(sys.stdin.readlines()).rstrip())"
To select N files/lines, use (note N is at the end of the command, replace this by a number)
ls -1 | python -c "import sys; import random; print(''.join(random.sample(sys.stdin.readlines(), int(sys.argv[1]))).rstrip())" N
If you want to copy a sample of those files to another folder:
ls | shuf -n 100 | xargs -I % cp % ../samples/
make samples directory first obviously.
MacOS does not have the sort -R and shuf commands, so I needed a bash only solution that randomizes all files without duplicates and did not find that here. This solution is similar to gniourf_gniourf's solution #4, but hopefully adds better comments.
The script should be easy to modify to stop after N samples using a counter with if, or gniourf_gniourf's for loop with N. $RANDOM is limited to ~32000 files, but that should do for most cases.
#!/bin/bash
array=(*) # this is the array of files to shuffle
# echo ${array[#]}
for dummy in "${array[#]}"; do # do loop length(array) times; once for each file
length=${#array[#]}
randomi=$(( $RANDOM % $length )) # select a random index
filename=${array[$randomi]}
echo "Processing: '$filename'" # do something with the file
unset -v "array[$randomi]" # set the element at index $randomi to NULL
array=("${array[#]}") # remove NULL elements introduced by unset; copy array
done
If you have more files in your folder, you can use the below piped command I found in unix stackexchange.
find /some/dir/ -type f -print0 | xargs -0 shuf -e -n 8 -z | xargs -0 cp -vt /target/dir/
Here I wanted to copy the files, but if you want to move files or do something else, just change the last command where I have used cp.
This is the only script I can get to play nice with bash on MacOS. I combined and edited snippets from the following two links:
ls command: how can I get a recursive full-path listing, one line per file?
http://www.linuxquestions.org/questions/linux-general-1/is-there-a-bash-command-for-picking-a-random-file-678687/
#!/bin/bash
# Reads a given directory and picks a random file.
# The directory you want to use. You could use "$1" instead if you
# wanted to parametrize it.
DIR="/path/to/"
# DIR="$1"
# Internal Field Separator set to newline, so file names with
# spaces do not break our script.
IFS='
'
if [[ -d "${DIR}" ]]
then
# Runs ls on the given dir, and dumps the output into a matrix,
# it uses the new lines character as a field delimiter, as explained above.
# file_matrix=($(ls -LR "${DIR}"))
file_matrix=($(ls -R $DIR | awk '; /:$/&&f{s=$0;f=0}; /:$/&&!f{sub(/:$/,"");s=$0;f=1;next}; NF&&f{ print s"/"$0 }'))
num_files=${#file_matrix[*]}
# This is the command you want to run on a random file.
# Change "ls -l" by anything you want, it's just an example.
ls -l "${file_matrix[$((RANDOM%num_files))]}"
fi
exit 0
I use this: it uses temporary file but goes deeply in a directory until it find a regular file and return it.
# find for a quasi-random file in a directory tree:
# directory to start search from:
ROOT="/";
tmp=/tmp/mytempfile
TARGET="$ROOT"
FILE="";
n=
r=
while [ -e "$TARGET" ]; do
TARGET="$(readlink -f "${TARGET}/$FILE")" ;
if [ -d "$TARGET" ]; then
ls -1 "$TARGET" 2> /dev/null > $tmp || break;
n=$(cat $tmp | wc -l);
if [ $n != 0 ]; then
FILE=$(shuf -n 1 $tmp)
# or if you dont have/want to use shuf:
# r=$(($RANDOM % $n)) ;
# FILE=$(tail -n +$(( $r + 1 )) $tmp | head -n 1);
fi ;
else
if [ -f "$TARGET" ] ; then
rm -f $tmp
echo $TARGET
break;
else
# is not a regular file, restart:
TARGET="$ROOT"
FILE=""
fi
fi
done;
How about a Perl solution slightly doctored from Mr. Kang over here:
How can I shuffle the lines of a text file on the Unix command line or in a shell script?
$ ls | perl -MList::Util=shuffle -e '#lines = shuffle(<>); print
#lines[0..4]'