Best way to choose a random file from a directory in a shell script - bash

What is the best way to choose a random file from a directory in a shell script?
Here is my solution in Bash but I would be very interested for a more portable (non-GNU) version for use on Unix proper.
dir='some/directory'
file=`/bin/ls -1 "$dir" | sort --random-sort | head -1`
path=`readlink --canonicalize "$dir/$file"` # Converts to full path
echo "The randomly-selected file is: $path"
Anybody have any other ideas?
Edit: lhunath makes a good point about parsing ls. I guess it comes down to whether you want to be portable or not. If you have the GNU findutils and coreutils then you can do:
find "$dir" -maxdepth 1 -mindepth 1 -type f -print0 \
| sort --zero-terminated --random-sort \
| sed 's/\d000.*//g/'
Whew, that was fun! Also it matches my question better since I said "random file". Honsetly though, these days it's hard to imagine a Unix system deployed out there having GNU installed but not Perl 5.

files=(/my/dir/*)
printf "%s\n" "${files[RANDOM % ${#files[#]}]}"
And don't parse ls. Read http://mywiki.wooledge.org/ParsingLs
Edit: Good luck finding a non-bash solution that's reliable. Most will break for certain types of filenames, such as filenames with spaces or newlines or dashes (it's pretty much impossible in pure sh). To do it right without bash, you'd need to fully migrate to awk/perl/python/... without piping that output for further processing or such.

Is "shuf" not portable?
shuf -n1 -e /path/to/files/*
or find if files are deeper than one directory:
find /path/to/files/ -type f | shuf -n1
it's part of coreutils but you'll need 6.4 or newer to get it... so RH/CentOS does not include it.

# ******************************************************************
# ******************************************************************
function randomFile {
tmpFile=$(mktemp)
files=$(find . -type f > $tmpFile)
total=$(cat "$tmpFile"|wc -l)
randomNumber=$(($RANDOM%$total))
i=0
while read line; do
if [ "$i" -eq "$randomNumber" ];then
# Do stuff with file
amarok $line
break
fi
i=$[$i+1]
done < $tmpFile
rm $tmpFile
}

Something like:
let x="$RANDOM % ${#file}"
echo "The randomly-selected file is ${path[$x]}"
$RANDOM in bash is a special variable that returns a random number, then I use modulus division to get a valid index, then reference that index in the array.

This boils down to: How can I create a random number in a Unix script in a portable way?
Because if you have a random number between 1 and N, you can use head -$N | tail to cut somewhere in the middle. Unfortunately, I know no portable way to do this with the shell alone. If you have Python or Perl, you can easily use their random support but AFAIK, there is no standard rand(1) command.

I think Awk is a good tool to get a random number. According to the Advanced Bash Guide, Awk is a good random number replacement for $RANDOM.
Here's a version of your script that avoids Bash-isms and GNU tools.
#! /bin/sh
dir='some/directory'
n_files=`/bin/ls -1 "$dir" | wc -l | cut -f1`
rand_num=`awk "BEGIN{srand();print int($n_files * rand()) + 1;}"`
file=`/bin/ls -1 "$dir" | sed -ne "${rand_num}p"`
path=`cd $dir && echo "$PWD/$file"` # Converts to full path.
echo "The randomly-selected file is: $path"
It inherits the problems other answers have mentioned should files contain newlines.

Newlines in file-names can be avoided by doing the following in Bash:
#!/bin/sh
OLDIFS=$IFS
IFS=$(echo -en "\n\b")
DIR="/home/user"
for file in $(ls -1 $DIR)
do
echo $file
done
IFS=$OLDIFS

Here's a shell snippet that relies only on POSIX features and copes with arbitrary file names (but omits dot files from the selection). The random selection uses awk, because that's all you get in POSIX. It's a very poor random number generator, since awk's RNG is seeded with the current time in seconds (so it's easily predictable, and returns the same choice if you call it multiple times per second).
set -- *
n=$(echo $# | awk '{srand(); print int(rand()*$0) + 1}')
eval "file=\$$n"
echo "Processing $file"
If you don't want to ignore dot files, the file name generation code (set -- *) needs to be replaced by something more complicated.
set -- *; [ -e "$1" ] || shift
set .[!.]* "$#"; [ -e "$1" ] || shift
set ..?* "$#"; [ -e "$1" ] || shift
if [ $# -eq 0]; then echo 1>&2 "empty directory"; exit 1; fi
If you have OpenSSL available, you can use it to generate random bytes. If you don't but your system has /dev/urandom, replace the call to openssl by dd if=/dev/urandom bs=3 count=1 2>/dev/null. Here's a snippet that sets n to a random value between 1 and $#, taking care not to introduce a bias. This snippet assumes that $# is at most 2^23-1.
while
n=$(($(openssl rand 3 | od -An -t u4) + 1))
[ $n -gt $((16777216 / $# * $#)) ]
do :; done
n=$((n % $#))

BusyBox (used on embedded devices) is usually configured to support $RANDOM but it doesn't have bash-style arrays or sort --random-sort or shuf. Hence the following:
#!/bin/sh
FILES="/usr/bin/*"
for f in $FILES; do echo "$RANDOM $f" ; done | sort -n | head -n1 | cut -d' ' -f2-
Note trailing "-" in cut -f2-; this is required to avoid truncating files that contain spaces (or whatever separator you want to use).
It won't handle filenames with embedded newlines correctly.

Put each line of output from the command 'ls' into an associative array named line and then choose one of those like so...
ls | awk '{ line[NR]=$0 } END { print line[(int(rand()*NR+1))]}'

My 2 cents, with a version that should not break when filenames with special chars exist:
#!/bin/bash --
dir='some/directory'
let number_of_files=$(find "${dir}" -type f -print0 | grep -zc .)
let rand_index=$((1+(RANDOM % number_of_files)))
printf "the randomly-selected file is: "
find "${dir}" -type f -print0 | head -z -n "${rand_index}" | tail -z -n 1
printf "\n"

Related

Batch Renaming files to a sequence [duplicate]

I want to rename the files in a directory to sequential numbers. Based on creation date of the files.
For Example sadf.jpg to 0001.jpg, wrjr3.jpg to 0002.jpg and so on, the number of leading zeroes depending on the total amount of files (no need for extra zeroes if not needed).
Beauty in one line:
ls -v | cat -n | while read n f; do mv -n "$f" "$n.ext"; done
You can change .ext with .png, .jpg, etc.
Try to use a loop, let, and printf for the padding:
a=1
for i in *.jpg; do
new=$(printf "%04d.jpg" "$a") #04 pad to length of 4
mv -i -- "$i" "$new"
let a=a+1
done
using the -i flag prevents automatically overwriting existing files, and using -- prevents mv from interpreting filenames with dashes as options.
I like gauteh's solution for its simplicity, but it has an important drawback. When running on thousands of files, you can get "argument list too long" message (more on this), and second, the script can get really slow. In my case, running it on roughly 36.000 files, script moved approx. one item per second! I'm not really sure why this happens, but the rule I got from colleagues was "find is your friend".
find -name '*.jpg' | # find jpegs
gawk 'BEGIN{ a=1 }{ printf "mv %s %04d.jpg\n", $0, a++ }' | # build mv command
bash # run that command
To count items and build command, gawk was used. Note the main difference, though. By default find searches for files in current directory and its subdirectories, so be sure to limit the search on current directory only, if necessary (use man find to see how).
A very simple bash one liner that keeps the original extensions, adds leading zeros, and also works in OSX:
num=0; for i in *; do mv "$i" "$(printf '%04d' $num).${i#*.}"; ((num++)); done
Simplified version of http://ubuntuforums.org/showthread.php?t=1355021
using Pero's solution on OSX required some modification. I used:
find . -name '*.jpg' \
| awk 'BEGIN{ a=0 }{ printf "mv \"%s\" %04d.jpg\n", $0, a++ }' \
| bash
note: the backslashes are there for line continuation
edit July 20, 2015:
incorporated #klaustopher's feedback to quote the \"%s\" argument of the mv command in order to support filenames with spaces.
with "rename" command
rename -N 0001 -X 's/.*/$N/' *.jpg
or
rename -N 0001 's/.*/$N.jpg/' *.jpg
To work in all situations, put a \" for files that have space in the name
find . -name '*.jpg' | gawk 'BEGIN{ a=1 }{ printf "mv \"%s\" %04d.jpg\n", $0, a++ }' | bash
On OSX, install the rename script from Homebrew:
brew install rename
Then you can do it really ridiculously easily:
rename -e 's/.*/$N.jpg/' *.jpg
Or to add a nice prefix:
rename -e 's/.*/photo-$N.jpg/' *.jpg
NOTE The rename commands here include -n which previews the rename. To actually perform the renaming, remove the -n
If your rename doesn't support -N, you can do something like this:
ls -1 --color=never -c | xargs rename -n 's/.*/our $i; sprintf("%04d.jpg", $i++)/e'
NOTE The rename commands here includes -n which previews the rename. To actually perform the renaming, remove the -n
Edit To start with a given number, you can use the (somewhat ugly-looking) code below, just replace 123 with the number you want:
ls -1 --color=never -c | xargs rename -n 's/.*/our $i; if(!$i) { $i=123; } sprintf("%04d.jpg", $i++)/e'
This lists files in order by creation time (newest first, add -r to ls to reverse sort), then sends this list of files to rename. Rename uses perl code in the regex to format and increment counter.
However, if you're dealing with JPEG images with EXIF information, I'd recommend exiftool
This is from the exiftool documentation, under "Renaming Examples"
exiftool '-FileName<CreateDate' -d %Y%m%d_%H%M%S%%-c.%%e dir
Rename all images in "dir" according to the "CreateDate" date and time, adding a copy number with leading '-' if the file already exists ("%-c"), and
preserving the original file extension (%e). Note the extra '%' necessary to escape the filename codes (%c and %e) in the date format string.
Follow command rename all files to sequence and also lowercase extension:
rename --counter-format 000001 --lower-case --keep-extension --expr='$_ = "$N" if #EXT' *
find . | grep 'avi' | nl -nrz -w3 -v1 | while read n f; do mv "$f" "$n.avi"; done
find . will display all file in folder and subfolders.
grep 'avi' will filter all files with avi extension.
nl -nrz -w3 -v1 will display sequence number starting 001 002 etc following by file name.
while read n f; do mv "$f" "$n.avi"; done will change file name to sequence numbers.
I spent 3-4 hours developing this solution for an article on this:
https://www.cloudsavvyit.com/8254/how-to-bulk-rename-files-to-numeric-file-names-in-linux/
if [ ! -r _e -a ! -r _c ]; then echo 'pdf' > _e; echo 1 > _c ;find . -name "*.$(cat _e)" -print0 | xargs -0 -t -I{} bash -c 'mv -n "{}" $(cat _c).$(cat _e);echo $[ $(cat _c) + 1 ] > _c'; rm -f _e _c; fi
This works for any type of filename (spaces, special chars) by using correct \0 escaping by both find and xargs, and you can set a start file naming offset by increasing echo 1 to any other number if you like.
Set extension at start (pdf in example here). It will also not overwrite any existing files.
Let us assume we have these files in a directory, listed in order of creation, the first being the oldest:
a.jpg
b.JPG
c.jpeg
d.tar.gz
e
then ls -1cr outputs exactly the list above. You can then use rename:
ls -1cr | xargs rename -n 's/^[^\.]*(\..*)?$/our $i; sprintf("%03d$1", $i++)/e'
which outputs
rename(a.jpg, 000.jpg)
rename(b.JPG, 001.JPG)
rename(c.jpeg, 002.jpeg)
rename(d.tar.gz, 003.tar.gz)
Use of uninitialized value $1 in concatenation (.) or string at (eval 4) line 1.
rename(e, 004)
The warning ”use of uninitialized value […]” is displayed for files without an extension; you can ignore it.
Remove -n from the rename command to actually apply the renaming.
This answer is inspired by Luke’s answer of April 2014. It ignores Gnutt’s requirement of setting the number of leading zeroes depending on the total amount of files.
I had a similar issue and wrote a shell script for that reason. I've decided to post it regardless that many good answers were already posted because I think it can be helpful for someone. Feel free to improve it!
numerate
#Gnutt The behavior you want can be achieved by typing the following:
./numerate.sh -d <path to directory> -o modtime -L 4 -b <startnumber> -r
If the option -r is left out the reaming will be only simulated (Should be helpful for testing).
The otion L describes the length of the target number (which will be filled with leading zeros)
it is also possible to add a prefix/suffix with the options -p <prefix> -s <suffix>.
In case somebody wants the files to be sorted numerically before they get numbered, just remove the -o modtime option.
a=1
for i in *.jpg; do
mv -- "$i" "$a.jpg"
a=`expr $a + 1`
done
Again using Pero's solution with little modifying, because find will be traversing the directory tree in the order items are stored within the directory entries. This will (mostly) be consistent from run to run, on the same machine and will essentially be "file/directory creation order" if there have been no deletes.
However, in some case you need to get some logical order, say, by name, which is used in this example.
find -name '*.jpg' | sort -n | # find jpegs
gawk 'BEGIN{ a=1 }{ printf "mv %s %04d.jpg\n", $0, a++ }' | # build mv command
bash # run that command
The majority of the other solutions will overwrite existing files already named as a number. This is particularly a problem if running the script, adding more files, and then running the script again.
This script renames existing numerical files first:
#!/usr/bin/perl
use strict;
use warnings;
use File::Temp qw/tempfile/;
my $dir = $ARGV[0]
or die "Please specify directory as first argument";
opendir(my $dh, $dir) or die "can't opendir $dir: $!";
# First rename any files that are already numeric
while (my #files = grep { /^[0-9]+(\..*)?$/ } readdir($dh))
{
for my $old (#files) {
my $ext = $old =~ /(\.[^.]+)$/ ? $1 : '';
my ($fh, $new) = tempfile(DIR => $dir, SUFFIX => $ext);
close $fh;
rename "$dir/$old", $new;
}
}
rewinddir $dh;
my $i;
while (my $file = readdir($dh))
{
next if $file =~ /\A\.\.?\z/;
my $ext = $file =~ /(\.[^.]+)$/ ? $1 : '';
rename "$dir/$file", sprintf("%s/%04d%s", $dir, ++$i, $ext);
}
Sorted by time, limited to jpg, leading zeroes and a basename (in case you likely want one):
ls -t *.jpg | cat -n | \
while read n f; do mv "$f" "$(printf thumb_%04d.jpg $n)"; done
(all on one line, without the \)
Not related to creation date but numbered based on sorted names:
python3 -c \
'ext="jpg"
start_num=0
pad=4
import os,glob
files=glob.glob(f"*.{ext}")
files.sort()
renames=list(zip(files,range(start_num,len(files)+start_num)))
for r in renames:
oname=r[0]
nname=f"{r[1]:0{pad}}.{ext}"
print(oname,"->",nname)
os.rename(oname,nname)
'
This script will sort the files by creation date on Mac OS bash. I use it to mass rename videos. Just change the extension and the first part of the name.
ls -trU *.mp4| awk 'BEGIN{ a=0 }{ printf "mv %s lecture_%03d.mp4\n", $0, a++ }' | bash
ls -1tr | rename -vn 's/.*/our $i;if(!$i){$i=1;} sprintf("%04d.jpg", $i++)/e'
rename -vn - remove n for off test mode
{$i=1;} - control start number
"%04d.jpg" - control count zero 04 and set output extension .jpg
To me this combination of answers worked perfectly:
ls -v | gawk 'BEGIN{ a=1 }{ printf "mv %s %04d.jpg\n", $0, a++ }' | bash
ls -v helps with ordering 1 10 9 in correct: 1 9 10 order, avoiding filename extension problems with jpg JPG jpeg
gawk 'BEGIN{ a=1 }{ printf "mv %s %04d.jpg\n", $0, a++ }' renumbers with 4 characters and leading zeros. By avoiding mv I do not accidentally try to overwrite anything that is there already by accidentally having the same number.
bash executes
Be aware of what #xhienne said, piping unknown content to bash is a security risk. But this was not the case for me as I was using my scanned photos.
Here is what worked for me.
I Have used rename command so that if any file contains spaces in name of it then , mv command dont get confused between spaces and actual file.
Here i replaced spaces , ' ' in a file name with '_' for all jpg files
#! /bin/bash
rename 'y/ /_/' *jpg #replacing spaces with _
let x=0;
for i in *.jpg;do
let x=(x+1)
mv $i $x.jpg
done
Nowadays there is an option after you select multiple files for renaming (I have seen in thunar file manager).
select multiple files
check options
select rename
A prompt comes with all files in that particular dir
just check with the category section
Using sed :
ls -tr | sed "s/(.*)/mv '\1' \=printf('%04s',line('.').jpg)/" > rename.sh
bash rename.sh
This way you can check the script before executing it to avoid big mistakes
Here a another solution with "rename" command:
find -name 'access.log.*.gz' | sort -Vr | rename 's/(\d+)/$1+1/ge'
Pero's answer got me here :)
I wanted to rename files relative to time as the image viewers did not display images in time order.
ls -tr *.jpg | # list jpegs relative to time
gawk 'BEGIN{ a=1 }{ printf "mv %s %04d.jpg\n", $0, a++ }' | # build mv command
bash # run that command
To renumber 6000, files in one folder you could use the 'Rename' option of the ACDsee program.
For defining a prefix use this format: ####"*"
Then set the start number and press Rename and the program will rename all 6000 files with sequential numbers.

Extract date from filename using bash script

I know that similar things have been asked before, but I haven't been able to really make hand and foot out of what's been posted.
I've got a whole bunch of files that contain the date in the format YYYYMMDD at some point in the filename. Luckily this is the only 8 digit substring in all the filenames!
I will need to write the dates into another file later, but that should be fine. I'm struggling to extract the date into a variable first...
I know I can get it with grep:
for d in $( ls *.csv | grep -Po "\d{8}"; do
echo $d done
However, as I want to get the full filename into a variable too while I iterate through them, that's not an option right now.
I've tried using sed, but I don't think I know how to use it:
for f in $( ls *.csv ); do
d=$( $f | sed -e 's/^.*\(\d{8}\).*$')
echo $d
done
Thanks for pointing me in the right direction!
Loop through your csv files like this (don't parse ls):
for f in *.csv; do
echo "$f"
d=$(echo "$f" | grep -oE '[0-9]{8}')
done
I've used grep in extended mode (-E) but perl mode is equally valid.
As you have tagged with bash, you can do d=$(grep -oE '[0-9]{8}' <<<"$f" instead if you prefer. You can also use built-in regular expression support, which is slightly more verbose but saves calling an external tool:
re='[0-9]{8}'
[[ $f =~ $re ]] && d="${BASH_REMATCH[0]}"
The array BASH_REMATCH contains the matches to the regular expression. If there is a match, we assign it to d.
#!/bin/bash
# ^-- important: bash, not not /bin/sh
for f in *.csv; do # Don't use ls for iterating over filenames
[[ $f =~ [[:digit:]]{8} ]] && { # native built-in regex matching
number=${BASH_REMATCH[0]} # ...refer to the matched content...
echo "Found $number in filename $f" # ...and emit output.
}
done

Bash script to store list of files in an array with number of occurrences of each word in all files

So far, my bash script takes in two arguments...input which can be a file or a directory, and output, which is the output file. It finds all files recursively and if the input is a file it finds all occurrences of each word in all the files found and list them in the output file with the number on the left and the word on the right sorted from greatest to least. Right now it is also counting numbers as words which it shouldn't do...how can I have it only find all occurrences of valid words and no numbers? Also, in the last if statement...if the input is a directory, I am having trouble getting it to do the same thing I had it do for the file. It needs to find all files in that directory, and if there is another directory in that directory, it needs to find all files in it and so on. Then it needs to count all occurrences of each word in all files and store them to the output file just as in the case for a file. I was thinking to store them in an array, but I'm not sure if its the best way, and my syntax is off because its not working...so I would like to know how can I do this? Thanks!
#!/bin/bash
INPUT="$1"
OUTPUT="$2"
ARRAY=();
# Check that there are two arguments
if [ "$#" -ne 2 ]
then
echo "Usage: $0 {dir-name}";
exit 1
fi
# Check that INPUT is different from OUTPUT
if [ "$INPUT" = "$OUTPUT" ]
then
echo "$INPUT must be different from $OUTPUT";
fi
# Check if INPUT is a file...if so, find number of occurrences of each word
# and store in OUTPUT file sorted in greatest to least
if [ -f "$INPUT" ]
then
for name in $INPUT; do
if [ -f "$name" ]
then
xargs grep -hoP '\b\w+\b' < "$name" | sort | uniq -c | sort -n -r > "$OUTPUT"
fi
done
# If INPUT is a directory, find number of occurrences of each word
# and store in OUTPUT file sorted in greatest to least
elif [ -d "$INPUT" ]
then
find $name -type f > "${ARRAY[#]}"
for name in "${ARRAY[#]}"; do
if [ -f "$name" ]
then
xargs grep -hoP '\b\w+\b' < "$name" | sort | uniq -c | sort -n -r > "$OUTPUT"
fi
done
fi
I don't recommend you specifying the output file, because you must to more validity checking for it, e.g.
the output shouldn't exists (if you don't want allow the overwrite)
if you want allow the overwrite, if the output exists, it must be an plain file
and so on..
it is better to have a possibility to use more input directories/files as arguments
therefore is better (an it is more bash-ish) produces output to standard output and you can redirect it to file at invocation, like
bash wordcounter.sh files or directories more the one to count words > to_some_file
e.g
bash worcounter.sh some_dir >result.txt
#or
bash wordcounter.sh file1.txt file2.txt .... fileN.txt > result2.txt
#or
bash wordcounter.sh dir1 file1 dir2 file2 >result2.txt
the whole wordcounter.sh could be the next:
for arg
do
find "$arg" -type f -print0
done |xargs -0 grep -hoP '\b[[:alpha:]]+\b' |sort |uniq -c |sort -nr
where:
the find will search plain files the for all arguments
and on the the generated file-list will run the counting script
The script sill has some drawbacks, e.g. will try count words in the image-files too and like, maybe in the next question in this serie you will ask for it ;)
EDIT
If you really want two argument script e.g. script where_to_search output (what isn't very bash-like), put the above script into the function, and do whatever you want, e.g:
#!/bin/bash
wordcounter() {
for arg
do
find "$arg" -type f -print0
done |xargs -0 grep -hoP '\b[[:alpha:]]+\b' |sort |uniq -c |sort -nr
}
where="$1"
output="$2"
#do here the necessary checks
#...
#and run the function
wordcounter "$where" > "$output"
#end of script

Why isn't this BASH array building?

Why isn't this bash array populating? I believe I've done them like this in the past. Echoing ${#XECOMMAND[#]} shows no data..
DIR=$1
TEMPFILE=/tmp/dir.tmp
ls -l $DIR | tail -n +2 | sed 's/\s\+/ /g' | cut -d" " -f5,9 > $TEMPFILE
i=0
cat $TEMPFILE | while read line ;do
if [[ $(echo $line | cut -d" " -f1) == 0 ]]; then
XECOMMAND[$i]="$(echo "$line" | cut -d" " -f2)"
(( i++ ))
fi
done
When you run the while loop like
somecommand | while read ...
then the while loop is executed in sub-shell, i.e. a different process than the main script. Thus, all variable assignments that happen in the loop, will not be reflected in the main process. The workaround is to use input redirection and/or command substitution, so that the loop executes in the current process. For example if you want to read from a file you do
while read ....
do
# do stuff
done < "$filename"
or if you wan't the output of a process you can do
while read ....
do
# do stuff
done < <(some command)
Finally, in bash 4.2 and above, you can set shopt -s lastpipe, which causes the last command in the pipeline to be executed in the current process.
I think you're trying to construct an array consisting of the names of all zero-length files and directories in $DIR. If so, you can do it like this:
mapfile -t ZERO_LENGTH < <(find "$DIR" -maxdepth 1 -size 0)
(Add -type f to the find command if you're only interested in regular files.)
This sort of solution is almost always better than trying to parse ls output.
The use of process substitution (< <(...)) rather than piping (... |) is important, because it means that the shell variable will be set in the current shell, not in an ephimeral subshell.

Renaming files in a folder to sequential numbers

I want to rename the files in a directory to sequential numbers. Based on creation date of the files.
For Example sadf.jpg to 0001.jpg, wrjr3.jpg to 0002.jpg and so on, the number of leading zeroes depending on the total amount of files (no need for extra zeroes if not needed).
Beauty in one line:
ls -v | cat -n | while read n f; do mv -n "$f" "$n.ext"; done
You can change .ext with .png, .jpg, etc.
Try to use a loop, let, and printf for the padding:
a=1
for i in *.jpg; do
new=$(printf "%04d.jpg" "$a") #04 pad to length of 4
mv -i -- "$i" "$new"
let a=a+1
done
using the -i flag prevents automatically overwriting existing files, and using -- prevents mv from interpreting filenames with dashes as options.
I like gauteh's solution for its simplicity, but it has an important drawback. When running on thousands of files, you can get "argument list too long" message (more on this), and second, the script can get really slow. In my case, running it on roughly 36.000 files, script moved approx. one item per second! I'm not really sure why this happens, but the rule I got from colleagues was "find is your friend".
find -name '*.jpg' | # find jpegs
gawk 'BEGIN{ a=1 }{ printf "mv %s %04d.jpg\n", $0, a++ }' | # build mv command
bash # run that command
To count items and build command, gawk was used. Note the main difference, though. By default find searches for files in current directory and its subdirectories, so be sure to limit the search on current directory only, if necessary (use man find to see how).
A very simple bash one liner that keeps the original extensions, adds leading zeros, and also works in OSX:
num=0; for i in *; do mv "$i" "$(printf '%04d' $num).${i#*.}"; ((num++)); done
Simplified version of http://ubuntuforums.org/showthread.php?t=1355021
using Pero's solution on OSX required some modification. I used:
find . -name '*.jpg' \
| awk 'BEGIN{ a=0 }{ printf "mv \"%s\" %04d.jpg\n", $0, a++ }' \
| bash
note: the backslashes are there for line continuation
edit July 20, 2015:
incorporated #klaustopher's feedback to quote the \"%s\" argument of the mv command in order to support filenames with spaces.
with "rename" command
rename -N 0001 -X 's/.*/$N/' *.jpg
or
rename -N 0001 's/.*/$N.jpg/' *.jpg
To work in all situations, put a \" for files that have space in the name
find . -name '*.jpg' | gawk 'BEGIN{ a=1 }{ printf "mv \"%s\" %04d.jpg\n", $0, a++ }' | bash
On OSX, install the rename script from Homebrew:
brew install rename
Then you can do it really ridiculously easily:
rename -e 's/.*/$N.jpg/' *.jpg
Or to add a nice prefix:
rename -e 's/.*/photo-$N.jpg/' *.jpg
NOTE The rename commands here include -n which previews the rename. To actually perform the renaming, remove the -n
If your rename doesn't support -N, you can do something like this:
ls -1 --color=never -c | xargs rename -n 's/.*/our $i; sprintf("%04d.jpg", $i++)/e'
NOTE The rename commands here includes -n which previews the rename. To actually perform the renaming, remove the -n
Edit To start with a given number, you can use the (somewhat ugly-looking) code below, just replace 123 with the number you want:
ls -1 --color=never -c | xargs rename -n 's/.*/our $i; if(!$i) { $i=123; } sprintf("%04d.jpg", $i++)/e'
This lists files in order by creation time (newest first, add -r to ls to reverse sort), then sends this list of files to rename. Rename uses perl code in the regex to format and increment counter.
However, if you're dealing with JPEG images with EXIF information, I'd recommend exiftool
This is from the exiftool documentation, under "Renaming Examples"
exiftool '-FileName<CreateDate' -d %Y%m%d_%H%M%S%%-c.%%e dir
Rename all images in "dir" according to the "CreateDate" date and time, adding a copy number with leading '-' if the file already exists ("%-c"), and
preserving the original file extension (%e). Note the extra '%' necessary to escape the filename codes (%c and %e) in the date format string.
Follow command rename all files to sequence and also lowercase extension:
rename --counter-format 000001 --lower-case --keep-extension --expr='$_ = "$N" if #EXT' *
find . | grep 'avi' | nl -nrz -w3 -v1 | while read n f; do mv "$f" "$n.avi"; done
find . will display all file in folder and subfolders.
grep 'avi' will filter all files with avi extension.
nl -nrz -w3 -v1 will display sequence number starting 001 002 etc following by file name.
while read n f; do mv "$f" "$n.avi"; done will change file name to sequence numbers.
I spent 3-4 hours developing this solution for an article on this:
https://www.cloudsavvyit.com/8254/how-to-bulk-rename-files-to-numeric-file-names-in-linux/
if [ ! -r _e -a ! -r _c ]; then echo 'pdf' > _e; echo 1 > _c ;find . -name "*.$(cat _e)" -print0 | xargs -0 -t -I{} bash -c 'mv -n "{}" $(cat _c).$(cat _e);echo $[ $(cat _c) + 1 ] > _c'; rm -f _e _c; fi
This works for any type of filename (spaces, special chars) by using correct \0 escaping by both find and xargs, and you can set a start file naming offset by increasing echo 1 to any other number if you like.
Set extension at start (pdf in example here). It will also not overwrite any existing files.
Let us assume we have these files in a directory, listed in order of creation, the first being the oldest:
a.jpg
b.JPG
c.jpeg
d.tar.gz
e
then ls -1cr outputs exactly the list above. You can then use rename:
ls -1cr | xargs rename -n 's/^[^\.]*(\..*)?$/our $i; sprintf("%03d$1", $i++)/e'
which outputs
rename(a.jpg, 000.jpg)
rename(b.JPG, 001.JPG)
rename(c.jpeg, 002.jpeg)
rename(d.tar.gz, 003.tar.gz)
Use of uninitialized value $1 in concatenation (.) or string at (eval 4) line 1.
rename(e, 004)
The warning ”use of uninitialized value […]” is displayed for files without an extension; you can ignore it.
Remove -n from the rename command to actually apply the renaming.
This answer is inspired by Luke’s answer of April 2014. It ignores Gnutt’s requirement of setting the number of leading zeroes depending on the total amount of files.
I had a similar issue and wrote a shell script for that reason. I've decided to post it regardless that many good answers were already posted because I think it can be helpful for someone. Feel free to improve it!
numerate
#Gnutt The behavior you want can be achieved by typing the following:
./numerate.sh -d <path to directory> -o modtime -L 4 -b <startnumber> -r
If the option -r is left out the reaming will be only simulated (Should be helpful for testing).
The otion L describes the length of the target number (which will be filled with leading zeros)
it is also possible to add a prefix/suffix with the options -p <prefix> -s <suffix>.
In case somebody wants the files to be sorted numerically before they get numbered, just remove the -o modtime option.
a=1
for i in *.jpg; do
mv -- "$i" "$a.jpg"
a=`expr $a + 1`
done
Again using Pero's solution with little modifying, because find will be traversing the directory tree in the order items are stored within the directory entries. This will (mostly) be consistent from run to run, on the same machine and will essentially be "file/directory creation order" if there have been no deletes.
However, in some case you need to get some logical order, say, by name, which is used in this example.
find -name '*.jpg' | sort -n | # find jpegs
gawk 'BEGIN{ a=1 }{ printf "mv %s %04d.jpg\n", $0, a++ }' | # build mv command
bash # run that command
The majority of the other solutions will overwrite existing files already named as a number. This is particularly a problem if running the script, adding more files, and then running the script again.
This script renames existing numerical files first:
#!/usr/bin/perl
use strict;
use warnings;
use File::Temp qw/tempfile/;
my $dir = $ARGV[0]
or die "Please specify directory as first argument";
opendir(my $dh, $dir) or die "can't opendir $dir: $!";
# First rename any files that are already numeric
while (my #files = grep { /^[0-9]+(\..*)?$/ } readdir($dh))
{
for my $old (#files) {
my $ext = $old =~ /(\.[^.]+)$/ ? $1 : '';
my ($fh, $new) = tempfile(DIR => $dir, SUFFIX => $ext);
close $fh;
rename "$dir/$old", $new;
}
}
rewinddir $dh;
my $i;
while (my $file = readdir($dh))
{
next if $file =~ /\A\.\.?\z/;
my $ext = $file =~ /(\.[^.]+)$/ ? $1 : '';
rename "$dir/$file", sprintf("%s/%04d%s", $dir, ++$i, $ext);
}
Sorted by time, limited to jpg, leading zeroes and a basename (in case you likely want one):
ls -t *.jpg | cat -n | \
while read n f; do mv "$f" "$(printf thumb_%04d.jpg $n)"; done
(all on one line, without the \)
Not related to creation date but numbered based on sorted names:
python3 -c \
'ext="jpg"
start_num=0
pad=4
import os,glob
files=glob.glob(f"*.{ext}")
files.sort()
renames=list(zip(files,range(start_num,len(files)+start_num)))
for r in renames:
oname=r[0]
nname=f"{r[1]:0{pad}}.{ext}"
print(oname,"->",nname)
os.rename(oname,nname)
'
This script will sort the files by creation date on Mac OS bash. I use it to mass rename videos. Just change the extension and the first part of the name.
ls -trU *.mp4| awk 'BEGIN{ a=0 }{ printf "mv %s lecture_%03d.mp4\n", $0, a++ }' | bash
ls -1tr | rename -vn 's/.*/our $i;if(!$i){$i=1;} sprintf("%04d.jpg", $i++)/e'
rename -vn - remove n for off test mode
{$i=1;} - control start number
"%04d.jpg" - control count zero 04 and set output extension .jpg
To me this combination of answers worked perfectly:
ls -v | gawk 'BEGIN{ a=1 }{ printf "mv %s %04d.jpg\n", $0, a++ }' | bash
ls -v helps with ordering 1 10 9 in correct: 1 9 10 order, avoiding filename extension problems with jpg JPG jpeg
gawk 'BEGIN{ a=1 }{ printf "mv %s %04d.jpg\n", $0, a++ }' renumbers with 4 characters and leading zeros. By avoiding mv I do not accidentally try to overwrite anything that is there already by accidentally having the same number.
bash executes
Be aware of what #xhienne said, piping unknown content to bash is a security risk. But this was not the case for me as I was using my scanned photos.
Here is what worked for me.
I Have used rename command so that if any file contains spaces in name of it then , mv command dont get confused between spaces and actual file.
Here i replaced spaces , ' ' in a file name with '_' for all jpg files
#! /bin/bash
rename 'y/ /_/' *jpg #replacing spaces with _
let x=0;
for i in *.jpg;do
let x=(x+1)
mv $i $x.jpg
done
Nowadays there is an option after you select multiple files for renaming (I have seen in thunar file manager).
select multiple files
check options
select rename
A prompt comes with all files in that particular dir
just check with the category section
Using sed :
ls -tr | sed "s/(.*)/mv '\1' \=printf('%04s',line('.').jpg)/" > rename.sh
bash rename.sh
This way you can check the script before executing it to avoid big mistakes
Here a another solution with "rename" command:
find -name 'access.log.*.gz' | sort -Vr | rename 's/(\d+)/$1+1/ge'
Pero's answer got me here :)
I wanted to rename files relative to time as the image viewers did not display images in time order.
ls -tr *.jpg | # list jpegs relative to time
gawk 'BEGIN{ a=1 }{ printf "mv %s %04d.jpg\n", $0, a++ }' | # build mv command
bash # run that command
To renumber 6000, files in one folder you could use the 'Rename' option of the ACDsee program.
For defining a prefix use this format: ####"*"
Then set the start number and press Rename and the program will rename all 6000 files with sequential numbers.

Resources