How to zero pad numbers in file names in Bash? - bash

What is the best way, using Bash, to rename files in the form:
(foo1, foo2, ..., foo1300, ..., fooN)
With zero-padded file names:
(foo00001, foo00002, ..., foo01300, ..., fooN)

It's not pure bash, but much easier with the Perl version of rename:
rename 's/\d+/sprintf("%05d",$&)/e' foo*
Where 's/\d+/sprintf("%05d",$&)/e' is the Perl replace regular expression.
\d+ will match the first set of numbers (at least one number)
sprintf("%05d",$&) will pass the matched numbers to Perl's sprintf, and %05d will pad to five digits

In case N is not a priori fixed:
for f in foo[0-9]*; do
mv "$f" "$(printf 'foo%05d' "${f#foo}")"
done

I had a more complex case where the file names had a postfix as well as a prefix. I also needed to perform a subtraction on the number from the filename.
For example, I wanted foo56.png to become foo00000055.png.
I hope this helps if you're doing something more complex.
#!/bin/bash
prefix="foo"
postfix=".png"
targetDir="../newframes"
paddingLength=8
for file in ${prefix}[0-9]*${postfix}; do
# strip the prefix off the file name
postfile=${file#$prefix}
# strip the postfix off the file name
number=${postfile%$postfix}
# subtract 1 from the resulting number
i=$((number-1))
# copy to a new name with padded zeros in a new folder
cp ${file} "$targetDir"/$(printf $prefix%0${paddingLength}d$postfix $i)
done

Pure Bash, no external processes other than 'mv':
for file in foo*; do
newnumber='00000'${file#foo} # get number, pack with zeros
newnumber=${newnumber:(-5)} # the last five characters
mv $file foo$newnumber # rename
done

The oneline command that I use is this:
ls * | cat -n | while read i f; do mv "$f" `printf "PATTERN" "$i"`; done
PATTERN can be for example:
rename with increment counter: %04d.${f#*.} (keep original file extension)
rename with increment counter with prefix: photo_%04d.${f#*.} (keep original extension)
rename with increment counter and change extension to jpg: %04d.jpg
rename with increment counter with prefix and file basename: photo_$(basename $f .${f#*.})_%04d.${f#*.}
...
You can filter the file to rename with for example ls *.jpg | ...
You have available the variable f that is the file name and i that is the counter.
For your question the right command is:
ls * | cat -n | while read i f; do mv "$f" `printf "foo%d05" "$i"`; done

To left-pad numbers in filenames:
$ ls -l
total 0
-rw-r--r-- 1 victoria victoria 0 Mar 28 17:24 010
-rw-r--r-- 1 victoria victoria 0 Mar 28 18:09 050
-rw-r--r-- 1 victoria victoria 0 Mar 28 17:23 050.zzz
-rw-r--r-- 1 victoria victoria 0 Mar 28 17:24 10
-rw-r--r-- 1 victoria victoria 0 Mar 28 17:23 1.zzz
$ for f in [0-9]*.[a-z]*; do tmp=`echo $f | awk -F. '{printf "%04d.%s\n", $1, $2}'`; mv "$f" "$tmp"; done;
$ ls -l
total 0
-rw-r--r-- 1 victoria victoria 0 Mar 28 17:23 0001.zzz
-rw-r--r-- 1 victoria victoria 0 Mar 28 17:23 0050.zzz
-rw-r--r-- 1 victoria victoria 0 Mar 28 17:24 010
-rw-r--r-- 1 victoria victoria 0 Mar 28 18:09 050
-rw-r--r-- 1 victoria victoria 0 Mar 28 17:24 10
Explanation
for f in [0-9]*.[a-z]*; do tmp=`echo $f | \
awk -F. '{printf "%04d.%s\n", $1, $2}'`; mv "$f" "$tmp"; done;
note the backticks: `echo ... $2}\` (The backslash, \, immediately above just splits that one-liner over two lines for readability)
in a loop find files that are named as numbers with lowercase alphabet extensions: [0-9]*.[a-z]*
echo that filename ($f) to pass it to awk
-F. : awk field separator, a period (.): if matched, separates the file names as two fields ($1 = number; $2 = extension)
format with printf: print first field ($1, the number part) as 4 digits (%04d), then print the period, then print the second field ($2: the extension) as a string (%s). All of that is assigned to the $tmp variable
lastly, move the source file ($f) to the new filename ($tmp)

The following will do it:
for ((i=1; i<=N; i++)) ; do mv foo$i `printf foo%05d $i` ; done
EDIT: changed to use ((i=1,...)), thanks mweerden!

My solution replaces numbers, everywhere in a string
for f in * ; do
number=`echo $f | sed 's/[^0-9]*//g'`
padded=`printf "%04d" $number`
echo $f | sed "s/${number}/${padded}/";
done
You can easily try it, since it just prints transformed file names (no filesystem operations are performed).
Explanation:
Looping through list of files
A loop: for f in * ; do ;done, lists all files and passes each filename as $f variable to loop body.
Grabbing the number from string
With echo $f | sed we pipe variable $f to sed program.
In command sed 's/[^0-9]*//g', part [^0-9]* with modifier ^ tells to match opposite from digit 0-9 (not a number) and then remove it it with empty replacement //. Why not just remove [a-z]? Because filename can contain dots, dashes etc. So, we strip everything, that is not a number and get a number.
Next, we assign the result to number variable. Remember to not put spaces in assignment, like number = …, because you get different behavior.
We assign execution result of a command to variable, wrapping the command with backtick symbols `.
Zero padding
Command printf "%04d" $number changes format of a number to 4 digits and adds zeros if our number contains less than 4 digits.
Replacing number to zero-padded number
We use sed again with replacement command like s/substring/replacement/. To interpret our variables, we use double quotes and substitute our variables in this way ${number}.
The script above just prints transformed names, so, let's do actual renaming job:
for f in *.js ; do
number=`echo $f | sed 's/[^0-9]*//g'`
padded=`printf "%04d" $number`
new_name=`echo $f | sed "s/${number}/${padded}/"`
mv $f $new_name;
done
Hope this helps someone.
I spent several hours to figure this out.

This answer is derived from Chris Conway's accepted answer but assumes your files have an extension (unlike Chris' answer). Just paste this (rather long) one liner into your command line.
for f in foo[0-9]*; do mv "$f" "$(printf 'foo%05d' "${f#foo}" 2> /dev/null)"; done; for f in foo[0-9]*; do mv "$f" "$f.ext"; done;
OPTIONAL ADDITIONAL INFO
This script will rename
foo1.ext > foo00001.ext
foo2.ext > foo00002.ext
foo1300.ext > foo01300.ext
To test it on your machine, just paste this one liner into an EMPTY directory.
rm * 2> /dev/null; touch foo1.ext foo2.ext foo1300.ext; for f in foo[0-9]*; do mv "$f" "$(printf 'foo%05d' "${f#foo}" 2> /dev/null)"; done; for f in foo[0-9]*; do mv "$f" "$f.ext"; done;
This deletes the content of the directory, creates the files in the above example and then does the batch rename.
For those who don't need a one liner, the script indented looks like this.
for f in foo[0-9]*;
do mv "$f" "$(printf 'foo%05d' "${f#foo}" 2> /dev/null)";
done;
for f in foo[0-9]*;
do mv "$f" "$f.ext";
done;

Here's a quick solution that assumes a fixed length prefix (your "foo") and fixed length padding. If you need more flexibility, maybe this will at least be a helpful starting point.
#!/bin/bash
# some test data
files="foo1
foo2
foo100
foo200
foo9999"
for f in $files; do
prefix=`echo "$f" | cut -c 1-3` # chars 1-3 = "foo"
number=`echo "$f" | cut -c 4-` # chars 4-end = the number
printf "%s%04d\n" "$prefix" "$number"
done

Related

sort by name on bash same as graphical on windows

I have this folder in windows
if I do a simple ls , find, either in bash (cygwin) or msdos, it shows me like this.
$ ls -1
su-01-01.jpg
su-01-02-03.jpg
su-01-12-13.jpg
su-01-14.jpg
su-01-15.jpg
su-01-16.jpg
su-01-18.jpg
su-01-19.jpg
su-01-20.jpg
su-01-21.jpg
su-01-31.jpg
su-01-34.jpg
su-01-35.jpg
su-01-38.jpg
su-01-39.jpg
su-01-42-43.jpg
su-01-44.jpg
su-01-45.jpg
su-01-47.jpg
su-01-48.jpg
su01-00.jpg
su01-04.jpg
su01-05.jpg
su01-06.jpg
su01-07.jpg
su01-08.jpg
I have tried ordering and it does not take into account 0 00 1
$ ls -1 |sort -V
su01-00.jpg
su01-04.jpg
su01-05.jpg
su01-06.jpg
su01-07.jpg
su01-08.jpg
su01-09.jpg
su01-10.jpg
su01-11.jpg
su01-22-23.jpg
su01-24.jpg
su01-25.jpg
su01-26.jpg
su01-27.jpg
su01-28-29.jpg
su01-30.jpg
su01-32.jpg
su01-33.jpg
su01-40-41.jpg
su-01-01.jpg
su-01-02-03.jpg
su-01-12-13.jpg
su-01-14.jpg
su-01-15.jpg
but how do I make it ignore the (-)?
thank you very much for your help
find doesn't guaranty alphabetical ordering; ls and sort do, but the char - value is 45 while the 0 char value is 48, so su- will come ahead of the su0 in an alphabetical sorting.
While a printf '%s\n' su* | LANG=en_US.utf8 sort -n seems to display the files the way you want, the best thing to do for making your life easier would be to rename some of the files:
#!/bin/bash
for f in su0*
do
mv "$f" "su-0${f#su0}"
done
Update
renaming the files to 001.jpg 002.jpg ...
#!/bin/bash
shopt -s nullglob
n=1
while IFS='' read -r file
do
printf -v newname '%03d.%s' "$((n++))" "${file##*.}"
printf '%q %q %q\n' mv "$file" "$newname"
done < <(
printf '%s\n' su* |
sed -nE 's,su-?([^/]*)$,\1/&,p' |
LANG=C sort -nt '-' |
sed 's,[^/]*/,,'
)
The simplest way to control the sort order in Bash, both for ls and sort, so to set your LANG variable to the locale you want.
In your .bashrc or .profile, add
export LANG=en_US.utf8
and then
ls -1
or
ls -1 | sort
will output the order you're looking for.
If you want to test with different locales and see their effect, your can set LANG one command at a time. For example, compare the output of these commands:
LANG=en_US.utf8 ls -1 # what you're looking for
LANG=C ls -1 # "ASCIIbetic" order
LANG=fr_FR.utf8 ls -1 # would consider é as between e and f

Find groups of files that end with the same 17 characters

I'm grabbing files that have a unique and common pattern. I'm trying to match on the common. Currently trying with bash. I can use python or whatever.
file1_02_01_2021_002244.mp4
file2_02_01_2021_002244.mp4
file3_02_01_2021_002244.mp4
# _02_01_2021_002244.mp4 should be the 'match all files that contain this string'
file1_03_01_2021_092200.mp4
file2_03_01_2021_092200.mp4
file3_03_01_2021_092200.mp4
# _03_01_2021_092200.mp4 is the match
...
file201_01_01_2022_112230.mp4
file202_01_01_2022_112230.mp4
file203_01_01_2022_112230.mp4
# _01_01_2022_112230.mp4 is the match
the goal is to find all that are matching from the very end of the file back to the first uniq character, then move them into a folder. The actionable part will be easy. I just need help with the matching.
find -type f $("all that match the same last 17 characters of the file name"); do
do things
done
this is my example directory:
total 28480
drwxr-xr-x 2 user user 64B Feb 24 10:49 dir1
drwxr-xr-x 2 user user 64B Feb 24 10:49 dir2
-rw-r--r-- 2 user user 6.8M Feb 24 08:59 file1_02_01_2021_002244.mp4
-rw-r--r-- 2 user user 468K Feb 24 09:06 file1_03_01_2021_092200.mp4
-rw-r--r-- 2 user user 4.5M Feb 24 08:59 file2_02_01_2021_002244.mp4
-rw-r--r-- 2 user user 665K Feb 24 09:06 file2_03_01_2021_092200.mp4
-rw-r--r-- 1 user user 0B Feb 24 10:49 otherfile1
-rw-r--r-- 1 user user 0B Feb 24 10:49 otherfile2
I've got it to work with suggestions from the answer marked as correct. They python method probably could work better (especially with the file names that have spaces in them) but I'm not proficient with python enough to make it do everything I want. the script in full is found below:
#!/usr/local/bin/bash
# this is my solution
# create array with patterns
aPATTERN=($(find . -type f -name "*.mp4" | sed 's/^[^_]*//'|sort -u ))
# itterate through all patterns, do things
for each in ${aPATTERN[#]}; do
# create a temp working directory for files that match the pattern
vDIR=`gmktemp -d -p $(pwd)`
# create array of all files found matching the pattern
aFIND+=(`find . -mindepth 1 -maxdepth 1 -type f -iname \*$each`)
# move all files that match the match to the working temp directory
for file in ${aFIND[#]}; do
mv -iv $file $vDIR
done
# reset the found files array, get ready for next pattern
aFIND=()
done
In python:
import os
os.chdir("folder_path")
data = {}
data = [[file[-22:], file] for file in os.listdir()]
output = {}
for pattern, filename in data:
output.setdefault(pattern, []).append(filename)
print(output)
This will create a dict associating each file with the corresponding pattern.
Output:
{
'_03_01_2021_092200.mp4': ['file1_03_01_2021_092200.mp4', 'file3_03_01_2021_092200.mp4', 'file2_03_01_2021_092200.mp4'],
'_01_01_2022_112230.mp4': ['file202_01_01_2022_112230.mp4', 'file201_01_01_2022_112230.mp4', 'file203_01_01_2022_112230.mp4'],
'_02_01_2021_002244.mp4': ['file1_02_01_2021_002244.mp4', 'file2_02_01_2021_002244.mp4', 'file3_02_01_2021_002244.mp4']
}
Try to play with this
first get all pattern sorted and uniq
find ./data -type f -name "*.mp4" | sed 's/^[^_]*//'|sort -u
or with regex
find ./data -type f -regextype sed -regex '.*_[0-9]\{2\}_[0-9]\{2\}_[0-9]\{4\}_[0-9]\{6\}\.mp4$'| sed 's/^[^_]*//'|sort -u
then iterate the the pattern via while loop to find files for every pattern
while read pattern
do
# find and exec
find ./data -type f -name "*$pattern" -exec mv {} /to/whatever/you/want/ \;
#or find and xargs
find ./data -type f -name "*$pattern" | xargs -I {} mv {} /to/whaterver/you/want/
done < <(find ./data -type f -name "*.mp4" | sed 's/^[^_]*//'|sort -u)
There are several ways to approach this, including writing a bash script, but if it were me, I'd take the quick and easy road. Use grep and read:
PATTERN=_02_01_2021_002244.mp4
find . -name '*.mp4' | grep $PATTERN; while read -t 1 A; do echo $A; done
There are probably better ways that I haven't thought of but this gets the job done.
Try this:
#!/bin/bash
while IFS= read -r line
do
if [[ "$line" == *_+([0-9])_+([0-9])_+([0-9])_+([0-9])\.mp4 ]]
then
echo "MATCH: $line"
else
echo "no match: $line"
fi
done < <(/bin/ls -c1)
Remember that is uses globbing, not regex when you build your pattern.
That is why I did not use [0-9]{2} to match 2 digits, {} does not do that in globbing, like it does in regex.
To use regex, use:
#!/bin/bash
while IFS= read -r line
do
if [[ $(echo "$line" | grep -cE '*_[0-9]{2}_[0-9]{2}_[0-9]{4}_[0-9]{6}\.mp4') -ne 0 ]]
then
echo "MATCH: $line"
else
echo "no match: $line"
fi
done < <(/bin/ls -c1)
This is a more precise match since you can specify how many digits to accept in each sub-pattern.

Correct way of quoting command substitution

I have simple bash script which only outputs the filenames that are given to the script as positional arguments:
#!/usr/bin/env bash
for file; do
echo "$file"
done
Say I have files with spaces (say "f 1" and "f 2"). I can call the script with a wildcard and get the expected output:
$ ./script f*
> f 1
> f 2
But if I use command substitution it doesn't work:
$ ./script $(echo f*)
> f
> 1
> f
> 2
How can I get the quoting right when my command substition outputs multiple filenames with spaces?
Edit: What I ultimatively want is to pass filenames to a script (that is slightly more elaborate than just echoing their names) in a random order, e.g. something like that:
./script $(ls f* | shuf)
With GNU shuf and Bash 4.3+:
readarray -d '' files < <(shuf --zero-terminated --echo f*)
./script "${files[#]}"
where the --zero-terminated can handle any filenames, and readarray also uses the null byte as the delimiter.
With older Bash where readarray doesn't support the -d option:
while IFS= read -r -d '' f; do
files+=("$f")
done < <(shuf --zero-terminated --echo f*)
./script "${files[#]}"
In extreme cases with many files, this might run into command line length limitations; in that case,
shuf --zero-terminated --echo f*
could be replaced by
printf '%s\0' f* | shuf --zero-terminated
Hat tip to Socowi for pointing out --echo.
It's very difficult to get this completely correct. A simple attempt would be to use %q specifier to printf, but I believe that is a bashism. You still need to use eval, though. eg:
$ cat a.sh
#!/bin/sh
for x; do echo $((i++)): "$x"; done
$ ./a.sh *
0: a.sh
1: name
with
newlines
2: name with spaces
$ eval ./a.sh $(printf "%q " *)
0: a.sh
1: name
with
newlines
2: name with spaces
This feels like an XY Problem. Maybe you should explain the real problem, someone might have a much better solution.
Nonetheless, working with what you posted, I'd say read this page on why you shouldn't try to parse ls as it has relevant points; then I suggest an array.
lst=(f*)
./script "${lst[#]}"
This will still fail if you reparse it as the output of a subshell, though -
./script $( echo "${lst[#]}" ) # still fails same way
./script "$( echo "${lst[#]}" )" # *still* fails same way
Thinking about how we could make it work...
You can use xargs:
$ ls -l
total 4
-rw-r--r-- 1 root root 0 2021-08-13 00:23 ' file 1'
-rw-r--r-- 1 root root 0 2021-08-13 00:23 ' file 2'
-rw-r--r-- 1 root root 0 2021-08-13 00:23 ' file 3'
-rw-r--r-- 1 root root 0 2021-08-13 00:23 ' file 4'
-rwxr-xr-x 1 root root 35 2021-08-13 00:25 script
$ ./script *file*
file 1
file 2
file 3
file 4
$ ls *file* | shuf | xargs -d '\n' ./script
file 4
file 2
file 1
file 3
If your xargs does not support -d:
$ ls *file* | shuf | tr '\n' '\0' | xargs -0 ./script
file 3
file 1
file 4
file 2

Increment of multiple file prefixes?

I am looking for a way in Bash to rename my file prefixes.
These files are all in one folder. No other files will be in it.
00 - Artist - Title.mp3
01 - Artist - Title.mp3
... and so on
to
01 - Artist - Title.mp3
02 - Artist - Title.mp3
... and so on
The prefix can also be only a single (0, 1, 2, ...), double(00, 01, 02, ...), triple, ... prefixes.
Perl solution:
perl -we 'for (#ARGV) {
my ($n, $r) = /^([0-9]+)(.*)/;
rename $_, sprintf("%0" . length($n) . "d", 1 + $n) . $r;
}' *.mp3
The regular expression match extracts the number to $n and the rest to $r.
$n + 1 is then formatted by sprintf to be zero padded, having the same length as the original number.
Note that it changes the length of the number for 9, 99, etc.
It's risky business, but it here's a sh solution that seems to work:
ls *.mp3 | sort -rn | while read f
do
number=`echo "$f" | sed 's/ .*//'`
rest=`echo "$f" | sed 's/^[^ ]* //'`
number2=`expr $number + 1`
number2f=`printf %02d $number2`
mv -i "$number $rest" "$number2f $rest"
done
sort -rn so that it won't try to overwrite anything if there are adjacently-numbered files with the same artist and title (which probably won't happen, although it does if I take your example literally).
mv -i so it will ask you before it overwrites anything if there are any of those cases that manage to come up anyway.
If you have a cleaner way you like to break things like $f up into $number and $rest, be my guest.

Bash: test mutual equality of multiple variables?

What is the right way to test if several variables are all equal?
if [[ $var1 = $var2 = $var3 ]] # syntax error
Is it necessary to write something like the following?
if [[ $var1 = $var2 && $var1 = $var3 && $var2 = $var3 ]] # cumbersome
if [[ $var1 = $var2 && $var2 = $var3 && $var3 = $var4 ]] # somewhat better
Unfortunately, the otherwise excellent Advanced Bash Scripting Guide and other online sources I could find don't provide such an example.
My particular motivation is to test if several directories all have the same number of files, using ls -1 $dir | wc -l to count files.
Note
"var1" etc. are example variables. I'm looking for a solution for arbitrary variable names, not just those with a predictable numeric ending.
Update
I've accepted Richo's answer, as it is the most general. However, I'm actually using Kyle's because it's the simplest and my inputs are guaranteed to avoid the caveat.
Thanks for the suggestions, everyone.
if you want to test equality of an arbitrary number of items (let's call them $item1-5, but they could be an array
st=0
for i in $item2 $item3 $item4 $item5; do
[ "$item1" = "$i" ]
st=$(( $? + st ))
done
if [ $st -eq 0 ]; then
echo "They were all the same"
fi
If they are single words you can get really cheap about it.
varUniqCount=`echo "${var1} ${var2} ${var3} ${var4}" | sort -u | wc -l`
if [ ${varUniqCount} -gt 1 ]; then
echo "Do not match"
fi
Transitive method of inspection.
#!/bin/bash
var1=10
var2=10
var3=10
if [[ ($var1 == $var2) && ($var2 == $var3) ]]; then
echo "yay"
else
echo "nay"
fi
Output:
[jaypal:~/Temp] ./s.sh
yay
Note:
Since you have stated in your question that your objective is to test several directories that have same number of files, I thought of the following solution. I know this isn't something you had request so please feel free to disregard it.
Step1:
Identify number of files in a given directory. This command will look inside sub-dirs too but that can be controlled using -depth option of find.
[jaypal:~/Temp] find . -type d -exec sh -c "printf {} && ls -1 {} | wc -l " \;
. 9
./Backup 7
./bash 2
./GTP 22
./GTP/ParserDump 11
./GTP/ParserDump/ParserDump 1
./perl 7
./perl/p1 2
./python 1
./ruby 0
./scripts 22
Step2:
This can be combined with Step1 as we are just redirecting the content to a file.
[jaypal:~/Temp] find . -type d -exec sh -c "printf {} && ls -1 {} | wc -l " \; > file.temp
Step3:
Using the following command we will look in the file.temp twice and it will give us a list of directories that have same number of files.
[jaypal:~/Temp] awk 'NR==FNR && a[$2]++ {b[$2];next} ($2 in b)' file.temp file.temp | sort -k2
./GTP/ParserDump/ParserDump 1
./python 1
./bash 2
./perl/p1 2
./Backup 7
./perl 7
./GTP 22
./scripts 22
(edited to include delimiters to fix the problem noted by Keith Thompson)
Treating the variable values as strings, you can concatenate them along with a suitable delimiter and do one comparison:
if [[ "$var1|$var2|$var3" = "$var1|$var1|$var1" ]]
I used = instead == because == isn't an equality comparison inside [[ ]], it is a pattern match.
For your specific case, this should work:
distinct_values=$(for dir in this_dir that_dir another_dir ; do ls -l "$dir" | wc -l ; done | uniq | wc -l)
if [ $distinct_values -eq 1 ] ; then
echo All the same
else
echo Not all the same
fi
Explanation:
ls -l "$dir" lists the files and subdirectories in the directory, one per line (omitting dot files).
Piping the output through wc -l gives you the number of files in the directory.
Doing that consecutively for each directory in the list gives you a list consisting of the number of files in each directory; if there are 7 in each, this gives 3 lines each consisting of the number 7
Piping that through uniq eliminates consecutive duplicate lines.
Piping that through wc -l gives you the number of distinct lines, which will be 1 if and only if all the directories contain the same number of files.
Note that the output of the 4th stage doesn't necessarily give you the number of distinct numbers of files in the directories; uniq only removes adjacent duplicates, so if the inputs are 7 6 7, the two 7s won't be merged. But it will merge all lines into 1 only if they're all the same.
This is the power of the Unix command line: putting small tools together to do interesting and useful things. (Show me a GUI that can do that!)
For values stored in variables, replace the first line by:
distinct_values=$(echo "$this_var" "$that_var" "$another_var" | fmt -1 | uniq | wc -l)
This assumes that the values of the variables don't contain spaces.

Resources