Add leading zeros to integer section of filename - bash

I have folders of images with names like HolidaySnapsJune-1.tif , HolidaySnapsMay-12.tif and HolidaySnaps2018-005.tif
I want to add one leading 0 to the integer section of the filename if it is 2 digits long, and I want to add two leading 00s if it is just one digit long.
I have tried variations of
find . -name '*\_[0-9][0-9].tif' -exec sh -c '
for fpath do
echo mv "$fpath" "${fpath%/*}/${fpath##*/}"
done' _ {} +
But these put the leading zeros in front of the full file name instead of in front of the integer section.
I would love to do this is a bash script which would recursively work on folders so it's important that the difference in names preceeding the '-' is ignored or worked-around.
I'm on Windows and just have access to whatever is built into git-bashso bash, sed, awk etc.

You could use the rename.ul command from linux-utils.
rename [options] expression replacement file...
replaces the first occurence of expression by replacement in all names of files passed to the command.
Assuming your filenames contain exactly one hyphen -, you could simply run both of the following commands in a shell that supports the **/* glob syntax (alternatively, use find with the -exec option or something alike) to recursively rename all files:
rename.ul -- - -00 **/*-?.tif
rename.ul -- - -0 **/*-??.tif
There are several options to rename.ul to prevent you from accidentally renaming unintended files (Watch out! The consequences could be quite drastic):
-v, --verbose
Show which files were renamed, if any.
-n, --no-act
Do not make any changes; add --verbose to see what would
be made.
-i, --interactive
Ask before overwriting existing files.
So you could either run the commands with the -nv options to perform a dry-run and see what changes the program would make, or add -i to be asked for confirmation each time a file would be renamed.

If you don't want to use non-standard commands and write a small script, this would be one way to do it.
while read -r line; do
num=$(sed 's/\..*//' <<<${line/*-})
printf -v new_name '%s-%03d.%s' "${line/-*}" "${num}" "${line/*\.}"
mv -v "${line}” "${new_name}"
done < <(printf '%s\n' HolidaySnapsJune-1.tif HolidaySnapsMay-12.tif)
Using HolidaySnapsJune-1.tif for explanation below:
${line/*-} removes everything before the dash -= 1.tif
${line/*\.} removes everything besides the extension.
sed 's/\..*//' <<<${line/*-} also removes everything after the first period ., so now we have simply 1
'%s-%03d.%s' the %03d part of that tells printf to print digits with leading zeroes up to 3 digits.
Used while read as it is easy to mockup with. You probably want to either use a find command or something such for the input to the loop.

So, after looking through the answers submitted here and elsewhere on SO this is what I ame up with:
find . -name '*\-[0-9][0-9].tif' -exec sh -c '
for f do
mv "$f" "${f//\-/\-0}";
echo "$f"
done' _ {} +
This works on files with two digits in the integer section, bringing them up to 3 digits. For single digit files I alter slightly and run again.
One nice thing about this script is that it works on subfolders.
I do have to admit to not understanding it completely. I have no real idea why doneis followed by ' _ {} + . I guess that's the next thing I'll have to look up :-).

Related

how list just one file from a (bash) shell directory listing

A bit lowly a query but here goes:
bash shell script. POSIX, Mint 21
I just want one/any (mp3) file from a directory. As a sample.
In normal execution, a full run, the code would be such
for f in *.mp3 do
#statements
done
This works fine but if I wanted to sample just one file of such an array/glob (?) without looping, how might I do that? I don't care which file, just that it is an mp3 from the directory I am working in.
Should I just start this for-loop and then exit(break) after one statement, or is there a neater way more tailored-for-the-job way?
for f in *.mp3 do
#statement
break
done
Ta (can not believe how dopey I feel asking this one, my forehead will hurt when I see the answers )
Since you are using Linux (Mint) you've got GNU find so one way to get one .mp3 file from the current directory is:
mp3file=$(find . -maxdepth 1 -mindepth 1 -name '*.mp3' -printf '%f' -quit)
-maxdepth 1 -mindepth 1 causes the search to be restricted to one level under the current directory.
-printf '%f' prints just the filename (e.g. foo.mp3). The -print option would print the path to the filename (e.g. ./foo.mp3). That may not matter to you.
-quit causes find to exit as soon as one match is found and printed.
Another option is to use the Bash : (colon) command and $_ (dollar underscore) special variable:
: *.mp3
mp3file=$_
: *.mp3 runs the : command with the list of .mp3 files in the current directory as arguments. The : command ignores its arguments and does nothing.
mp3file=$_ sets the value of the mp3file variable to the last argument supplied to the previous command (:).
The second option should not be used if the number of .mp3 files is large (hundreds or more) because it will find all of the files and sort them by name internally.
In both cases $mp3file should be checked to ensure that it really exists (e.g. [[ -e $mp3file ]]) before using it for anything else, in case there are no .mp3 files in the directory.
I would do it like this in POSIX shell:
mp3file=
for f in *.mp3; do
if [ -f "$f" ]; then
mp3file=$f
break
fi
done
# At this point, the variable mp3file contains a filename which
# represents a regular file (or a symbolic link) with the .mp3
# extension, or empty string if there is no such a file.
The fact that you use
for f in *.mp3 do
suggests to me, that the MP3s are named without to much strange characters in the filename.
In that case, if you really don't care which MP3, you could:
f=$(ls *.mp3|head)
statement
Or, if you want a different one every time:
f=$(ls *.mp3|sort -R | tail -1)
Note: if your filenames get more complicated (including spaces or other special characters), this will not work anymore.
Assuming you don't have spaces in your filenames, (and I don't understand why the collective taboo is against using ls in scripts at all, rather than not having spaces in filenames, personally) then:-
ls *.mp3 | tr ' ' '\n' | sed -n '1p'

How to remove unknown file extensions from files using script

I can remove file extensions if I know the extensions, for example to remove .txt from files:
foreach file (`find . -type f`)
mv $file `basename $file .txt`
end
However if I don't know what kind of file extension to begin with, how would I do this?
I tried:
foreach file (`find . -type f`)
mv $file `basename $file .*`
end
but it wouldn't work.
What shell is this? At least in bash you can do:
find . -type f | while read -r; do
mv -- "$REPLY" "${REPLY%.*}"
done
(The usual caveats apply: This doesn't handle files whose name contains newlines.)
You can use sed to compute base file name.
foreach file (`find . -type f`)
mv $file `echo $file | sed -e 's/^\(.*\)\.[^.]\+$/\1/'`
end
Be cautious: The command you seek to run could cause loss of data!
If you don't think your file names contain newlines or double quotes, then you could use:
find . -type f -name '?*.*' |
sed 's/\(.*\)\.[^.]*$/mv "&" "\1"/' |
sh
This generates your list of files (making sure that the names contain at least one character plus a .), runs each file name through the sed script to convert it into an mv command by effectively removing the material from the last . onwards, and then running the stream of commands through a shell.
Clearly, you test this first by omitting the | sh part. Consider running it with | sh -x to get a trace of what the shell's doing. Consider making sure you capture the output of the shell, standard output and standard error, into a log file so you've got a record of the damage that occurred.
Do make sure you've got a backup of the original set of files before you start playing with this. It need only be a tar file stored in a different part of the directory hierarchy, and you can remove it as soon as you're happy with the results.
You can choose any shell; this doesn't rely on any shell constructs except pipes and single quotes and double quotes (pretty much common to all shells), and the sed script is version neutral too.
Note that if you have files xyz.c and xyz.h before you run this, you'll only have a file xyz afterwards (and what it contains depends on the order in which the files are processed, which needn't be alphabetic order).
If you think your file names might contain double quotes (but not single quotes), you can play with the changing the quotes in the sed script. If you might have to deal with both, you need a more complex sed script. If you need to deal with newlines in file names, then it is time to (a) tell your user(s) to stop being silly and (b) fix the names so they don't contain newlines. Then you can use the script above. If that isn't feasible, you have to work a lot harder to get the job done accurately — you probably need to make sure you've got a find that supports -print0, a sed that supports -z and an xargs that supports -0 (installing the most recent GNU versions if you don't already have the right support in place).
It's very simple:
$ set filename=/home/foo/bar.dat
$ echo ${filename:r}
/home/foo/bar
See more in man tcsh, in "History substitution":
r
Remove a filename extension '.xxx', leaving the root name.

How to add leading zero's to sequential file names

I have images files that when they are created have these kind of file names:
Name of file-1.jpg
Name of file-2.jpg
Name of file-3.jpg
Name of file-4.jpg
..etc
This causes problems for sorting between Windows and Cygwin Bash. When I process these files in Cygwin Bash, they get processed out of order because of the differences in sorting between Windows file system and Cygwin Bash sees them. However, if the files get manually renamed and numbered with leading zeroes, this issue isn't a problem. How can I use Bash to rename these files automatically so I don't have to manually process them. I'd like to add a few lines of code to my Bash script to rename them and add the leading zeroes before they are processed by the rest of the script.
Since I use this Bash script interchangeably between Windows Cygwin and Mac, I would like something that works in both environments, if possible. Also all files will have names with spaces.
You could use something like this:
files="*.jpg"
regex="(.*-)(.*)(\.jpg)"
for f in $files
do
if [[ "$f" =~ $regex ]]
then
number=`printf %03d ${BASH_REMATCH[2]}`
name="${BASH_REMATCH[1]}${number}${BASH_REMATCH[3]}"
mv "$f" "${name}"
fi
done
Put that in a script, like rename.sh and run that in the folder where you want to covert the files. Modify as necessary...
Shamelessly ripped from here:
Capturing Groups From a Grep RegEx
and here:
How to Add Leading Zeros to Sequential File Names
#!/bin/bash
#cygcheck (cygwin) 2.3.1
#GNU bash, version 4.3.42(4)-release (i686-pc-cygwin)
namemodify()
{
bname="${1##*/}"
dname="${1%/*}"
mv "$1" "${dname}/00${bname}" # Add any number of leading zeroes.
}
export -f namemodify
find . -type f -iname "*jpg" -exec bash -c 'namemodify "$1"' _ {} \;
I hope this won't break on Mac too :) good luck

Bash: Check all files in a location against another for existence

I'm after a little help with some Bash scripting (on OSX). I want to create a script that takes two parameters - source folder and target folder - and checks all files in the source hierarchy to see whether or not they exist in the target hierarchy. i.e. Given a data DVD check whether the files contained on it are already on the internal drive.
What I've come up with so far is
#!/bin/bash
if [ $# -ne 2 ]
then
echo "Usage is command sourcedir targetdir"
exit 0
fi
source="$1"
target="$2"
for f in "$( find $source -type f -name '*' -print )"
do
I'm now not sure how it's best to obtain the filename without its path and then see if it exists. I am really a beginner at scripting.
Edit: The answers given so far are all very efficient in terms of compact code. However I need to be able to look for files found within the total source hierarchy anywhere within the target hierarchy. If found I would like to compare checksums and last modified dates etc and comment or, if not found, I would like to note this. The purpose is to check whether files on external media have been uploaded to a file server.
This should give you some ideas:
#!/bin/bash
DIR1="tmpa"
DIR2="tmpb"
function sorted_contents
{
cd "$1"
find . -type f | sort
}
DIR1_CONTENTS=$(sorted_contents "$DIR1")
DIR2_CONTENTS=$(sorted_contents "$DIR2")
diff -y <(echo "$DIR1_CONTENTS") <(echo "$DIR2_CONTENTS")
In my test directories, the output was:
[user#host so]$ ./dirdiff.sh
./address-book.dat ./address-book.dat
./passwords.txt ./passwords.txt
./some-song.mp3 <
./the-holy-grail.info ./the-holy-grail.info
> ./victory.wav
./zzz.wad ./zzz.wad
If its not clear, "some-song.mp3" was only in the first directory while "victory.wav" was only in the second. The rest of the files were common.
Note that this only compares the file names, not the contents. If you like where this is headed, you could play with the diff options (maybe --suppress-common-lines if you want cleaner output).
But this is probably how I'd approach it -- offload a lot of the work onto diff.
EDIT: I should also point out that something as simple as:
[user#host so]$ diff tmpa tmpb
would also work:
Only in tmpa: some-song.mp3
Only in tmpb: victory.wav
... but not feel as satisfying as writing a script yourself. :-)
To list only files in $source_dir that do not exist in $target_dir:
comm -23 <(cd "$source_dir" && find .|sort) <(cd "$target_dir" && find .|sort)
You can limit it to just regular files with -f on the find commands, etc.
The comm command (short for "common") finds lines in common between two text files and outputs three columns: lines only in the first file, lines only in the second file, and lines common to both. The numbers suppress the corresponding column, so the output of comm -23 is only the lines from the first file that don't appear in the second.
The process substitution syntax <(command) is replaced by the pathname to a named pipe connected to the output of the given command, which lets you use a "pipe" anywhere you could put a filename, instead of only stdin and stdout.
The commands in this case generate lists of files under the two directories - the cd makes the output relative to the directories being compared, so that corresponding files come out as identical strings, and the sort ensures that comm won't be confused by the same files listed in different order in the two folders.
A few remarks about the line for f in "$( find $source -type f -name '*' -print )":
Make that "$source". Always use double quotes around variable substitutions. Otherwise the result is split into words that are treated as wildcard patterns (a historical oddity in the shell parsing rules); in particular, this would fail if the value of the variable contain spaces.
You can't iterate over the output of find that way. Because of the double quotes, there would be a single iteration through the loop, with $f containing the complete output from find. Without double quotes, file names containing spaces and other special characters would trip the script.
-name '*' is a no-op, it matches everything.
As far as I understand, you want to look for files by name independently of their location, i.e. you consider /dvd/path/to/somefile to be a match to /internal-drive/different/path-to/somefile. So make an list of files on each side indexed by name. You can do this by massaging the output of find a little. The code below can cope with any character in file names except newlines.
list_files () {
find . -type f -print |
sed 's:^\(.*\)/\(.*\)$:\2/\1/\2:' |
sort
}
source_files="$(cd "$1" && list_files)"
dest_files="$(cd "$2" && list_files)"
join -t / -v 1 <(echo "$source_files") <(echo "$dest_files") |
sed 's:^[^/]*/::'
The list_files function generates a list of file names with paths, and prepends the file name in front of the files, so e.g. /mnt/dvd/some/dir/filename.txt will appear as filename.txt/./some/dir/filename.txt. It then sorts the files.
The join command prints out lines like filename.txt/./some/dir/filename.txt when there is a file called filename.txt in the source hierarchy but not in the destination hierarchy. We finally massage its output a little since we no longer need the filename at the beginning of the line.

Recursive batch rename

I have several folders with some files that I would like to rename from
Foo'Bar - Title
to
Title
I'm using OS X 10.7. I've looked at other solutions, but none that address recursion very well.
Any suggestions?
There are two parts to your problem: Finding files to operate on recursively, and renaming them.
For the first, if everything is exactly one level below the current directory, you can just list the contents of every directory in the current directory (as in Mattias Wadman's answer above), but more generally (and possibly more easy to understand, to boot), you can just use the find command.
For the second, you can use sed and work out how to get the quoting and piping right (which you should definitely eventually learn), but it's much simpler to use the rename command. Unfortunately, this one isn't built in on Mac, but you can install it with, e.g., Homebrew, or just download the perl script and sudo install -m755 rename /usr/local/bin/rename.
So, you can do this:
find . -exec rename 's|[^/]* - ||' {} +
If you want to do a "dry run" to make sure it's right, add the "-n" flag to rename:
find . -exec rename -n 's|[^/]* - ||' {} +
To understand how it works, you really should read the tutorial for find, and the manpage for rename, but breaking it down:
find . means 'find all files recursively under the current directory'.
You can add additional tests to filter things (e.g., -type f if you want to skip everything but regular files, or `-name '*Title' if you want to only change files that end in 'Title'), but that isn't necessary for your use.
-exec … + means to batch up the found files, and pass as many of them as possible in place of any {} in the command that appears in the '…'.
rename 's|[^/]* - ||' {} means for each file in {}, apply the perl expression s|[^/]* - || to the filename, and, if the result is different, rename it to that result.
s|[^/]* - || means to match the regular expression '[^/]* -' and replace the match with '' (the empty string).
[^/]* - means to match any string of non-slash characters that ends with ' - '. So, in './A/FooBar - Title', it'll match the 'FooBar -'.
I should mention that, when I have something complicated to do like this, if after a few minutes and a couple attempts to get it right with find/sed/awk/rename/etc., I still haven't got it, I often just code it up imperatively with Python and os.walk. If you know Python, that might be easier for you to understand (although more verbose and less simple), and easier for you to modify to other use cases, so if you're interested, ask for that.
Try this:
ls -1 * | while read f ; do mv "$f" "`echo $f | sed 's/^.* - //'`" ; done
I recommend you to add a echo before mv before running it to make sure the commands look ok. And as abarnert noted in the comments this command will only work for one directory at a time.
Detailed explanation of the various commands:
ls -1 * will output a line for each file (and directory) in the current directory (except .-files). So this will be expanded in to ls -1 file1 file2 ..., -1 to ls tells it to list the filename only and one file per line.
The output is then piped into while read f ; ... ; done which will loop while read f returns zero, which it does until it reaches end of file. read f reads one line at a time from standard input (which in this case is the output from ls -1 ...) and store it in the the variable specified, in this case f.
In the while loop we run a mv command with two arguments, first "$f" as the source file (note the quotes to handle filenames with spaces etc) and second the destination filename which uses sed and ` (backticks) to do what is called command substitution that will call the command inside the backticks and be replaced it with the output from standard output.
echo $f | sed 's/^.* - //' pipes the current file $f into sed that will match a regular expression and do substitution (the s in s/) and output the result on standard output. The regular expression is ^.* - which will match from the start of the string ^ (called anchoring) and then any characters .* followed by - and replace it with the empty string (the string between //).
I know you asked for batch rename, but I suggest you to use Automator.
It works perfectly, and if you create it as a service you will have the option in your contextual menu :)
After some trial and error, I came across this solution that worked for me to solve the same problem.
find <dir> -name *.<oldExt> -exec rename -S .<oldExt> .<newExt> {} \;
Basically, I leverage the find and rename utilities. The trick here is figuring out where to place the '{}' (which represents the files that need to be processed by rename) of rename.
P.S. rename is not a built-in linux utility. I work with OS X and used homebrew to install rename.

Resources