Replacing multiple preceding numbers from files - bash

Good day,
I have a bunch of files that need to be batch renamed like so:
01-filename1.txt > filename1.txt
02-filename2.txt > filename2.txt
32-filename3.txt > filename3.txt
322-filename4.txt > filename4.txt
31112-filename5.txt > filename5.txt
I run into an example of achieving this using bash ${string#substring} string operation, so this almost works:
for i in `ls`; do mv $i ${i#[0-9]}; done
However, this removes only a single digit and adding regex '+' does not seem to work. Is there a way to strip ALL preceding digits characters?
Thank you!

With Perl's standalone rename command:
rename -n 's/.*?-//' *.txt
If output looks okay, remove -n.
See: The Stack Overflow Regular Expressions FAQ

If you have a single character that always marks the end of the prefix, Pattern Matching makes it very simple.
for f in *; do
mv -nv "$f" "${f#*-}";
done;
Things worth noting:
In your case, the use of ls does not cause problems, but for a more generalized solution, certain filenames would break it. Additionally, the lack of quotes around parameter expansions would cause issues for files with newlines, spaces or tabs in them.
The pattern *- matches any string ending with - combined with lazy prefix removal (one # instead of 2), leads to ${f#*-} evaluating to "$f" with the shortest prefix ending in - removed (if one exists).
Bash's pattern matching is different from and inferior to RegEx, but you can get a little more power by enabling extended pattern matching with shopt -s extglob. Some distributions have this enabled by default.
Also, I threw the -nv flags in mv to ensure no mishaps when playing around with parameter expansion.
More Pattern Matching tricks I often use:
If you want to remove all leading digits and don't always have a single character terminating the prefix, extended pattern matching is helpful: "${f##+([0-9])}"

for i in *
do
name=$( echo "$i" | cut -d "-" -f 2 )
mv "$i" "$name" 2>/dev/null
done

Related

How to remove all file extensions in bash?

x=./gandalf.tar.gz
noext=${x%.*}
echo $noext
This prints ./gandalf.tar, but I need just ./gandalf.
I might have even files like ./gandalf.tar.a.b.c which have many more extensions.
I just need the part before the first .
If you want to give sed a chance then:
x='./gandalf.tar.a.b.c'
sed -E 's~(.)\..*~\1~g' <<< "$x"
./gandalf
Or 2 step process in bash:
x="${s#./}"
echo "./${x%%.*}"
./gandalf
Using extglob shell option of bash:
shopt -s extglob
x=./gandalf.tar.a.b.c
noext=${x%%.*([!/])}
echo "$noext"
This deletes the substring not containing a / character, after and including the first . character. Also works for x=/pq.12/r/gandalf.tar.a.b.c
Perhaps a regexp is the best way to go if your bash version supports it, as it doesn't fork new processes.
This regexp works with any prefix path and takes into account files with a dot as first char in the name (hidden files):
[[ "$x" =~ ^(.*/|)(.[^.]*).*$ ]] && \
noext="${BASH_REMATCH[1]}${BASH_REMATCH[2]}"
Regexp explained
The first group captures everything up to the last / included (regexp are greedy in bash), or nothing if there are no / in the string.
Then the second group captures everything up to the first ., excluded.
The rest of the string is not captured, as we want to get rid of it.
Finally, we concatenate the path and the stripped name.
Note
It's not clear what you want to do with files beginning with a . (hidden files). I modified the regexp to preserve that . if present, as it seemed the most reasonable thing to do. E.g.
x="/foo/bar/.myinitfile.sh"
becomes /foo/bar/.myinitfile.
If performance is not an issue, for instance something like this:
fil=$(basename "$x")
noext="$(dirname "$x")"/${fil%%.*}

Rename all files including a bracket (

I have a directory filled with a lot of files, some of which have brackets in them; e.g. a(file).ext,an(other)file.ext, etc. My goal is to rename them into something like this: a_file_.ext or an?other?file.ext. (doesn't matter what character).
the reason for this is because certain console applications can't deal with these brackets and think it's some kind of command.
things I already tried:
$ rename ( ? *(*
$ for f in *(*; do mv $f ${f//(/?}; done
$ for f in "*(*"; do mv $f ${f//\"(\"/\"?\"}; done
and the like.
It could be that I'm not understanding these rename functions. (I do know that these only work for "(" and that I have to do them again for ")")
So could someone also give some more explanation about it's syntax and why it won't work?
All in Bash.
Consider:
(shopt -s nullglob; for f in *[\(\)]*; do mv "$f" "${f//[()]/_}"; done)
( and ) are syntax, and need to be escaped to be unambiguously referred to as literals.
Setting the nullglob option makes the glob expand to nothing at all, rather than itself, if no files match. Putting the code in a subshell prevents this configuration change from persisting beyond the single command.
Using quotes around expansions is mandatory if you don't want those expansions to be subject to string-splitting and globbing.

Mass renaming of files in folder

I need to renami all the files below few files format in the folder in such a way that last _2.txt will be the same and apac, emea, mds will be the same in all files but before _XXX_2.txt need to add logs_date to all the files.
ABC_xyz_123_apac_2.txt
POR5_emea_2.txt
qw_1_0_122_mds_2.txt
to
logs_date_apac_2.txt
logs_date_emea_2.txt
logs_date_mds_2.txt
I'm not sure but maybe this is what you want:
#!/bin/bash
for file in *_2.txt;do
# remove echo to rename the files once you check it does what you expect
echo mv -v "$file" "$(sed 's/.*\(_.*_2\.txt\)$/logs_date\1/' <<<"$file")"
done
Do you have to use bash?
Bulk Rename Utility is an awesome tool that can easily rename multiple files in an intuitive way.
http://www.bulkrenameutility.co.uk/Main_Intro.php
Using mmv command should be easy.
mmv '*_*_2.txt' 'logs_date_#2_2.txt' *.txt
You could also use the rename tool:
rename 's/.+(_[a-z]+_[0-9].)/logs_date$1/' files
This will give you the desired output.
If you don't want to or can't use sed, you can also try this, which might even run faster. No matter what solution you use, be sure to backup before if possible.
shopt +s extglob # turn on the extglob shell option, which enables several extended pattern matching operators
set +H # turn off ! style history substitution
for file in *_2.txt;do
# remove echo to rename the files once you check it does what you expect
echo mv -v "$file" "${file/?(*_)!(*apac*|*emea*|*mds*)_/logs_date_}"
done
${parameter/pattern/string} performs pattern substitution. First optionally a number of characters ending with an underscore are matched, then a following number of characters not containing apac, emea or mds and ending with an underscore are matched, then the match is replaced with "logs_date_".
Copied from the bash man page:
?(pattern-list)
Matches zero or one occurrence of the given patterns
*(pattern-list)
Matches zero or more occurrences of the given patterns
+(pattern-list)
Matches one or more occurrences of the given patterns
#(pattern-list)
Matches one of the given patterns
!(pattern-list)
Matches anything except one of the given patterns

In Bash, how to strip out all numbers in the file names in a directory while leaving the file extension intact

I have files in a directory like this:
asdfs54345gsdf.pdf
gsdf6456wer.pdf
oirt4534724wefd.pdf
I want to rename all the files to just the numbers + .pdf so the above files would be renamed to:
54345.pdf
6456.pdf
4534724.pdf
The best would be a native Bash command or script (OSX 10.6.8)
Some clues I picked up include
sed 's/[^0-9]*//g' input.txt
sed 's/[^0-9]*//g' input.txt > output.txt
sed -i 's/[^0-9]*//g' input.txt
echo ${A//[0-9]/} rename 's/[0-9] //' *.pdf
This sould do it:
for f in *.pdf
do
mv "$f" "${f//[^0-9]/}.pdf"
done
but you better try before:
for f in *.pdf
do
echo mv "$f" "${f//[^0-9]/}.pdf"
done
Note, that abc4.pdf and zzz4.pdf will both be renamed to 4.pdf. So maybe you use mv -i instead of just mv.
updte: explaining:
I guess the fist part is clear; *.pdf is called globbing, and matches all files, ending with .pdf. for f in ... just iterates over them, setting f to one of them each time.
for f in *.pdf
do
mv "$f" "${f//[^0-9]/}.pdf"
done
I guess
mv source target
is clear as well. If a file is named "Unnamed File1", you need to mask it with quotes, because else mv will read
mv Unnamed File1 1.pdf
which means, it has multiple files to move, Unnamed and File1, and will interpret 1.pdf to be a directory to move both files to.
Okay, I guess the real issue is here:
"${f//[^0-9]/}.pdf"
There is an outer glueing of characters. Let be
foo=bar
some variable assignment Then
$foo
${foo}
"$foo"
"${foo}"
are four legitimate ways to refer to them. The last two used to mask blanks and such, so this is in some cases no difference, in some cases it is.
If we glue something together
$foo4
${foo}4
"$foo"4
"${foo}"4
the first form will not work - the shell will look for a variable foo4. All other 3 expressions refer to bar4 - first $foo is interpreted as bar, and then 4 is appended. For some characters the masking is not needed:
$foo/fool
${foo}/fool
"$foo"/fool
"${foo}"/fool
will all be interpreted in the same way. So whatever "${f//[^0-9]/}" is, "${f//[^0-9]/}.pdf" is ".pdf" appended to it.
We approach the kernel of all mysterias:
${f//[^0-9]/}
This is a substitution expression of the form
${variable//pattern/replacement}
variable is $f (we can omit the $ inside the braces here) is said $f from above. That was easy!
replacement is empty - that was even more easy.
But [^0-9] is something really complicated, isn't it?
-
[0-9]
is just the group of all digits from 0 to 9, other groups could be:
[0-4] digits below 5
[02468] even digits
[a-z] lower case letters
[a-zA-Z] all (common latin) characters
[;:,/] semicolon, colon, comma, slash
The Caret ^ in front as first character is the negation of the group:
[^0-9]
means everything except 0 to 9 (including dot, comma, colon, ...) is in the group. Together:
${f//[^0-9]/}
remove all non-digits from $f, and
"${f//[^0-9]/}.pdf"
append .pdf - the whole thing masked.
${v//p/r}
and its friends (there are many useful) are explained in man bash in the chapter Parameter Expansion. For the group I don't have a source for further reading at hand.

access series of numbers higher than 9 with [ ]

I want to change files with sed for a series of numbered filemnames (more than 10).
0 - 9 I can easily access using sed -i 's/old/new/g' myfile[0-9] but this doesn't seem to work for numbers higher than 10. How can I do this instead? like [0-50]?
A character class, like [0-9] or [a-f], matches a single character only, by definition. They don't match "numbers", per se -- even if the digits given are numeric, they're being viewed only as character codepoints, not numeric values. That's the same thing in fnmatch()-style patterns (used here by the shell) as they are in regular expressions.
If you want 0-50, with globbing behavior (matching only files that exist) that can be done by composing multiple character classes, like so:
shopt -s nullglob # if no files match, return empty result
files=( myfile[0-9] myfile[1-4][0-9] myfile5[0] )
# if list of files is nonzero, run sed:
(( ${#files[#]} )) && sed -i -e 's/old/new/g' "${files[#]}"
To explain how that works:
myfile[1-9] matches 1-9 (if they exist)
myfile[1-4][0-9] matches 10-49 (if they exist)
myfile5[0] matches 50, if and only if it exists.
Putting the list of files into the array and checking the array's length makes sure you don't run sed without any filenames at all listed, which could happen otherwise because of nullglob. (Why use nullglob at all here? Because you don't want myfile5[0] being passed as a literal filename if no myfile50 exists, which is the default behavior otherwise).
There are some additional extensions to the POSIX sh standard available as well:
If you don't care if files exist (and want to put contents on the command line even if they don't), you can use brace expansion:
sed -i -e 's/old/new/g' myfile{0..50}
Alternately, if you simply care about matching one-or-more numeric digits at the end of the filename, you can use extglobs:
shopt -s extglob
sed -i -e 's/old/new/g' myfile+([0-9])

Resources