access series of numbers higher than 9 with [ ] - bash

I want to change files with sed for a series of numbered filemnames (more than 10).
0 - 9 I can easily access using sed -i 's/old/new/g' myfile[0-9] but this doesn't seem to work for numbers higher than 10. How can I do this instead? like [0-50]?

A character class, like [0-9] or [a-f], matches a single character only, by definition. They don't match "numbers", per se -- even if the digits given are numeric, they're being viewed only as character codepoints, not numeric values. That's the same thing in fnmatch()-style patterns (used here by the shell) as they are in regular expressions.
If you want 0-50, with globbing behavior (matching only files that exist) that can be done by composing multiple character classes, like so:
shopt -s nullglob # if no files match, return empty result
files=( myfile[0-9] myfile[1-4][0-9] myfile5[0] )
# if list of files is nonzero, run sed:
(( ${#files[#]} )) && sed -i -e 's/old/new/g' "${files[#]}"
To explain how that works:
myfile[1-9] matches 1-9 (if they exist)
myfile[1-4][0-9] matches 10-49 (if they exist)
myfile5[0] matches 50, if and only if it exists.
Putting the list of files into the array and checking the array's length makes sure you don't run sed without any filenames at all listed, which could happen otherwise because of nullglob. (Why use nullglob at all here? Because you don't want myfile5[0] being passed as a literal filename if no myfile50 exists, which is the default behavior otherwise).
There are some additional extensions to the POSIX sh standard available as well:
If you don't care if files exist (and want to put contents on the command line even if they don't), you can use brace expansion:
sed -i -e 's/old/new/g' myfile{0..50}
Alternately, if you simply care about matching one-or-more numeric digits at the end of the filename, you can use extglobs:
shopt -s extglob
sed -i -e 's/old/new/g' myfile+([0-9])

Related

Replacing multiple preceding numbers from files

Good day,
I have a bunch of files that need to be batch renamed like so:
01-filename1.txt > filename1.txt
02-filename2.txt > filename2.txt
32-filename3.txt > filename3.txt
322-filename4.txt > filename4.txt
31112-filename5.txt > filename5.txt
I run into an example of achieving this using bash ${string#substring} string operation, so this almost works:
for i in `ls`; do mv $i ${i#[0-9]}; done
However, this removes only a single digit and adding regex '+' does not seem to work. Is there a way to strip ALL preceding digits characters?
Thank you!
With Perl's standalone rename command:
rename -n 's/.*?-//' *.txt
If output looks okay, remove -n.
See: The Stack Overflow Regular Expressions FAQ
If you have a single character that always marks the end of the prefix, Pattern Matching makes it very simple.
for f in *; do
mv -nv "$f" "${f#*-}";
done;
Things worth noting:
In your case, the use of ls does not cause problems, but for a more generalized solution, certain filenames would break it. Additionally, the lack of quotes around parameter expansions would cause issues for files with newlines, spaces or tabs in them.
The pattern *- matches any string ending with - combined with lazy prefix removal (one # instead of 2), leads to ${f#*-} evaluating to "$f" with the shortest prefix ending in - removed (if one exists).
Bash's pattern matching is different from and inferior to RegEx, but you can get a little more power by enabling extended pattern matching with shopt -s extglob. Some distributions have this enabled by default.
Also, I threw the -nv flags in mv to ensure no mishaps when playing around with parameter expansion.
More Pattern Matching tricks I often use:
If you want to remove all leading digits and don't always have a single character terminating the prefix, extended pattern matching is helpful: "${f##+([0-9])}"
for i in *
do
name=$( echo "$i" | cut -d "-" -f 2 )
mv "$i" "$name" 2>/dev/null
done

How to keep/remove numbers in a variable in shell?

I have a variable such as:
disk=/dev/sda1
I want to extract:
only the non numeric part (i.e. /dev/sda)
only the numeric part (i.e. 1)
I'm gonna use it in a script where I need the disk and the partition number.
How can I do that in shell (bash and zsh mostly)?
I was thinking about using Shell parameters expansions, but couldn't find working patterns in the documentation.
Basically, I tried:
echo ${disk##[:alpha:]}
and
echo ${disk##[:digit:]}
But none worked. Both returned /dev/sda1
With bash and zsh and Parameter Expansion:
disk="/dev/sda12"
echo "${disk//[0-9]/} ${disk//[^0-9]/}"
Output:
/dev/sda 12
The expansions kind-of work the other way round. With [:digit:] you will match only a single digit. You need to match everything up until, or from a digit, so you need to use *.
The following looks ok:
$ echo ${disk%%[0-9]*} ${disk##*[^0-9]}
/dev/sda 1
To use [:digit:] you need double braces, cause the character class is [:class:] and it itself has to be inside [ ]. That's why I prefer 0-9, less typing*. The following is the same as above:
echo ${disk%%[[:digit:]]*} ${disk##*[^[:digit:]]}
* - Theoretically they may be not equal, as [0-9] can be affected by the current locale, so it may be not equal to [0123456789], but to something different.
You have to be careful when using patterns in parameter substitution. These patterns are not regular expressions but pathname expansion patterns, or glob patterns.
The idea is to remove the last number, so you want to make use of Remove matching suffix pattern (${parameter%%word}). Here we remove the longest instance of the matched pattern described by word. Representing single digit numbers is easily done by using the pattern [0-9], however, multi-digit numbers is harder. For this you need to use extended glob expressions:
*(pattern-list): Matches zero or more occurrences of the given patterns
So if you want to remove the last number, you use:
$ shopt -s extglob
$ disk="/dev/sda1"
$ echo "${disk#${disk%%*([0-9])}} "${disk%%*([0-9])}"
1 dev/sda
$ disk="/dev/dsk/c0t2d0s0"
$ echo "${disk#${disk%%*([0-9])}} "${disk%%*([0-9])}"
0 /dev/dsk/c0t2d0s
We have to use ${disk#${disk%%*([0-9])}} to remove the prefix. It essentially searches the last number, removes it, uses the remainder and remove that part again.
You can also make use of pattern substitution (${parameter/pattern/string}) with the anchors % and # to anchor the pattern to the begin or end of the parameter. (see man bash for more information). This is completely equivalent to the previous solution:
$ shopt -s extglob
$ disk="/dev/sda1"
$ echo "${disk/${disk/%*([0-9])}/}" "${disk/%*([0-9])}"
1 dev/sda
$ disk="/dev/dsk/c0t2d0s0"
$ echo "${disk/${disk/%*([0-9])}/}" "${disk/%*([0-9])}"
0 /dev/dsk/c0t2d0s

How to capture Filename Expansion? (expanding globs)

--Disclaimer--
I am open to better titles for this question.
I am trying to get the full name of a file matching: "target/cs-*.jar".
The glob is the version number.
Right now the version is 0.0.1-SNAPSHOT.
So, below, I would like jar_location to evaluate to cs-0.0.1-SNAPSHOT.jar
I've tried a few solutions, some of them work, some don't and I'm not sure what I'm missing.
Works
jar_location=( $( echo "target/cs-*.jar") )
echo "${jar_location[0]}"
Doesn't work
jar_location=$( echo "target/cs-*.jar")
echo "$jar_location"
jar_location=( "/target/cs-*.jar" )
echo "${jar_location}"
jar_location=$( ls "target/cs-*.jar" )
echo "${jar_location}"
--EDIT--
Added Filename Expansion to the title
Link to Bash Globbing / Filename Expansion
Similar question: The best way to expand glob pattern?
If you're using bash, the best option is to use an array to expand the glob:
shopt -s nullglob
jar_locations=( target/cs-*.jar )
if [[ ${#jar_locations[#]} -gt 0 ]]; then
jar_location=${jar_locations##*/}
fi
Enabling nullglob means that the array will be empty if there are no matches; without this shell option enabled, the array would contain the literal string target/cs-*.jar in the case of no matches.
If the length of the array is greater than zero, then set the variable, using the expansion to remove everything up to the last / from the first element of the array. This uses the fact that ${jar_locations[0]} and $jar_locations get you the same thing, namely the first element of the array. If you don't like that, you can always assign to a temporary variable.
An alternative for those with GNU find:
jar_location=$(find target -name 'cs-*.jar' -printf '%f' -quit)
This prints the filename of the first result and quits.
Note that if there is more than one file found, the output of these two commands may differ.

Match a range of file names with variable end, in a Bash script

Let's say I have a number of files named file1, file2, file3, and so on. I'm trying to find a way to match the first N files, in a Bash script, where N is a variable. Here are the options I've considered so far:
Brace expansion, i.e. file{1..3}, doesn't allow variable end. In other words, file{1..$N} doesn't work.
A range expression can be used to match numeric characters. It allows variable end, i.e. file[1-$N], but this works only until N > 9.
$(seq 1 $N) can be used to create a sequence of numbers, but it doesn't help since the problem is to match a sequence of numbers in a file name. Were the files name simply 1, 2, 3, and so on, this would work.
Here is another solution. I'm not advocating it, but then again there can be legitimate uses for eval ;) ...also I think not being able to use a variable in a range is an annoying/less intuitive shortcoming.
N=5
eval echo {1..$N}
So you could do
eval ls file{1..$N}
I found a solution using extended globs. They need to be enabled with shopt -s extglob command. #(...) can be used to match any of a set of patterns separated by | character, e.g. file#(1|2|3). Now I just need to generate the number sequence with | as the separator character instead of a newline:
shopt -s extglob
range=$(seq 1 $N)
ls file#(${range//$'\n'/|})
Could you simply do,
for file01.txt, file02.txt, file345.txt, file678.txt...
cat file*.txt > file_all.txt
or am I missing the point?

In Bash, how to strip out all numbers in the file names in a directory while leaving the file extension intact

I have files in a directory like this:
asdfs54345gsdf.pdf
gsdf6456wer.pdf
oirt4534724wefd.pdf
I want to rename all the files to just the numbers + .pdf so the above files would be renamed to:
54345.pdf
6456.pdf
4534724.pdf
The best would be a native Bash command or script (OSX 10.6.8)
Some clues I picked up include
sed 's/[^0-9]*//g' input.txt
sed 's/[^0-9]*//g' input.txt > output.txt
sed -i 's/[^0-9]*//g' input.txt
echo ${A//[0-9]/} rename 's/[0-9] //' *.pdf
This sould do it:
for f in *.pdf
do
mv "$f" "${f//[^0-9]/}.pdf"
done
but you better try before:
for f in *.pdf
do
echo mv "$f" "${f//[^0-9]/}.pdf"
done
Note, that abc4.pdf and zzz4.pdf will both be renamed to 4.pdf. So maybe you use mv -i instead of just mv.
updte: explaining:
I guess the fist part is clear; *.pdf is called globbing, and matches all files, ending with .pdf. for f in ... just iterates over them, setting f to one of them each time.
for f in *.pdf
do
mv "$f" "${f//[^0-9]/}.pdf"
done
I guess
mv source target
is clear as well. If a file is named "Unnamed File1", you need to mask it with quotes, because else mv will read
mv Unnamed File1 1.pdf
which means, it has multiple files to move, Unnamed and File1, and will interpret 1.pdf to be a directory to move both files to.
Okay, I guess the real issue is here:
"${f//[^0-9]/}.pdf"
There is an outer glueing of characters. Let be
foo=bar
some variable assignment Then
$foo
${foo}
"$foo"
"${foo}"
are four legitimate ways to refer to them. The last two used to mask blanks and such, so this is in some cases no difference, in some cases it is.
If we glue something together
$foo4
${foo}4
"$foo"4
"${foo}"4
the first form will not work - the shell will look for a variable foo4. All other 3 expressions refer to bar4 - first $foo is interpreted as bar, and then 4 is appended. For some characters the masking is not needed:
$foo/fool
${foo}/fool
"$foo"/fool
"${foo}"/fool
will all be interpreted in the same way. So whatever "${f//[^0-9]/}" is, "${f//[^0-9]/}.pdf" is ".pdf" appended to it.
We approach the kernel of all mysterias:
${f//[^0-9]/}
This is a substitution expression of the form
${variable//pattern/replacement}
variable is $f (we can omit the $ inside the braces here) is said $f from above. That was easy!
replacement is empty - that was even more easy.
But [^0-9] is something really complicated, isn't it?
-
[0-9]
is just the group of all digits from 0 to 9, other groups could be:
[0-4] digits below 5
[02468] even digits
[a-z] lower case letters
[a-zA-Z] all (common latin) characters
[;:,/] semicolon, colon, comma, slash
The Caret ^ in front as first character is the negation of the group:
[^0-9]
means everything except 0 to 9 (including dot, comma, colon, ...) is in the group. Together:
${f//[^0-9]/}
remove all non-digits from $f, and
"${f//[^0-9]/}.pdf"
append .pdf - the whole thing masked.
${v//p/r}
and its friends (there are many useful) are explained in man bash in the chapter Parameter Expansion. For the group I don't have a source for further reading at hand.

Resources