How to get the wildcard matched portion from a filename - shell

for name in file*.txt
do
echo ${name%.txt} | grep -o -E '[0-9]+'
done
is there a better way then using grep? I have file1.txt, file2.txt ...., want to extract just the numbers.

If your shell is bash, you might also consider replacing all non-numeric characters with the empty string with a parameter expansion, as follows:
for name in file*.txt; do
echo "${name//[![:digit:]]/}"
done
By contrast, if you need to work with POSIX-compatible shells (and the file prefix is hardcoded), consider trimming prefix and suffix using the following POSIX-compliant PEs:
for name in file*.txt; do
num=${name%.txt}; num=${num#file}
echo "$num"
done

In bash:
$ for f in test* ; do f=${f#test} ; echo "${f%.txt}" ; done
1
2
3
Add some quotes if filenames have space in them.bb
Using GNU awk:
$ awk 'BEGINFILE {
gsub(/^test|\.txt/,"",FILENAME)
print FILENAME
nextfile
}' test*
1
2
3

Related

How can I save only a substring of file names from a directory without the file extension?

I have a directory that I'm reading from and I want to save only the date representation as a string.
I am close to getting it , although I know there is probably an easier way. Here is what I have so far:
#files are in the format of "THIS_20200420.csv" so I want only "20200420"
declare -a arr
declare -a arr2
FILES=test2/*.csv
for file in $FILES
do
arr=(${arr[*]} "${file##*/}")
done
for i in "${arr[#]}"
do
arr2+=$(echo $i | cut -c6-13)
done
for item in "${arr2[#]}"
do
echo $item
done
the output shows the array only having one element which is all the strings concatenated:
20200110202001202020021920200220202004202020042220200110202001202020021920200220202004202020042220200219202002202020042020200422
Im bashing my head against my computer at this point.
arr=(
"THIS_20200420.csv"
"THIS_20200421.csv"
"THIS_20200422.csv"
"THIS_20200423.csv"
"THIS_20200424.csv"
"THIS_20200425.csv"
"THIS_20200426.csv"
"THIS_20200427.csv"
"THIS_20200428.csv"
"THIS_20200429.csv"
"THIS_20200430.csv" )
arr=( ${arr[#]//*_} )
arr=( ${arr[#]//.*} )
echo "arr: ${arr[#]}"
Explanation:
arr=( ${arr[#]//*_} ) will match all char up to '_' for each element, and replace them with empty string.
arr=( ${arr[#]//.*} ) will match all char after '.' for each element, and replace them with empty string.
For more information on parameter expansion, a good reference is TLDP's guide on parameter expansion.
Try this
declare -a arrayname=($(ls -1 test2/*.csv | grep -o '[0-9]*'))
Demo:
$ls -1 *csv
THIS_20200420.csv
THIS_20200421.csv
THIS_20200422.csv
THIS_20200423.csv
THIS_20200424.csv
THIS_20200425.csv
THIS_20200426.csv
THIS_20200427.csv
THIS_20200428.csv
THIS_20200429.csv
THIS_20200430.csv
$declare -a arrayname=($(ls -1 *csv | grep -o '[0-9]*'))
$echo ${arrayname[#]}
20200420 20200421 20200422 20200423 20200424 20200425 20200426 20200427 20200428 20200429 20200430
$echo ${arrayname[2]}
20200422
$
You could achieve this using a loop with awk:
$ for file in *.csv; do echo $file | awk -F '[^[:alnum:]]' '{print $2}'; done
The -F '[^[:alnum:]]' tells awk to use non alphanumeric characters as the delimiter.
Another way to do this is to use bash shell parameter expansion to echo only the part of the filename you want. This obviously only works if your filenames have consistent formatting:
$ for file in *.csv; do echo "${file:5:8}"; done
I thought it would be nice to use bash parameter expansion to strip the unwanted prefix and suffix but you can't have nested expansion (afaict) so this is the best I could come up with:
$ for file in *.csv; do echo "$(tmp=${file%.csv}; echo ${tmp#THIS_})"; done
Meet Cut! A good friend of Linux Users
for file in ./*.csv; do echo $file | cut -d "_" -f 2 | cut -d "." -f 1 ; done
This one line should do the trick!
Example:
Use an array for the files assignment and parameter expansion.
#!/usr/bin/env bash
shopt -s nullglob
##: Save the files ending in *.csv in an array
## so it expands properly, variable assignment does not expand the glob *
files=(test2/*.csv)
##: Remain only the files that end with .csv without the pathname, longest match
files=("${files[#]##*/}")
##: Remain only the file names without the .csv extention
files=("${files[#]%.csv}")
##: Remain only the filename after the _ from the beginning, shortest match.
files=("${files[#]#*_}")
printf '%s ' "${files[#]}"

How can I increment an infix variable in Bash?

I have a string foo-0 that I want to convert to bar1baz, i.e., parse the trailing index and add a prefix/suffix. The part before the trailing index (in this case foo- can also contain numeric characters, but those should not be changed.
I tried the following:
echo foo-0 | cut -d'-' -f 2 | sed 's/.*/bar&baz/'
but that gives me only a partial solution (bar0baz). How can I increment the infix variable?
EDIT: the solutions below only work partially for what I am trying to achieve. This is my fault because I simplified the example above too much for the sake of clarity.
The final goal is to set an environmental variable (let's call it MY_ENV) to the output value using bash with the following syntax:
/bin/sh -c "echo $var | ... (some bash magic to replace the trailing index) | ... (some bash magic to set MY_ENV=the output of the pipe)"
Side note: The reason I am using /bin/sh -c "..." is because I want to use the command in a Kubernetes YAML.
Partial solution (using awk)
This works:
echo foo-0 | awk -F- '{print "bar" $2+1 "baz"}'
This doesn't (output is 1baz):
/bin/sh -c "echo foo-0 | awk -F- '{print \"bar\" $2+1 \"baz\"}'
Partial solution (using arithmetic context and parameter expansion)
$ var=foo-0
$ echo "bar$((${var//[![:digit:]]}+1))baz"
This does not work if var contains other numeric characters before the trailing index (e.g. for var foo=r2a-foo-0.
You may use awk:
awk -F- '{print "bar" $2+1 "baz"}' <<< 'foo-0'
bar1baz
You could use an arithmetic context and parameter expansion:
$ var=foo-0
$ echo "bar$((${var//[![:digit:]]}+1))baz"
bar1baz
Unrolled, from the inside:
${var//[![:digit:]]} removes all non-digits from var:
$ echo "${var//[![:digit:]]}"
0
$((blah+1)) adds 1 to the variable blah:
$ blah=0
$ echo "$((blah+1))"
1
or, instead of blah, we can use the result of the inner substitution:
$ echo "$(( ${var//[![:digit:]]} + 1 ))"
1
and finally, putting this between bar and baz, you get bar1baz.
Amending for the other case brought up: assuming there might be other digits and we want to increment only the trailing ones, e.g.,
var=2a-foo-21
To do this, we can use nested parameter expansion with extended globs (shopt -s extglob) and the +(pattern) pattern, which matches one or more of pattern. Observe:
$ echo "${var#"${var%%+([[:digit:]])}"}"
21
The outer expansion is ${var#pattern}, which removes the shortest match of pattern from the beginning of $var. For pattern, we use
"${var%%+([[:digit:]])}"
which is "remove the longest match of +([[:digit:]]) (one or more digits) from the end of $var". This leaves us with just the trailing digits, and incrementing them and adding string before and after looks something like this:
$ echo "bar$((${var#"${var%%+([[:digit:]])}"}+1))baz"
bar22baz
This is so unreadable that I'd suggest using regex instead:
$ re='([[:digit:]]+)$'
$ [[ $var =~ $re ]]
$ echo "bar$((${BASH_REMATCH[1]}+1))baz"
bar22baz

Cut a substring in bash

Suppose I have the following string:
some letters foo/substring/goo/some additional letters
I need to extract this substring supposing that foo/ and /goo are constant strings that are known in advance. How can I do that?
This sed one-liner does it.
sed 's#.*foo/##;s#/goo/.*##' file
Except for sed, awk, grep can do the job too. Or with zsh:
kent$ v="some letters foo/substring/goo/some additional letters"
kent$ echo ${${v##*foo/}%%/goo/*}
substring
Note that:
comment by #Nahuel Fouilleul
in ${var%%/goo/*} var must be a variable name, and can't be the result of expansion
The line should be divided into two statements, if work with bash.
$ echo $0
bash
$ v="some letters foo/substring/goo/some additional letters"
$ v=${v##*foo/}
$ v=${v%%/goo/*}
$ echo $v
substring
The line I executed in zsh, worked, but just I tested in bash, it didn't work.
$ echo $0
-zsh
$ v="some letters foo/substring/goo/some additional letters"
$ echo ${${v##*foo/}%%/goo/*}
substring
With variable expansion
line='some letters foo/substring/goo/some additional letters'
line=${line%%/goo*} # remove suffix /goo*
line=${line##*foo/} # remove prefix *ffo/
echo "$line"
or bash regular expression
line='some letters foo/substring/goo/some additional letters'
if [[ $line =~ foo/([^/]*)/goo ]]; then
echo "${BASH_REMATCH[1]}"
fi
If you know there are no other / in your "other letters", you can use cut :
> echo "some letters foo/substring/goo/some additional letters" | cut -d'/' -f2
In terms of readability I think awk is a good solution
echo "some letters foo/substring/goo/some additional letters" | awk -v FS="(foo/|/goo)" '{print $2}'

How do I avoid the usage of the "for" loop in this bash function?

I am creating this function to make multiple grep's over every line of a file. I run it as following:
cat file.txt | agrep string1 string2 ... stringN
function agrep () {
for a in $#; do
cmd+=" | grep '$a'";
done ;
while read line ; do
eval "echo "\'"$line"\'" $cmd";
done;
}
The idea is to print every line that contains all the strings: string1, string2, ..., stringN. This already works but I want to avoid the usage of the for to construct the expression:
| grep string1 | grep string2 ... | stringN
And if it's possible, also the usage of eval. I tried to make some expansion as follows:
echo "| grep $"{1..3}
And I get:
| grep $1 | grep $2 | grep $3
This is almost what I want but the problem is that when I try:
echo "| grep $"{1..$#}
The expansion doesn't occur because bash cant expand {1..$#} due to the $#. It just works with numbers. I would like to construct some expansion that works in order to avoid the usage of the for in the agrep function.
agrep () {
if [ $# = 0 ]; then
cat
else
pattern="$1"
shift
grep -e "$pattern" | agrep "$#"
fi
}
Instead of running each multiple greps on each line, just get all the lines that match string1, then pipe that to grep for string2, etc. One way to do this is make agrep recursive.
agrep () {
if (( $# == 0 )); then
cat # With no arguments, just output everything
else
grep "$1" | agrep "${#:2}"
fi
}
It's not the most efficient solution, but it's simple.
(Be sure to note Rob Mayoff's answer, which is the POSIX-compliant version of this.)
awk to the rescue!
you can avoid multiple grep calls and constructing the command by switching to awk
awk -v pat='string1 string2 string3' 'BEGIN{n=split(pat,p)}
{for(i=1;i<=n;i++) if($0!~p[i]) next}1 ' file
enter your space delimited strings as in the example above.
Not building a string for the command is definitely better (see chepner's and Rob Mayoff's answers). However, just as an example, you can avoid the for by using printf:
agrep () {
cmd=$(printf ' | grep %q' "$#")
sh -c "cat $cmd"
}
Using printf also helps somewhat with special characters in the patterns. From help printf:
In addition to the standard format specifications described in printf(1),
printf interprets:
%b expand backslash escape sequences in the corresponding argument
%q quote the argument in a way that can be reused as shell input
%(fmt)T output the date-time string resulting from using FMT as a format
string for strftime(3)
Since the aim of %q is providing output suitable for shell input, this should be safe.
Also: You almost always want to use "$#" with the quotes, not just plain $#.

Batch rename multiple numbers in filename with different padding

I am trying to rename a batch of files of the form:
test1_run1
test1_run2
...
test1_run10
...
test10_run1
test10_run2
...
test10_run10
to the form with multiple paddings. For the first number I need padding with zeros of length 5 and for the second with length 3.
The final result should be of the form:
test00001_run001
test00001_run002
...
test00001_run010
...
test00010_run001
test00010_run002
...
test00010_run010
How can I do this in bash for all the files in a particular folder?
We can convert the string into test + 5 digits + _run + 3 digits formats by saying:
$ awk -F"test" '{split($2,a,"_run"); printf "%s%0.5d%s%0.3d\n", FS, a[1], "_run", a[2]}' a
test00001_run001
test00001_run002
test00001_run010
test00010_run001
test00010_run002
test00010_run010
This works by using test as field separator and splitting the 2nd field in two parts: before and after _run. Then, it uses the printf thingies to get the proper output.
Then, you can print mv together with the previous value and say:
$ awk -F"test" '{split($2,a,"_run"); printf "mv %s %s%0.5d%s%0.3d\n", $0, FS, a[1], "_run", a[2]}' a
mv test1_run1 test00001_run001
mv test1_run2 test00001_run002
mv test1_run10 test00001_run010
mv test10_run1 test00010_run001
mv test10_run2 test00010_run002
mv test10_run10 test00010_run010
If you then pipe it to sh, it will get executed.
If you don't want to use perl or awk, and strictly use bash and some utility programs that are available in most distribution, you can try something like this:
for i in * ; do
testpart=`echo $i | cut -d_ -f1`
testnum=${testpart#test}
runpart=`echo $i | cut -d_ -f2`
runnum=${runpart#run}
destfile=test`printf %05d $testnum`_run`printf %03d $runnum`
mv $i $destfile
done
In bash:
#!/bin/bash
shopt -s nullglob extglob
for file in test+([[:digit:]])_run+([[:digit:]]); do
[[ $file =~ ^test([[:digit:]]+)_run([[:digit:]]+)$ ]]
printf -v newfile 'test_%05d_run%03d' "$((10#${BASH_REMATCH[1]}))" "$((10#${BASH_REMATCH[2]}))"
echo mv "$file" "$newfile"
done
Run this from within the folder you want to process. This will only echo the mv commands to be performed. Remove the echo if you're happy with the result.
we're using the shell option nullglob so that non-matching globs expand to nothing;
we're using the shell option extglob because the for loop will use extended globs;
the extended glob test+([[:digit:]])_run+([[:digit:]]) will expand to the files matching this pattern (if any)
we're using a regex to get the digits from the file names; the first number will be in BASH_REMATCH[1] and the second in BASH_REMATCH[2].
we're using printf to format the new file name; the modifiers %05d and %03d will format the numbers according to your wishes (with appropriate leading zeroes). Observe that we're using ((10#${BASH_REMATCH[1]})) to explicitly specify that the number is in radix 10, in case you have a file test09_run001. The 09 part would make bash misinterpret the number in radix 8 (because of the leading 0) and you'll get a complaint; the -v switch tells printf to not output on standard output, but to store the output in variable newfile;
finally we perform the mv.

Resources