bash pattern substitution to remove an arbitrary long sequence of letters - bash

My script deals about filenames which are padded by the letter x to a certain length, so a file may be abcdxxxxxx or fooxxxxxxx. I have the filename stored in a variable fn, and I want to extract just the "stem", i.e. abcd or foo.
I obviously can do this by forking a sed or tr process and feed the file name into it, but bash also has a feature called pattern substitution for variables, and I was wondering whether this could be used.
From the bash man page:
${parameter/pattern/string}
Pattern substitution. The pattern is expanded to produce a pattern just as in pathname expansion. Parameter is expanded and the longest match of pattern against its value is replaced with string. If pattern begins with /, all matches of pattern are replaced with string. Normally only the first match is replaced.... If pattern begins with %, it must match at the end of the expanded value of parameter.
Now, a pattern denoting the letter x is just x, and since the pattern should match at the end, I need %x
echo ${fn/%x/}
indeed return the filename with the last x removed. But I want to have all x removed, i.e. all occurences of the pattern, which requires according to the man-page that the pattern starts with a slash. I understand this to turn %x into either /%x or %/x However, neither echo ${fn//%x/} nor echo ${fn/%/x/} produce the expected result.
Did I misunderstand something in the description of pattern substitution?

Regarding the substring replacements (/, //, /%, /#). Towards the end in here here:
${var/Pattern/Replacement}
First match of Pattern, within var replaced with Replacement.
${var//Pattern/Replacement}
Global replacement. All matches of Pattern, within var replaced with Replacement.
${var/#Pattern/Replacement}
If prefix of var matches Pattern, then substitute Replacement for Pattern.
${var/%Pattern/Replacement}
If suffix of var matches Pattern, then substitute Replacement for Pattern.
So, it's first match, all matches, prefix string or suffix string and as with globbing you can't x* in the sense of regular expressions, you are left with options described in the other answers.

Try:
echo "${fn%${fn##*[^x]}}"
Examples
$ fn=abcdxxxxxx; echo "${fn%${fn##*[^x]}}"
abcd
$ fn=fooxxxxxxx; echo "${fn%${fn##*[^x]}}"
foo
How it works
For starters, ${parameter##word} is prefix removal. It removes word from the beginning of parameter. In our cvase, ${fn##*[^x]} is the file with everything removed from the front up to an including the last character that is not x. This leaves only the trailing x's. For example:
$ fn=abcdxxxxxx; echo "${fn##*[^x]}"
xxxxxx
${parameter%%word} is suffix removal. It removes word from the end of $parameter. In our case, we want to removes trailing x's (as found above) from $fn. Thus we want ${fn%${fn##*[^x]}}.

Doubling the percent sign will do what you want:
echo "${fn%%x*}"
"Remove, from the end of the string, x and all the characters that follow it"
Or you can use extended globs:
shopt -s extglob
echo "${fn/%+(x)/}"
"Replace, at the end of the string, a sequence of one or more x's with nothing"

Assuming you have the filename in the environment variable fn, then in bash you can do:
if [[ $fn =~ x+$ ]]; then
echo ${fn%$BASH_REMATCH}
fi
This will print the filename with the matched part removed. If you want it to work also when there are no x:es at the end of the filename, replace x+$ with x*$ above, in which case it will always match.
As for the pattern substitution, my guess is it will only attempt the replace matches in the string once at a given location even if you add the / to replace all matches. So when it matches the last x at the end of the string, it will not go back to an earlier location in the string to see if it matches again. Basically this means you cannot combine % and /. If my guess is correct, that is :)

Related

What does "output_dir="${1%/}" mean in .sh file?

I have not seen the usage like this.Anyone can provide relevant information? The source code im2txt
See the bash manual:
${parameter%word}
${parameter%%word}
The word is expanded to produce a pattern and matched according to the rules described below (see Pattern Matching). If the pattern matches a trailing portion of the expanded value of parameter, then the result of the expansion is the value of parameter with the shortest matching pattern (the ‘%’ case) or the longest matching pattern (the ‘%%’ case) deleted. [...]
I emphasized the relevant alternative. The parameter in question is $1, i.e. the first command line argument the script was called with. The pattern is a simple / which will be removed if present. In other words, the expansion removes an optional trailing slash.
Demonstration (the y case shows that it's just a trailing pattern, z demonstrates no match):
$ x=aaa/; y=aaa/bbb; z=aaa; echo "$x -> ${x%/}"; echo "$y -> ${y%/}"; echo "$z -> ${z%/}"
aaa/ -> aaa
aaa/bbb -> aaa/bbb
aaa -> aaa
It basically removes the last "/" character from the ending of the first string received as a parameter of the script in cause.
If you had "/home/users/" as a string, then output_dir would become "/home/users"
You can find more details on string manipulation in bash here.

How can i find the first letter in a line[A to Z] of a file when the line is full of spaces

I have a file with 5 empty spaces and then a letter, how can i show the position of the first letter in a line?
Something like
" version"
And it should return me 5 since it starts on the 5th position
In bash you can do it relatively easily with parameter expansion that trims all non-space characters from the right, leaving only the spaces which you store in a variable and then use the length parameter expansion to get the count, e.g.
$ line=" version"; nspaces="${line%%[![:space:]]*}"; echo "${#nspaces} spaces"
5 spaces
Explanation
nspaces="${line%%[![:space:]]*}" which uses the parameter expansion form ${parameter%%word} with the inverted POSIX character-class [:space:] to trim all non-space characters from the right leaving only leading spaces;
echo "${#nspaces} spaces" simply uses the ${#parameter} form to obtain the length of spaces in nspaces.
Let me know if that is what you were looking for and if you have any questions.

bash shell reworking variable replace dots by underscore

I can't see to get it working :
echo $VERSIONNUMBER
i get : v0.9.3-beta
VERSIONNUMBERNAME=${VERSIONNUMBER:1}
echo $VERSIONNUMBERNAME
I get : 0.9.3-beta
VERSION=${VERSIONNUMBERNAME/./_}
echo $VERSION
I get : 0_9.3-beta
I want to have : 0_9_3-beta
I've been googling my brains out I can't make heads or tails of it.
Ideally I'd like to remove the v and replace the periods with underscores in one line.
Let's create your variables:
$ VERSIONNUMBER=v0.9.3-beta
$ VERSIONNUMBERNAME=${VERSIONNUMBER:1}
This form only replaces the first occurrence of .:
$ echo "${VERSIONNUMBERNAME/./_}"
0_9.3-beta
To replace all occurrences of ., use:
$ echo "${VERSIONNUMBERNAME//./_}"
0_9_3-beta
Because this approach avoids the creation of pipelines and subshells and the use of external executables, this approach is efficient. This approach is also unicode-safe.
Documentation
From man bash:
${parameter/pattern/string}
Pattern substitution. The pattern is expanded to produce a pattern
just as in pathname expansion. Parameter is expanded and the longest
match of pattern against its value is replaced with
string. If pattern begins with /, all matches of pattern are replaced
with string. Normally only the first match is replaced. If
pattern begins with #, it must match at the beginning of the expanded
value of parameter. If pattern begins with %, it must match at the
end of the expanded value of parameter. If string
is null, matches of pattern are deleted and the / following pattern
may be omitted. If the nocasematch shell option is enabled, the
match is performed without regard to the case of alphabetic
characters. If parameter is # or *, the substitution operation is
applied to each positional parameter in turn, and the
expansion is the resultant list. If parameter is an array variable
subscripted with # or *, the substitution operation is
applied to each member of the array in turn, and the expansion is the
resultant list.
(Emphasis added.)
You can combine pattern substitution with tr:
VERSION=$( echo ${VERSIONNUMBER:1} | tr '.' '_' )

How to rename multiple files?

In a folder I have several files with the following name-structure (I write just three examples):
F_001_4837_blabla1.doc
F_045_8987_blabla2.doc
F_168_9092_blabla3.doc
What I would do is to use a BASH command to rename all the files in my folder by deleting the first underscore and the series of zeros before the first number code obtaining:
F1_4837_blabla1.doc
F45_8987_blabla2.doc
F168_9092_blabla3.doc
shopt -s extglob
for f in *; do
echo "$f: ${f/_*(0)/}"
# mv "$f" "${f/_*(0)/}" # for the actual rename
done
output
F_001_4837_blabla1.doc: F1_4837_blabla1.doc
F_045_8987_blabla2.doc: F45_8987_blabla2.doc
F_168_9092_blabla3.doc: F168_9092_blabla3.doc
Parameter Expansion
Parameter expansion can be used to replace the content of a variable. In this case, we replace the pattern _*(0) with nothing.
${parameter/pattern/string}
Pattern substitution. The pattern is expanded to produce a pat-
tern just as in pathname expansion. Parameter is expanded and
the longest match of pattern against its value is replaced with
string. If pattern begins with /, all matches of pattern are
replaced with string. Normally only the first match is
replaced. If pattern begins with #, it must match at the begin-
ning of the expanded value of parameter. If pattern begins with
%, it must match at the end of the expanded value of parameter.
If string is null, matches of pattern are deleted and the / fol-
lowing pattern may be omitted. If parameter is # or *, the sub-
stitution operation is applied to each positional parameter in
turn, and the expansion is the resultant list. If parameter is
an array variable subscripted with # or *, the substitution
operation is applied to each member of the array in turn, and
the expansion is the resultant list.
Extended pattern matching
Extended pattern matching allows us to use the pattern *(0) to match zero or more 0 characters. It needs to be enabled using the extglob setting.
If the extglob shell option is enabled using the shopt builtin, several
extended pattern matching operators are recognized. In the following
description, a pattern-list is a list of one or more patterns separated
by a |. Composite patterns may be formed using one or more of the fol-
lowing sub-patterns:
?(pattern-list)
Matches zero or one occurrence of the given patterns
*(pattern-list)
Matches zero or more occurrences of the given patterns
+(pattern-list)
Matches one or more occurrences of the given patterns
#(pattern-list)
Matches one of the given patterns
!(pattern-list)
Matches anything except one of the given patterns

Bash last index of

Sorry for the lame bash question, but I can't seem to be able to work it out.
I have the following simple case:
I have variable like artifact-1.2.3.zip
I would like to get a sub-string between the hyphen and the last index of the dot (both exclusive).
My bash skill are not too strong. I have the following:
a="artifact-1.2.3.zip"; b="-"; echo ${a:$(( $(expr index "$a" "$b" + 1) - $(expr length "$b") ))}
Producing:
1.2.3.zip
How do I remove the .zip part as well?
The bash man page section titled "Variable Substitution" describes using ${var#pattern}, ${var##pattern}, ${var%pattern}, and ${var%%pattern}.
Assuming that you have a variable called filename, e.g.,
filename="artifact-1.2.3.zip"
then, the following are pattern-based extractions:
% echo "${filename%-*}"
artifact
% echo "${filename##*-}"
1.2.3.zip
Why did I use ## instead of #?
If the filename could possibly contain dashes within, such as:
filename="multiple-part-name-1.2.3.zip"
then compare the two following substitutions:
% echo "${filename#*-}"
part-name-1.2.3.zip
% echo "${filename##*-}"
1.2.3.zip
Once having extracted the version and extension, to isolate the version, use:
% verext="${filename##*-}"
% ver="${verext%.*}"
% ext="${verext##*.}"
% echo $ver
1.2.3
% echo $ext
zip
$ a="artifact-1.2.3.zip"; a="${a#*-}"; echo "${a%.*}"
‘#pattern’ removes pattern so long as it matches the beginning of $a.
The syntax of pattern is similar to that used in filename matching.
In our case,
* is any sequence of characters.
- means a literal dash.
Thus #*- matches everything up to, and including, the first dash.
Thus ${a#*-} expands to whatever $a would expand to,
except that artifact- is removed from the expansion,
leaving us with 1.2.3.zip.
Similarly, ‘%pattern’ removes pattern so long as it matches the end of the expansion.
In our case,
. a literal dot.
* any sequence of characters.
Thus %.* is everything including the last dot up to the end of the string.
Thus if $a expands to 1.2.3.zip,
then ${a%.*} expands to 1.2.3.
Job done.
The man page content for this is as follows (at least on my machine, YMMV):
${parameter#word}
${parameter##word}
The word is expanded to produce a pattern just as in pathname
expansion. If the pattern matches the beginning of the value of
parameter, then the result of the expansion is the expanded
value of parameter with the shortest matching pattern (the ``#''
case) or the longest matching pattern (the ``##'' case) deleted.
If parameter is # or *, the pattern removal operation is applied
to each positional parameter in turn, and the expansion is the
resultant list. If parameter is an array variable subscripted
with # or *, the pattern removal operation is applied to each
member of the array in turn, and the expansion is the resultant
list.
${parameter%word}
${parameter%%word}
The word is expanded to produce a pattern just as in pathname
expansion. If the pattern matches a trailing portion of the
expanded value of parameter, then the result of the expansion is
the expanded value of parameter with the shortest matching pat-
tern (the ``%'' case) or the longest matching pattern (the
``%%'' case) deleted. If parameter is # or *, the pattern
removal operation is applied to each positional parameter in
turn, and the expansion is the resultant list. If parameter is
an array variable subscripted with # or *, the pattern removal
operation is applied to each member of the array in turn, and
the expansion is the resultant list.
HTH!
EDIT
Kudos to #x4d for the detailed answer.
Still think people should RTFM though.
If they don't understand the manual,
then post another question.
Using Bash RegEx feature:
>str="artifact-1.2.3.zip"
[[ "$str" =~ -(.*)\.[^.]*$ ]] && echo ${BASH_REMATCH[1]}
I think you can do this:
string=${a="artifact-1.2.3.zip"; b="-"; echo ${a:$(( $(expr index "$a" "$b" + 1) - $(expr length "$b") ))}}
substring=${string:0:4}
The last step removes the last 4 characters from the string. There's some more info on here.

Resources