Linux: shell builtin string matching - bash

I am trying to become more familiar with using the builtin string matching stuff available in shells in linux. I came across this guys posting, and he showed an example
a="abc|def"
echo ${a#*|} # will yield "def"
echo ${a%|*} # will yield "abc"
I tried it out and it does what its advertised to do, but I don't understand what the $,{},#,*,| are doing, I tried looking for some reference online or in the manuals but I couldn't find anything. Can anyone explain to me what's going on here?

This article in the Linux Journal says that the # operator deletes the shortest possible match on the left, while the % operator deletes the shortest possible match on the right.
So ${a#*|} returns everything after the |, and ${a%|*} returns everything before the |.
If you had a situation that called for greedy matching, you'd use ## or %%.

Take a look at this.
${string%substring}
Deletes shortest match of $substring
from back of $string.
${string#substring}
Deletes shortest match of $substring
from front of $string.
EDIT:
I don't understand what the $,{},#,*,|
are doing
I recommend reading this

Typically, ${somename} will substitute the contents of a defined parameter:
mystring="1234567"
echo ${mystring} # produces '1234567'
The % and # symbols are allowing you to add commands that modify the default behavior.
The asterisk '*' is a wildcard; while the pipe '|' is simply a matching character. Let me do the same thing using the matching character of '4'.
mystring="1234567"
echo ${mystring#*4} # produces '567'

Those features and other similarly useful ones are documented in the Shell Parameter Expansion section of the Bash Reference Manual. Here's another really good reference.

Related

What does this variable assignment do?

I'm having to code a subversion hook script, and I found a few examples online, mostly python and perl. I found one or two shell scripts (bash) as well. I am confused by a line and am sorry this is so basic a question.
FILTER=".(sh|SH|exe|EXE|bat|BAT)$"
The script later uses this to perform a test, such as (assume EXT=ex):
if [[ "$FILTER" == *"$EXT"* ]]; then blah
My problem is the above test is true. However, I'm not asking you to assist in writing the script, just explaining the initial assignment of FILTER. I don't understand that line.
Editing in a closer example FILTER line. Of course the script, as written does not work, because 'ex' returns true, and not just 'exe'. My problem here is only, however, that I don't understant the layout of the variable assignment itself.
Why is there a period at the beginning? ".(sh..."
Why is there a dollar sign at the end? "...BAT)$"
Why are there pipes between each pattern? "sh|SH|exe"
You probably looking for something as next:
FILTER="\.(sh|SH|exe|EXE|bat|BAT)$"
for EXT
do
if [[ "$EXT" =~ $FILTER ]];
then
echo $EXT extension disallowed
else
echo $EXT is allowed
fi
done
save it to myscript.sh and run it as
myscript.sh bash ba.sh
and will get
bash is allowed
ba.sh extension disallowed
If you don't escape the "dot", e.g. with the FILTER=".(sh|SH|exe|EXE|bat|BAT)$" you will get
bash extension disallowed
ba.sh extension disallowed
What is (of course) wrong.
For the questions:
Why is there a period at the beginning? ".(sh..."
Because you want match .sh (as extension) and not for example bash (without the dot). And therefore the . must be escaped, like \. because the . in regex mean "any character.
Why is there a dollar sign at the end? "...BAT)$"
The $ mean = end of string. You want match file.sh and not file.sh.jpg. The .sh should be at the end of string.
Why are there pipes between each pattern? "sh|SH|exe"
In the rexex, the (...|...|...) construction delimites the "alternatives". As you sure quessed.
You really need read some "regex tutorial" - it is more complicated - and can't be explained in one answer.
Ps: NEVER use UPPERCASE variable names, they can collide with environment variables.
This just assigns a string to FILTER; the contents of that string have no special meaning. When you try to match it against the pattern *ex*, the result is true assuming that the value of $FILTER consists the string ex surrounded by anything on either side. This is true; ex is a substring of exe.
FILTER=".(sh|SH|exe|EXE|bat|BAT)$"
^^
|
+---- here is the "ex" from the pattern.
As I can this is similar to regular expression pattern:
In regular expressions the string start with can be show with ^, similarly in this case . represent seems doing that.
In the bracket you have exact string, which represents what the exact file extensions would be matched, they are 'Or' by using the '|'.
And at the end the expression should only pick the string will '$' or end point and not more than.
I would say that way original author might have looked at it and implemented it.

Using pattern in Shell Parameter Expansion

I am reading a page and trying to extract some data from it. I am interested in using bash and after going through few links, i came to know that 'Shell Parameter Expansion' might help however, i am finding difficulty using it in my script. I know that using sed might be easier but just for my knowledge i want to know how can i achieve this in bash.
shopt -s extglob
str='My work</u><br /><span style="color: rgb(34,34,34);"></span><span>abc-X7-27ABC | </span><span style="color: rgb(34,34,34);">build'
echo "${str//<.*>/|}"
I want my output to be like this: My work|abc-X7-27ABC |build
I thought of checking whether it accepts only word instead of pattern and it seems to be working with words.
For instance,
echo "${str//span style/|}" works but
echo "${str//span.*style/|}" doesn't
On the other hand, i saw in one of the link that it does accept pattern. I am confused why it's not working with the patern i am using above.
How to make sed do non-greedy match?
(User konsolebox's solution)
One mistake you're making is by mixing shell globbing and regex. In shell glob dot is taken literally as dot character not as 0 or more of any character.
If you try this code instead:
echo "${str//<*>/|}"
then it will print:
My work|build
This is not an answer, so much as a demonstration of why pattern-matching is not recommended for this kind of HTML editing. I attempted the following.
shopt -s extglob
set +H # Turn off history expansion, if necessary, to allow the !(...) pattern
echo ${str//+(<+(!(>))>)/|}
First: it didn't work, even for a simpler string like str='My work</u><br />bob<foo>build'. Second, for the string in the original question, it appeared to lock up the shell; I suspect such a complex pattern triggers exponential backtracking.
Here's how it's intended to work:
!(>) is any thing other than a single >
+(!(>)) is one or more non-> characters.
<+(!(>))> is one or more non-> characters enclosed in < and >
+(<+(!(>))>) is one or more groups of <...>-enclosed non->s.
My theory is that since !(>) can match a multi-character string as well as a single character, there is a ton of backtracking required.

what does ## mean inside ${}

I am reading a shell scripts from github :script
It has two lines of code confused me. I have never seen ## used in bash like this before.
could anyone explain this to me, how does it work? thanks.
branch_name=$(git symbolic-ref -q HEAD)
branch_name=${branch_name##refs/heads/}
Note:The first line produces something like 'refs/heads/master'
and the next line remove the leading refs/heads make the branch_name becomes master.
From the bash(1) man page, EXPANSION section, Parameter Expansion subsection:
${parameter#word}
${parameter##word}
Remove matching prefix pattern. The word is expanded to produce
a pattern just as in pathname expansion. If the pattern matches
the beginning of the value of parameter, then the result of the
expansion is the expanded value of parameter with the shortest
matching pattern (the ``#'' case) or the longest matching pat‐
tern (the ``##'' case) deleted.
Also available in the manual, of course (but it doesn't seem to support linking to this exact text; search the page for ##).
Have a look here where a lot other string manipulation tricks are described. In short
${string##substring}
Deletes longest match of $substring from front of $string.

Caseinsensitive Parameter Expansion

my problem is, I try to strip a string at a start of a variable. I have done shopt -s exglob to get extended pattern matching.
a="HelloDolly"
echo "${a#[A-Z]+([a-z])}"
I thought that +([a-z]) mean as much lower case letter as possible. And that [A-Z]+([a-z]) should match Hello
should return Dolly but I get lloDolly back. If give / instead # a try
echo "${a/[A-Z]+([a-z])}"
I get back nothing. Looks like the Parameter Expansions is caseinsensitive.
Thanks everybody who could give me an hint.
Using a single #, you get the shortest possible match. "He" is the shortest possible match of one uppercase letter and one or more lowercase letters. Switch to double # to get the longest possible match "Hello"
echo "${a##[A-Z]+([a-z])}"
To avoid issues with locale-based interpretation of character ranges, use character classes instead:
echo "${a##[[:upper:]]+([[:lower:]])}"

Search and replace in Shell

I am writing a shell (bash) script and I'm trying to figure out an easy way to accomplish a simple task.
I have some string in a variable.
I don't know if this is relevant, but it can contain spaces, newlines, because actually this string is the content of a whole text file.
I want to replace the last occurence of a certain substring with something else.
Perhaps I could use a regexp for that, but there are two moments that confuse me:
I need to match from the end, not from the start
the substring that I want to scan for is fixed, not variable.
for truncating at the start: ${var#pattern}
truncating at the end ${var%pattern}
${var/pattern/repl} for general replacement
the patterns are 'filename' style expansion, and the last one can be prefixed with # or % to match only at the start or end (respectively)
it's all in the (long) bash manpage. check the "Parameter Expansion" chapter.
amn expression like this
s/match string here$/new string/
should do the trick - s is for sustitute, / break up the command, and the $ is the end of line marker. You can try this in vi to see if it does what you need.
I would look up the man pages for awk or sed.
Javier's answer is shell specific and won't work in all shells.
The sed answers that MrTelly and epochwolf alluded to are incomplete and should look something like this:
MyString="stuff ttto be edittted"
NewString=`echo $MyString | sed -e 's/\(.*\)ttt\(.*\)/\1xxx\2/'`
The reason this works without having to use the $ to mark the end is that the first '.*' is greedy and will attempt to gather up as much as possible while allowing the rest of the regular expression to be true.
This sed command should work fine in any shell context used.
Usually when I get stuck with Sed I use this page,
http://sed.sourceforge.net/sed1line.txt

Resources