Using pattern in Shell Parameter Expansion - bash

I am reading a page and trying to extract some data from it. I am interested in using bash and after going through few links, i came to know that 'Shell Parameter Expansion' might help however, i am finding difficulty using it in my script. I know that using sed might be easier but just for my knowledge i want to know how can i achieve this in bash.
shopt -s extglob
str='My work</u><br /><span style="color: rgb(34,34,34);"></span><span>abc-X7-27ABC | </span><span style="color: rgb(34,34,34);">build'
echo "${str//<.*>/|}"
I want my output to be like this: My work|abc-X7-27ABC |build
I thought of checking whether it accepts only word instead of pattern and it seems to be working with words.
For instance,
echo "${str//span style/|}" works but
echo "${str//span.*style/|}" doesn't
On the other hand, i saw in one of the link that it does accept pattern. I am confused why it's not working with the patern i am using above.
How to make sed do non-greedy match?
(User konsolebox's solution)

One mistake you're making is by mixing shell globbing and regex. In shell glob dot is taken literally as dot character not as 0 or more of any character.
If you try this code instead:
echo "${str//<*>/|}"
then it will print:
My work|build

This is not an answer, so much as a demonstration of why pattern-matching is not recommended for this kind of HTML editing. I attempted the following.
shopt -s extglob
set +H # Turn off history expansion, if necessary, to allow the !(...) pattern
echo ${str//+(<+(!(>))>)/|}
First: it didn't work, even for a simpler string like str='My work</u><br />bob<foo>build'. Second, for the string in the original question, it appeared to lock up the shell; I suspect such a complex pattern triggers exponential backtracking.
Here's how it's intended to work:
!(>) is any thing other than a single >
+(!(>)) is one or more non-> characters.
<+(!(>))> is one or more non-> characters enclosed in < and >
+(<+(!(>))>) is one or more groups of <...>-enclosed non->s.
My theory is that since !(>) can match a multi-character string as well as a single character, there is a ton of backtracking required.

Related

Bash pattern matching 'or' using shell parameter expansion

hashrate=${line//*:/}
hashrate=${hashrate//H\/s/}
I'm trying to unify this regex replace into a single command, something like:
hashrate=${line//*:\+/H\/s/}
However, this last option doesn't work. I also tried with \|, but it doesn't seem to work and I haven't found anything useful in bash manuals and documentation. I need to use ${} instead of sed, even if using it solves my problem.
The alternation for shell patterns (assuming extended globbing, shopt -s extglob is enabled), is #(pattern|pattern...). For your case:
${line//#(*:|H\/s)}
The trailing / is optional if you just remove a pattern instead of replacing it.
Notice that because of the double slash, //, all occurrences of the patterns will be removed, one at a time. If you used *(...) (see randomir's answer), consecutive patterns would be removed all in one go. Unless you have giant string, the difference should be negligible. (If you have giant strings, you don't want to use globbing anyway, as it's not optimized for this kind of thing.)
If you enable extended globbing (extglob via shopt), you can use the *(pattern1|pattern2|...) operator to match zero or more glob patterns:
hashrate="${line//*(*:|H\/s)/}"

Weird issue when running grep with the --include option

Here is the code at the bash shell. How is the file mask supposed to be specified, if not this way? I expected both commands to find the search expression, but it's not happening. In this example, I know in advance that I prefer to restrict the search to python source code files only, because unqualified searches are silly time wasters.
So, this works as expected:
grep -rni '/home/ga/projects' -e 'def Pr(x,u,v)'
/home/ga/projects/anom/anom.py:27:def Pr(x,u,v): blah, blah, ...
but this won't work:
grep --include=\*.{py} -rni '/home/ga/projects' -e 'def Pr(x,u,v)'
I'm using GNU grep version 2.16.
--include=\*.{py} looks like a broken attempt to use brace expansion (an unquoted {...} expression).
However, for brace expansion
to occur in bash (and ksh and zsh), you must either have:
a list of at least 2 items, separated with ,; e.g. {py,txt}, which expands to 2 arguments, py and txt.
or, a range of items formed from two end points, separated with ..; e.g., {1..3}, which expands to 3 arguments, 1, 2, and 3.
Thus, with a single item, simply do not use brace expansion:
--include=\*.py
If you did have multiple extensions to consider, e.g., *.py as well as *.pyc files, here's a robust form that illustrates the underlying shell features:
'--include=*.'{py,pyc}
Here:
Brace expansion is applied, because {...} contains a 2-item list.
Since the {...} directly follows the literal (single-quoted) string --include=*., the results of the brace expansion include the literal part.
Therefore, 2 arguments are ultimately passed to grep, with the following literal content:
--include=*.py
--include=*.pyc
Your command fails because of the braces '{}'. It will search for it in the file name. You can create a file such as 'myscript.{py}' to convince yourself. You'll see it will appear in the results.
The correct option parameter would be '*.py' or the equivalent \*.py. Either way will protect it from being (mis)interpreted by the shell.
On the other side, I can only advise to use the command find for such jobs :
find /home/ga/projects -regex '.*\.py$' -exec grep -e "def Pr(x,u,v)" {} +
That will protect you from hard to understand shell behaviour.
Try like this (using quotes to be safe; also better readability than backslash escaping IMHO):
grep --include='*.py' ...
your \*.{py} brace expansion usage isn't supported at all by grep. Please see the comments below for the full investigation regarding this. For the record, blame this answer for the resulting brace wars ;)
By the way, the brace expansion works generally fine in Bash. See mklement0 answer for more details.
Ack. As an alternative, you might consider switching to ack instead from now on. It's a tool just like grep, but fully optimized for programmers.
It's a great fit for what you are doing. A nice quote about it:
Every once in a while something comes along that improves an idea so much, you can't ignore it. Such a thing is ack, the grep replacement.

confusing case of a bash completion script

I'm having trouble understanding what the following code does in a bash completion script:
case "$last" in
+\(--import|-i\))
_filedir '+(txt|html)';;
When is that case ever met? I thought the second line above would be something like
--import|-i)
which does make sense to me. I grepped my bash_completion.d directory for '+\\(' but that one was the only one that came up so I guess it's not that common.
This code is indeed puzzling without context. As it is, it matches two literal strings -
$ case "+(--import" in +\(--import|-i\)) echo match ;; esac
match
$ case "-i)" in +\(--import|-i\)) echo match ;; esac
match
It looks similar to the extended glob pattern +(--import|-i), but in this form it's neither a match for the literal pattern (would need to escape the pipe) nor the actual pattern (would need to unescape the parentheses). I'd guess "bug", but bash completion is a minefield of crazy metaprogramming, so it's impossible to say without seeing the entire script.
From bash(1)
If the extglob shell option is enabled using the shopt builtin,
several extended pattern matching operators are recognized. In the
following description, a pattern-list is a list of one or more
patterns separated by a |. Composite patterns may be formed using one
or more of the following sub-patterns:
[...]
+(pattern-list)
Matches one or more occurrences of the given patterns

11*(...) as a bash parameter without quotation marks

I'm trying to write a small piece of code that passes a small formula to another program, however i've found that something strange happens when the formula starts with 11*(:
$ echo 11*15
Neatly prints '11*15'
$ echo 21*(15)
Neatly prints '21*(15)', while
echo 11*(15)
Only gives '11'. As far as I've found this only happens with '11*('. I know that this can be solved by using proper quotation marks, but I'm still curious as to why this happens.
Does anyone know?
How is your program coded? If its coded to take in parameters, then pass your formula like
./myprogram "11*15"
or
echo '11*15' | myprogram
If you do echo just like that on the command line, you may inadvertently display files that has 11 in its file name
11*(15) uses a Bash-specific extended glob syntax. You've stumbled across it accidentally, emphasizing why quotation marks are a good idea. (I also learned a lot tracking down why it was working differently for me; thanks for that.)
The behavior of
echo 11*(15)
in bash is going to vary depending on whether extglob is enabled. If it's enabled *(PATTERN-LIST) matches zero or more occurrences of the patterns. If it's disabled, it doesn't, and the resulting ( is likely to cause a syntax error.
For example:
$ ls
11 115 1155 11555 115555
$ shopt -u extglob
$ echo 11*(55)
bash: syntax error near unexpected token `('
$ shopt -s extglob
$ echo 11*(55)
11 1155 115555
$
(This explains the odd behavior I discussed in comments.)
Quoting from the bash 4.2.8 documentation (info bash):
If the `extglob' shell option is enabled using the `shopt' builtin,
several extended pattern matching operators are recognized. In the
following description, a PATTERN-LIST is a list of one or more
patterns separated by a `|'. Composite patterns may be formed using
one or more of the following sub-patterns:
`?(PATTERN-LIST)'
Matches zero or one occurrence of the given patterns.
`*(PATTERN-LIST)'
Matches zero or more occurrences of the given patterns.
`+(PATTERN-LIST)'
Matches one or more occurrences of the given patterns.
`#(PATTERN-LIST)'
Matches one of the given patterns.
`!(PATTERN-LIST)'
Matches anything except one of the given patterns.

Search and replace in Shell

I am writing a shell (bash) script and I'm trying to figure out an easy way to accomplish a simple task.
I have some string in a variable.
I don't know if this is relevant, but it can contain spaces, newlines, because actually this string is the content of a whole text file.
I want to replace the last occurence of a certain substring with something else.
Perhaps I could use a regexp for that, but there are two moments that confuse me:
I need to match from the end, not from the start
the substring that I want to scan for is fixed, not variable.
for truncating at the start: ${var#pattern}
truncating at the end ${var%pattern}
${var/pattern/repl} for general replacement
the patterns are 'filename' style expansion, and the last one can be prefixed with # or % to match only at the start or end (respectively)
it's all in the (long) bash manpage. check the "Parameter Expansion" chapter.
amn expression like this
s/match string here$/new string/
should do the trick - s is for sustitute, / break up the command, and the $ is the end of line marker. You can try this in vi to see if it does what you need.
I would look up the man pages for awk or sed.
Javier's answer is shell specific and won't work in all shells.
The sed answers that MrTelly and epochwolf alluded to are incomplete and should look something like this:
MyString="stuff ttto be edittted"
NewString=`echo $MyString | sed -e 's/\(.*\)ttt\(.*\)/\1xxx\2/'`
The reason this works without having to use the $ to mark the end is that the first '.*' is greedy and will attempt to gather up as much as possible while allowing the rest of the regular expression to be true.
This sed command should work fine in any shell context used.
Usually when I get stuck with Sed I use this page,
http://sed.sourceforge.net/sed1line.txt

Resources