For the following variable:
var="/path/to/my/document-001_extra.txt"
i need only the parts between the / [slash] and the _ [underscore].
Also, the - [dash] needs to be stripped.
In other words: document 001
This is what I have so far:
var="${var##*/}"
var="${var%_*}"
var="${var/-/ }"
which works fine, but I'm looking for a more compact substitution pattern that would spare me the triple var=...
Use of sed, awk, cut, etc. would perhaps make more sense for this, but I'm looking for a pure bash solution.
Needs to work under GNU bash, version 3.2.51(1)-release
After editing your question to talk about patterns instead of regular expressions, I'll now show you how to actually use regular expressions in bash :)
[[ $var =~ ^.*/(.*)-(.*)_ ]] && var="${BASH_REMATCH[#]:1:2}"
Parameter expansions like you were using previously unfortunately cannot be nested in bash (unless you use ill-advised eval hacks, and even then it will be less clear than the line above).
The =~ operator performs a match between the string on the left and the regular expression on the right. Parentheses in the regular expression define match groups. If a match is successful, the exit status of [[ ... ]] is zero, and so the code following the && is executed. (Reminder: don't confuse the "0=success, non-zero=failure" convention of process exit statuses with the common Boolean convention of "0=false, 1=true".)
BASH_REMATCH is an array parameter that bash sets following a successful regular-expression match. The first element of the array contains the full text matched by the regular expression; each of the following elements contains the contents of the corresponding capture group.
The ${foo[#]:x:y} parameter expansion produces y elements of the array, starting with index x. In this case, it's just a short way of writing ${BASH_REMATCH[1]} ${BASH_REMATCH[2]}. (Also, while var=${BASH_REMATCH[*]:1:2} would have worked as well, I tend to use # anyway to reinforce the fact that you almost always want to use # instead of * in other contexts.)
Both of the following should work correctly. Though the second is sensitive to misplaced characters (if you have a / or - after the last _ it will fail).
var=$(IFS=_ read s _ <<<"$var"; IFS=-; echo ${s##*/})
var=$(IFS=/-_; a=($var); echo "${a[#]:${#a[#]} - 3:2}")
Related
I read in the release notes of Bash 5.1:
p. BASH_REMATCH is no longer readonly.
As explained in the Bash Reference Manual:
The array variable BASH_REMATCH records which parts of the string matched the pattern. The element of BASH_REMATCH with index 0 contains the portion of the string matching the entire regular expression. Substrings matched by parenthesized subexpressions within the regular expression are saved in the remaining BASH_REMATCH indices. The element of BASH_REMATCH with index n is the portion of the string matching the nth parenthesized subexpression.
And yes, it is very useful to access the matches of a regular expression:
$ DESERT=pie-cake_berry_cream-sirup
$ [[ $DESERT =~ _(.*)_ ]] && echo "${BASH_REMATCH[1]}"
berry
However, I cannot see what is the use of this news in Bash 5.1. That is, what is the point of BASH_REMATCH not being readonly?
In this bug-bash mailing list thread, the maintainer of bashdb (a Bash debugger) explains that the value of BASH_REMATCH might change inside a debug hook, and since the variable is read-only, resetting it happens by running the command that initially set it again, which is tricky and fragile:
Current in bash 5.0 and earlier, the value of BASH_REMATCH might chanted inside a debug hook.
Since BASH_REMATCH is read-only, resetting the value on hook return to the debugged program is a bit tricky and fragile...
[...]
The way that bashdb currently resets BASH_REMATCH is to reissue the command that caused the value to get initially set. That is fragile since this set on exit between stepping from the time BASH_REMATCH was set until the time it is last used. In between variables used in the regular expression may have changed.
[...]
Restoring it is just as tricky. As I hope you see all of this is a bit fragile.
In the response, Chet offers to make it not read-only:
How about we just make it not read-only? The shell will still set it when it does regexp matching.
To rephrase - I want to use Bash command substitution and string substitution in the same line.
My actual commands are longer, but the ridiculous use of echo here is just a "substitution" for shortness and acts the same - with same errors ;)
I know we can use a Bash command to produce it's output string as a parameter for another command like this:
echo "$(echo "aahahah</ddd>")"
aahahah</ddd>
I also know we can remove last known part of a string like this:
var="aahahah</ddd>"; echo "${var%</ddd>}"
aahahah
I am trying to write a command where one command gives a string output, where I want to remove last part, which is known.
echo "${$(echo "aahahah</ddd>")%</ddd>}"
-bash: ${$(echo "aahahah</ddd>")%</ddd>}: bad substitution
It might be the order of things happening or substitution only works on variables or hardcoded strings. But I suspect just me missing something and it is possible.
How do I make it work?
Why doesn't it work?
When a dollar sign as in $word or equivalently ${word} is used, it asks for word's content. This is called parameter expansion, as per man bash.
You may write var="aahahah</ddd>"; echo "${var%</ddd>}": That expands var and performs a special suffix operation before returning the value.
However, you may not write echo "${$(echo "aahahah</ddd>")%</ddd>}" because there is nothing to expand once $(echo "aahahah</ddd>") is evaluated.
From man bash (my emphasis):
${parameter%word}
Remove matching suffix pattern. The word is expanded to produce a
pattern just as in pathname expansion. If the pattern
matches a trailing portion of the expanded value of parameter, then
the result of the expansion is the expanded value of parameter
with the shortest matching pattern (the ''%'' case) or the longest matching pattern (the ''%%'' case) deleted.
Combine your commands like this
var=$(echo "aahahah</ddd>")
echo ${var/'</ddd>'}
Am trying to print a bunch of strings in a script (in zsh) and it doesn't seem to work. The code would work if I place the array in a variable and use it instead. Any ideas why this doesn't work otherwise?
for string in (some random strings to print) ; echo $string
The default form of the for command in zsh does not use parentheses (if there are any they are not interpreted as part of the for statement):
for string in some random strings to show
do
echo _$string
done
This results in the following output:
_some
_random
_strings
_to
_show
So, echo _$string was run for each word after in. The list ends with the newline.
It is possible to write the whole statement in a single line:
for string in some random strings to show; do echo _$string; done
As usual when putting multiple shell commands in the same line, newlines just need to be replaced by ;. The exception here is the newline after do; while zsh allows a ; to be placed after do, it is usually not done, and in bash it would be a syntax error.
There are also several short forms available for for, all of which are equivalent to the default form above and produce the same output:
for single commands (to be exact: single pipelines or multiple pipelines linked with && or ||, where a pipeline can also be just a single command), there are two options:
the default form, just without do or done:
for string in some random strings to show ; echo _$string
without in but with parentheses, also without do or done
for string (some random strings to show) ; echo _$string
for a list of commands (like in the default form), foreach instead of for, no in, with parentheses and terminated by end:
foreach string (some random strings to show) echo _$string ; end
In your case, you mixed the two short forms for single commands. Due to the presence of in, zsh did not take the parentheses as a syntactic element of the for command. Instead they are interpreted as a glob qualifier. Aside from the fact that you did not intend any filename expansions, this fails for two reasons:
there is no pattern (with or without actual globs) before the glob qualifier. So any matching filename would have to exactly match an empty string, which is just not possible
but mainly "some random strings to print" is not a valid glob qualifier. You probably get an error like "zsh: unknown file attribute: i" (at least with zsh 5.0.5, it may depend on the zsh version).
Check the zsh forloop documentation:
for x (1 2 3); do echo $x; done
for x in 1 2 3; do echo $x; done
You are probably trying to do this:
for string in some random strings to print ;do
echo $string
done
I'm having to code a subversion hook script, and I found a few examples online, mostly python and perl. I found one or two shell scripts (bash) as well. I am confused by a line and am sorry this is so basic a question.
FILTER=".(sh|SH|exe|EXE|bat|BAT)$"
The script later uses this to perform a test, such as (assume EXT=ex):
if [[ "$FILTER" == *"$EXT"* ]]; then blah
My problem is the above test is true. However, I'm not asking you to assist in writing the script, just explaining the initial assignment of FILTER. I don't understand that line.
Editing in a closer example FILTER line. Of course the script, as written does not work, because 'ex' returns true, and not just 'exe'. My problem here is only, however, that I don't understant the layout of the variable assignment itself.
Why is there a period at the beginning? ".(sh..."
Why is there a dollar sign at the end? "...BAT)$"
Why are there pipes between each pattern? "sh|SH|exe"
You probably looking for something as next:
FILTER="\.(sh|SH|exe|EXE|bat|BAT)$"
for EXT
do
if [[ "$EXT" =~ $FILTER ]];
then
echo $EXT extension disallowed
else
echo $EXT is allowed
fi
done
save it to myscript.sh and run it as
myscript.sh bash ba.sh
and will get
bash is allowed
ba.sh extension disallowed
If you don't escape the "dot", e.g. with the FILTER=".(sh|SH|exe|EXE|bat|BAT)$" you will get
bash extension disallowed
ba.sh extension disallowed
What is (of course) wrong.
For the questions:
Why is there a period at the beginning? ".(sh..."
Because you want match .sh (as extension) and not for example bash (without the dot). And therefore the . must be escaped, like \. because the . in regex mean "any character.
Why is there a dollar sign at the end? "...BAT)$"
The $ mean = end of string. You want match file.sh and not file.sh.jpg. The .sh should be at the end of string.
Why are there pipes between each pattern? "sh|SH|exe"
In the rexex, the (...|...|...) construction delimites the "alternatives". As you sure quessed.
You really need read some "regex tutorial" - it is more complicated - and can't be explained in one answer.
Ps: NEVER use UPPERCASE variable names, they can collide with environment variables.
This just assigns a string to FILTER; the contents of that string have no special meaning. When you try to match it against the pattern *ex*, the result is true assuming that the value of $FILTER consists the string ex surrounded by anything on either side. This is true; ex is a substring of exe.
FILTER=".(sh|SH|exe|EXE|bat|BAT)$"
^^
|
+---- here is the "ex" from the pattern.
As I can this is similar to regular expression pattern:
In regular expressions the string start with can be show with ^, similarly in this case . represent seems doing that.
In the bracket you have exact string, which represents what the exact file extensions would be matched, they are 'Or' by using the '|'.
And at the end the expression should only pick the string will '$' or end point and not more than.
I would say that way original author might have looked at it and implemented it.
I'm having trouble understanding what the following code does in a bash completion script:
case "$last" in
+\(--import|-i\))
_filedir '+(txt|html)';;
When is that case ever met? I thought the second line above would be something like
--import|-i)
which does make sense to me. I grepped my bash_completion.d directory for '+\\(' but that one was the only one that came up so I guess it's not that common.
This code is indeed puzzling without context. As it is, it matches two literal strings -
$ case "+(--import" in +\(--import|-i\)) echo match ;; esac
match
$ case "-i)" in +\(--import|-i\)) echo match ;; esac
match
It looks similar to the extended glob pattern +(--import|-i), but in this form it's neither a match for the literal pattern (would need to escape the pipe) nor the actual pattern (would need to unescape the parentheses). I'd guess "bug", but bash completion is a minefield of crazy metaprogramming, so it's impossible to say without seeing the entire script.
From bash(1)
If the extglob shell option is enabled using the shopt builtin,
several extended pattern matching operators are recognized. In the
following description, a pattern-list is a list of one or more
patterns separated by a |. Composite patterns may be formed using one
or more of the following sub-patterns:
[...]
+(pattern-list)
Matches one or more occurrences of the given patterns