What does this variable assignment do? - bash

I'm having to code a subversion hook script, and I found a few examples online, mostly python and perl. I found one or two shell scripts (bash) as well. I am confused by a line and am sorry this is so basic a question.
FILTER=".(sh|SH|exe|EXE|bat|BAT)$"
The script later uses this to perform a test, such as (assume EXT=ex):
if [[ "$FILTER" == *"$EXT"* ]]; then blah
My problem is the above test is true. However, I'm not asking you to assist in writing the script, just explaining the initial assignment of FILTER. I don't understand that line.
Editing in a closer example FILTER line. Of course the script, as written does not work, because 'ex' returns true, and not just 'exe'. My problem here is only, however, that I don't understant the layout of the variable assignment itself.
Why is there a period at the beginning? ".(sh..."
Why is there a dollar sign at the end? "...BAT)$"
Why are there pipes between each pattern? "sh|SH|exe"

You probably looking for something as next:
FILTER="\.(sh|SH|exe|EXE|bat|BAT)$"
for EXT
do
if [[ "$EXT" =~ $FILTER ]];
then
echo $EXT extension disallowed
else
echo $EXT is allowed
fi
done
save it to myscript.sh and run it as
myscript.sh bash ba.sh
and will get
bash is allowed
ba.sh extension disallowed
If you don't escape the "dot", e.g. with the FILTER=".(sh|SH|exe|EXE|bat|BAT)$" you will get
bash extension disallowed
ba.sh extension disallowed
What is (of course) wrong.
For the questions:
Why is there a period at the beginning? ".(sh..."
Because you want match .sh (as extension) and not for example bash (without the dot). And therefore the . must be escaped, like \. because the . in regex mean "any character.
Why is there a dollar sign at the end? "...BAT)$"
The $ mean = end of string. You want match file.sh and not file.sh.jpg. The .sh should be at the end of string.
Why are there pipes between each pattern? "sh|SH|exe"
In the rexex, the (...|...|...) construction delimites the "alternatives". As you sure quessed.
You really need read some "regex tutorial" - it is more complicated - and can't be explained in one answer.
Ps: NEVER use UPPERCASE variable names, they can collide with environment variables.

This just assigns a string to FILTER; the contents of that string have no special meaning. When you try to match it against the pattern *ex*, the result is true assuming that the value of $FILTER consists the string ex surrounded by anything on either side. This is true; ex is a substring of exe.
FILTER=".(sh|SH|exe|EXE|bat|BAT)$"
^^
|
+---- here is the "ex" from the pattern.

As I can this is similar to regular expression pattern:
In regular expressions the string start with can be show with ^, similarly in this case . represent seems doing that.
In the bracket you have exact string, which represents what the exact file extensions would be matched, they are 'Or' by using the '|'.
And at the end the expression should only pick the string will '$' or end point and not more than.
I would say that way original author might have looked at it and implemented it.

Related

How to create new names for files with problematic characters for use in an existing bash scripted environment?

The goal is to get rid of (by changing) filenames that give headaches for scripting by translating them to something else. The reason is that in this nearly 30 year Unix / Linux environment, with a lot of existing scripts that may not be "written correctly", a new, large and important cache of files arrived that have to be managed, and so, a colleague has asked me to write a script to help with "problematic filenames" and translate them. They've got a list of chars to turn into dots, such as the comma, and another list to turn into underscores, such as whitespace, as but two examples and ran into problems which I asked about over here.
I was using tr to do it, but commenters to it said I should perhaps ask just about this instead of how to get tr to work. So, I have!
Parameter expansion can do this for you.
Note that unlike when using tr (as requested on your other question), when using parameter expansion you don't need to use backslashes inside your character class definitions: put the expansion in double quotes and bash will treat the results of that expansion as literal.
#!/usr/bin/env bash
toDots='\,;:|+##$%^&*~'
toUnderscores='}{]['"'"'="()`!'
# requires bash 5+: if debug=1, then print what we would do instead of doing it
runOrDebug() {
if (( debug )); then
printf '%s\n' "${*#Q}"
else
"$#"
fi
}
renameFiles() {
local name subDots subBoth
for name; do
subDots=${name//["$toDots"]/.}
subBoth=${subDots//["$toUnderscores"]/_}
if [[ $subBoth != "$name" ]]; then
runOrDebug mv -- "$name" "$subBoth"
fi
done
}
debug=1 renameFiles '[/a],/;[p:r|o\b+lem#a#t$i%c]/#(%$^!/(e^n&t*ry)~='
Note that toUnderscores is (except for the single quote in the middle) in single quotes, so all the backslashes in it are part of the variable's data rather than being syntax; because globs use character class syntax from REs, they're parsed as POSIX regular expression character class syntax.
See a demonstration of the technique running at https://ideone.com/kKE7IJ

multiple replacements on a single variable

For the following variable:
var="/path/to/my/document-001_extra.txt"
i need only the parts between the / [slash] and the _ [underscore].
Also, the - [dash] needs to be stripped.
In other words: document 001
This is what I have so far:
var="${var##*/}"
var="${var%_*}"
var="${var/-/ }"
which works fine, but I'm looking for a more compact substitution pattern that would spare me the triple var=...
Use of sed, awk, cut, etc. would perhaps make more sense for this, but I'm looking for a pure bash solution.
Needs to work under GNU bash, version 3.2.51(1)-release
After editing your question to talk about patterns instead of regular expressions, I'll now show you how to actually use regular expressions in bash :)
[[ $var =~ ^.*/(.*)-(.*)_ ]] && var="${BASH_REMATCH[#]:1:2}"
Parameter expansions like you were using previously unfortunately cannot be nested in bash (unless you use ill-advised eval hacks, and even then it will be less clear than the line above).
The =~ operator performs a match between the string on the left and the regular expression on the right. Parentheses in the regular expression define match groups. If a match is successful, the exit status of [[ ... ]] is zero, and so the code following the && is executed. (Reminder: don't confuse the "0=success, non-zero=failure" convention of process exit statuses with the common Boolean convention of "0=false, 1=true".)
BASH_REMATCH is an array parameter that bash sets following a successful regular-expression match. The first element of the array contains the full text matched by the regular expression; each of the following elements contains the contents of the corresponding capture group.
The ${foo[#]:x:y} parameter expansion produces y elements of the array, starting with index x. In this case, it's just a short way of writing ${BASH_REMATCH[1]} ${BASH_REMATCH[2]}. (Also, while var=${BASH_REMATCH[*]:1:2} would have worked as well, I tend to use # anyway to reinforce the fact that you almost always want to use # instead of * in other contexts.)
Both of the following should work correctly. Though the second is sensitive to misplaced characters (if you have a / or - after the last _ it will fail).
var=$(IFS=_ read s _ <<<"$var"; IFS=-; echo ${s##*/})
var=$(IFS=/-_; a=($var); echo "${a[#]:${#a[#]} - 3:2}")

extract as if a key value pair in bash

The other day I stumbled upon a question on SO. If I wanted to extract the value of HOSTNAME in /etc/sysconfig/network which contains
NETWORKING=yes
HOSTNAME=foo
now I can do grep and cut to get the foo but there was some bash magic involved for a similar issue. I don't know what to search for that and I can't seem to find the question now. it involved something like #{HOSTNAME} . As if it was treating HOSTNAME as a key and foo as a value.
If that configuration file is compatible with shell syntax, simply include it as a shell script. IIRC the files in /etc/sysconfig on Red Hat-like distributions are indeed designed to be parsable by a shell. Note that this means that
If shell special characters may end up in a variable's value, they must be properly quoted. For example, var="value with spaces" requires the quotes. var="with\$dollar" requires the backslash.
The script may run arbitrary code that will be executed, so this is only ok if you trust its content.
If these assumptions are valid, then you can go the simple route:
. /etc/sysconfig/network
echo "$HOSTNAME"
Regarding the quoting and braces, see $VAR vs ${VAR} and to quote or not to quote.

Linux: shell builtin string matching

I am trying to become more familiar with using the builtin string matching stuff available in shells in linux. I came across this guys posting, and he showed an example
a="abc|def"
echo ${a#*|} # will yield "def"
echo ${a%|*} # will yield "abc"
I tried it out and it does what its advertised to do, but I don't understand what the $,{},#,*,| are doing, I tried looking for some reference online or in the manuals but I couldn't find anything. Can anyone explain to me what's going on here?
This article in the Linux Journal says that the # operator deletes the shortest possible match on the left, while the % operator deletes the shortest possible match on the right.
So ${a#*|} returns everything after the |, and ${a%|*} returns everything before the |.
If you had a situation that called for greedy matching, you'd use ## or %%.
Take a look at this.
${string%substring}
Deletes shortest match of $substring
from back of $string.
${string#substring}
Deletes shortest match of $substring
from front of $string.
EDIT:
I don't understand what the $,{},#,*,|
are doing
I recommend reading this
Typically, ${somename} will substitute the contents of a defined parameter:
mystring="1234567"
echo ${mystring} # produces '1234567'
The % and # symbols are allowing you to add commands that modify the default behavior.
The asterisk '*' is a wildcard; while the pipe '|' is simply a matching character. Let me do the same thing using the matching character of '4'.
mystring="1234567"
echo ${mystring#*4} # produces '567'
Those features and other similarly useful ones are documented in the Shell Parameter Expansion section of the Bash Reference Manual. Here's another really good reference.

Search and replace in Shell

I am writing a shell (bash) script and I'm trying to figure out an easy way to accomplish a simple task.
I have some string in a variable.
I don't know if this is relevant, but it can contain spaces, newlines, because actually this string is the content of a whole text file.
I want to replace the last occurence of a certain substring with something else.
Perhaps I could use a regexp for that, but there are two moments that confuse me:
I need to match from the end, not from the start
the substring that I want to scan for is fixed, not variable.
for truncating at the start: ${var#pattern}
truncating at the end ${var%pattern}
${var/pattern/repl} for general replacement
the patterns are 'filename' style expansion, and the last one can be prefixed with # or % to match only at the start or end (respectively)
it's all in the (long) bash manpage. check the "Parameter Expansion" chapter.
amn expression like this
s/match string here$/new string/
should do the trick - s is for sustitute, / break up the command, and the $ is the end of line marker. You can try this in vi to see if it does what you need.
I would look up the man pages for awk or sed.
Javier's answer is shell specific and won't work in all shells.
The sed answers that MrTelly and epochwolf alluded to are incomplete and should look something like this:
MyString="stuff ttto be edittted"
NewString=`echo $MyString | sed -e 's/\(.*\)ttt\(.*\)/\1xxx\2/'`
The reason this works without having to use the $ to mark the end is that the first '.*' is greedy and will attempt to gather up as much as possible while allowing the rest of the regular expression to be true.
This sed command should work fine in any shell context used.
Usually when I get stuck with Sed I use this page,
http://sed.sourceforge.net/sed1line.txt

Resources