Skip items in bash when in exclusion array - bash

I have a script that loops over database names and if the name of the current database is in my exclusion array I want to skip it. How would I accomplish this in bash?
excluded_databases=("template1" "template0")
for database in $databases
do
if ...; then
# perform something on the database...
fi
done

You can do it by testing each name in turn, but you might be better off filtering the list in one operation. (The following assumes that no name in $databases contains whitespace, which is implicit given your for loop).
for database in $(printf %s\\n $databases |
grep -Fvx "${excluded_databases[#]/#/-e}"); do
# something
done
Explanation of the idioms:
printf %s\\n ... prints each of its arguments on a single line.
grep -Fvx searchs for exact matches (-F) of the whole line (-x) and inverts the match result (-v).
"${array[#]/#/-e}" prepends -e to each element of the array array, which is useful when you need to provide each element of the array as a (repeated) command-line option to a utility. In this case, the utility is grep and the -e flag is used to provide a match pattern.
I've been criticized in the past for printf %s\\n -- some people prefer printf '%s\n' -- but I find the first one easier to type. YMMV.
As a comment, it seems like it would be better to make $databases an array as well as $excluded_databases, which would allow for names including whitespace. The printf | grep solution still doesn't allow newlines in names; it's complicated to work around that. If you were to make that change, you'd only need to change the printf to printf %s\\n "${databases[#]}".

You can use this condition to check for presence of an element in an array:
if [[ "${excluded_databases[#]/$database}" == "${excluded_databases[#]}" ]]
Another option using case:
case "${excluded_databases[#]}" in *"$database"*) echo "found in array" ;; esac

If you are using bash 4 or greater then using an associative array will help you here.
declare -A excluded_databases=(["template1"]=1 ["template0"]=1)
for database in $databases
do
if [ -z "${excluded_databases[$database]}" ]; then
continue
fi
# ... do something with $database
done

Related

Extracting a string between last two slashes in Bash

I know this can be easily done using regex like I answered on https://stackoverflow.com/a/33379831/3962126, however I need to do this in bash.
So the closest question on Stackoverflow I found is this one bash: extracting last two dirs for a pathname, however the difference is that if
DIRNAME = /a/b/c/d/e
then I need to extract
d
This may be relatively long, but it's also much faster to execute than most preceding answers (other than the zsh-only one and that by j.a.), since it uses only string manipulations built into bash and uses no subshell expansions:
string='/a/b/c/d/e' # initial data
dir=${string%/*} # trim everything past the last /
dir=${dir##*/} # ...then remove everything before the last / remaining
printf '%s\n' "$dir" # demonstrate output
printf is used in the above because echo doesn't work reliably for all values (think about what it would do on a GNU system with /a/b/c/-n/e).
Here a pure bash solution:
[[ $DIRNAME =~ /([^/]+)/[^/]*$ ]] && printf '%s\n' "${BASH_REMATCH[1]}"
Compared to some of the other answers:
It matches the string between the last two slashes. So, for example, it doesn't match d if DIRNAME=d/e.
It's shorter and fast (just uses built-ins and doesn't create subprocesses).
Support any character between last two slashes (see Charles Duffy's answer for more on this).
Also notice that is not the way to assign a variable in bash:
DIRNAME = /a/b/c/d/e
^ ^
Those spaces are wrong, so remove them:
DIRNAME=/a/b/c/d/e
Using awk:
echo "/a/b/c/d/e" | awk -F / '{ print $(NF-1) }' # d
Edit: This does not work when the path contains newlines, and still gives output when there are less than two slashes, see comments below.
Using sed
if you want to get the fourth element
DIRNAME="/a/b/c/d/e"
echo "$DIRNAME" | sed -r 's_^(/[^/]*){3}/([^/]*)/.*$_\2_g'
if you want to get the before last element
DIRNAME="/a/b/c/d/e"
echo "$DIRNAME" | sed -r 's_^.*/([^/]*)/[^/]*$_\1_g'
OMG, maybe this was obvious, but not to me initially. I got the right result with:
dir=$(basename -- "$(dirname -- "$str")")
echo "$dir"
Using zsh parameter substitution is pretty cool too
echo ${${DIRNAME%/*}##*/}
I think it's faster than the double $() as well, because it won't need any subprocesses.
Basically it slices off the right side first, and then all the remaining left side second.

Shell: extract words matching pattern, but ignore circumventing expression

I am currently trying to extract ALL matching expressions from a text which e.g. looks like this and put them into an array.
aaaaaaaaa${bbbbbbb}ccccccc${dddd}eeeee
ssssssssssssssssss${TTTTTT}efhsekfh ej
348653jlk3jß1094utß43t59ßgöelfl,-s-fko
The matching expressions are similar to this: ${}. Beware that I need the full expression, not only the word in between this expression! So in this case the result should be an array which contains:
${bbbbbbb}
${dddd}
${TTTTTTT}
Problems I have stumbled upon and couldn't solve:
It should NOT recognizes this as a whole
${bbbbbbb}ccccccc${dddd} but each for its own
grep -o is not installed on the old machine, Perl is not allowed either!
Many commands e.g. BASH_REMATCH only deliver the whole line or the first occurrence of the expression, instead of all matching expressions in the line!
The mentioned pattern \${[^}]*} seems to work partly, as it can extract the first occurrence of the expression, however it always omitts the ones following after that, if it's in the same text line. What I need is ALL matching expressions found in the line, not only the first one.
You could split the string on any of the characters $,{,}:
$ s='...blaaaaa${blabla}bloooo${bla}bluuuuu...'
$ echo "$s"
...blaaaaa${blabla}bloooo${bla}bluuuuu...
$ IFS='${}' read -ra words <<< "$s"
$ for ((i=0; i<${#words[#]}; i++)); do printf "%d %s\n" $i "${words[i]}"; done
0 ...blaaaaa
1
2 blabla
3 bloooo
4
5 bla
6 bluuuuu...
So if you're trying to extract the words inside the braces:
$ for ((i=2; i<${#words[#]}; i+=3)); do printf "%d %s\n" $i "${words[i]}"; done
2 blabla
5 bla
If the above doesn't suit you, grep will work:
$ echo '...blaaaaa${blabla}bloooo${bla}bluuuuu...' | grep -o '\${[^}]\+}'
${blabla}
${bla}
You still haven't told us exactly what output you want.
Since it bugged me a lot I have asked directly on www.unix.com and was kindly provided with a solution which fits for my ancient shell. So if anyone got the same problem here is the solution:
line='aaaa$aa{yyy}aaa${important}xxxxxxxx${important2}oo{o$}oo$oo${importantstring3}'
IFS=\$ read -a words <<< "$line"
regex='^(\{[^}]+})'
for e in "${words[#]}"; do
if [[ $e =~ $regex ]]; then
echo "\$${BASH_REMATCH[0]}";
fi;
done
which prints then the following - without even getting disturbed by random occurrences of $ and { or } between the syntactically correct expressions:
${important}
${important2}
${importantstring3}
I have updated the full solution after I got another update from the forums: now it also ignores this: aaa$aa{yyy}aaaa - which it previously printed as ${yyy} - but which it should completely ignore as there are characters between $ and {. Now with the additional anchoring on the beginning of the regexp it works as expected.
I just found another issue: theoretically using the above approach I would still get a wrong output if the read line looks like this line='{ccc}aaaa${important}aaa'. The IFS would split it and the REGEX would match {ccc} although this hadn't the $ sign in front. This is suboptimal.
However following approach could solve it: after getting the BASH_REMATCH I would need to do a search in the original line - the one I gave to the IFS - for this exact expression ${ccc} - with the difference, that the $ is included! And only if it finds this exact match, only then, it counts as a valid match; otherwise it should be ignored. Kind of a reverse search method...
Updated - add this reverse search to ignore the trap on the beginning of the line:
pattern="\$${BASH_REMATCH[0]}";
searchresult="";
searchresult=`echo "$line" | grep "$pattern"`;
if [ "$searchresult" != "" ]; then echo "It was found!"; fi;
Neglectable issue: If the line looks like this line='{ccc}aaaaaa${ccc}bbbbb' it would recognize the first {ccc} as a valid match (although it isn't) and print it, because the reverse search found the second ${ccc}. Although this is not intended it's irrelevant for my specific purpose as it implies that this pattern does in fact exist at least once in the same line.

Using Variables with grep, and an IF statement regarding this

I am looking to search for strings within a file using variables.
I have a script that will accept 3 or 4 parameters: 3 are required; the 4th isn't mandatory.
I would like to search the text file for the 3 parameters matching within the same line, and if they do match then I want to remove that line and replace it with my new one - basically it would update the 4th parameter if set, and avoid duplicate entries.
Currently this is what I have:
input=$(egrep -e '$domain\s+$type\s+$item' ~/etc/security/limits.conf)
if [ "$input" == "" ]; then
echo $domain $type $item $value >>~/etc/security/limits.conf
echo \"$domain\" \"$type\" \"$item\" \"$value\" has been successfully added to your limits.conf file.
else
cat ~/etc/security/limits.conf | egrep -v "$domain|$type|$item" >~/etc/security/limits.conf1
rm -rf ~/etc/security/limits.conf
mv ~/etc/security/limits.conf1 ~/etc/security/limits.conf
echo $domain $type $item $value >>~/etc/security/limits.conf
echo \"$domain\" \"$type\" \"$item\" \"$value\" has been successfully added to your limits.conf file.
exit 0
fi
Now I already know that the input=egrep etc.. will not work; it works if I hard code some values, but it won't accept those variables. Basically I have domain=$1, type=$2 and so on.
I would like it so that if all 3 variables are not matched within one line, than it will just append the parameters to the end of the file, but if the parameters do match, then I want them to be deleted, and appended to the file. I know I can use other things like sed and awk, but I have yet to learn them.
This is for a school assignment, and all help is very much appreciated, but I'd also like to learn why and how it works/doesn't, so if you can provide answers to that as well that would be great!
Three things:
To assign the output of a command, use var=$(cmd).
Don't put spaces around the = in assignments.
Expressions don't expand in single quotes: use double quotes.
To summarize:
input=$(egrep -e "$domain\s+$type\s+$item" ~/etc/security/limits.conf)
Also note that ~ is your home directory, so if you meant /etc/security/limits.conf and not /home/youruser/etc/security/limits.conf, leave off the ~
You have several bugs in your script. Here's your script with some comments added
input=$(egrep -e '$domain\s+$type\s+$item' ~/etc/security/limits.conf)
# use " not ' in the string above or the shell can't expand your variables.
# some versions of egrep won't understand '\s'. The safer, POSIX character class is [[:blank:]].
if [ "$input" == "" ]; then
# the shell equality test operator is =, not ==. Some shells will also take == but don't count on it.
# the normal way to check for a variable being empty in shell is with `-z`
# you can have problems with tests in some shells if $input is empty, in which case you'd use [ "X$input" = "X" ].
echo $domain $type $item $value >>~/etc/security/limits.conf
# echo is unsafe and non-portable, you should use printf instead.
# the above calls echo with 4 args, one for each variable - you probably don't want that and should have double-quoted the whole thing.
# always double-quote your shell variables to avoid word splitting ad file name expansion (google those - you don't want them happening here!)
echo \"$domain\" \"$type\" \"$item\" \"$value\" has been successfully added to your limits.conf file.
# the correct form would be:
# printf '"%s" "%s" "%s" "%s" has been successfully added to your limits.conf file.\n' "$domain" "$type" "$item" "$value"
else
cat ~/etc/security/limits.conf | egrep -v "$domain|$type|$item" >~/etc/security/limits.conf1
# Useless Use Of Cat (UUOC - google it). [e]grep can open files just as easily as cat can.
rm -rf ~/etc/security/limits.conf
# -r is for recursively removing files in a directory - inappropriate and misleading when used on a single file.
mv ~/etc/security/limits.conf1 ~/etc/security/limits.conf
# pointless to remove the file above when you're overwriting it here anyway
# If your egrep above failed to create your temp file (e.g. due to memory or permissions issues) then the "mv" above would zap your real file. the correct way to do this is:
# egrep regexp file > tmp && mv tmp file
# i.e. use && to only do the mv if creating the tmp file succeeded.
echo $domain $type $item $value >>~/etc/security/limits.conf
# see previous echo comments.
echo \"$domain\" \"$type\" \"$item\" \"$value\" has been successfully added to your limits.conf file.
# ditto
exit 0
# pointless and misleading having an explicit "exit <success>" when that's what the script will do by default anyway.
fi
This line:
input=$(egrep -e '$domain\s+$type\s+$item' ~/etc/security/limits.conf)
requires double quotes around the regex to allow the shell to interpolate the variable values.
input=$(egrep -e "$domain\s+$type\s+$item" ~/etc/security/limits.conf)
You need to be careful with backslashes; you probably don't have to double them up in this context, but you should be sure you know why.
You should be aware that your first egrep commands is much more restrictive in what it selects than the second egrep which is used to delete data from the file. The first requires the entry with the three fields in the single line; the second only requires a match with any one of the words (and that could be part of a larger word) to delete the line.
Since ~/etc/security/limits.conf is a file, there is no need to use the -r option of rm; it is advisable not to use the -r unless you intend to remove directories.

Manipulating data text file with bash command?

I was given this text file, call stock.txt, the content of the text file is:
pepsi;drinks;3
fries;snacks;6
apple;fruits;9
baron;drinks;7
orange;fruits;2
chips;snacks;8
I will need to use bash-script to come up this output:
Total amount for drinks: 10
Total amount for snacks: 14
Total amount for fruits: 11
Total of everything: 35
My gut tells me I will need to use sed, group, grep and something else.
Where should I start?
I would break the exercise down into steps
Step 1: Read the file one line at a time
while read -r line
do
# do something with $line
done
Step 2: Pattern match (drinks, snacks, fruits) and do some simple arithmetic. This step requires that you tokenized each line which I'll leave an exercise for you to figure out.
if [[ "$line" =~ "drinks" ]]
then
echo "matched drinks"
.
.
.
fi
Pure Bash. A nice application for an associative array:
declare -A category # associative array
IFS=';'
while read name cate price ; do
((category[$cate]+=price))
done < stock.txt
sum=0
for cate in ${!category[#]}; do # loop over the indices
printf "Total amount of %s: %d\n" $cate ${category[$cate]}
((sum+=${category[$cate]}))
done
printf "Total amount of everything: %d\n" $sum
There is a short description here about processing comma separated files in bash here:
http://www.cyberciti.biz/faq/unix-linux-bash-read-comma-separated-cvsfile/
You could do something similar. Just change IFS from comma to semicolon.
Oh yeah, and a general hint for learning bash: man is your friend. Use this command to see manual pages for all (or most) of commands and utilities.
Example: man read shows the manual page for read command. On most systems it will be opened in less, so you should exit the manual by pressing q (may be funny, but it took me a while to figure that out)
The easy way to do this is using a hash table, which is supported directly by bash 4.x and of course can be found in awk and perl. If you don't have a hash table then you need to loop twice: once to collect the unique values of the second column, once to total.
There are many ways to do this. Here's a fun one which doesn't use awk, sed or perl. The only external utilities I've used here are cut, sort and uniq. You could even replace cut with a little more effort. In fact lines 5-9 could have been written more easily with grep, (grep $kind stock.txt) but I avoided that to show off the power of bash.
for kind in $(cut -d\; -f 2 stock.txt | sort | uniq) ; do
total=0
while read d ; do
total=$(( total+d ))
done < <(
while read line ; do
[[ $line =~ $kind ]] && echo $line
done < stock.txt | cut -d\; -f3
)
echo "Total amount for $kind: $total"
done
We lose the strict ordering of your original output here. An exercise for you might be to find a way not to do that.
Discussion:
The first line describes a sub-shell with a simple pipeline using cut. We read the third field from the stock.txt file, with fields delineated by ;, written \; here so the shell does not interpret it. The result is a newline-separated list of values from stock.txt. This is piped to sort, then uniq. This performs our "grouping" step, since the pipeline will output an alphabetic list of items from the second column but will only list each item once no matter how many times it appeared in the input file.
Also on the first line is a typical for loop: For each item resulting from the sub-shell we loop once, storing the value of the item in the variable kind. This is the other half of the grouping step, making sure that each "Total" output line occurs once.
On the second line total is initialized to zero so that it always resets whenever a new group is started.
The third line begins the 'totaling' loop, in which for the current kind we find the sum of its occurrences. here we declare that we will read the variable d in from stdin on each iteration of the loop.
On the fourth line the totaling actually occurs: Using shell arithmatic we add the value in d to the value in total.
Line five ends the while loop and then describes its input. We use shell input redirection via < to specify that the input to the loop, and thus to the read command, comes from a file. We then use process substitution to specify that the file will actually be the results of a command.
On the sixth line the command that will feed the while-read loop begins. It is itself another while-read loop, this time reading into the variable line. On the seventh line the test is performed via a conditional construct. Here we use [[ for its =~ operator, which is a pattern matching operator. We are testing to see whether $line matches our current $kind.
On the eighth line we end the inner while-read loop and specify that its input comes from the stock.txt file, then we pipe the output of the entire loop, which by now is simply all lines matching $kind, to cut and instruct it to show only the third field, which is the numeric field. On line nine we then end the process substitution command, the output of which is a newline-delineated list of numbers from lines which were of the group specified by kind.
Given that the total is now known and the kind is known it is a simple matter to print the results to the screen.
The below answer is OP's. As it was edited in the question itself and OP hasn't come back for 6 years, I am editing out the answer from the question and posting it as wiki here.
My answer, to get the total price, I use this:
...
PRICE=0
IFS=";" # new field separator, the end of line
while read name cate price
do
let PRICE=PRICE+$price
done < stock.txt
echo $PRICE
When I echo, its :35, which is correct. Now I will moving on using awk to get the sub-category result.
Whole Solution:
Thanks guys, I manage to do it myself. Here is my code:
#!/bin/bash
INPUT=stock.txt
PRICE=0
DRINKS=0
SNACKS=0
FRUITS=0
old_IFS=$IFS # save the field separator
IFS=";" # new field separator, the end of line
while read name cate price
do
if [ $cate = "drinks" ]; then
let DRINKS=DRINKS+$price
fi
if [ $cate = "snacks" ]; then
let SNACKS=SNACKS+$price
fi
if [ $cate = "fruits" ]; then
let FRUITS=FRUITS+$price
fi
# Total
let PRICE=PRICE+$price
done < $INPUT
echo -e "Drinks: " $DRINKS
echo -e "Snacks: " $SNACKS
echo -e "Fruits: " $FRUITS
echo -e "Price " $PRICE
IFS=$old_IFS

Bourne Shell Scripting -- simple for loop syntax

I'm not entirely new to programming, but I'm not exactly experienced. I want to write small shell script for practice.
Here's what I have so far:
#!/bin/sh
name=$0
links=$3
owner=$4
if [ $# -ne 1 ]
then
echo "Usage: $0 <directory>"
exit 1
fi
if [ ! -e $1 ]
then
echo "$1 not found"
exit 1
elif [ -d $1 ]
then
echo "Name\t\tLinks\t\tOwner\t\tDate"
echo "$name\t$links\t$owner\t$date"
exit 0
fi
Basically what I'm trying to do is have the script go through all of the files in a specified directory and then display the name of each file with the amount of links it has, its owner, and the date it was created. What would be the syntax for displaying the date of creation or at least the date of last modification of the file?
Another thing is, what is the syntax for creating a for loop? From what I understand I would have to write something like for $1 in $1 ($1 being all of the files in the directory the user typed in correct?) and then go through checking each file and displaying the information for each one. How would I start and end the for loop (what is the syntax for this?).
As you can see I'm not very familiar bourne shell programming. If you have any helpful websites or have a better way of approaching this please show me!
Syntax for a for loop:
for var in list
do
echo $var
done
for example:
for var in *
do
echo $var
done
What you might want to consider however is something like this:
ls -l | while read perms links owner group size date1 date2 time filename
do
echo $filename
done
which splits the output of ls -l into fields on-the-fly so you don't need to do any splitting yourself.
The field-splitting is controlled by the shell-variable IFS, which by default contains a space, tab and newline. If you change this in a shell script, remember to change it back. Thus by changing the value of IFS you can, for example, parse CSV files by setting this to a comma. this example reads three fields from a CSV and spits out the 2nd and 3rd only (it's effectively the shell equivalent of cut -d, -f2,3 inputfile.csv)
oldifs=$IFS
IFS=","
while read field1 field2 field3
do
echo $field2 $field3
done < inputfile.csv
IFS=oldifs
(note: you don't need to revert IFS, but I generally do to make sure that further text processing in a script isn't affected after I'm done with it).
Plenty of documentation out the on both for and while loops; just google for it :-)
$1 is the first positional parameter, so $3 is the third and $4 is the fourth. They have nothing to do with the directory (or its files) the script was started from. If your script was started using this, for example:
./script.sh apple banana cherry date elderberry
then the variable $1 would equal "apple" and so on. The special parameter $# is the count of positional parameters, which in this case would be five.
The name of the script is contained in $0 and $* and $# are arrays that contain all the positional parameters which behave differently depending on whether they appear in quotes.
You can refer to the positional parameters using a substring-style index:
${#:2:1}
would give "banana" using the example above. And:
${#: -1}
or
${#:$#}
would give the last ("elderberry"). Note that the space before the minus sign is required in this context.
You might want to look at Advanced Bash-Scripting Guide. It has a section that explains loops.
I suggest to use find with the option -printf "%P\t%n\t%u\t%t"
for x in "$#"; do
echo "$x"
done
The "$#" protects any whitespace in supplied file names. Obviously, do your real work in place of "echo $x", which isn't doing much. But $# is all the junk supplied on the command line to your script.
But also, your script bails out if $# is not equal to 1, but you're apparently fully expecting up to 4 arguments (hence the $4 you reference in the early part of your script).
assuming you have GNU find on your system
find /path -type f -printf "filename: %f | hardlinks: %n| owner: %u | time: %TH %Tb %TY\n"

Resources