Removing a block of a certain character in pure bash - bash

Hey so lets say you have a string "aabbaabbbaab". As you can see you have 3 blocks of "b". For example, how do I remove the 2nd block of b , so "bbb"? It should turn the string into: "aabbaaaab". I have tried looking everywhere but I just couldnt think of a right syntax for my specific question. I need to do this in pure bash so no awk, sed etc.

Here's pure bash: it iterates over the string, character by character. When it detects it's in the n'th block of the specified char, we know that the entire string up til here is the first part of the output we want. When we get to the end of the n'th block, we know that the rest of the string is wanted.
remove_nth_block () {
local str=$1 char=$2 n=$3
local i count=0 prev prefix
for ((i=0; i<${#str}; i++)); do
if [[ ${str:i:1} = $char && $prev != $char ]]; then
((++count == n)) && prefix=${str:0:i}
else
if [[ ${str:i:1} != $char && $prev = $char && $count -eq $n ]]; then
echo "$prefix${str:i}"
return
fi
fi
prev=${str:i:1}
done
}
Then
$ remove_nth_block aabbaabbbaab b 2
aabbaaaab
$ remove_nth_block aabbaabbbaab a 2
aabbbbbaab

This should print myString after replacing all occurrences of bbb with nothing. For some really useful tips and examples of string manipulation in bash, check out this site.
myString="aabbaabbbaab"
echo ${myString//bbb/}

Related

Bash script with multiline variable

Here is my code
vmname="$1"
EXCEPTLIST="desktop-01|desktop-02|desktop-03|desktop-04"
if [[ $vmname != #(${EXCEPTLIST}) ]]; then
echo "${vmname}"
else
echo "Its in the exceptlist"
fi
The above code works perfectly but my question is , the EXCEPTLIST can be a long line, say 100 server names. In that case its hard to put all that names in one line. In that situation is there any way to make the variable EXCEPTLIST to be a multiline variable ? something like as follows:
EXCEPTLIST="desktop-01|desktop-02|desktop-03| \n
desktop-04|desktop-05|desktop-06| \n
desktop-07|desktop-08"
I am not sure but was thinking of possibilities.
Apparently I would like to know the terminology of using #(${})- Is this called variable expansion or what ? Does anyone know the documentation/explain to me about how this works in bash. ?
One can declare an array if the data/string is long/large. Use IFS and printf for the format string, something like:
#!/usr/bin/env bash
exceptlist=(
desktop-01
desktop-02
desktop-03
desktop-04
desktop-05
desktop-06
)
pattern=$(IFS='|'; printf '#(%s)' "${exceptlist[*]}")
[[ "$vmname" != $pattern ]] && echo good
In that situation is there any way to make the variable EXCEPTLIST to be a multiline variable ?
With your given input/data an array is also a best option, something like:
exceptlist=(
'desktop-01|desktop-02|desktop-03'
'desktop-04|desktop-05|desktop-06'
'desktop-07|desktop-08'
)
Check what is the value of $pattern variable one way is:
declare -p pattern
Output:
declare -- pattern="#(desktop-01|desktop-02|desktop-03|desktop-04|desktop-05|desktop-06)"
Need to test/check if $vmname is an empty string too, since it will always be true.
On a side note, don't use all upper case variables for purely internal purposes.
The $(...) is called Command Substitution.
See LESS=+'/\ *Command Substitution' man bash
In addition to what was mentioned in the comments about pattern matching
See LESS=+/'(pattern-list)' man bash
See LESS=+/' *\[\[ expression' man bash
s there any way to make the variable EXCEPTLIST to be a multiline variable ?
I see no reason to use matching. Use a bash array and just compare.
exceptlist=(
desktop-01
desktop-02
desktop-03
desktop-04
desktop-05
desktop-06
)
is_in_list() {
local i
for i in "${#:2}"; do
if [[ "$1" = "$i" ]]; then
return 0
fi
done
return 1
}
if is_in_list "$vmname" "${EXCEPTLIST[#]}"; then
echo "is in exception list ${vmname}"
fi
#(${})- Is this called variable expansion or what ? Does anyone know the documentation/explain to me about how this works in bash. ?
${var} is a variable expansion.
#(...) are just characters # ( ).
From man bash in Compund commands:
[[ expression ]]
When the == and != operators are used, the string to the right of the operator is considered a pattern and matched according to the rules
described below under Pattern Matching, as if the extglob shell option were enabled. ...
From Pattern Matching in man bash:
#(pattern-list)
Matches one of the given patterns
[[ command receives the #(a|b|c) string and then matches the arguments.
There is absolutely no need to use Bash specific regex or arrays and loop for a match, if using grep for raw string on word boundary.
The exception list can be multi-line, it will work as well:
#!/usr/bin/sh
exceptlist='
desktop-01|desktop-02|desktop-03|
deskop-04|desktop-05|desktop-06|
desktop-07|deskop-08'
if printf %s "$exceptlist" | grep -qwF "$1"; then
printf '%s is in the exceptlist\n' "$1"
fi
I wouldn't bother with multiple lines of text. This is would be just fine:
EXCEPTLIST='desktop-01|desktop-02|desktop-03|'
EXCEPTLIST+='desktop-04|desktop-05|desktop-06|'
EXCEPTLIST+='desktop-07|desktop-08'
The #(...) construct is called extended globbing pattern and what it does is an extension of what you probably already know -- wildcards:
VAR='foobar'
if [[ "$VAR" == fo?b* ]]; then
echo "Yes!"
else
echo "No!"
fi
A quick walkthrough on extended globbing examples: https://www.linuxjournal.com/content/bash-extended-globbing
#!/bin/bash
set +o posix
shopt -s extglob
vmname=$1
EXCEPTLIST=(
desktop-01 desktop-02 desktop-03
...
)
if IFS='|' eval '[[ ${vmname} == #(${EXCEPTLIST[*]}) ]]'; then
...
Here's one way to load a multiline string into a variable:
fn() {
cat <<EOF
desktop-01|desktop-02|desktop-03|
desktop-04|desktop-05|desktop-06|
desktop-07|desktop-08
EOF
}
exceptlist="$(fn)"
echo $exceptlist
As to solving your specific problem, I can think of a variety of approaches.
Solution 1, since all the desktop has the same desktop-0 prefix and only differ in the last letter, we can make use of {,} or {..} expansion as follows:
vmname="$1"
found=0
for d in desktop-{01..08}
do
if [[ "$vmname" == $d ]]; then
echo "It's in the exceptlist"
found=1
break
fi
done
if (( !found )); then
echo "Not found"
fi
Solution 2, sometimes, it is good to provide a list in a maintainable clear text list. We can use a while loop and iterate through the list
vmname="$1"
found=0
while IFS= read -r d
do
if [[ "$vmname" == $d ]]; then
echo "It's in the exceptlist"
found=1
break
fi
done <<EOF
desktop-01
desktop-02
desktop-03
desktop-04
desktop-05
desktop-06
desktop-07
desktop-08
EOF
if (( !found )); then
echo "Not found"
fi
Solution 3, we can desktop the servers using regular expressions:
vmname="$1"
if [[ "$vmname" =~ ^desktop-0[1-8]$ ]]; then
echo "It's in the exceptlist"
else
echo "Not found"
fi
Solution 4, we populate an array, then iterate through an array:
vmname="$1"
exceptlist=()
exceptlist+=(desktop-01 desktop-02 desktop-03 deskop-04)
exceptlist+=(desktop-05 desktop-06 desktop-07 deskop-08)
found=0
for d in ${exceptlist[#]}
do
if [[ "$vmname" == "$d" ]]; then
echo "It's in the exceptlist"
found=1
break;
fi
done
if (( !found )); then
echo "Not found"
fi

In Bash, is it possible to match a string variable containing wildcards to another string

I am trying to compare strings against a list of other strings read from a file.
However some of the strings in the file contain wildcard characters (both ? and *) which need to be taken into account when matching.
I am probably missing something but I am unable to see how to do it
Eg.
I have strings from file in an array which could be anything alphanumeric (and include commas and full stops) with wildcards : (a?cd, xy, q?hz, j,h-??)
and I have another string I wish to compare with each item in the list in turn. Any of the strings may contain spaces.
so what I want is something like
teststring="abcdx.rubb ish,y"
matchstrings=("a?cd" "*x*y" "q?h*z" "j*,h-??")
for i in "${matchstrings[#]}" ; do
if [[ "$i" == "$teststring" ]]; then # this test here is the problem
<do something>
else
<do something else>
fi
done
This should match on the second "matchstring" but not any others
Any help appreciated
Yes; you just have the two operands to == reversed; the glob goes on the right (and must not be quoted):
if [[ $teststring == $i ]]; then
Example:
$ i=f*
$ [[ foo == $i ]] && echo pattern match
pattern match
If you quote the parameter expansion, the operation is treated as a literal string comparison, not a pattern match.
$ [[ foo == "$i" ]] || echo "foo != f*"
foo != f*
Spaces in the pattern are not a problem:
$ i="foo b*"
$ [[ "foo bar" == $i ]] && echo pattern match
pattern match
You can do this even completely within POSIX, since case alternatives undergo parameter substitution:
#!/bin/sh
teststring="abcdx.rubbish,y"
while IFS= read -r matchstring; do
case $teststring in
($matchstring) echo "$matchstring";;
esac
done << "EOF"
a?cd
*x*y
q?h*z
j*,h-??
EOF
This outputs only *x*y as desired.

How to tokenise string and call a function on each token in bash?

I have a text file with comma delimiter like following
for example str_data.txt
aaa,111,bbb
ccc,222,ddd
eee,333,fff
I have a bash function to validate each token (i.e. if each token is following some rule or not based on that function will echo true or false. (can leave it like [[ XYZ == "$1"]] also, instead of returning echo) )
for example
function validate_token {
local _rule = XYZ
if [[ XYZ == "$1" ]]; then
echo "true"
else
echo "false"
fi
}
I want to write a bash script (one-liner or multi-line) to validate all these tokens separately (i.e. validate_token "aaa" then validate_token "111") and finally answer "true" or "false" based on ANDing of each token's results.
Would yo please try the following:
validate_token() {
local rule="???" # matches a three-chraracter string
if [[ $1 == $rule ]]; then
echo 1
else
echo 0
fi
}
final=1 # final result
while IFS=',' read -ra ary; do
for i in "${ary[#]}"; do
final=$(( final & $(validate_token "$i") ))
# take AND with the individual test result
done
done < "str_data.txt"
(( $final )) && echo "true" || echo "false"
I've also modified your function due to several reasons.
When defining a bash function, the form name() { .. } is preferred.
It is not recommended to start the user's variable name with an underscore.
You have localized it and don't have to care about the variable name
collision.
When evaluating the conditional expression by using == or = operator
within [[ .. ]], it will be better to place the pattern or rule to the right of the
operator.
It will be convenient to return 1 or 0 rather than true or false for further calculation.
Hope this helps.
You can try the below, reading line by line and storing the values into an array, then iterating the array calling the function for each value :
IFS=","
while read line
do
read -ra lineValues <<< "$line"
for value in "${lineValues[#]}"
do
validate_token "$value"
done
done < your_txt_file

Struggling with while loop with two conditions and a for loop

I'm trying to learn how to use arrays in bash. I'm writing a script that asks the user for three numbers and figures out what the biggest number is. I want the script to force the user to enter only numeric values. Furthermore, I want the three numbers should be different. I'm using an array to store user input. This is what I have so far:
## declare variables
order=("first" "second" "third")
answers=()
RED='\033[0;31m'
NC='\033[0m' # No Color
numbersonly() {
if [[ ! $1 =~ ^[0-9]+$ ]]; then
echo "${RED}$1 is not a valid number.${NC}"
else
answers+=("$input")
break
fi
}
inarray(){
for e in ${answers[#]}; do
if [[ $1 == $e ]]; then
echo "${RED}Warning.${NC}"
fi
done
}
readnumber(){
for i in {1..3}; do
j=$(awk "BEGIN { print $i-1 }")
while read -p "Enter the ${order[$j]} number: " input ; do
inarray $input
numbersonly $input
done
done
}
displayanswers(){
echo "Your numbers are: ${answers[#]}"
}
biggestnumber(){
if (( ${answers[0]} >= ${answers[1]} )); then
biggest=${answers[0]}
else
biggest=${answers[1]}
fi
if (( $biggest <= ${answers[2]} )); then
biggest=${answers[2]}
fi
echo "The biggest number is: $biggest"
}
main(){
readnumber
displayanswers
biggestnumber
}
main
Right now, I can get the script to display a warning when the user enters a number that was previously entered, but I can't seem to find the proper syntax to stay in the while loop if the user input has already been entered. Thoughts?
I found a way around it. My problem was twofold: 1) I didn't realize that if you have a for loop nested in a while loop, you'll need two break statements to exit the while loop; 2) having two functions within the same while loop made it hard to control what was happening. By merging inarray() and numbersonly() into a new function, I solved the double conditional issue. The new function looks like this:
testing(){
for item in ${answers[*]}
do
test "$item" == "$1" && { inlist="yes"; break; } || inlist="no"
done
if [[ $inlist == "yes" ]]; then
echo "${RED}$1 is already in list.${NC}"
else
if [[ ! $1 =~ ^[0-9]+$ ]]; then
echo "${RED}$1 is not a valid number.${NC}"
else
answers+=("$input")
break
fi
fi
}
Without much study here is what leapt off the screen to me follows. Beware I haven't actually tested it... debugging is an exercise left to the student.
Recommend using newer "function" definitions as you can declare local variables. () definitions do not allow localized variables.
function inarray
{
local e; #don't muck up any variable e in caller
...
}
To calculate values avoid extra awk and use j=$(( i - 1 ));
biggestnumber should likely use a loop.
Overall comment:
nummax=3; #maximum value defined in just one place
# loop this way... showing optional {} trick for marking larger loops
for (( n = 0; n < nummax; ++n )); do
{
nx=$(( 1 + n )); #1 based index
} done;
Hint: should stop input loop once all input present. Could also add:
if [ "" == "${input:-}" ]; then break;
for (( a = 0; a < ${#answer[*]}; ++a )); do
Note the extensive use of double quotes to avoid syntax errors if the variable value is empty or contains many shell metacharacters, like spaces. I can't tell you how many bug reports I've fixed by adding the quotes to existing code.
[[ ... ]] expressions use file name tests, not regular expressions. The closest you can get to [[ ! $1 =~ ^[0-9]+$ ]]; is using [[ "$1" != [0-9]* ]] && [[ "$1" != *[^0-9]* ]].
But I suspect ! expr >/dev/null "$i" : '[0-9][0-9]*$'; is more what you want as "expr" does use regular expressions. Don't enclose in []s. Used [0-9][0-9]* rather than [0-9]+ as "+" has given me mixed successes across all dialects of UNIX regular expressions.

Multiple matches in a string using regex in bash

Been looking for some more advanced regex info on regex with bash and have not found much information on it.
Here's the concept, with a simple string:
myString="DO-BATCH BATCH-DO"
if [[ $myString =~ ([[:alpha:]]*)-([[:alpha:]]*) ]]; then
echo ${BASH_REMATCH[1]} #first perens
echo ${BASH_REMATCH[2]} #second perens
echo ${BASH_REMATCH[0]} #full match
fi
outputs:
BATCH
DO
DO-BATCH
So fine it does the first match (BATCH-DO) but how do I pull a second match (DO-BATCH)? I'm just drawing a blank here and can not find much info on bash regex.
OK so one way I did this is to put it in a for loop:
myString="DO-BATCH BATCH-DO"
for aString in ${myString[#]}; do
if [[ ${aString} =~ ([[:alpha:]]*)-([[:alpha:]]*) ]]; then
echo ${BASH_REMATCH[1]} #first perens
echo ${BASH_REMATCH[2]} #second perens
echo ${BASH_REMATCH[0]} #full match
fi
done
which outputs:
DO
BATCH
DO-BATCH
BATCH
DO
BATCH-DO
Which works but I kind of was hoping to pull it all from one regex if possible.
In your answer, myString is not an array, but you use an array reference to access it. This works in Bash because the 0th element of an array can be referred to by just the variable name and vice versa. What that means is that you could use:
for aString in $myString; do
to get the same result in this case.
In your question, you say the output includes "BATCH-DO". I get "DO-BATCH" so I presume this was a typo.
The only way to get the extra strings without using a for loop is to use a longer regex. By the way, I recommend putting Bash regexes in variable. It makes certain types much easier to use (those the contain whitespace or special characters, for example.
pattern='(([[:alpha:]]*)-([[:alpha:]]*)) +(([[:alpha:]]*)-([[:alpha:]]*))'
[[ $myString =~ $pattern ]]
declare -p BASH_REMATCH #dump the array
Outputs:
declare -ar BASH_REMATCH='([0]="DO-BATCH BATCH-DO" [1]="DO-BATCH" [2]="DO" [3]="BATCH" [4]="BATCH-DO" [5]="BATCH" [6]="DO")'
The extra set of parentheses is needed if you want to capture the individual substrings as well as the hyphenated phrases. If you don't need the individual words, you can eliminate the inner sets of parentheses.
Notice that you don't need to use if if you only need to extract substrings. You only need if to take conditional action based on a match.
Also notice that ${BASH_REMATCH[0]} will be quite different with the longer regex since it contains the whole match.
Per #Dennis Williamson's post I messed around and ended up with the following:
myString="DO-BATCH BATCH-DO"
pattern='(([[:alpha:]]*)-([[:alpha:]]*)) +(([[:alpha:]]*)-([[:alpha:]]*))'
[[ $myString =~ $pattern ]] && { read -a myREMatch <<< ${BASH_REMATCH[#]}; }
echo "\${myString} -> ${myString}"
echo "\${#myREMatch[#]} -> ${#myREMatch[#]}"
for (( i = 0; i < ${#myREMatch[#]}; i++ )); do
echo "\${myREMatch[$i]} -> ${myREMatch[$i]}"
done
This works fine except myString must have the 2 values to be there. So I post this because its is kinda interesting and I had fun messing with it. But to get this more generic and address any amount of paired groups (ie DO-BATCH) I'm going to go with a modified version of my original answer:
myString="DO-BATCH BATCH-DO"
myRE="([[:alpha:]]*)-([[:alpha:]]*)"
read -a myString <<< $myString
for aString in ${myString[#]}; do
echo "\${aString} -> ${aString}"
if [[ ${aString} =~ ${myRE} ]]; then
echo "\${BASH_REMATCH[#]} -> ${BASH_REMATCH[#]}"
echo "\${#BASH_REMATCH[#]} -> ${#BASH_REMATCH[#]}"
for (( i = 0; i < ${#BASH_REMATCH[#]}; i++ )); do
echo "\${BASH_REMATCH[$i]} -> ${BASH_REMATCH[$i]}"
done
fi
done
I would have liked a perlre like multiple match but this works fine.
Although this is a year old question (without accepted answer), could the regex pattern be simplified to:
myRE="([[:alpha:]]*-[[:alpha:]]*)"
by removing the inner parenthesis to find a smaller (more concise) set of the words DO-BATCH and BATCH-DO?
It works for me in you 18:10 time answer. ${BASH_REMATCH[0]} and ${BASH_REMATCH[1]} result in the 2 words being found.
In case you don't actually know how many matches there will be ahead of time, you can use this:
#!/bin/bash
function handle_value {
local one=$1
local two=$2
echo "i found ${one}-${two}"
}
function match_all {
local current=$1
local regex=$2
local handler=$3
while [[ ${current} =~ ${regex} ]]; do
"${handler}" "${BASH_REMATCH[#]:1}"
# trim off the portion already matched
current="${current#${BASH_REMATCH[0]}}"
done
}
match_all \
"DO-BATCH BATCH-DO" \
'([[:alpha:]]*)-([[:alpha:]]*)[[:space:]]*' \
'handle_value'

Resources