Check if a string contains "-" and "]" at the same time - bash

I have the next two regex in Bash:
1.^[-a-zA-Z0-9\,\.\;\:]*$
2.^[]a-zA-Z0-9\,\.\;\:]*$
The first matches when the string contains a "-" and the other values.
The second when contains a "]".
I put this values at the beginning of my regex because I can't scape them.
How I can get match the two values at the same time?

You can also place the - at the end of the bracket expression, since a range must be closed on both ends.
^[]a-zA-Z0-9,.;:-]*$
You don't have to escape any of the other characters, either. Colons, semicolons, and commas have no special meaning in any part of a regular expression, and while a period loses its special meaning inside a bracket expression.

Basically you can use this:
grep -E '^.*\-.*\[|\[.*\-.*$'
It matches either a - followed by zero or more arbitrary chars and a [ or a [ followed by zero or more chars and a -
However since you don't accept arbitrary chars, you need to change it to:
grep -E '^[a-zA-Z0-9,.;:]*\-[a-zA-Z0-9,.;:]*\[|\[[a-zA-Z0-9,.;:]*\-[a-zA-Z0-9,.;:]*$'

Maybe, this can help you
#!/bin/bash
while read p; do
echo $p | grep -E '\-.*\]|\].*\-' | grep "^[]a-zA-Z0-9,.;:-]*$"
done <$1
user-host:/tmp$ cat test
-i]string
]adfadfa-
string-
]string
str]ing
]123string
123string-
?????
++++++
user-host:/tmp$ ./test.sh test
-i]string
]adfadfa-

There are two questions in your post.
One is in the description:
How I can get match the two values at the same time?
That is an OR match, which could be done with a range that mix your two ranges:
pattern='^[]a-zA-Z0-9,.;:-]*$'
That will match a line that either contains one (or several) -…OR…]…OR any of the included characters. That would be all the lines (except ?????, ++++++ and as df gh) in the test script below.
Two is in the title:
… a string contains “-” and “]” at the same time
That is an AND match. The simplest (and slowest) way to do it is:
echo "$line" | grep '-' | grep ']' | grep '^[-a-zA-Z0-9,.;:]*$'
The first two calls to grep select only the lines that:
contain both (one or several) - and (one or several) ]
Test script:
#!/bin/bash
printlines(){
cat <<-\_test_lines_
asdfgh
asdfgh-
asdfgh]
as]df
as,df
as.df
as;df
as:df
as-df
as]]]df
as---df
asAS]]]DFdf
as123--456DF
as,.;:-df
as-dfg]h
as]dfg-h
a]s]d]f]g]h
a]s]d]f]g]h-
s-t-r-i-n-g]
as]df-gh
123]asdefgh
123asd-fgh-
?????
++++++
as df gh
_test_lines_
}
pattern='^[]a-zA-Z0-9,.;:-]*$'
printf '%s\n' "Testing the simple pattern of $pattern"
while read line; do
resultgrep="$( echo "$line" | grep "$pattern" )"
printf '%13s %-13s\n' "$line" "$resultgrep"
done < <(printlines)
echo "#############################################################"
echo
p1='-'; p2=']'; p3='^[]a-zA-Z0-9,.;:-]*$'
printf '%s\n' "Testing a 'grep AND' of '$p1', '$p2' and '$p3'."
while read line; do
resultgrep="$( echo "$line" | grep "$p1" | grep "$p2" | grep "$p3" )"
[[ $resultgrep ]] && printf '%13s %-13s\n' "$line" "$resultgrep"
done < <(printlines)
echo "#############################################################"
echo
printf '%s\n' "Testing an 'AWK AND' of '$p1', '$p2' and '$p3'."
while read line; do
resultawk="$( echo "$line" |
awk -v p1="$p1" -v p2="$p2" -v p3="$p3" '$0~p1 && $0~p2 && $0~p3' )"
[[ $resultawk ]] && printf '%13s %-13s\n' "$line" "$resultawk"
done < <(printlines)
echo "#############################################################"
echo
printf '%s\n' "Testing a 'bash AND' of '$p1', '$p2' and '$p3'."
while read line; do
rgrep="$( echo "$line" | grep "$p1" | grep "$p2" | grep "$p3" )"
[[ ( $line =~ $p1 ) && ( $line =~ $p2 ) && ( $line =~ $p3 ) ]]
rbash=${BASH_REMATCH[0]}
[[ $rbash ]] && printf '%13s %-13s %-13s\n' "$line" "$rgrep" "$rbash"
done < <(printlines)
echo "#############################################################"
echo

Related

How to cut variables which are beteween quotes from a string

I had problem with cut variables from string in " quotes. I have some scripts to write for my sys classes, I had a problem with a script in which I had to read input from the user in the form of (a="var1", b="var2")
I tried the code below
#!/bin/bash
read input
a=$($input | cut -d '"' -f3)
echo $a
it returns me a error "not found a command" on line 3 I tried to double brackets like
a=$(($input | cut -d '"' -f3)
but it's still wrong.
In a comment the OP gave a working answer (should post it as an answer):
#!/bin/bash
read input
a=$(echo $input | cut -d '"' -f2)
b=$(echo $input | cut -d '"' -f4)
echo sum: $(( a + b))
echo difference: $(( a - b))
This will work for user input that is exactly like a="8", b="5".
Never trust input.
You might want to add the check
if [[ ${input} =~ ^[a-z]+=\"[0-9]+\",\ [a-z]+=\"[0-9]+\"$ ]]; then
echo "Use your code"
else
echo "Incorrect input"
fi
And when you add a check, you might want to execute the input (after replacing the comma with a semicolon).
input='testa="8", testb="5"'
if [[ ${input} =~ ^[a-z]+=\"[0-9]+\",\ [a-z]+=\"[0-9]+\"$ ]];
then
eval $(tr "," ";" <<< ${input})
set | grep -E "^test[ab]="
else
echo no
fi
EDIT:
#PesaThe commented correctly about BASH_REMATCH:
When you use bash and a test on the input you can use
if [[ ${input} =~ ^[a-z]+=\"([0-9]+)\",\ [a-z]+=\"([0-9])+\"$ ]];
then
a="${BASH_REMATCH[1]}"
b="${BASH_REMATCH[2]}"
fi
To extract the digit 1 from a string "var1" you would use a Bash substring replacement most likely:
$ s="var1"
$ echo "${s//[^0-9]/}"
1
Or,
$ a="${s//[^0-9]/}"
$ echo "$a"
1
This works by replacing any non digits in a string with nothing. Which works in your example with a single number field in the string but may not be what you need if you have multiple number fields:
$ s2="1 and a 2 and 3"
$ echo "${s2//[^0-9]/}"
123
In this case, you would use sed or grep awk or a Bash regex to capture the individual number fields and keep them distinct:
$ echo "$s2" | grep -o -E '[[:digit:]]+'
1
2
3

Wrapping hunspell to stem a large number of words efficiently?

I have written a script for stemming English words, it does a decent job but it takes forever when I use it on big files, which have more than 1000 words, one per line. Are there ways to speed it up? Maybe a different approach altogether? Different programming language? Different stemmer?
file=$1
while read -r a
do
b="$(echo "$a" | hunspell -s -d en_US | wc -l)"
if [[ "$b" -eq 2 ]]
then
g="$(echo "$a" | hunspell -s -d en_US | wc -w)"
if [[ "$g" -eq 1 ]]
then
echo "$a" | hunspell -s -d en_US | awk 'FNR==1 {print $1}'
else
echo "$a" | hunspell -s -d en_US | awk 'FNR==1 {print $2}'
fi
else
if [[ "$a" == *ing ]] || [[ "$a" == *ed ]]
then
echo "$a" | hunspell -s -d en_US | awk 'FNR==2 {print $2}'
else
echo "$a" | hunspell -s -d en_US | awk 'FNR==1 {print $1}'
fi
fi
done < "$file"
Here's an example of what it does.
input file
cliché
womb
range
strain
fiddle
coup
earnest
touched
gave
dazzling
blindfolded
stagger
buying
insignia
output
cliché
womb
range
strain
fiddle
coup
earnest
touch
give
dazzle
blindfold
stagger
buy
insignia
How it works
If you run hunspell -s -d en_US word, it can give you different results depending on a word. Options, and actions to take, follow:
One line with one word (print that word)
One line with two words (print second word)
Two lines with two words; ends with "ing" or "ed" (print second word on second line)
Two lines with two words; not ending with "ing" or "ed" (print first word on first line)
The following emits the exact same output (but for changing gave to give, which my hunspell appears not to have in its dictionary) -- and far, far faster:
last_word=; stems=( )
while read -r word stem _; do
if [[ $word ]]; then
last_word=$word
[[ $stem ]] && stems+=( "$stem" )
else
if (( ${#stems[#]} == 0 )); then
printf '%s\n' "$last_word" # no stems available; print input word
elif (( ${#stems[#]} == 1 )); then
printf '%s\n' "${stems[0]}" # found one stem; print it.
else
case $last_word in
*ing|*ed) printf '%s\n' "${stems[1]}" ;; # "ing" or "ed": print the 2nd stem
*) printf '%s\n' "${stems[0]}" ;; # otherwise: print the 1st stem
esac
fi
stems=( )
fi
done < <(hunspell -s -d en_US <"$1")
Note that this runs hunspell only once for the whole file, not once per word; it's restarting hunspell over and over, not anything to do with bash, where your script is spending all its time.

How to check if string contains more than one special character

I have this
if [[ ! $newstring == *['!'##\$%^\&*()_+]* ]]
then
echo Error - Does not contain One Special Character - $newstring
i=$((i+1))
fi
Which checks if the string only has one single character from the bank, i want to check if it has more than one?
What would be the best way?
Either add a second class
if [[ "$newstring" != *['!'##\$%^\&*\(\)_+]*['!'##\$%^\&*\(\)_+]* ]]
or strip anything else out and check length
t="${newstring//[^!##\$%^\&*()_+]}"
if [ ${#t} -lt 2 ]
We can use tr to solve it.
$ string='Hello-World_12#$##*&%)(!####'
$ number=$(( $(tr -d '[[:alnum:]]' <<< "$string"|wc -m) - 1 ))
$ echo "We have $number of special characters"
$ 16
This should be short and faster.
#!/bin/bash
a='!*#%6789';
if [[ `echo $a | sed "s/\(.\)/\1\n/g"|grep -c "[[:punct:]]"` -gt 1 ]]; then echo shenzi; else echo koba; fi
grep can be useful to provide the match
grep -oP "^[^'\!'##\$%^\&*()_+]*['\!'##\$%^\&*()_+][^'\!'##\$%^\&*()_+]+$"
test
$ echo "#asdfasdf234" | grep -oP "^[^'\!'##\$%^\&*()_+]*['\!'##\$%^\&*()_+][^'\!'##\$%^\&*()_+]+$"
will match the string as
#asdfasdf234
$ echo "#asdf#asdf234" | grep -oP "^[^'\!'##\$%^\&*()_+]*['\!'##\$%^\&*()_+][^'\!'##\$%^\&*()_+]+$"
will not match the string
The if construct can be
echo $newstring| grep -oP "^[^'\!'##\$%^\&*()_+]*['\!'##\$%^\&*()_+][^'\!'##\$%^\&*()_+]+$"
if [[ $? -eq 0 ]] > /dev/null
then
echo Error - Does not contain One Special Character - $newstring
i=$((i+1))
fi
Here the regex
^[^'\!'##\$%^\&*()_+]*['\!'##\$%^\&*()_+][^'\!'##\$%^\&*()_+]+$
matches all strings with exact one occurence of the special character

How to check if word is in alphabetical order

I 'd like to find a bash only (no sed, awk, perl, ...) for finding out if a word is in alphabetical order, in other words every letter is.
example:
bdjkz is true,
ahjmno is true,
sdgla is false.
I'm already struggling just comparing ascii values for characters, so if anyone could point me in the right direction for that it would help a lot!
Thanks
Pure bash solution (no external tool used), using Parameter Expansion to address characters inside strings:
function compare () {
word=$1
for (( pos=0; pos<${#word}-1; pos++ )) ; do
[[ ${word:pos:1} < ${word:pos+1:1} ]] || return 1
done
return 0
}
Tested with
for word in bdjkz ahjmno sdgla ; do
if compare $word ; then
echo $word ordered
else
echo $word not ordered
fi
done
If you can utilize other command line tools (but not awk, sed, perl), you can try:
[[ "YOURSTRING" = "$(echo "YOURSTRING" | grep -o '.' | sort -n |tr -d '\n')" ]] && \
echo "Alphabetic order"
[[ ... ]] is testing the expresion
"YOURSTRING" = string comparison
"$( ... )" capture the inner workings output in a string
echo "YOURSTRING" | grep -o '.' print every character on a line from "YOURSTRING" (-o '.': print only the matches for any single character - NOTE: you might need a new version of grep for this option)
... sort -n | sort the output from 4.
... tr -d '\n' rejoin the characters from 5. (by deleting the trailing new line characters)
You can use:
p='bdjkz'
q=$(fold -w1 <<< "$p"|sort|tr -d "\n")
[[ "$p" == "$q" ]] && echo "in alphabetical order" || echo "not in alphabetical order"
s=($(echo "existingString" | grep -o .)) # put each character of input string in an array.
k=($(printf '%s\n' "${s[#]}" | sort)) # sorts the input string
if [[ "${s[*]}" == "${k[*]}" ]]; then # comparing the input string array with sorted array
echo "alphabetical"
else
echo "not alphabetical"
fi

count words in a file without using wc

Working in a shell script here, trying to count the number of words/characters/lines in a file without using the wc command. I can get the file broken into lines and count those easy enough, but I'm struggling here to get the words and the characters.
#define word_count function
count_stuff(){
c=0
w=0
l=0
local f="$1"
while read Line
do
l=`expr $line + 1`
# now that I have a line I want to break it into words and characters???
done < "$f"
echo "Number characters: $chars"
echo "Number words: $words"
echo "Number lines: $line"
}
As for characters, try this (adjust echo "test" to where you get your output from):
expr `echo "test" | sed "s/./ + 1/g;s/^/0/"`
As for lines, try this:
expr `echo -e "test\ntest\ntest" | sed "s/^.*$/./" | tr -d "\n" | sed "s/./ + 1/g;s/^/0/"`
===
As for your code, you want something like this to count words (if you want to go at it completely raw):
while read line ; do
set $line ;
while true ; do
[ -z $1 ] && break
l=`expr $l + 1`
shift ;
done ;
done
You can do this with the following Bash shell script:
count=0
for var in `cat $1`
do
count=`echo $count+1 | bc`
done
echo $count

Resources