Pig Latin Punctuation - bash

I have a file which takes the text of another file and converts it to pig latin, and it works fine, except that punctuation in words like:
what's becomes atwhay'say instead of what I want which would be for
what's to become at'swhay.
Code:
echo $(
for i in `cat $1`
do
if [[ $i =~ ^[B-DF-HJ-NP-TV-XZ] ]]
then
echo $i | sed -e 's/\(\<[B-DF-HJ-NP-TV-XZ]\+\)\([[:alpha:]]\)\([[:alpha:]]*\>\)/\U\2\L\3\L\1ay/g'
elif [[ $i =~ ^[b-df-hj-np-tv-xz] ]]
then
echo $i | sed -e 's/\(\<[b-df-hj-np-tv-xz]\+\)\([[:alpha:]]*\>\)/\2\1ay/g'
else
echo $i | sed -e 's/\(\<[[:alpha:]]*\>\)/\1yay/g'
fi
done
)
Thanks to anyone who can help me with this punctuation error

Related

Check if a string contains "-" and "]" at the same time

I have the next two regex in Bash:
1.^[-a-zA-Z0-9\,\.\;\:]*$
2.^[]a-zA-Z0-9\,\.\;\:]*$
The first matches when the string contains a "-" and the other values.
The second when contains a "]".
I put this values at the beginning of my regex because I can't scape them.
How I can get match the two values at the same time?
You can also place the - at the end of the bracket expression, since a range must be closed on both ends.
^[]a-zA-Z0-9,.;:-]*$
You don't have to escape any of the other characters, either. Colons, semicolons, and commas have no special meaning in any part of a regular expression, and while a period loses its special meaning inside a bracket expression.
Basically you can use this:
grep -E '^.*\-.*\[|\[.*\-.*$'
It matches either a - followed by zero or more arbitrary chars and a [ or a [ followed by zero or more chars and a -
However since you don't accept arbitrary chars, you need to change it to:
grep -E '^[a-zA-Z0-9,.;:]*\-[a-zA-Z0-9,.;:]*\[|\[[a-zA-Z0-9,.;:]*\-[a-zA-Z0-9,.;:]*$'
Maybe, this can help you
#!/bin/bash
while read p; do
echo $p | grep -E '\-.*\]|\].*\-' | grep "^[]a-zA-Z0-9,.;:-]*$"
done <$1
user-host:/tmp$ cat test
-i]string
]adfadfa-
string-
]string
str]ing
]123string
123string-
?????
++++++
user-host:/tmp$ ./test.sh test
-i]string
]adfadfa-
There are two questions in your post.
One is in the description:
How I can get match the two values at the same time?
That is an OR match, which could be done with a range that mix your two ranges:
pattern='^[]a-zA-Z0-9,.;:-]*$'
That will match a line that either contains one (or several) -…OR…]…OR any of the included characters. That would be all the lines (except ?????, ++++++ and as df gh) in the test script below.
Two is in the title:
… a string contains “-” and “]” at the same time
That is an AND match. The simplest (and slowest) way to do it is:
echo "$line" | grep '-' | grep ']' | grep '^[-a-zA-Z0-9,.;:]*$'
The first two calls to grep select only the lines that:
contain both (one or several) - and (one or several) ]
Test script:
#!/bin/bash
printlines(){
cat <<-\_test_lines_
asdfgh
asdfgh-
asdfgh]
as]df
as,df
as.df
as;df
as:df
as-df
as]]]df
as---df
asAS]]]DFdf
as123--456DF
as,.;:-df
as-dfg]h
as]dfg-h
a]s]d]f]g]h
a]s]d]f]g]h-
s-t-r-i-n-g]
as]df-gh
123]asdefgh
123asd-fgh-
?????
++++++
as df gh
_test_lines_
}
pattern='^[]a-zA-Z0-9,.;:-]*$'
printf '%s\n' "Testing the simple pattern of $pattern"
while read line; do
resultgrep="$( echo "$line" | grep "$pattern" )"
printf '%13s %-13s\n' "$line" "$resultgrep"
done < <(printlines)
echo "#############################################################"
echo
p1='-'; p2=']'; p3='^[]a-zA-Z0-9,.;:-]*$'
printf '%s\n' "Testing a 'grep AND' of '$p1', '$p2' and '$p3'."
while read line; do
resultgrep="$( echo "$line" | grep "$p1" | grep "$p2" | grep "$p3" )"
[[ $resultgrep ]] && printf '%13s %-13s\n' "$line" "$resultgrep"
done < <(printlines)
echo "#############################################################"
echo
printf '%s\n' "Testing an 'AWK AND' of '$p1', '$p2' and '$p3'."
while read line; do
resultawk="$( echo "$line" |
awk -v p1="$p1" -v p2="$p2" -v p3="$p3" '$0~p1 && $0~p2 && $0~p3' )"
[[ $resultawk ]] && printf '%13s %-13s\n' "$line" "$resultawk"
done < <(printlines)
echo "#############################################################"
echo
printf '%s\n' "Testing a 'bash AND' of '$p1', '$p2' and '$p3'."
while read line; do
rgrep="$( echo "$line" | grep "$p1" | grep "$p2" | grep "$p3" )"
[[ ( $line =~ $p1 ) && ( $line =~ $p2 ) && ( $line =~ $p3 ) ]]
rbash=${BASH_REMATCH[0]}
[[ $rbash ]] && printf '%13s %-13s %-13s\n' "$line" "$rgrep" "$rbash"
done < <(printlines)
echo "#############################################################"
echo

How to check if string contains more than one special character

I have this
if [[ ! $newstring == *['!'##\$%^\&*()_+]* ]]
then
echo Error - Does not contain One Special Character - $newstring
i=$((i+1))
fi
Which checks if the string only has one single character from the bank, i want to check if it has more than one?
What would be the best way?
Either add a second class
if [[ "$newstring" != *['!'##\$%^\&*\(\)_+]*['!'##\$%^\&*\(\)_+]* ]]
or strip anything else out and check length
t="${newstring//[^!##\$%^\&*()_+]}"
if [ ${#t} -lt 2 ]
We can use tr to solve it.
$ string='Hello-World_12#$##*&%)(!####'
$ number=$(( $(tr -d '[[:alnum:]]' <<< "$string"|wc -m) - 1 ))
$ echo "We have $number of special characters"
$ 16
This should be short and faster.
#!/bin/bash
a='!*#%6789';
if [[ `echo $a | sed "s/\(.\)/\1\n/g"|grep -c "[[:punct:]]"` -gt 1 ]]; then echo shenzi; else echo koba; fi
grep can be useful to provide the match
grep -oP "^[^'\!'##\$%^\&*()_+]*['\!'##\$%^\&*()_+][^'\!'##\$%^\&*()_+]+$"
test
$ echo "#asdfasdf234" | grep -oP "^[^'\!'##\$%^\&*()_+]*['\!'##\$%^\&*()_+][^'\!'##\$%^\&*()_+]+$"
will match the string as
#asdfasdf234
$ echo "#asdf#asdf234" | grep -oP "^[^'\!'##\$%^\&*()_+]*['\!'##\$%^\&*()_+][^'\!'##\$%^\&*()_+]+$"
will not match the string
The if construct can be
echo $newstring| grep -oP "^[^'\!'##\$%^\&*()_+]*['\!'##\$%^\&*()_+][^'\!'##\$%^\&*()_+]+$"
if [[ $? -eq 0 ]] > /dev/null
then
echo Error - Does not contain One Special Character - $newstring
i=$((i+1))
fi
Here the regex
^[^'\!'##\$%^\&*()_+]*['\!'##\$%^\&*()_+][^'\!'##\$%^\&*()_+]+$
matches all strings with exact one occurence of the special character

Finding patterns within array

I have an array of elements and I would like to find all elements that have the following form:
$i or ${i}
Where i can be any natural number?
Can this be achieved without using AWK?
You can do this using grep if you prefer. For instance:
a=('$1' '$3' '$(4)' '5' 'a' '$a' '$1' '${52}')
for i in ${a[*]}; do
if [ $(echo "$i" | grep -E "^[$][0-9]+$") ]; then # First possible pattern
echo "$i"
elif [ $(echo "$i" | grep -E "^[$]{[0-9]+}$") ]; then # Second possible pattern
echo "$i"
fi
done
Output:
$1
$3
$1
${52}
#!/bin/bash
ARRAY=('a' '1' '$1' '${1}')
FOUND=()
for __ in "${ARRAY[#]}"; do
[[ $__ =~ ^[$]([0-9]+|[{][0-9]+[}])$ ]] && FOUND+=("$__")
done
echo "Found: ${FOUND[*]}"
Output:
Found: $1 ${1}

count words in a file without using wc

Working in a shell script here, trying to count the number of words/characters/lines in a file without using the wc command. I can get the file broken into lines and count those easy enough, but I'm struggling here to get the words and the characters.
#define word_count function
count_stuff(){
c=0
w=0
l=0
local f="$1"
while read Line
do
l=`expr $line + 1`
# now that I have a line I want to break it into words and characters???
done < "$f"
echo "Number characters: $chars"
echo "Number words: $words"
echo "Number lines: $line"
}
As for characters, try this (adjust echo "test" to where you get your output from):
expr `echo "test" | sed "s/./ + 1/g;s/^/0/"`
As for lines, try this:
expr `echo -e "test\ntest\ntest" | sed "s/^.*$/./" | tr -d "\n" | sed "s/./ + 1/g;s/^/0/"`
===
As for your code, you want something like this to count words (if you want to go at it completely raw):
while read line ; do
set $line ;
while true ; do
[ -z $1 ] && break
l=`expr $l + 1`
shift ;
done ;
done
You can do this with the following Bash shell script:
count=0
for var in `cat $1`
do
count=`echo $count+1 | bc`
done
echo $count

Handling wildcard expansion from a string array in a bash shell scripting

Following is a sample script that I have written
line="/path/IntegrationFilter.java:150: * <td>http://abcd.com/index.do</td>"
echo "$line" <-- "$line" prints the text correctly
result_array=( `echo "$line"| sed 's/:/\n/1' | sed 's/:/\n/1'`)
echo "${result_array[0]}"
echo "${result_array[1]}"
echo "${result_array[2]}" <-- prints the first filename in the directory due to wildcard character * .
How to get the text "* http://abcd.com/index.do " printed instead of the filename when retrieved from an array?
Assuming bash is the right tool, there are a few ways:
disable filename expansion temporarily
use read with IFS
use the substitution feature of bash expansion
Disabling expansion:
line="/path/IntegrationFilter.java:150: * <td>http://abcd.com/index.do</td>"
set -f
OIFS=$IFS
IFS=$'\n'
result_array=( `echo "$line"| sed 's/:/\n/1' | sed 's/:/\n/1'`)
IFS=$OIFS
set +f
echo "${result_array[0]}"
echo "${result_array[1]}"
echo "${result_array[2]}"
(note we also had to set IFS, otherwise each part of the contents ends up in result_array[2], [3], [4], etc.)
Using read:
line="/path/IntegrationFilter.java:150: * <td>http://abcd.com/index.do</td>"
echo "$line"
IFS=: read file number match <<<"$line"
echo "$file"
echo "$number"
echo "$match"
Using bash parameter expansion/substitution:
line="/path/IntegrationFilter.java:150: * <td>http://abcd.com/index.do</td>"
rest="$line"
file=${rest%%:*}
[ "$file" = "$line" ] && echo "Error"
rest=${line#$file:}
number=${rest%%:*}
[ "$number" = "$rest" ] && echo "Error"
rest=${rest#$number:}
match=$rest
echo "$file"
echo "$number"
echo "$match"
How about:
$ line='/path/IntegrationFilter.java:150: * <td>http://abcd.com/index.do</td>'
$ echo "$line" | cut -d: -f3-
* <td>http://abcd.com/index.do</td>

Resources