Need help matching a mattern using grep/egrep in bash scripting - bash

I am trying to match all characters of given string but those characters should match in the order as given to the bash script.
while [[ $# -gt 0 ]]; do
case $1 in
-i)
arg=$2
egrep "*[$arg]*" words.txt
shift ;;
esac
shift
done
$ sh match_the_pattern.sh -i aei words.txt
Should return words like
abstentious
adventitious
sacrilegiousness
If you notice, first a is matched then e and then i, all of them are in order. Plus, the whole word is matched and filtered.

You may use getopts with some bash parameter substitution to construct the query string.
#!/bin/bash
while getopts 'i:' choice
do
case "${choice}" in
i)
length=${#OPTARG}
for((count=0;count<length;count++))
do
if [ $count -eq 0 ]
then
pattern="${pattern}.*${OPTARG:count:1}.*"
else
pattern="${pattern}${OPTARG:count:1}.*"
fi
done
;;
esac
done
# The remaining parameter should be our filename
shift $(($OPTIND - 1))
filename="$1"
# Some error checking based on the parsed values
# Ideally user input should not be trusted, so a security check should
# also be done,omitting that for brevity.
if [ -z "$pattern" ] || [ -z "$filename" ]
then
echo "-i is must. Also, filename cannot be empty"
echo "Run the script like ./scriptname -i 'value' -- filename"
else
grep -i "${pattern}" "$filename"
fi
Refer this to know more on parameter substitution and this for getopts.

Change this:
arg=$2
egrep "*[$arg]*" words.txt
to this:
arg=$(sed 's/./.*[&]/g' <<< "$2")
grep "$arg" words.txt
If that's not all you need then edit your question to clarify your requirements and provide more truly representative sample input/output.

Your regex is matching 'a' or 'e' or 'i' because they are in a character set ([]).
I think the regular expression you are looking for is
a+.*e+.*i+.*
which matches 'a' one or more times, then anything, then 'e' one or more times, then anything, then 'i' one or more times.

Related

Bash command to specify which egrep to search

I'm trying to create two separate options to search through a file, one for phone numbers and one for emails. Nothing seems to happen when I run the file at the moment.
#!/bin/sh
while getopts ":-e:-p:" option; do
case $option in
-e) egrep -o "\b[a-zA-Z0-9.-]+#[a-zA-Z0-9.-]+.[a-zA-Z0-9.-]+\b" $2 ;;
-p) egrep -o "^((\([0-9]{3}\) )|([0-9]{3}\-))[0-9]{3}\-[0-9]{4}$" $2 ;;
esac
done
Consider using variables to reduce duplication, and enforce that the two options are mutually exclusive (which I think is what your description says). I also escaped the period in your email regex with \.:
#!/bin/sh
while getopts 'e:p:' option
do
case "$option" in
e)
regex="\b[a-zA-Z0-9.-]+#[a-zA-Z0-9.-]+\.[a-zA-Z0-9.-]+\b"
file=$OPTARG
;;
p)
regex="^((\([0-9]{3}\) )|([0-9]{3}\-))[0-9]{3}\-[0-9]{4}$"
file=$OPTARG
;;
esac
done
if [ -z "$regex" ]
then
# error handling
fi
egrep -o "$regex" "$file"
Use arguments instead of options if email or phone is required (sneaked in alternative long names for documentation):
#!/bin/sh
case "$1" in
e|email)
regex="\b[a-zA-Z0-9.-]+#[a-zA-Z0-9.-]+\.[a-zA-Z0-9.-]+\b"
;;
p|phone)
regex="^((\([0-9]{3}\) )|([0-9]{3}\-))[0-9]{3}\-[0-9]{4}$"
;;
esac
if [ -z "$regex" ]
then
echo e or p required
exit 1
fi
if [ -z "$2" ]
then
echo file required
exit 1
fi
file=$2
egrep -o "$regex" "$file"
I reverse engineered your regex to generate the sample data that I asked you for earlier:
cat >input.txt
a bc#de.fg h
(123) 456-7890
123-456-7890
^D
and here is the result of running the 2nd script over the test data:
./Myfile.sh e input.txt
bc#de.fg
./Myfile.sh p input.txt
(123) 456-7890
123-456-7890
Btw, it's a good idea to leave out the .sh suffix. This allows you to seamless rewrite your program in another language should the need arise.

sh: Is it safe to use a variable as a command if the command contains only letters, number and underscores?

I'm writing a POSIX compliant script in dash so I am having to get creative with using fake arrays.
Contents of fake_array.sh
fake_array_job() {
array="$1"
job_name="$2"
comma_count="$(echo "$array" | grep -o -F ',' | wc -l)"
if [ "$comma_count" -lt '1' ]; then
echo 'You gave a fake array to fake_array_job that does not contain at least one comma. Exiting...'
exit
fi
array_count="$(( comma_count + 1 ))"
position=1
while [ "$position" -le "$array_count" ]; do
item="$(echo "$array" | cut -d ',' -f "$position")"
"$job_name" || exit
position="$(( position + 1 ))"
done
}
Contents of script.sh
#!/bin/sh
. fake_array.sh
job_to_do() {
echo "$item"
}
fake_array_job 'goat,pig,sheep' 'job_to_do'
second_job() {
echo "$item"
}
fake_array_job 'apple,orange' 'second_job'
I am aware that it may seem silly to use a unique name for each job I pass to fake_array_job, but I like that I have to type it twice because it helps to reduce human error.
I keep reading that it is a bad idea to use a variable as a command. Does my use of "$job_name" to run a function have any negative implications as it concerns stability, security or efficiency?
(Read to the end for a good suggestion by Charles Duffy. I'm too lazy to completely rewrite my answer to mention it earlier...)
You can iterate over the "array" using simple parameter expansions without requiring multiple elements in the array.
fake_array_job() {
args=${1%,}, # Ensure the array ends with a comma
job_name=$2
while [ -n "$args" ]; do
item=${args%%,*}
"$job_name" || exit
args=${args#*,}
done
}
One problem with the above is that assures that the array is comma-terminated by assuming that foo,bar, is not a comma-delimited array with an empty last element. A better (though uglier) solution is to use read to break up the array.
fake_array_job () {
args=$1
job_name=$2
rest=$args
while [ -n "$rest" ]; do
IFS=, read -r item rest <<EOF
$rest
EOF
"$job_name" || exit
done
}
(You can use <<-EOF and make sure the here doc is indented with tabs, but it's hard to convey that here, so I'll just leave the ugly version.)
There's also Charles Duffy's good suggestion of using case to pattern match on the array to see if there are any commas left or not:
while [ -n "$args" ]; do
case $var in
*,*) next=${args%%,*}; var=${args#*,}; "$cmd" "$next";;
*) "$cmd" "$var"; break;;
esac;
done

How to concatenate the arguments and store it in a variable in unix? [duplicate]

This question already has answers here:
Concatenate all arguments and wrap them with double quotes
(6 answers)
Closed 5 years ago.
I would like to concatenate all the arguments passed to my bash script except the flag.
So for example, If the script takes inputs as follows:
./myBashScript.sh -flag1 exampleString1 exampleString2
I want the result to be "exampleString1_exampleString2"
I can do this for a predefined number of inputs (i.e. 2), but how can i do it for an arbitrary number of inputs?
function concatenate_args
{
string=""
for a in "$#" # Loop over arguments
do
if [[ "${a:0:1}" != "-" ]] # Ignore flags (first character is -)
then
if [[ "$string" != "" ]]
then
string+="_" # Delimeter
fi
string+="$a"
fi
done
echo "$string"
}
# Usage:
args="$(concatenate_args "$#")"
This is an ugly but simple solution:
echo $* | sed -e "s/ /_/g;s/[^_]*_//"
You can also use formatted strings to concatenate args.
# assuming flag is first arg and optional
flag=$1
[[ $1 = ${1#-} ]] && unset $flag || shift
concat=$(printf '%s_' ${#})
echo ${concat%_} # to remove the trailing _
nJoy!
Here's a piece of code that I'm actually proud of (it is very shell-style I think)
#!/bin/sh
firsttime=yes
for i in "$#"
do
test "$firsttime" && set -- && unset firsttime
test "${i%%-*}" && set -- "$#" "$i"
done
IFS=_ ; echo "$*"
I've interpreted your question so as to remove all arguments beginning with -
If you only want to remove the beginning sequence of arguments beginnnig with -:
#!/bin/sh
while ! test "${1%%-*}"
do
shift
done
IFS=_ ; echo "$*"
If you simply want to remove the first argument:
#!/bin/sh
shift
IFS=_ ; printf %s\\n "$*"
flag="$1"
shift
oldIFS="$IFS"
IFS="_"
the_rest="$*"
IFS="$oldIFS"
In this context, "$*" is exactly what you're looking for, it seems. It is seldom the correct choice, but here's a case where it really is the correct choice.
Alternatively, simply loop and concatenate:
flag="$1"
shift
the_rest=""
pad=""
for arg in "$#"
do
the_rest="${the_rest}${pad}${arg}"
pad="_"
done
The $pad variable ensures that you don't end up with a stray underscore at the start of $the_rest.
#!/bin/bash
paramCat () {
for s in "$#"
do
case $s in
-*)
;;
*)
echo -n _${s}
;;
esac
done
}
catted="$(paramCat "$#")"
echo ${catted/_/}

How to handle "--" in the shell script arguments?

This question has 3 parts, and each alone is easy, but combined together is not trivial (at least for me) :)
Need write a script what should take as its arguments:
one name of another command
several arguments for the command
list of files
Examples:
./my_script head -100 a.txt b.txt ./xxx/*.txt
./my_script sed -n 's/xxx/aaa/' *.txt
and so on.
Inside my script for some reason I need distinguish
what is the command
what are the arguments for the command
what are the files
so probably the most standard way write the above examples is:
./my_script head -100 -- a.txt b.txt ./xxx/*.txt
./my_script sed -n 's/xxx/aaa/' -- *.txt
Question1: Is here any better solution?
Processing in ./my_script (first attempt):
command="$1";shift
args=`echo $* | sed 's/--.*//'`
filenames=`echo $* | sed 's/.*--//'`
#... some additional processing ...
"$command" "$args" $filenames #execute the command with args and files
This solution will fail when the filenames will contain spaces and/or '--', e.g.
/some--path/to/more/idiotic file name.txt
Question2: How properly get $command its $args and $filenames for the later execution?
Question3: - how to achieve the following style of execution?
echo $filenames | $command $args #but want one filename = one line (like ls -1)
Is here nice shell solution, or need to use for example perl?
First of all, it sounds like you're trying to write a script that takes a command and a list of filenames and runs the command on each filename in turn. This can be done in one line in bash:
$ for file in a.txt b.txt ./xxx/*.txt;do head -100 "$file";done
$ for file in *.txt; do sed -n 's/xxx/aaa/' "$file";done
However, maybe I've misinterpreted your intent so let me answer your questions individually.
Instead of using "--" (which already has a different meaning), the following syntax feels more natural to me:
./my_script -c "head -100" a.txt b.txt ./xxx/*.txt
./my_script -c "sed -n 's/xxx/aaa/'" *.txt
To extract the arguments in bash, use getopts:
SCRIPT=$0
while getopts "c:" opt; do
case $opt in
c)
command=$OPTARG
;;
esac
done
shift $((OPTIND-1))
if [ -z "$command" ] || [ -z "$*" ]; then
echo "Usage: $SCRIPT -c <command> file [file..]"
exit
fi
If you want to run a command for each of the remaining arguments, it would look like this:
for target in "$#";do
eval $command \"$target\"
done
If you want to read the filenames from STDIN, it would look more like this:
while read target; do
eval $command \"$target\"
done
The $# variable, when quoted will be able to group parameters as they should be:
for parameter in "$#"
do
echo "The parameter is '$parameter'"
done
If given:
head -100 test this "File name" out
Will print
the parameter is 'head'
the parameter is '-100'
the parameter is 'test'
the parameter is 'this'
the parameter is 'File name'
the parameter is 'out'
Now, all you have to do is parse the loop out. You can use some very simple rules:
The first parameter is always the file name
The parameters that follow that start with a dash are parameters
After the "--" or once one doesn't start with a "-", the rest are all file names.
You can check to see if the first character in the parameter is a dash by using this:
if [[ "x${parameter}" == "x${parameter#-}" ]]
If you haven't seen this syntax before, it's a left filter. The # divides the two parts of the variable name. The first part is the name of the variable, and the second is the glob filter (not regular expression) to cut off. In this case, it's a single dash. As long as this statement isn't true, you know you have a parameter. BTW, the x may or may not be needed in this case. When you run a test, and you have a string with a dash in it, the test might mistake it for a parameter of the test and not the value.
Put it together would be something like this:
parameterFlag=""
for parameter in "$#" #Quotes are important!
do
if [[ "x${parameter}" == "x${parameter#-}" ]]
then
parameterFlag="Tripped!"
fi
if [[ "x${parameter}" == "x--" ]]
then
print "Parameter \"$parameter\" ends the parameter list"
parameterFlag="TRIPPED!"
fi
if [ -n $parameterFlag ]
then
print "\"$parameter\" is a file"
else
echo "The parameter \"$parameter\" is a parameter"
fi
done
Question 1
I don't think so, at least not if you need to do this for arbitrary commands.
Question 3
command=$1
shift
while [ $1 != '--' ]; do
args="$args $1"
shift
done
shift
while [ -n "$1" ]; do
echo "$1"
shift
done | $command $args
Question 2
How does that differ from question 3?

How to detect a filename within a case statement - in unix shell?

I coded the below code and if no -l or -L option is passes to the script I need to assume (detect) whether a filename was passed as a param. The below third condition only matches if filename is one lowercase character. How can I make it flexible to match upper and lower case letters or variable length of the string?
while [ $# -gt 0 ]
do
case $1 in
-l) echo ls;;
-L) echo ls -l;;
[a-z]) echo filename;;
*) echo usage
exit 1;;
esac
shift
done
Also, how can I include a condition in that case statement that would react to empty $1?
If for example the script is called without any options or filenames.
You can match an empty string with a '') or "") case.
A file name can contain any character--even weird ones likes symbols, spaces, newlines, and control characters--so trying to figure out if you have a file name by looking for letters and numbers isn't the right way to do it. Instead you can use the [ -e filename ] test to check if a string is a valid file name.
You should, by the way, put "$1" in double quotes so your script will work if the file name does contain spaces.
case "$1" in
'') echo empty;;
-l) echo ls;;
-L) echo ls -l;;
*) if [ -e "$1" ]; then
echo filename
else
echo usage >&2 # echo to stderr
exit 1
fi;;
esac
Use getopts to parse options, then treat remaining non-option arguments however you like (such as by testing if they're a file).
you find the information in the bash man page if you search for "Pattern Matching" without the quotes.this does the trick: [a-zA-Z0-9]*)
you should probably read on about pattern matching, regular expressions and so on.
furthermore you should honour john kugelmans hint about the double quotes.
in the following code snippet you can see how to check if no parameter got passed.
#!/bin/sh
case "$1" in
[a-zA-Z0-9]*)
echo "filename"
;;
"")
echo "no data"
;;
esac
#OP, generally if you are using bash, to match case insensitivity you can use shopt and set nocasematch
$ shopt -s nocasematch
to check null in your case statement
case "$var" in
"") echo "empty value";;
esac
You are better off falling into the default case *) and there you can at least check if the file exists with [ -e "$1" ] ... if it doesn't then echo usage and exit 1.

Resources