reading lines in a text file with special characters specifically as quoted '<', '>' in bash shell - bash

I have a text file which is the output difference of two grepped files . the text file has lines like below I need to read the file (loop through the lines in the text file ) and based on text to the left hand side of '<' and right hand side of '>' do something.
editing to add details:
LHS of < OR RHS of >
if either of those, i will need to store the content into a variable, and get the 1st(ABCDEF) 3rd(10) and search (will grep) for them in one of other two files and if found print a message and attach those file(s) names in an email DL. All the file names and directories have been stored in separate variables.
how do i do that.
ps:have basic knowledge on text formatting and bash/shell commands but still learning the scripting syntax.Thanks.
ABCDEF,20200101,10 <
PQRSTU,20200106,11 <
LMNOPQ,20200101,12 <
EFGHIJ,20200102,13 <
KLMNOP,20200103,14 <
STUVWX,20200104,15 <
PQRSTU,20200105,16 <
> LMNOPQ,20200101,10
ABCDEF,20200107,17 <
What wrong am I doing now?
while IFS= read -r line; do
if $line =~ ([^[:blank:]]+)[[:blank:]]+\<
then
IFS=, read -r f1 f2 f3 <<< "${BASH_REMATCH[1]}"
#echo "f1=$f1 f2=$f2 f3=$f3"
zgrep "$f1" file1 | grep "with seq $f3" || zgrep "$f1" file2 | grep "with seq $f3"
elif $line =~ \>[[:blank:]]+([^[:blank:]]+)
then
IFS=, read -r g1 g2 g3 <<< "${BASH_REMATCH[1]}"
#echo "g1=$g1 g2=$g2 g3=$g3"
zgrep "$g1" file3 | grep "with seq $g3" || zgrep "$g1" file3 | grep "with seq $g3"
fi

Would you please try something like:
#!/bin/bash
while IFS= read -r line; do
if [[ $line =~ ([^[:blank:]]+)[[:blank:]]+\< || $line =~ \>[[:blank:]]+([^[:blank:]]+) ]]; then
IFS=, read -r f1 f2 f3 <<< "${BASH_REMATCH[1]}"
echo "f1=$f1 f2=$f2 f3=$f3"
# do something here with "$f1", "$f2" and "$f3"
fi
done < file.txt
Output:
f1=ABCDEF f2=20200101 f3=10
f1=PQRSTU f2=20200106 f3=11
f1=LMNOPQ f2=20200101 f3=12
f1=EFGHIJ f2=20200102 f3=13
f1=KLMNOP f2=20200103 f3=14
f1=STUVWX f2=20200104 f3=15
f1=PQRSTU f2=20200105 f3=16
f1=LMNOPQ f2=20200101 f3=10
f1=ABCDEF f2=20200107 f3=17
Please modify the echo "f1=$f1 f2=$f2 f3=$f3" line to your desired
command such as grep.
The regex ([^[:blank:]]+)[[:blank:]]+\< matches a line which contains <
and assigns the bash variable ${BASH_REMATCH[1]} to the LHS.
On the other hand, the regex \>[[:blank:]]+([^[:blank:]]+) does the similar thing for
a line which contains >.
The statement IFS=, read -r f1 f2 f3 <<< "${BASH_REMATCH[1]}" splits the bash variable
on , and assigns f1, f2 and f3 to the fields.
Please note if the input file is very large, bash solution may not
be efficient in execution time. I used bash just because it will be convenient
to pass the variables to your grep command.
EDIT
Regarding the updated script in your question, please refer to the following modification:
while IFS= read -r line; do
if [[ $line =~ ([^[:blank:]]+)[[:blank:]]+\< ]]; then
IFS=, read -r f1 f2 f3 <<< "${BASH_REMATCH[1]}"
# echo "f1=$f1 f2=$f2 f3=$f3"
result=$(zgrep "$f1" file1 | grep "with seq $f3" || zgrep "$f1" file2 | grep "with seq $f3")
elif [[ $line =~ \>[[:blank:]]+([^[:blank:]]+) ]]; then
IFS=, read -r g1 g2 g3 <<< "${BASH_REMATCH[1]}"
# echo "g1=$g1 g2=$g2 g3=$g3"
result=$(zgrep "$g1" file3 | grep "with seq $g3" || zgrep "$g1" file3 | grep "with seq $g3")
fi
if [[ -n $result ]]; then
echo "result = $result"
fi
done < file.txt

Related

Reverse the words but keep the order Bash

I have a file with lines. I want to reverse the words, but keep them in same order.
For example: "Test this word"
Result: "tseT siht drow"
I'm using MAC, so awk doesn't seem to work.
What I got for now
input=FILE_PATH
while IFS= read -r line || [[ -n $line ]]
do
echo $line | rev
done < "$input"
Here is a solution that completely avoids awk
#!/bin/bash
input=./data
while read -r line ; do
for word in $line ; do
output=`echo $word | rev`
printf "%s " $output
done
printf "\n"
done < "$input"
In case xargs works on mac:
echo "Test this word" | xargs -n 1 | rev | xargs
Inside your read loop, you can just iterate over the words of your string and pass them to rev
line="Test this word"
for word in "$line"; do
echo -n " $word" | rev
done
echo # Add final newline
output
tseT siht drow
You are actually in fairly good shape with bash. You can use string-indexes and string-length and C-style for loops to loop over the characters in each word building a reversed string to output. You can control formatting in a number of ways to handle spaces between words, but a simple flag first=1 is about as easy as anything else. You can do the following with your read,
#!/bin/bash
while read -r line || [[ -n $line ]]; do ## read line
first=1 ## flag to control space
a=( $( echo $line ) ) ## put line in array
for i in "${a[#]}"; do ## for each word
tmp= ## clear temp
len=${#i} ## get length
for ((j = 0; j < len; j++)); do ## loop length times
tmp="${tmp}${i:$((len-j-1)):1}" ## add char len - j to tmp
done
if [ "$first" -eq '1' ]; then ## if first word
printf "$tmp"; first=0; ## output w/o space
else
printf " $tmp" ## output w/space
fi
done
echo "" ## output newline
done
Example Input
$ cat dat/lines2rev.txt
my dog has fleas
the cat has none
Example Use/Output
$ bash revlines.sh <dat/lines2rev.txt
ym god sah saelf
eht tac sah enon
Look things over and let me know if you have questions.
Using rev and awk
Consider this as the sample input file:
$ cat file
Test this word
Keep the order
Try:
$ rev <file | awk '{for (i=NF; i>=2; i--) printf "%s%s",$i,OFS; print $1}'
tseT siht drow
peeK eht redro
(This uses awk but, because it uses no advanced awk features, it should work on MacOS.)
Using in a script
If you need to put the above in a script, then create a file like:
$ cat script
#!/bin/bash
input="/Users/Anastasiia/Desktop/Tasks/test.txt"
rev <"$input" | awk '{for (i=NF; i>=2; i--) printf "%s%s",$i,OFS; print $1}'
And, run the file:
$ bash script
tseT siht drow
peeK eht redro
Using bash
while read -a arr
do
x=" "
for ((i=0; i<${#arr}; i++))
do
((i == ${#arr}-1)) && x=$'\n'
printf "%s%s" $(rev <<<"${arr[i]}") "$x"
done
done <file
Applying the above to our same test file:
$ while read -a arr; do x=" "; for ((i=0; i<${#arr}; i++)); do ((i == ${#arr}-1)) && x=$'\n'; printf "%s%s" $(rev <<<"${arr[i]}") "$x"; done; done <file
tseT siht drow
peeK eht redro

How can I assign expressions to a variable for read line?

I have a bash while read line block reading from a text file specified by $filename:
IFS=''
while read -r line
do
...
done < $filename
Instead of reading the whole file each time, I would like to supply different inputs in the redirect depending on the arguments supplied to the script.
Whole file: done < "$filename"
start at line x: done < <(tail -n +"$x" "$filename")
line x to line y: done < <(tail -n +"$x" "$filename" | head -n "$y")
start to line y: done < <(head -n "$y" "$filename")
How can I assign these inputs to a variable ahead of time to be read by the while loop?
My input file is ~4GB with some 58M lines (all with different lengths), and may grow or shrink from time to time. Reading https://unix.stackexchange.com/questions/47407/cat-line-x-to-line-y-on-a-huge-file it appears that tail | head is the fastest method to read from the middle of a file, so given the file size, I'm deliberately avoiding awk and sed for the most part.
Your data is too big to read in whole. The good news is that the contents of a process substitution is a shell script, so you can write:
while IFS= read -r line; do
...
done < <(
if [[ $x && $y ]]; then tail -n +"$x" "$filename" | head -n "$y"
elif [[ $x ]]; then tail -n +"$x" "$filename"
elif [[ $y ]]; then head -n "$y" "$filename"
else cat "$filename"
fi
)
One thing I don't like about process substitutions is that code follows the loop for which it is input. It would be nice if it appeared first. I think this will work, but is untested:
# set up file descriptor 3
exec 3< <(
if [[ $x && $y ]]; then tail -n +"$x" "$filename" | head -n "$y"
elif [[ $x ]]; then tail -n +"$x" "$filename"
elif [[ $y ]]; then head -n "$y" "$filename"
else cat "$filename"
fi
)
# iterate over lines read from fd 3
while IFS= read -u3 -r line; do
...
done
# close fd 3
exec 3<&-
I might handle all of these as part of the loop condition, with an explicitly maintained line counter.
start=10
end=30
i=0
while ((i <= end )) && IFS= read -r line; do
(( i++ >= start )) || continue
...
done < "$filename"
However, if you might skip a significant number of lines at the beginning, it might be more efficient to use sed
while IFS= read -r line; do
...
done < <(sed -n "$start,$stop p" "$filename")
or awk:
while IFS= read -r line; do
...
done < <(awk -v start "$start" -v end "$end" 'NR >= start && NR <= end' "$filename")
This then raises the question of how much of the body of the while loop can be moved into awk itself.

Name (and set) variables in current shell, based on line input data

I have a SQL*Plus output written into a text file in the following format:
3459906| |2|X1|WAS1| Output1
334596| |2|X1|WAS2| Output1
3495792| |1|X1|WAS1| Output1
687954| |1|X1|WAS2| Output1
I need a shell script to fetch the counts which were at the beginning based on the text after the counts.
For example, If the Text is like |2|X1|WAS1| , then 3459906 should be passed on to a variable x1was12 and if the text is like |2|X1|WAS2| , then 334596 should be passed on to a variable x1was22.
I tried writing a for loop and if condition to pass on the counts, but was unsuccessful:
export filename1='file1.dat'
while read -r line ; do
if [[ grep -i "*|2|X1|WAS1| Output1*" | wc -l -eq 0 ]] ; then
export xwas12=sed -n ${line}p $filename1 | \
sed 's/[^0-9]*//g' | sed 's/..$//'
elif [[ grep -i "*|2|X1|WAS2| Output1*" | wc -l -eq 0 ]] ; then
export x1was22=sed -n ${line}p $filename1 | \
sed 's/[^0-9]*//g' | sed 's/..$//'
elif [[ grep -i "*|1|X1|WAS1| Output1*" | wc -l -eq 0 ]] ; then
export x1was11=sed -n ${line}p $filename1 | \
sed 's/[^0-9]*//g' | sed 's/..$//'
elif [[ grep -i "*|1|X1|WAS2| Output1*" | wc -l -eq 0 ]]
export x1was21=sed -n ${line}p $filename1 | \
sed 's/[^0-9]*//g' | sed 's/..$//'
fi
done < "$filename1"
echo '$x1was12' > output.txt
echo '$x1was22' >> output.txt
echo '$x1was11' >> output.txt
echo '$x1was21' >> output.txt
What I was trying to do was:
Go to the first line in the file
-> Search for the text and if found then assign the sed output to the variable
Then go to the second line of the file
-> Search for the texts in the if commands and assign the sed output to another variable.
same goes for other
while IFS='|' read -r count _ n x was _; do
# remove spaces from all variables
count=${count// /}; n=${n// /}; x=${x// /}; was=${was// /}
varname="${x}${was}${n}"
printf -v "${varname,,}" %s "$count"
done <<'EOF'
3459906| |2|X1|WAS1| Output1
334596| |2|X1|WAS2| Output1
3495792| |1|X1|WAS1| Output1
687954| |1|X1|WAS2| Output1
EOF
With the above executed:
$ echo "$x1was12"
3459906
Of course, the redirection from a heredoc could be replaced with a redirection from a file as well.
How does this work? Let's break it down:
Every time IFS='|' read -r count _ n x was _ is run, it reads a single line, separating it by |s, putting the first column into count, discarding the second by assigning it to _, reading the third into n, the fourth into x, the fifth into was, and the sixth and all following content into _. This practice is discussed in detail in BashFAQ #1.
count=${count// /} is a parameter expansion which prunes spaces from the variable count, by replacing all such spaces with empty strings. See also BashFAQ #100.
"${varname,,}" is another parameter expansion, this one converting a variable's contents to all-lowercase. (This requires bash 4.0; in prior versions, consider "$(tr '[:upper:]' '[:lower:]' <<<"$varname") as a less-efficient alternative).
printf -v "$varname" %s "value" is a mechanism for doing an indirect assignment to the variable named in the variable varname.
If not for the variable names, the whole thing could be done with two commands:
cut -d '|' -f1 file1.dat | tr -d ' ' > output.txt
The variable names make it more interesting. Two bash methods follow, plus a POSIX method...
The following bash code ought to do what the OP's sample code was
meant to do:
declare $(while IFS='|' read a b c d e f ; do
echo $a 1>&2 ; echo x1${e,,}$c=${a/ /}
done < file1.dat 2> output.txt )
Notes:
The bash shell is needed for ${e,,}, (turns "WAS" into "was"), and $a/ /} , (removes a leading space that might be in
$a), and declare.
The while loop parses file1.dat and outputs a bunch of variable assignments. Without the declare this code:
while IFS='|' read a b c d e f ; do
echo x1${e,,}$c=${a/ /} ;
done < file1.dat
Outputs:
x1was12=3459906
x1was22=334596
x1was11=3495792
x1was21=687954
The while loop outputs to two separate streams: stdout (for the declare), and stderr (using the 1>&2 and 2> redirects for
output.txt).
Using bash associative arrays:
declare -A x1was="( $(while IFS='|' read a b c d e f ; do
echo $a 1>&2 ; echo [${e/WAS/}$c]=${a/ /}
done < file1.dat 2> output.txt ) )"
In which case the variable names require brackets:
echo ${x1was[21]}
687954
POSIX shell code (tested using dash):
eval $(while IFS='|' read a b c d e f ; do
echo $a 1>&2; echo x1$(echo $e | tr '[A-Z]' '[a-z]')$c=$(echo $a)
done < file1.dat 2> output.txt )
eval should not be used if there's any doubt about what's in file1.dat. The above code assumes the data in file1.dat is
uniformly dependable.

Check if a string contains "-" and "]" at the same time

I have the next two regex in Bash:
1.^[-a-zA-Z0-9\,\.\;\:]*$
2.^[]a-zA-Z0-9\,\.\;\:]*$
The first matches when the string contains a "-" and the other values.
The second when contains a "]".
I put this values at the beginning of my regex because I can't scape them.
How I can get match the two values at the same time?
You can also place the - at the end of the bracket expression, since a range must be closed on both ends.
^[]a-zA-Z0-9,.;:-]*$
You don't have to escape any of the other characters, either. Colons, semicolons, and commas have no special meaning in any part of a regular expression, and while a period loses its special meaning inside a bracket expression.
Basically you can use this:
grep -E '^.*\-.*\[|\[.*\-.*$'
It matches either a - followed by zero or more arbitrary chars and a [ or a [ followed by zero or more chars and a -
However since you don't accept arbitrary chars, you need to change it to:
grep -E '^[a-zA-Z0-9,.;:]*\-[a-zA-Z0-9,.;:]*\[|\[[a-zA-Z0-9,.;:]*\-[a-zA-Z0-9,.;:]*$'
Maybe, this can help you
#!/bin/bash
while read p; do
echo $p | grep -E '\-.*\]|\].*\-' | grep "^[]a-zA-Z0-9,.;:-]*$"
done <$1
user-host:/tmp$ cat test
-i]string
]adfadfa-
string-
]string
str]ing
]123string
123string-
?????
++++++
user-host:/tmp$ ./test.sh test
-i]string
]adfadfa-
There are two questions in your post.
One is in the description:
How I can get match the two values at the same time?
That is an OR match, which could be done with a range that mix your two ranges:
pattern='^[]a-zA-Z0-9,.;:-]*$'
That will match a line that either contains one (or several) -…OR…]…OR any of the included characters. That would be all the lines (except ?????, ++++++ and as df gh) in the test script below.
Two is in the title:
… a string contains “-” and “]” at the same time
That is an AND match. The simplest (and slowest) way to do it is:
echo "$line" | grep '-' | grep ']' | grep '^[-a-zA-Z0-9,.;:]*$'
The first two calls to grep select only the lines that:
contain both (one or several) - and (one or several) ]
Test script:
#!/bin/bash
printlines(){
cat <<-\_test_lines_
asdfgh
asdfgh-
asdfgh]
as]df
as,df
as.df
as;df
as:df
as-df
as]]]df
as---df
asAS]]]DFdf
as123--456DF
as,.;:-df
as-dfg]h
as]dfg-h
a]s]d]f]g]h
a]s]d]f]g]h-
s-t-r-i-n-g]
as]df-gh
123]asdefgh
123asd-fgh-
?????
++++++
as df gh
_test_lines_
}
pattern='^[]a-zA-Z0-9,.;:-]*$'
printf '%s\n' "Testing the simple pattern of $pattern"
while read line; do
resultgrep="$( echo "$line" | grep "$pattern" )"
printf '%13s %-13s\n' "$line" "$resultgrep"
done < <(printlines)
echo "#############################################################"
echo
p1='-'; p2=']'; p3='^[]a-zA-Z0-9,.;:-]*$'
printf '%s\n' "Testing a 'grep AND' of '$p1', '$p2' and '$p3'."
while read line; do
resultgrep="$( echo "$line" | grep "$p1" | grep "$p2" | grep "$p3" )"
[[ $resultgrep ]] && printf '%13s %-13s\n' "$line" "$resultgrep"
done < <(printlines)
echo "#############################################################"
echo
printf '%s\n' "Testing an 'AWK AND' of '$p1', '$p2' and '$p3'."
while read line; do
resultawk="$( echo "$line" |
awk -v p1="$p1" -v p2="$p2" -v p3="$p3" '$0~p1 && $0~p2 && $0~p3' )"
[[ $resultawk ]] && printf '%13s %-13s\n' "$line" "$resultawk"
done < <(printlines)
echo "#############################################################"
echo
printf '%s\n' "Testing a 'bash AND' of '$p1', '$p2' and '$p3'."
while read line; do
rgrep="$( echo "$line" | grep "$p1" | grep "$p2" | grep "$p3" )"
[[ ( $line =~ $p1 ) && ( $line =~ $p2 ) && ( $line =~ $p3 ) ]]
rbash=${BASH_REMATCH[0]}
[[ $rbash ]] && printf '%13s %-13s %-13s\n' "$line" "$rgrep" "$rbash"
done < <(printlines)
echo "#############################################################"
echo

Comment/uncomment a line where a word is matched without using sed/awk

How can I comment out lines where a certain word can be found in a bash script, using piped UNIX commands (no sed/awk) ?
The comment character is # .
Here is how It could start :
cat $file | grep $word | ...
With GNU bash.
#!/bin/bash
keyword="foo"
while IFS= read -r line; do
[[ "$line" =~ $keyword ]] && line="${line#*#}"
printf "%s\n" "$line"
done < /etc/network/interfaces > /tmp/interfaces_modified

Resources