How can I assign expressions to a variable for read line? - bash

I have a bash while read line block reading from a text file specified by $filename:
IFS=''
while read -r line
do
...
done < $filename
Instead of reading the whole file each time, I would like to supply different inputs in the redirect depending on the arguments supplied to the script.
Whole file: done < "$filename"
start at line x: done < <(tail -n +"$x" "$filename")
line x to line y: done < <(tail -n +"$x" "$filename" | head -n "$y")
start to line y: done < <(head -n "$y" "$filename")
How can I assign these inputs to a variable ahead of time to be read by the while loop?
My input file is ~4GB with some 58M lines (all with different lengths), and may grow or shrink from time to time. Reading https://unix.stackexchange.com/questions/47407/cat-line-x-to-line-y-on-a-huge-file it appears that tail | head is the fastest method to read from the middle of a file, so given the file size, I'm deliberately avoiding awk and sed for the most part.

Your data is too big to read in whole. The good news is that the contents of a process substitution is a shell script, so you can write:
while IFS= read -r line; do
...
done < <(
if [[ $x && $y ]]; then tail -n +"$x" "$filename" | head -n "$y"
elif [[ $x ]]; then tail -n +"$x" "$filename"
elif [[ $y ]]; then head -n "$y" "$filename"
else cat "$filename"
fi
)
One thing I don't like about process substitutions is that code follows the loop for which it is input. It would be nice if it appeared first. I think this will work, but is untested:
# set up file descriptor 3
exec 3< <(
if [[ $x && $y ]]; then tail -n +"$x" "$filename" | head -n "$y"
elif [[ $x ]]; then tail -n +"$x" "$filename"
elif [[ $y ]]; then head -n "$y" "$filename"
else cat "$filename"
fi
)
# iterate over lines read from fd 3
while IFS= read -u3 -r line; do
...
done
# close fd 3
exec 3<&-

I might handle all of these as part of the loop condition, with an explicitly maintained line counter.
start=10
end=30
i=0
while ((i <= end )) && IFS= read -r line; do
(( i++ >= start )) || continue
...
done < "$filename"
However, if you might skip a significant number of lines at the beginning, it might be more efficient to use sed
while IFS= read -r line; do
...
done < <(sed -n "$start,$stop p" "$filename")
or awk:
while IFS= read -r line; do
...
done < <(awk -v start "$start" -v end "$end" 'NR >= start && NR <= end' "$filename")
This then raises the question of how much of the body of the while loop can be moved into awk itself.

Related

How to compare the 3rd octet of an IP to the 3rd octet in an array of IP in bash

Probably there is an easier way to do it but I have tried below and could not get it to work
ip=$1 #got the IP input 10.100.251.2
#broken into 4 octets via read
IFS="." read -r octet1 octet2 octet3 octet4 <<<"${ip}"
#the list to compare against
cat tmp_brd
10.100.123.255
10.100.127.255
10.100.239.255
10.100.255.255
#the list for the ouput which corresponds line by line to tmp_brd
cat tmp_sm
10.100.120.0
10.100.124.0
10.100.224.0
10.100.240.0
#the tmp_brd and tmp_sm arrays are collected from grep/awk the same file and each elements have 1 to 1 relation
mapfile -t sm_array <tmp_sm
mapfile -t brd_array <tmp_brd
brd_ln=${#brd_array[#]}
for ((i = 0; i < ${brd_ln}; i++)); do
IFS="." read -r octet1$i octet2$i octet3$i octet4$i <<<"${brd_array[$i]}"
if [[ $octet3 -lt $octet3$i ]]; then
echo "${sm_array[$i]}" >>subnet
fi
done
so basically ip=10.100.251.2 will match subnet=10.100.240.0 (from tmp_sm)
Solved it!
Thanks
mapfile -t sm_array <tmp_sm
mapfile -t brd_array <tmp_brd
brd_ln=${#brd_array[#]}
for ((i = 0; i < ${brd_ln}; i++)); do
IFS="." read -r oct1 oct2 oct3 oct4 <<<"${brd_array[$i]}"
if [[ $octet3 -lt $oct3 ]]; then
subnet=${sm_array[$i]}
break
fi
done
I think you would be better off using ipcalc for this:
#!/bin/bash
mapfile -t network <tmp_sm
mapfile -t broadcast <tmp_brd
let l=${#network[#]}-1
declare -a networks
declare -a masks
for i in $(seq 0 $l)
do
networks[$i]=$(ipcalc ${network[$i]} - ${broadcast[$i]} | tail -1)
masks[$i]=$(echo ${networks[$i]} | cut -d/ -f2)
done
ip=10.100.251.2 # just hardcoding one for test
for i in $(seq 0 $l)
do
echo "checking $ip against ${network[$i]} - ${broadcast[$i]} (${masks[$i]} bit mask)"
n=$(ipcalc ${ip}/${masks[$i]} | grep ^Network | awk '{print $2}' )
if [[ ${networks[$i]} == $n ]]
then
echo "$ip is in network ${networks[$i]}"
ipcalc $ip/${masks[$i]}
break;
fi
done

reading lines in a text file with special characters specifically as quoted '<', '>' in bash shell

I have a text file which is the output difference of two grepped files . the text file has lines like below I need to read the file (loop through the lines in the text file ) and based on text to the left hand side of '<' and right hand side of '>' do something.
editing to add details:
LHS of < OR RHS of >
if either of those, i will need to store the content into a variable, and get the 1st(ABCDEF) 3rd(10) and search (will grep) for them in one of other two files and if found print a message and attach those file(s) names in an email DL. All the file names and directories have been stored in separate variables.
how do i do that.
ps:have basic knowledge on text formatting and bash/shell commands but still learning the scripting syntax.Thanks.
ABCDEF,20200101,10 <
PQRSTU,20200106,11 <
LMNOPQ,20200101,12 <
EFGHIJ,20200102,13 <
KLMNOP,20200103,14 <
STUVWX,20200104,15 <
PQRSTU,20200105,16 <
> LMNOPQ,20200101,10
ABCDEF,20200107,17 <
What wrong am I doing now?
while IFS= read -r line; do
if $line =~ ([^[:blank:]]+)[[:blank:]]+\<
then
IFS=, read -r f1 f2 f3 <<< "${BASH_REMATCH[1]}"
#echo "f1=$f1 f2=$f2 f3=$f3"
zgrep "$f1" file1 | grep "with seq $f3" || zgrep "$f1" file2 | grep "with seq $f3"
elif $line =~ \>[[:blank:]]+([^[:blank:]]+)
then
IFS=, read -r g1 g2 g3 <<< "${BASH_REMATCH[1]}"
#echo "g1=$g1 g2=$g2 g3=$g3"
zgrep "$g1" file3 | grep "with seq $g3" || zgrep "$g1" file3 | grep "with seq $g3"
fi
Would you please try something like:
#!/bin/bash
while IFS= read -r line; do
if [[ $line =~ ([^[:blank:]]+)[[:blank:]]+\< || $line =~ \>[[:blank:]]+([^[:blank:]]+) ]]; then
IFS=, read -r f1 f2 f3 <<< "${BASH_REMATCH[1]}"
echo "f1=$f1 f2=$f2 f3=$f3"
# do something here with "$f1", "$f2" and "$f3"
fi
done < file.txt
Output:
f1=ABCDEF f2=20200101 f3=10
f1=PQRSTU f2=20200106 f3=11
f1=LMNOPQ f2=20200101 f3=12
f1=EFGHIJ f2=20200102 f3=13
f1=KLMNOP f2=20200103 f3=14
f1=STUVWX f2=20200104 f3=15
f1=PQRSTU f2=20200105 f3=16
f1=LMNOPQ f2=20200101 f3=10
f1=ABCDEF f2=20200107 f3=17
Please modify the echo "f1=$f1 f2=$f2 f3=$f3" line to your desired
command such as grep.
The regex ([^[:blank:]]+)[[:blank:]]+\< matches a line which contains <
and assigns the bash variable ${BASH_REMATCH[1]} to the LHS.
On the other hand, the regex \>[[:blank:]]+([^[:blank:]]+) does the similar thing for
a line which contains >.
The statement IFS=, read -r f1 f2 f3 <<< "${BASH_REMATCH[1]}" splits the bash variable
on , and assigns f1, f2 and f3 to the fields.
Please note if the input file is very large, bash solution may not
be efficient in execution time. I used bash just because it will be convenient
to pass the variables to your grep command.
EDIT
Regarding the updated script in your question, please refer to the following modification:
while IFS= read -r line; do
if [[ $line =~ ([^[:blank:]]+)[[:blank:]]+\< ]]; then
IFS=, read -r f1 f2 f3 <<< "${BASH_REMATCH[1]}"
# echo "f1=$f1 f2=$f2 f3=$f3"
result=$(zgrep "$f1" file1 | grep "with seq $f3" || zgrep "$f1" file2 | grep "with seq $f3")
elif [[ $line =~ \>[[:blank:]]+([^[:blank:]]+) ]]; then
IFS=, read -r g1 g2 g3 <<< "${BASH_REMATCH[1]}"
# echo "g1=$g1 g2=$g2 g3=$g3"
result=$(zgrep "$g1" file3 | grep "with seq $g3" || zgrep "$g1" file3 | grep "with seq $g3")
fi
if [[ -n $result ]]; then
echo "result = $result"
fi
done < file.txt

Intermittent piping failure in bash

I have a code snippet that looks like this
while grep "{{SECRETS}}" /tmp/kubernetes/$basefile | grep -v "#"; do
grep -n "{{SECRETS}}" /tmp/kubernetes/$basefile | grep -v "#" | head -n1 | while read -r line ; do
lineno=$(echo $line | cut -d':' -f1)
spaces=$(sed "${lineno}!d" /tmp/kubernetes/$basefile | awk -F'[^ \t]' '{print length($1)}')
spaces=$((spaces-1))
# Delete line that had {{SECRETS}}
sed -i -e "${lineno}d" /tmp/kubernetes/$basefile
while IFS='' read -r secretline || [[ -n "$secretline" ]]; do
newline=$(printf "%*s%s" $spaces "" "$secretline")
sed -i "${lineno}i\ ${newline}" /tmp/kubernetes/$basefile
lineno=$((lineno+1))
done < "/tmp/secrets.yaml"
done
done
in /tmp/kubernetes/$basefile, the string {{SECRETS}} appears twice 100% of the time.
Almost every single time, this completes fine. However, very infrequently, the script errors on its second loop through the file. like so, according to set -x
...
IFS=
+ read -r secretline
+ [[ -n '' ]]
+ read -r line
exit code 1
When it works, the set -x looks like this, and continues processesing the file correctly.
...
+ IFS=
+ read -r secretline
+ [[ -n '' ]]
+ read -r line
+ grep '{{SECRETS}}' /tmp/kubernetes/deployment.yaml
+ grep -v '#'
I have no answer for how this can only happen occasionally, so I think there's something about bash piping's parallelism I don't understand. Is there something in grep -n "{{SECRETS}}" /tmp/kubernetes/$basefile | grep -v "#" | head -n1 | while read -r line ; do that could lead to out-of-order execution somehow? Based on the error, it seems like it's trying to read a line, but can't because previous commands didn't work. But there's no indication of that in the set -x output.
A likely cause of the problem is that the pipeline containing the inner loop both reads and writes the "basefile" at the same time. See How to make reading and writing the same file in the same pipeline always “fail”?.
One way to fix the problem is do a full read of the file before trying to update it. Try:
basepath=/tmp/kubernetes/$basefile
secretspath=/tmp/secrets.yaml
while
line=$(grep -n "{{SECRETS}}" "$basepath" | grep -v "#" | head -n1)
[[ -n $line ]]
do
lineno=$(echo "$line" | cut -d':' -f1)
spaces=$(sed "${lineno}!d" "$basepath" \
| awk -F'[^ \t]' '{print length($1)}')
spaces=$((spaces-1))
# Delete line that had {{SECRETS}}
sed -i -e "${lineno}d" "$basepath"
while IFS='' read -r secretline || [[ -n "$secretline" ]]; do
newline=$(printf "%*s%s" $spaces "" "$secretline")
sed -i "${lineno}i\ ${newline}" "$basepath"
lineno=$((lineno+1))
done < "$secretspath"
done
(I introduced the variables basepath and secretspath to make the code easier to test.)
As an aside, it's also possible to do this with pure Bash code:
basepath=/tmp/kubernetes/$basefile
secretspath=/tmp/secrets.yaml
updated_lines=()
is_updated=0
while IFS= read -r line || [[ -n $line ]] ; do
if [[ $line == *'{{SECRETS}}'* && $line != *'#'* ]] ; then
spaces=${line%%[^[:space:]]*}
while IFS= read -r secretline || [[ -n $secretline ]]; do
updated_lines+=( "${spaces}${secretline}" )
done < "$secretspath"
is_updated=1
else
updated_lines+=( "$line" )
fi
done <"$basepath"
(( is_updated )) && printf '%s\n' "${updated_lines[#]}" >"$basepath"
The whole updated file is stored in memory (in the update_lines array) but that shouldn't be a problem because any file that's too big to store in memory will almost certainly be too big to process line-by-line with Bash. Bash is generally extremely slow.
In this code spaces holds the actual space characters for indentation, not the number of them.

Bash - How can I execute a variable

I am reading a file with lines like:
folder=abc
name=xyz
For some lines line I would like set a variable e.g name=xyz corresponding to the line I have read.
Cutting it down, with name=xyz and folder=abc, I have tried:
while read -r line; do
$line
echo $name
done < /etc/testfile.conf
This gives an error message ./test: line 4: folder=abc: command not found etc.
I have tried "$line" and $($line) and it is the same. Is it possible to do what I whant?
I have succeeded by doing:
while read -r line; do
if [[ "$line" == 'folder'* ]]; then
folder="$(echo "$line" | cut -d'=' -f 2)"
fi
if [[ "$line" == 'name'* ]]; then
name="$(echo "$line" | cut -d'=' -f 2)"
fi
done < /etc/testfile.conf
but this seems messy
for your sample, declare is the safest option:
while read -r line; do
declare "$line"
done
$ echo "$folder"
abc
$ echo "$name"
xyz
Direct approach, use eval.
Different approach, try with source or .:
$ echo "$line"
folder=abc
$ . <(echo "$line")
$ echo "$folder"
abc
But probably the good answer will be to tackle the problem in a different way.
You can clean up your approach a bit without resorting to eval.
while IFS="=" read -r name value; do
case $name in
folder) folder=$value ;;
name) name=$value ;;
esac
done < /etc/testfile.conf
why not only source de file ?
$ . infile ; echo "$name"
xyz

Grep command in array

For a homework assignment I have to Take the results from the grep command, and write out up to the first 5 of them, numbering them from 1 to 5. (Print the number, then a space, then the line from grep.) If there are no lines, print a message saying so. So far I managed to store the grep command in an array but this is where I've gotten stuck: Can anyone provide guidance as to how to proceed in printing this as stated above
pattern="*.c"
fileList=$(grep -l "main" $pattern)
IFS=$"\n"
declare -a array
array=$fileList
for x in "${array[#]}"; do
echo "$x"
done
you can grep options -c and -l
pattern="*.c"
searchPattern="main"
counter=1
while read -r line ; do
IFS=':' read -r -a lineInfo <<< "$line"
if [[ $counter > 5 ]]; then
exit 1
fi
if [[ ${lineInfo[1]} > 0 ]]; then
numsOfLine=""
while read -r fileline ; do
IFS=':' read -r -a fileLineInfo <<< "$fileline"
numsOfLine="$numsOfLine ${fileLineInfo[0]} "
done < <(grep -n $searchPattern ${lineInfo[0]})
echo "$counter ${lineInfo[0]} match on lines: $numsOfLine"
let "counter += 1"
else
echo "${lineInfo[0]} no match lines"
fi
done < <(grep -c $searchPattern $pattern)
If you're only allowed to use grep and bash(?):
pattern="*.c"
fileList=($(grep -l "main" $pattern))
if test ${#fileList[#]} = 0 ; then
echo "No results"
else
n=0
while test $n -lt ${#fileList[#]} -a $n -lt 5 ; do
i=$n
n=$(( n + 1 ))
echo "$n ${fileList[$i]}"
done
fi
If you are allowed to use commands in addition to grep, you can pipe the results through nl to add line numbers, then head to limit the results to the first 5 lines, then a second grep to test if there were any lines. For example:
if ! grep -l "main" $pattern | \
nl -s ' ' | sed -e 's/^ *//' | \
head -n 5 | grep '' ; then
echo "No results"
fi

Resources