How to extend a variable outside a multi-threaded while loop - bash

I am writing a shell script that contains a multi-threaded while loop. My loop iterates through the values of an array. Within the loop, I am calling a function. At the end of the function I am saving the results as a string variable. I want to add this string variable to an array on each iteration, and then be able to retrieve the contents of this array when the while loop completes.
From my understanding running the multi-threaded while loop, is what is causing for the array to be empty once the while loop completes. Each thread is ran in its own environment and the array value does not extend outside that environment. I would like to be able to extend this array value outside of the thread if possible. Currently I am just writing the string value to a temp file and then after the while loop, reading the contents of the temp file and saving that as my array. This method works, as the file generally isn't "too" large, but I would like to avoid writing to file if possible
My Code - doDeepLookup actually is a API call, but for the sake of argument lets just say it appends some text in-front of the read line from the while loop
#!/bin/bash
n=0
maxjobs=20
resultsArray=""
while IFS= read -r line
do
IPaddress="$(echo $line | sed 's/ /\n/g' | grep -E -o "([0-9]{1,3}[\.]){3}[0-9]{1,3}")"
doDeepLookup "$line" "$IPaddress" &
if(( $(($((++n)) % $maxjobs)) == 0 )) ; then
wait
fi
done <<< "$(printf '%s\n' "${SomeOtherArray[#]}")"
printf '%s\n' "${resultsArray[#]}" #Returns NULL
doDeepLookup() {
results="$(echo "help me : $line")"
resultsArray+=($results)
}

Thanks to William
#!/bin/bash
n=0
maxjobs=20
WhileLoopFunction() {
resultsArray=""
while IFS= read -r line
do
IPaddress="$(echo $line | sed 's/ /\n/g' | grep -E -o "([0-9]{1,3}[\.]){3}[0-9]{1,3}")"
doDeepLookup "$line" "$IPaddress" &
if(( $(($((++n)) % $maxjobs)) == 0 )) ; then
wait
fi
done <<< "$(printf '%s\n' "${SomeOtherArray[#]}")"
}
doDeepLookup() {
results="$(echo "help me : $line")"
echo $results
}
resultsArray=( $(WhileLoopFunction"${DeepArray[#]}") )
printf '%s\n' "${resultsArray[#]}"

With parset from GNU Parallel you would do something like:
parset resultsArray doDeepLookup ::: "${DeepArray[#]}"
printf '%s\n' "${resultsArray[#]}"

Related

Can't add a new element to an array in bash [duplicate]

In the following program, if I set the variable $foo to the value 1 inside the first if statement, it works in the sense that its value is remembered after the if statement. However, when I set the same variable to the value 2 inside an if which is inside a while statement, it's forgotten after the while loop. It's behaving like I'm using some sort of copy of the variable $foo inside the while loop and I am modifying only that particular copy. Here's a complete test program:
#!/bin/bash
set -e
set -u
foo=0
bar="hello"
if [[ "$bar" == "hello" ]]
then
foo=1
echo "Setting \$foo to 1: $foo"
fi
echo "Variable \$foo after if statement: $foo"
lines="first line\nsecond line\nthird line"
echo -e $lines | while read line
do
if [[ "$line" == "second line" ]]
then
foo=2
echo "Variable \$foo updated to $foo inside if inside while loop"
fi
echo "Value of \$foo in while loop body: $foo"
done
echo "Variable \$foo after while loop: $foo"
# Output:
# $ ./testbash.sh
# Setting $foo to 1: 1
# Variable $foo after if statement: 1
# Value of $foo in while loop body: 1
# Variable $foo updated to 2 inside if inside while loop
# Value of $foo in while loop body: 2
# Value of $foo in while loop body: 2
# Variable $foo after while loop: 1
# bash --version
# GNU bash, version 4.1.10(4)-release (i686-pc-cygwin)
echo -e $lines | while read line
...
done
The while loop is executed in a subshell. So any changes you do to the variable will not be available once the subshell exits.
Instead you can use a here string to re-write the while loop to be in the main shell process; only echo -e $lines will run in a subshell:
while read line
do
if [[ "$line" == "second line" ]]
then
foo=2
echo "Variable \$foo updated to $foo inside if inside while loop"
fi
echo "Value of \$foo in while loop body: $foo"
done <<< "$(echo -e "$lines")"
You can get rid of the rather ugly echo in the here-string above by expanding the backslash sequences immediately when assigning lines. The $'...' form of quoting can be used there:
lines=$'first line\nsecond line\nthird line'
while read line; do
...
done <<< "$lines"
UPDATED#2
Explanation is in Blue Moons's answer.
Alternative solutions:
Eliminate echo
while read line; do
...
done <<EOT
first line
second line
third line
EOT
Add the echo inside the here-is-the-document
while read line; do
...
done <<EOT
$(echo -e $lines)
EOT
Run echo in background:
coproc echo -e $lines
while read -u ${COPROC[0]} line; do
...
done
Redirect to a file handle explicitly (Mind the space in < <!):
exec 3< <(echo -e $lines)
while read -u 3 line; do
...
done
Or just redirect to the stdin:
while read line; do
...
done < <(echo -e $lines)
And one for chepner (eliminating echo):
arr=("first line" "second line" "third line");
for((i=0;i<${#arr[*]};++i)) { line=${arr[i]};
...
}
Variable $lines can be converted to an array without starting a new sub-shell. The characters \ and n has to be converted to some character (e.g. a real new line character) and use the IFS (Internal Field Separator) variable to split the string into array elements. This can be done like:
lines="first line\nsecond line\nthird line"
echo "$lines"
OIFS="$IFS"
IFS=$'\n' arr=(${lines//\\n/$'\n'}) # Conversion
IFS="$OIFS"
echo "${arr[#]}", Length: ${#arr[*]}
set|grep ^arr
Result is
first line\nsecond line\nthird line
first line second line third line, Length: 3
arr=([0]="first line" [1]="second line" [2]="third line")
You are asking this bash FAQ. The answer also describes the general case of variables set in subshells created by pipes:
E4) If I pipe the output of a command into read variable, why
doesn't the output show up in $variable when the read command finishes?
This has to do with the parent-child relationship between Unix
processes. It affects all commands run in pipelines, not just
simple calls to read. For example, piping a command's output
into a while loop that repeatedly calls read will result in
the same behavior.
Each element of a pipeline, even a builtin or shell function,
runs in a separate process, a child of the shell running the
pipeline. A subprocess cannot affect its parent's environment.
When the read command sets the variable to the input, that
variable is set only in the subshell, not the parent shell. When
the subshell exits, the value of the variable is lost.
Many pipelines that end with read variable can be converted
into command substitutions, which will capture the output of
a specified command. The output can then be assigned to a
variable:
grep ^gnu /usr/lib/news/active | wc -l | read ngroup
can be converted into
ngroup=$(grep ^gnu /usr/lib/news/active | wc -l)
This does not, unfortunately, work to split the text among
multiple variables, as read does when given multiple variable
arguments. If you need to do this, you can either use the
command substitution above to read the output into a variable
and chop up the variable using the bash pattern removal
expansion operators or use some variant of the following
approach.
Say /usr/local/bin/ipaddr is the following shell script:
#! /bin/sh
host `hostname` | awk '/address/ {print $NF}'
Instead of using
/usr/local/bin/ipaddr | read A B C D
to break the local machine's IP address into separate octets, use
OIFS="$IFS"
IFS=.
set -- $(/usr/local/bin/ipaddr)
IFS="$OIFS"
A="$1" B="$2" C="$3" D="$4"
Beware, however, that this will change the shell's positional
parameters. If you need them, you should save them before doing
this.
This is the general approach -- in most cases you will not need to
set $IFS to a different value.
Some other user-supplied alternatives include:
read A B C D << HERE
$(IFS=.; echo $(/usr/local/bin/ipaddr))
HERE
and, where process substitution is available,
read A B C D < <(IFS=.; echo $(/usr/local/bin/ipaddr))
Hmmm... I would almost swear that this worked for the original Bourne shell, but don't have access to a running copy just now to check.
There is, however, a very trivial workaround to the problem.
Change the first line of the script from:
#!/bin/bash
to
#!/bin/ksh
Et voila! A read at the end of a pipeline works just fine, assuming you have the Korn shell installed.
This is an interesting question and touches on a very basic concept in Bourne shell and subshell. Here I provide a solution that is different from the previous solutions by doing some kind of filtering. I will give an example that may be useful in real life. This is a fragment for checking that downloaded files conform to a known checksum. The checksum file look like the following (Showing just 3 lines):
49174 36326 dna_align_feature.txt.gz
54757 1 dna.txt.gz
55409 9971 exon_transcript.txt.gz
The shell script:
#!/bin/sh
.....
failcnt=0 # this variable is only valid in the parent shell
#variable xx captures all the outputs from the while loop
xx=$(cat ${checkfile} | while read -r line; do
num1=$(echo $line | awk '{print $1}')
num2=$(echo $line | awk '{print $2}')
fname=$(echo $line | awk '{print $3}')
if [ -f "$fname" ]; then
res=$(sum $fname)
filegood=$(sum $fname | awk -v na=$num1 -v nb=$num2 -v fn=$fname '{ if (na == $1 && nb == $2) { print "TRUE"; } else { print "FALSE"; }}')
if [ "$filegood" = "FALSE" ]; then
failcnt=$(expr $failcnt + 1) # only in subshell
echo "$fname BAD $failcnt"
fi
fi
done | tail -1) # I am only interested in the final result
# you can capture a whole bunch of texts and do further filtering
failcnt=${xx#* BAD } # I am only interested in the number
# this variable is in the parent shell
echo failcnt $failcnt
if [ $failcnt -gt 0 ]; then
echo $failcnt files failed
else
echo download successful
fi
The parent and subshell communicate through the echo command. You can pick some easy to parse text for the parent shell. This method does not break your normal way of thinking, just that you have to do some post processing. You can use grep, sed, awk, and more for doing so.
I use stderr to store within a loop, and read from it outside.
Here var i is initially set and read inside the loop as 1.
# reading lines of content from 2 files concatenated
# inside loop: write value of var i to stderr (before iteration)
# outside: read var i from stderr, has last iterative value
f=/tmp/file1
g=/tmp/file2
i=1
cat $f $g | \
while read -r s;
do
echo $s > /dev/null; # some work
echo $i > 2
let i++
done;
read -r i < 2
echo $i
Or use the heredoc method to reduce the amount of code in a subshell.
Note the iterative i value can be read outside the while loop.
i=1
while read -r s;
do
echo $s > /dev/null
let i++
done <<EOT
$(cat $f $g)
EOT
let i--
echo $i
How about a very simple method
+call your while loop in a function
- set your value inside (nonsense, but shows the example)
- return your value inside
+capture your value outside
+set outside
+display outside
#!/bin/bash
# set -e
# set -u
# No idea why you need this, not using here
foo=0
bar="hello"
if [[ "$bar" == "hello" ]]
then
foo=1
echo "Setting \$foo to $foo"
fi
echo "Variable \$foo after if statement: $foo"
lines="first line\nsecond line\nthird line"
function my_while_loop
{
echo -e $lines | while read line
do
if [[ "$line" == "second line" ]]
then
foo=2; return 2;
echo "Variable \$foo updated to $foo inside if inside while loop"
fi
echo -e $lines | while read line
do
if [[ "$line" == "second line" ]]
then
foo=2;
echo "Variable \$foo updated to $foo inside if inside while loop"
return 2;
fi
# Code below won't be executed since we returned from function in 'if' statement
# We aready reported the $foo var beint set to 2 anyway
echo "Value of \$foo in while loop body: $foo"
done
}
my_while_loop; foo="$?"
echo "Variable \$foo after while loop: $foo"
Output:
Setting $foo 1
Variable $foo after if statement: 1
Value of $foo in while loop body: 1
Variable $foo after while loop: 2
bash --version
GNU bash, version 3.2.51(1)-release (x86_64-apple-darwin13)
Copyright (C) 2007 Free Software Foundation, Inc.
Though this is an old question and asked several times, here's what I'm doing after hours fidgeting with here strings, and the only option that worked for me is to store the value in a file during while loop sub-shells and then retrieve it. Simple.
Use echo statement to store and cat statement to retrieve. And the bash user must chown the directory or have read-write chmod access.
#write to file
echo "1" > foo.txt
while condition; do
if (condition); then
#write again to file
echo "2" > foo.txt
fi
done
#read from file
echo "Value of \$foo in while loop body: $(cat foo.txt)"

here string in nested loop

I got such a piece of bash code:
var="empty"
find $path1 -maxdepth 3 | while read line; do
find $path2 -maxdepth 1 | while read line2; do
if [[ $line2 != $var ]]; then
echo "new value"
fi
var=$line2
done <<< "$line2"
done
The question is... how to make var stay changed? Because I would like to echo on every new value found by loops but it doesn't work ;( var="empty" every time that the second loop starts iteration.
How to make var=$line2 for every iteration?
You are reading the value into line2 with a read from stdin, and feeding the value of line2 into the loop at the done with a here-string on stdin. bash gives the here-string precedence, so line2 is only ever being assigned from line2, which means it's never set.
echo -e "one\nthree\nfive" | while read num
do echo $num
done <<< "two"
Output is two. The input stream is totally ignored.
You are also defining a nested loop for no reason, since you are never using the outer loop. Clean your code before posting please.
find ~ | while read f; do var=$f; echo $f; done
This works fine.

printing line numbers that are multiple of 5

Hi I am trying to print/echo line numbers that are multiple of 5. I am doing this in shell script. I am getting errors and unable to proceed. below is the script
#!/bin/bash
x=0
y=$wc -l $1
while [ $x -le $y ]
do
sed -n `$x`p $1
x=$(( $x + 5 ))
done
When executing above script i get below errors
#./echo5.sh sample.h
./echo5.sh: line 3: -l: command not found
./echo5.sh: line 4: [: 0: unary operator expected
Please help me with this issue.
For efficiency, you don't want to be invoking sed multiple times on your file just to select a particular line. You want to read through the file once, filtering out the lines you don't want.
#!/bin/bash
i=0
while IFS= read -r line; do
(( ++i % 5 == 0 )) && echo "$line"
done < "$1"
Demo:
$ i=0; while read line; do (( ++i % 5 == 0 )) && echo "$line"; done < <(seq 42)
5
10
15
20
25
30
35
40
A funny pure Bash possibility:
#!/bin/bash
mapfile ary < "$1"
printf "%.0s%.0s%.0s%.0s%s" "${ary[#]}"
This slurps the file into an array ary, which each line of the file in a field of the array. Then printf takes care of printing one every 5 lines: %.0s takes a field, but does nothing, and %s prints the field. Since mapfile is used without the -t option, the newlines are included in the array. Of course this really slurps the file into memory, so it might not be good for huge files. For large files you can use a callback with mapfile:
#!/bin/bash
callback() {
printf '%s' "$2"
ary=()
}
mapfile -c 5 -C callback ary < "$1"
We're removing all the elements of the array during the callback, so that the array doesn't grow too large, and the printing is done on the fly, as the file is read.
Another funny possibility, in the spirit of glenn jackmann's solution, yet without a counter (and still pure Bash):
#!/bin/bash
while read && read && read && read && IFS= read -r line; do
printf '%s\n' "$line"
done < "$1"
Use sed.
sed -n '0~5p' $1
This prints every fifth line in the file starting from 0
Also
y=$wc -l $1
wont work
y=$(wc -l < $1)
You need to create a subshell as bash will see the spaces as the end of the assignment, also if you just want the number its best to redirect the file into wc.
Dont know what you were trying to do with this ?
x=$(( $x + 5 ))
Guessing you were trying to use let, so id suggest looking up the syntax for that command. It would look more like
(( x = x + 5 ))
Hope this helps
There are cleaner ways to do it, but what you're looking for is this.
#!/bin/bash
x=5
y=`wc -l $1`
y=`echo $y | cut -f1 -d\ `
while [ "$y" -gt "$x" ]
do
sed -n "${x}p" "$1"
x=$(( $x + 5 ))
done
Initialize x to 5, since there is no "line zero" in your file $1.
Also, wc -l $1 will display the number of line counts, followed by the name of the file. Use cut to strip the file name out and keep just the first word.
In conditionals, a value of zero can be interpreted as "true" in Bash.
You should not have space between your $x and your p in your sed command. You can put them right next to each other using curly braces.
You can do this quite succinctly using awk:
awk 'NR % 5 == 0' "$1"
NR is the record number (line number in this case). Whenever it is a multiple of 5, the expression is true, so the line is printed.
You might also like the even shorter but slightly less readable:
awk '!(NR%5)' "$1"
which does the same thing.

Incrementing a variable inside a Bash loop

I'm trying to write a small script that will count entries in a log file, and I'm incrementing a variable (USCOUNTER) which I'm trying to use after the loop is done.
But at that moment USCOUNTER looks to be 0 instead of the actual value. Any idea what I'm doing wrong? Thanks!
FILE=$1
tail -n10 mylog > $FILE
USCOUNTER=0
cat $FILE | while read line; do
country=$(echo "$line" | cut -d' ' -f1)
if [ "US" = "$country" ]; then
USCOUNTER=`expr $USCOUNTER + 1`
echo "US counter $USCOUNTER"
fi
done
echo "final $USCOUNTER"
It outputs:
US counter 1
US counter 2
US counter 3
..
final 0
You are using USCOUNTER in a subshell, that's why the variable is not showing in the main shell.
Instead of cat FILE | while ..., do just a while ... done < $FILE. This way, you avoid the common problem of I set variables in a loop that's in a pipeline. Why do they disappear after the loop terminates? Or, why can't I pipe data to read?:
while read country _; do
if [ "US" = "$country" ]; then
USCOUNTER=$(expr $USCOUNTER + 1)
echo "US counter $USCOUNTER"
fi
done < "$FILE"
Note I also replaced the `` expression with a $().
I also replaced while read line; do country=$(echo "$line" | cut -d' ' -f1) with while read country _. This allows you to say while read var1 var2 ... varN where var1 contains the first word in the line, $var2 and so on, until $varN containing the remaining content.
Always use -r with read.
There is no need to use cut, you can stick with pure bash solutions.
In this case passing read a 2nd var (_) to catch the additional "fields"
Prefer [[ ]] over [ ].
Use arithmetic expressions.
Do not forget to quote variables! Link includes other pitfalls as well
while read -r country _; do
if [[ $country = 'US' ]]; then
((USCOUNTER++))
echo "US counter $USCOUNTER"
fi
done < "$FILE"
minimalist
counter=0
((counter++))
echo $counter
You're getting final 0 because your while loop is being executed in a sub (shell) process and any changes made there are not reflected in the current (parent) shell.
Correct script:
while read -r country _; do
if [ "US" = "$country" ]; then
((USCOUNTER++))
echo "US counter $USCOUNTER"
fi
done < "$FILE"
I had the same $count variable in a while loop getting lost issue.
#fedorqui's answer (and a few others) are accurate answers to the actual question: the sub-shell is indeed the problem.
But it lead me to another issue: I wasn't piping a file content... but the output of a series of pipes & greps...
my erroring sample code:
count=0
cat /etc/hosts | head | while read line; do
((count++))
echo $count $line
done
echo $count
and my fix thanks to the help of this thread and the process substitution:
count=0
while IFS= read -r line; do
((count++))
echo "$count $line"
done < <(cat /etc/hosts | head)
echo "$count"
USCOUNTER=$(grep -c "^US " "$FILE")
Incrementing a variable can be done like that:
_my_counter=$[$_my_counter + 1]
Counting the number of occurrence of a pattern in a column can be done with grep
grep -cE "^([^ ]* ){2}US"
-c count
([^ ]* ) To detect a colonne
{2} the colonne number
US your pattern
Using the following 1 line command for changing many files name in linux using phrase specificity:
find -type f -name '*.jpg' | rename 's/holiday/honeymoon/'
For all files with the extension ".jpg", if they contain the string "holiday", replace it with "honeymoon". For instance, this command would rename the file "ourholiday001.jpg" to "ourhoneymoon001.jpg".
This example also illustrates how to use the find command to send a list of files (-type f) with the extension .jpg (-name '*.jpg') to rename via a pipe (|). rename then reads its file list from standard input.

Unexpected behaviour of for

Script:
#!/bin/bash
IFS=','
i=0
for j in `cat database | head -n 1`; do
variables[$i]=$j
i=`expr $i + 1`
done
k=0
for l in `cat database | tail -n $(expr $(cat database | wc -l) - 1)`; do
echo -n $k
k=`expr $k + 1`
if [ $k -eq 3 ]; then
k=0
fi
done
Input file
a,b,c
d,e,f
g,e,f
Output
01201
Expected output
012012
The question is why the for skips last echo? It is weird, because if I change $k to $l echo will run 6 times.
Update:
#thom's analysis is correct. You can fix the problem by changing IFS=',' to IFS=$',\n'.
My original statements below may be of general interest, but do not address the specific problem.
If accidental shell expansions were a concern, here's how the loop could be rewritten (assuming it's practical to read everything into an array variable first):
IFS=$',\n' read -d '' -r -a fields < <(echo $'*,b,c\nd,e,f\ng,h,i')
for field in "${fields[#]}"; do
  # $field is '*' in 1st iteration, then 'b', 'c', 'd',...
done
Original statements:
Just a few general pointers:
You should use a while loop rather than for to read command output - see http://mywiki.wooledge.org/BashFAQ/001; the short of it: with for, the input lines are subject to various shell expansions.
A missing iteration typically stems from the last input line missing a terminating \n (or a separator as defined in $IFS). With a while loop, you can use the following approach to address this: while read -r line || [[ -n $line ]]; do …
For instance, your 2nd for loop could be rewritten as (using process substitution as input to avoid creating a subshell with a separate variable scope):
while read -r l || [[ -n $l ]]; do …; done < <(cat database | tail -n $(expr $(cat database | wc -l) - 1))
Finally, you could benefit from using modern bashisms: for instance,
k=`expr $k + 1`
could be rewritten much more succinctly as (( ++k )) (which will run faster, too).
Your code expects after EVERY read variable a comma but you only give this:
a,b,c
d,e,f
g,e,f
instead of this:
a,b,c,
d,e,f,
g,e,f,
so it reads:
d,e,f'\n'g,e,f
and that is equal to 5 values, not 6

Resources