slow running script. How can I increase its speed? - performance

How can I speed this up? it's taking about 5 minutes to make one file...
it runs correctly, but I have a little more than 100000 files to make.
Is my implementation of awk or sed slowing it down? I could break it down into several smaller loops and run it on multiple processors but one script is much easier.
#!/bin/zsh
#1000 configs per file
alpha=( a b c d e f g h i j k l m n o p q r s t u v w x y z )
m=1000 # number of configs per file
t=1 #file number
for (( i=1; i<=4; i++ )); do
for (( j=i; j<=26; j++ )); do
input="arc"${alpha[$i]}${alpha[$j]}
n=1 #line number
#length=`sed -n ${n}p $input| awk '{printf("%d",$1)}'`
#(( length= $length + 1 ))
length=644
for ((k=1; k<=$m; k++ )); do
echo "$hmbi" >> ~/Glycine_Tinker/configs/config$t.in
echo "jobtype = energy" >> ~/Glycine_Tinker/configs/config$t.in
echo "analyze_only = false" >> ~/Glycine_Tinker/configs/config$t.in
echo "qm_path = qm_$t" >> ~/Glycine_Tinker/configs/config$t.in
echo "mm_path = aiff_$t" >> ~/Glycine_Tinker/configs/config$t.in
cat head.in >> ~/Glycine_Tinker/configs/config$t.in
water=4
echo $k
for (( l=1; l<=$length; l++ )); do
natom=`sed -n ${n}p $input| awk '{printf("%d",$1)}'`
number=`sed -n ${n}p $input| awk '{printf("%d",$6)}'`
if [[ $natom -gt 10 && $number -gt 0 ]]; then
symbol=`sed -n ${n}p $input| awk '{printf("%s",$2)}'`
x=`sed -n ${n}p $input| awk '{printf("%.10f",$3)}'`
y=`sed -n ${n}p $input| awk '{printf("%.10f",$4)}'`
z=`sed -n ${n}p $input| awk '{printf("%.10f",$5)}'`
if [[ $water -eq 4 ]]; then
echo "--" >> ~/Glycine_Tinker/configs/config$t.in
echo "0 1 0.4638" >> ~/Glycine_Tinker/configs/config$t.in
water=1
fi
echo "$symbol $x $y $z" >> ~/Glycine_Tinker/configs/config$t.in
(( water= $water + 1 ))
fi
(( n= $n + 1 ))
done
cat tail.in >> ~/Glycine_Tinker/configs/config$t.in
(( t= $t + 1 ))
done
done
done

One thing that is going to be killing you here is the sheer number of processes being created. Especially when they are doing the exact same thing.
Consider doing the sed -n ${n}p $input once per loop iteration.
Also consider doing the equivalent of awk as a shell array assignment, then accessing the individual elements.
With these two things you should be able to get the 12 or so processes (and the shell invocation via back quotes) down to a single shell invocation and the backquote.

Obviously, Ed's advice is far preferable, but if you don't want to follow that, I had a couple of thoughts...
Thought 1
Rather than run echo 5 times and cat head.in onto the Glycine file, each of which causes the file to be opened, seeked (or sought maybe) to the end, and appended, you could do that in one go like this:
# Instead of
hmbi=3
echo "$hmbi" >> ~/Glycine_thing
echo "jobtype = energy" >> ~/Glycine_thing
echo "somethingelse" >> ~/Glycine_thing
echo ... >> ~/Glycine_thing
echo ... >> ~/Glycine_thing
cat ... >> ~/Glycine_thing
# Try this
{
echo "$hmbi"
echo "jobtype = energy"
echo "somethingelse"
echo
echo
cat head.in
} >> ~/Glycine_thing
# Or, better still, this
echo -e "$hmbi\njobtype = energy\nsomethingelse" >> Glycine_thing
# Or, use a here-document, as suggested by #mklement0
cat -<<EOF >>Glycine
$hmbi
jobtype = energy
next thing
EOF
Thought 2
Rather than invoke sed and awk 5 times to find 5 parameters, just let awk do what sed was doing, and also do all 5 things in one go:
read symbol x y z < <(awk '...{printf "%.10f %.10f %.10f" $2,$3,$4}' $input)

Related

How do I create large CSVs in seconds?

I am trying to create 1000s of large CSVs rapidly. This function generates the CSVs:
function csvGenerator () {
for ((i=1; i<=$NUMCSVS; i++)); do
CSVNAME=$DIRNAME"-"$CSVPREFIX$i$CSVEXT
HEADERARRAY=()
if [[ ! -e $CSVNAME ]]; then #Only create csv file if it not exist
touch $CSVNAME
echo "file: "$CSVNAME "created at $(date)" >> ../status.txt
fi
for ((j=1; j<=$NUMCOLS; j++)); do
if (( j < $NUMCOLS )) ; then
HEADERNAME=$DIRNAME"-csv-"$i"-header-"$j", "
elif (( j == $NUMCOLS )) ; then
HEADERNAME=$DIRNAME"-csv-"$i"-header-"$j
fi
HEADERARRAY+=$HEADERNAME
done
echo $HEADERARRAY > $CSVNAME
for ((k=1; k<=$NUMROWS; k++)); do
ROWARRAY=()
for ((l=1; l<=$NUMCOLS; l++)); do
if (( l < $NUMCOLS )) ; then
ROWVALUE=$DIRNAME"-csv-"$i"-r"$k"c"$l", "
elif (( l == $NUMCOLS )) ; then
ROWVALUE=$DIRNAME"-csv-"$i"-r"$k"c"$l
fi
ROWARRAY+=$ROWVALUE
done
echo $ROWARRAY >> $CSVNAME
done
done
}
The script takes ~3 mins to generate a CSV with 100k rows and 70 cols. What do I need to do to generate these CSVs at the rate of 1 CSV/~10 seconds?
Let me start by saying that bash and "performant" don't usually go together in the same sentence. As other commentators suggested, awk may be a good choice that's adjacent in some senses.
I haven't yet had a chance to run your code, but it opens and closes the output file once per row — in this example, 100,000 times. Each time it must seek to the end of the file so that it can append the latest row.
Try pulling the actual generation (everything after for ((j=1; j<=$NUMCOLS; j++)); do) into a new function, like generateCsvContents. In that new function, don't reference $CSVNAME, and remove the redirections on the echo statements. Then, in the original function, call the new function and redirect its output to the filename. Roughly:
function csvGenerator () {
for ((i=1; i<=NUMCSVS; i++)); do
CSVNAME=$DIRNAME"-"$CSVPREFIX$i$CSVEXT
if [[ ! -e $CSVNAME ]]; then #Only create csv file if it not exist
echo "file: $CSVNAME created at $(date)" >> ../status.txt
fi
# This will create $CSVNAME if it doesn't yet exist
generateCsvContents > "$CSVNAME"
done
}
function generateCsvContents() {
HEADERARRAY=()
for ((j=1; j<=NUMCOLS; j++)); do
if (( j < NUMCOLS )) ; then
HEADERNAME=$DIRNAME"-csv-"$i"-header-"$j", "
elif (( j == NUMCOLS )) ; then
HEADERNAME=$DIRNAME"-csv-"$i"-header-"$j
fi
HEADERARRAY+=$HEADERNAME
done
echo $HEADERARRAY
for ((k=1; k<=NUMROWS; k++)); do
ROWARRAY=()
for ((l=1; l<=NUMCOLS; l++)); do
if (( l < NUMCOLS )) ; then
ROWVALUE=$DIRNAME"-csv-"$i"-r"$k"c"$l", "
elif (( l == NUMCOLS )) ; then
ROWVALUE=$DIRNAME"-csv-"$i"-r"$k"c"$l
fi
ROWARRAY+=$ROWVALUE
done
echo "$ROWARRAY"
done
}
"Not this way" is I think the answer.
There are a few problems here.
You're not using your arrays as arrays. When you treat them like strings, you affect only the first element in the array, which is misleading.
The way you're using >> causes the output file to be opened and closed once for every line. That's potentially wasteful.
You're not quoting your variables. In fact, you're quoting the stuff that doesn't need quoting, and not quoting the stuff that does.
Upper case variable names are not recommended, due to the risk of collision with system variables. ref
Bash isn't good at this. Really.
A cleaned up version of your function might look like this:
csvGenerator2() {
for (( i=1; i<=NUMCSVS; i++ )); do
CSVNAME="$DIRNAME-$CSVPREFIX$i$CSVEXT"
# Only create csv file if it not exist
[[ -e "$CSVNAME" ]] && continue
touch "$CSVNAME"
date "+[%F %T] created: $CSVNAME" | tee -a status.txt >&2
HEADER=""
for (( j=1; j<=NUMCOLS; j++ )); do
printf -v HEADER '%s, %s-csv-%s-header-%s' "$HEADER" "$DIRNAME" "$i" "$j"
done
echo "${HEADER#, }" > "$CSVNAME"
for (( k=1; k<=NUMROWS; k++ )); do
ROW=""
for (( l=1; l<=NUMCOLS; l++ )); do
printf -v ROW '%s, %s-csv-%s-r%sc%s' "$ROW" "$DIRNAME" "$i" "$k" "$l"
done
echo "${ROW#, }"
done >> "$CSVNAME"
done
}
(Note that I haven't switched the variables to lower case because I'm lazy, but it's still a good idea.)
And if you were to make something functionally equivalent in awk:
csvGenerator3() {
awk -v NUMCSVS="$NUMCSVS" -v NUMCOLS="$NUMCOLS" -v NUMROWS="$NUMROWS" -v DIRNAME="$DIRNAME" -v CSVPREFIX="$CSVPREFIX" -v CSVEXT="$CSVEXT" '
BEGIN {
for ( i=1; i<=NUMCSVS; i++) {
out=sprintf("%s-%s%s%s", DIRNAME, CSVPREFIX, i, CSVEXT)
if (!system("test -e " CSVNAME)) continue
system("date '\''+[%F %T] created: " out "'\'' | tee -a status.txt >&2")
comma=""
for ( j=1; j<=NUMCOLS; j++ ) {
printf "%s%s-csv-%s-header-%s", comma, DIRNAME, i, j > out
comma=", "
}
printf "\n" >> out
for ( k=1; k<=NUMROWS; k++ ) {
comma=""
for ( l=1; l<=NUMCOLS; l++ ) {
printf "%s%s-csv-%s-r%sc%s", comma, DIRNAME, i, k, l >> out
comma=", "
}
printf "\n" >> out
}
}
}
'
}
Note that awk does not suffer from the same open/closer overhead mentioned earlier with bash; when a file is used for output or as a pipe, it gets opened once and is left open until it is closed.
Comparing the two really highlights the choice you need to make:
$ time bash -c '. file; NUMCSVS=1 NUMCOLS=10 NUMROWS=100000 DIRNAME=2 CSVPREFIX=x CSVEXT=.csv csvGenerator2'
[2019-03-29 23:57:26] created: 2-x1.csv
real 0m30.260s
user 0m28.012s
sys 0m1.395s
$ time bash -c '. file; NUMCSVS=1 NUMCOLS=10 NUMROWS=100000 DIRNAME=3 CSVPREFIX=x CSVEXT=.csv csvGenerator3'
[2019-03-29 23:58:23] created: 3-x1.csv
real 0m4.994s
user 0m3.297s
sys 0m1.639s
Note that even my optimized bash version is only a little faster than your original code.
Refactoring your two inner for-loops to loops like this will save time:
for ((j=1; j<$NUMCOLS; ++j)); do
HEADERARRAY+=$DIRNAME"-csv-"$i"-header-"$j", "
done
HEADERARRAY+=$DIRNAME"-csv-"$i"-header-"$NUMCOLS

Bash Bug Fix: Reading a text file line by line and acting upon the lines

Here's what my script looks like
#/bin/bash
touch input.txt
touch output.txt
seq -w 0 999 >>input.txt
input=$(cat input.txt)
for i in $input
do
if [ $(($i%2)) -eq 0 ]; then
echo $i "even" >> output.txt
else
echo $i "odd" >> output.txt
fi
done
Here's the result of running the script and viewing the output.txt file created
000 even
001 odd
002 even
003 odd
004 even
005 odd
006 even
007 odd
I would like the script to do this for all 1,000 lines of the script, but I get an error message on line 9 saying
./tester.sh: line 9: 008: value too great for base (error token is "008")
My end goal is for the script to add each number on a line, and then tell if the number is even or odd, outputting to output.txt for all 1000 lines of the file.
End goal output file:
000 even
001 odd
002 even
003 odd
...
505 even
506 odd
507 even
508 odd
...
998 even
999 odd
From 000 all the way to 999
Use seq as seq and use printf to print your number in the format you like.
Bash arithmetic expansion interprets strings with leading zeros as octal numbers. You can force the number to be in 10th by prefixing it with 10# like (( 10#$i % 2)).
for i in $input
do
if [ $(( 10#$i % 2)) -eq 0 ]; then
echo $i "even" >> output.txt
else
echo $i "odd" >> output.txt
fi
done
Keep in mind that arithmetic expansion (( .. )) can do comparisions. It's clear to if (( 10#$i % 2 == 0 )); then.
I find printf "%03d" "$i" to be just clearer in this case.
No need to touch a file before >>, should create the file aumatically (this can be turned off with some bash set -o option, but I haven't seen anyone use it).
input=$(cat ...); for i in $input is just bad. Don't read lines with for
I don't like temp files.
How to read file line by line.
Your script is just:
seq 0 999 | xargs -i sh -c 'printf "%03d " "$1"; (( $1 % 2 == 0 )) && echo even || echo odd;' >>output.txt
If you prefer while read:
seq 0 999 | while IFS= read -r num; do
printf "%03d " "$num";
if (( num % 2 == 0 )); then
echo even
else
echo odd
fi
done >>output.txt
Or if you have to have your input.txt file containing 000\n001\n002\n and so on it's time for a tee:
seq -w 0 999 | tee -a input.txt | while IFS= read -r num; do
echo -n "$num "
if (( 10#$num % 2 == 0 )); then
echo even
else
echo odd
fi
done >>output.txt
This is a skeleton code for reading a text file line by line and acting upon the lines... Fill the missing part according to your own needs.
#!/bin/bash
{
while read -r line; do
if (( line % 2 == 0 )); then
# ...
else
# ...
fi
done < input.txt
} > output.txt
You may also apply pre-processing to the input file with <(cmd ...) notation:
#!/bin/bash
{
while read -r line; do
...
done < <(arbitrary-cmd input.txt | another-cmd ... )
} > output.txt
This form looks nicer but it spawns a "subshell" and makes it impossible for the code inside the while block to modify variables defined outside it, should you have any.
#!/bin/bash
{
arbitrary-cmd input.txt | another-cmd ... | while read -r line; do
...
done
} > output.txt
Your script could be something like
#!/bin/ksh
input=$(seq -w 0 999)
for i in $input
do
if [ $(($i%2)) -eq 0 ];then
echo $i "even" >> output.txt
else
echo $i "odd" >> output.txt
fi
done
then your output will be something like
000 even
001 odd
002 even
003 odd
004 even
005 odd
006 even
007 odd
Then you could grep "even" or "odd" and execute what you need or you could execute your command directly inside the if/else statement.
This works too:
#!/bin/bash
for x in `seq -w 0 999`; do
if [ $((10#$x%2)) == 0 ]; then
echo $x even
else
echo $x odd
fi
done > output.txt

Use of associative array with a variable name under bash 4.1

I trying to parse multiples files like this under bash-4.1
$cat hostname_abc.txt
host_type type_foo
SoftA version123
SoftB version456
to obtain an output where you can see how many times a version of Soft[A,B] is used, grouped by host type :
$./list_versions.sh
[type_foo] 11 times
SoftA:
[version123] 1 times
[version444] 5 times
[version567] 5 times
SoftB:
[version456] 9 times
[version777] 2 times
[type_bar] 6 times
SoftA:
[version444] 6 times
SoftB:
[version111] 4 times
[version777] 2 times
I don't know in advance the list of host_type and the versions.
So I tried to save in an associative array the count of each host_type and create dynamically the names of the associatives arrays which stored the count of each version of Soft[A,B] per host_type based base on a template host_type_Soft[A,B]
I tried many times with different variations of syntax and indirections so I remade below a more readable script that follow my aim :
#!/usr/bin/env bash
# ----- generated test conditions -----
echo -e "host_type typeA\nSoftA v2\nSoftB v1" > hostname_1.txt
echo -e "host_type typeB\nSoftA v1\nSoftB v1" > hostname_2.txt
echo -e "host_type typeB\nSoftA v1\nSoftB v0" > hostname_3.txt
echo -e "host_type typeA\nSoftA v0\nSoftB v0" > hostname_4.txt
echo -e "host_type typeA\nSoftA v3\nSoftB v2" > hostname_5.txt
echo -e "host_type typeB\nSoftA v3\nSoftB v1" > hostname_6.txt
echo -e "host_type typeB\nSoftA v2\nSoftB v2" > hostname_7.txt
echo -e "host_type typeA\nSoftA v1\nSoftB v2" > hostname_8.txt
echo -e "host_type typeC\nSoftA v0\nSoftB v4" > hostname_9.txt
list_hostname() {
for i in {1..9}; do
echo "hostname_${i}.txt"
done
}
declare -A list_host_type
while read f; do
#parse the hostname files
while read l; do
[[ $l = *"host_type"* ]] && host_type="$( echo $l | cut -d' ' -f2)"
[[ $l = *"SoftA"* ]] && versionA="$( echo $l | cut -d' ' -f2)"
[[ $l = *"SoftB"* ]] && versionB="$( echo $l | cut -d' ' -f2)"
done < <( cat "$f" )
#count the number of hosts by host_type
[[ ${list_host_type[$host_type]} ]] && ((list_host_type[$host_type]++)) || list_host_type[$host_type]='1'
#create associative arrays with a name only know at runtime
declare -A "${host_type}_SoftA"
declare -A "${host_type}_SoftB"
#count the number of host for the couple host_type and Soft[A,B], stored on the dynamically named assiociative array
[[ ${${host_type}_SoftA[$versionA]} ]] && ((${host_type}_SoftA[$versionA]++)) || ${host_type}_SoftA[$versionA]='1'
[[ ${${host_type}_SoftB[$versionB]} ]] && ((${host_type}_SoftB[$versionB]++)) || ${host_type}_SoftB[$versionB]='1'
done < <( list_hostname )
#print a non pretty-formated output
echo '==== result ====='
for m in "${!list_host_type[#]}"; do
echo "host type: $m count: ${list_model[$m]}"
for versionA in "${!${m}_softA[#]}"; do
echo " SoftA version: $versionA count: ${${m}_SoftA[$versionA]}"
done
for versionB in "${!${m}_softB[#]}"; do
echo " SoftB version: $versionB count: ${${m}_SoftB[$versionB]}"
done
done
I know they are others methods to achieve my goal but I want to know if I can use associative this way with bash-4.1.
I don't think you can use dynamic variable names with arrays in Bash.
(I tried a few things but couldn't figure out the syntax.)
Even if possible, I think it would be extremely difficult to understand.
A possible workaround could be using a single associative array,
with "composite keys".
That is, for example use a comma separated value of host type, soft and version:
while read f; do
line=0
while read col1 col2; do
if [[ $line = 0 ]]; then
host_type=$col2
else
soft=$col1
version=$col2
index=$host_type,$soft,$version
((list_host_type[$index]++))
fi
((line++))
done < <( cat "$f" )
done < <( list_hostname )
for m in "${!list_host_type[#]}"; do
echo $m = ${list_host_type[$m]}
done
For your sample data this would produce:
typeA,SoftA,v2 = 1
typeA,SoftA,v3 = 1
typeA,SoftA,v0 = 1
typeA,SoftA,v1 = 1
typeB,SoftA,v3 = 1
typeB,SoftA,v2 = 1
typeB,SoftA,v1 = 2
typeA,SoftB,v2 = 2
typeA,SoftB,v1 = 1
typeA,SoftB,v0 = 1
typeC,SoftB,v4 = 1
typeB,SoftB,v2 = 1
typeB,SoftB,v0 = 1
typeB,SoftB,v1 = 2
typeC,SoftA,v0 = 1
And then work with this associative array to compute the statistics you need. Here's a rough example implementation:
get_host_types() {
local names=(${!list_host_type[#]})
printf "%s\n" "${names[#]%%,*}" | sort -u
}
get_soft() {
local host_type=$1
local names=(${!list_host_type[#]})
for name in "${names[#]}"; do
[[ ${name%%,*} = $host_type ]] && echo $name
done | cut -d, -f2 | sort -u
}
get_versions() {
local prefix=$1
local names=(${!list_host_type[#]})
for name in "${names[#]}"; do
[[ ${name%,*} = $prefix ]] && echo $name
done | cut -d, -f3 | sort -u
}
indent=" "
for host_type in $(get_host_types); do
echo "[$host_type]"
for soft in $(get_soft $host_type); do
echo "$indent$soft:"
for version in $(get_versions $host_type,$soft); do
index=$host_type,$soft,$version
echo "$indent$indent[$version] ${list_host_type[$index]} times"
done
done
done
Producing as output:
[typeA]
SoftA:
[v0] 1 times
[v1] 1 times
[v2] 1 times
[v3] 1 times
SoftB:
[v0] 1 times
[v1] 1 times
[v2] 2 times
[typeB]
SoftA:
[v1] 2 times
[v2] 1 times
[v3] 1 times
SoftB:
[v0] 1 times
[v1] 2 times
[v2] 1 times
[typeC]
SoftA:
[v0] 1 times
SoftB:
[v4] 1 times
All in all, it would be better to implement this using a proper programming language.

To Continuously loop using for in shell scripting

for m in $count
do
`cat $op ${arr[$m]} > $op1`
`rm -f $op`
`touch $op`
`cat $op1 ${arr[$m+1]} > $op`
if [ $m ge $count ]; then
`rm -f $op1`
`touch $op1`
fi
m=$((m+1))
done
I wanted to continuously loop from the start count 2 till the end count 10 . The $count=10 here. But the above piece of code executes the for loop only once.
Rainy sunday - having much free time - long answer ;)
Many issues with your script, some recommended solutions. Because you used the construction m=$((m+1)) - will be using bash as "shell". (Consider adding the bash tag)
For the cycle - several possibilities
count=10
m=2 #start with 2
while (( $m <= $count )) #while m is less or equal to 10
do #do
echo $m #this action
let m++ #increment m (add one to m)
done #end of while
or, if the count is a constant (not a variable), you can write
for m in {2..10} #REMEMBER, will not works with a variables, like {2..$count}
do
echo "$m"
done
another variant - using the seq (man seq) command for counting
for m in $(seq 2 ${count:=10}) # ${count:=10} - defaults the $count to 10 if it is undefined
do
echo $m
done
or C-like for loop
let count=10
for ((m=2; m<=count; m++))
do
echo $m
done
All 4 loops produces:
2
3
4
5
6
7
8
9
10
so, having a right cycle now. Now add your specific actions.
The:
rm -f $op
touch $op
can be replaced by one command
echo -n > $op #echo nothing and write the "nothing" into the file
it is faster, because the echo is an bash builtin (doesn't start two external commands)
So your actions could looks like
cat $op ${arr[$m]} > $op1
echo -n > $op
cat $op1 ${arr[$m+1]} > $op
in this case, the echo is useless, because the second cat will write its output
to the $op anyway (and before write shortens the file to zero size), so this result is
identical with the above
cat $op ${arr[$m]} > $op1
cat $op1 ${arr[$m+1]} > $op
Those two cat commands can be shorted to one, using bash's >> append to file redirection
cat ${arr[$m]} ${arr[m+1]} >> $op
The whole script could look like the next
#making a testing environment
for f in $(seq 12) #create 12 files opdata-N
do
arr[$f]="opdata-$f" #store the filenames in the array "arr"
echo "data-$f" > ${arr[$f]} #each file contains one line "data-N"
done
#echo ${arr[#]}
#setting the $op and $op1 filenames
#consider choosing more descriptive variable names
op="file_op"
#op1="file_op1" #not needed
#add some initial (old) value to $op
echo "initial value" > $op
#end of creating the testing environment
#the script
count=10
for m in $(seq 2 $count)
do
cat ${arr[$m]} ${arr[m+1]} >> $op
done
at the end, file $op will contain:
initial value
data-2
data-3
data-3
data-4
data-4
data-5
data-5
data-6
data-6
data-7
data-7
data-8
data-8
data-9
data-9
data-10
data-10
data-11
BTW, are you sure about the result? Because if only want add file-2 .. file-10 to the end of $op (without duplicating entries), you can simple write:
cat file-{2..10} >> $op #the '>>' adds to the end of file...
or by using your array:
startpos=2
count=10
cat ${arr[#]:$startpos:$count} >> $op
Ufff.. ;)
Ps: usually it is good practice to enclose variables in double quotes like "$filename" - in the above examples for better readability I omitted them.
Any loop needs a "condition to keep looping". When you use a
for m in count
type of loop, the condition is "if there are more elements in the collection count, pick the next one and keep going". This doesn't seem to be what you want. You are looking for the bash equivalent of
for(m = 0; m < 10; m++)
I think. The best way to do this is - with exactly that kind of loop (but note - an extra pair of parentheses, and a semicolon):
#!/bin/bash
# Display message 5 times
for ((i = 0 ; i < 5 ; i++)); do
echo "Welcome $i times."
done
see nix craft for original
I think you can extend this to your situation… if I understood your question correctly you need something like this:
for ((m = 2; m <= 10; m++))
do
cat $op ${arr[$m]} > $op1
rm -f $op
touch $op
cat $op1 ${arr[$m+1]} > $op
if [ $m ge $count ]; then
rm -f $op1
touch $op1
fi
done
Use a while loop instead.
The for loop is when you have multiple objects to iterate against. You have only one, i.e. $count.

Comparison between array items

I've written a script to calculate the bandwidth usage of an OpenVZ container over time and suspend it if it uses too much too quickly. Here is the script so far:
#!/bin/bash
# Thresholds are in bytes per second
LOGDIR="/var/log/outbound_ddos"
THRESHOLD1=65536
THRESHOLD2=117964
while [ 1 ]
do
for veid in $(/usr/sbin/vzlist -o veid -H)
do
# Create the log file if it doesn't already exist
if ! test -e $LOGDIR/$veid.log; then
touch $LOGDIR/$veid.log
fi
# Parse out the inbound/outbound traffic and assign them to the corresponding variables
eval $(/usr/sbin/vzctl exec $veid "grep venet0 /proc/net/dev" | \
awk -F: '{print $2}' | awk '{printf"CTOUT=%s\n", $9}')
# Print the output and a timestamp to a log file
echo $(date +%s) $CTOUT >> $LOGDIR/$veid.log
# Read last 10 entries into arrays
i=0
tail $LOGDIR/$veid.log | while read time byte
do
times[i]=$time
bytes[i]=$byte
let ++i
done
# Time checks & calculations for higher threshold
counter=0
for (( i=0; i<9; i++ ))
do
# If we have roughly the right timestamp
if (( times[9-i] < times[8-i] + 20 ))
then
# If the user has gone over the threshold
if (( bytes[9-i] > bytes[8-i] + THRESHOLD2 * 10 ))
then let ++counter
fi
fi
done
# Now check counter
if (( counter == 9 ))
then vzctl stop $veid
fi
# Same for lower threshold
counter=0
for (( i=0; i<3; i++ ))
do
# If we have roughly the right timestamp
if (( times[3-i] < times[2-i] + 20 ))
then
# If the user has gone over the threshold
if (( bytes[3-i] > bytes[2-i] + THRESHOLD1 * 10 ))
then let ++counter
fi
fi
done
# Now check counter
if (( counter == 2 ))
then vzctl stop $veid
fi
done
sleep 10
done
I've checked the numbers in /var/log/outbound_ddos/vm101.log and they're increasing by more than the threshold, but nothing is happening.
I added some echo statements to try and figure out where the problem is and it seems to be this comparison that's returning false:
if (( bytes[9-i] > bytes[8-i] + THRESHOLD2 * 10 ))
So then I tried the following, which printed out nothing:
echo ${bytes[9-i]}
Could anyone point me in the right direction? I think the script is nearly done, probably something very simple.
Your shell runs the while read loop in a subshell (see here for why it does not work as expected), so your array magic does not propagate outside the tail | while construct.
Read this and fix accordingly :-)

Resources