How do I create large CSVs in seconds?

How do I create large CSVs in seconds? - bash

I am trying to create 1000s of large CSVs rapidly. This function generates the CSVs:
function csvGenerator () {
for ((i=1; i<=$NUMCSVS; i++)); do
CSVNAME=$DIRNAME"-"$CSVPREFIX$i$CSVEXT
HEADERARRAY=()
if [[ ! -e $CSVNAME ]]; then #Only create csv file if it not exist
touch $CSVNAME
echo "file: "$CSVNAME "created at $(date)" >> ../status.txt
fi
for ((j=1; j<=$NUMCOLS; j++)); do
if (( j < $NUMCOLS )) ; then
HEADERNAME=$DIRNAME"-csv-"$i"-header-"$j", "
elif (( j == $NUMCOLS )) ; then
HEADERNAME=$DIRNAME"-csv-"$i"-header-"$j
fi
HEADERARRAY+=$HEADERNAME
done
echo $HEADERARRAY > $CSVNAME
for ((k=1; k<=$NUMROWS; k++)); do
ROWARRAY=()
for ((l=1; l<=$NUMCOLS; l++)); do
if (( l < $NUMCOLS )) ; then
ROWVALUE=$DIRNAME"-csv-"$i"-r"$k"c"$l", "
elif (( l == $NUMCOLS )) ; then
ROWVALUE=$DIRNAME"-csv-"$i"-r"$k"c"$l
fi
ROWARRAY+=$ROWVALUE
done
echo $ROWARRAY >> $CSVNAME
done
done
}
The script takes ~3 mins to generate a CSV with 100k rows and 70 cols. What do I need to do to generate these CSVs at the rate of 1 CSV/~10 seconds?

Let me start by saying that bash and "performant" don't usually go together in the same sentence. As other commentators suggested, awk may be a good choice that's adjacent in some senses.
I haven't yet had a chance to run your code, but it opens and closes the output file once per row — in this example, 100,000 times. Each time it must seek to the end of the file so that it can append the latest row.
Try pulling the actual generation (everything after for ((j=1; j<=$NUMCOLS; j++)); do) into a new function, like generateCsvContents. In that new function, don't reference $CSVNAME, and remove the redirections on the echo statements. Then, in the original function, call the new function and redirect its output to the filename. Roughly:
function csvGenerator () {
for ((i=1; i<=NUMCSVS; i++)); do
CSVNAME=$DIRNAME"-"$CSVPREFIX$i$CSVEXT
if [[ ! -e $CSVNAME ]]; then #Only create csv file if it not exist
echo "file: $CSVNAME created at $(date)" >> ../status.txt
fi
# This will create $CSVNAME if it doesn't yet exist
generateCsvContents > "$CSVNAME"
done
}
function generateCsvContents() {
HEADERARRAY=()
for ((j=1; j<=NUMCOLS; j++)); do
if (( j < NUMCOLS )) ; then
HEADERNAME=$DIRNAME"-csv-"$i"-header-"$j", "
elif (( j == NUMCOLS )) ; then
HEADERNAME=$DIRNAME"-csv-"$i"-header-"$j
fi
HEADERARRAY+=$HEADERNAME
done
echo $HEADERARRAY
for ((k=1; k<=NUMROWS; k++)); do
ROWARRAY=()
for ((l=1; l<=NUMCOLS; l++)); do
if (( l < NUMCOLS )) ; then
ROWVALUE=$DIRNAME"-csv-"$i"-r"$k"c"$l", "
elif (( l == NUMCOLS )) ; then
ROWVALUE=$DIRNAME"-csv-"$i"-r"$k"c"$l
fi
ROWARRAY+=$ROWVALUE
done
echo "$ROWARRAY"
done
}

"Not this way" is I think the answer.
There are a few problems here.
You're not using your arrays as arrays. When you treat them like strings, you affect only the first element in the array, which is misleading.
The way you're using >> causes the output file to be opened and closed once for every line. That's potentially wasteful.
You're not quoting your variables. In fact, you're quoting the stuff that doesn't need quoting, and not quoting the stuff that does.
Upper case variable names are not recommended, due to the risk of collision with system variables. ref
Bash isn't good at this. Really.
A cleaned up version of your function might look like this:
csvGenerator2() {
for (( i=1; i<=NUMCSVS; i++ )); do
CSVNAME="$DIRNAME-$CSVPREFIX$i$CSVEXT"
# Only create csv file if it not exist
[[ -e "$CSVNAME" ]] && continue
touch "$CSVNAME"
date "+[%F %T] created: $CSVNAME" | tee -a status.txt >&2
HEADER=""
for (( j=1; j<=NUMCOLS; j++ )); do
printf -v HEADER '%s, %s-csv-%s-header-%s' "$HEADER" "$DIRNAME" "$i" "$j"
done
echo "${HEADER#, }" > "$CSVNAME"
for (( k=1; k<=NUMROWS; k++ )); do
ROW=""
for (( l=1; l<=NUMCOLS; l++ )); do
printf -v ROW '%s, %s-csv-%s-r%sc%s' "$ROW" "$DIRNAME" "$i" "$k" "$l"
done
echo "${ROW#, }"
done >> "$CSVNAME"
done
}
(Note that I haven't switched the variables to lower case because I'm lazy, but it's still a good idea.)
And if you were to make something functionally equivalent in awk:
csvGenerator3() {
awk -v NUMCSVS="$NUMCSVS" -v NUMCOLS="$NUMCOLS" -v NUMROWS="$NUMROWS" -v DIRNAME="$DIRNAME" -v CSVPREFIX="$CSVPREFIX" -v CSVEXT="$CSVEXT" '
BEGIN {
for ( i=1; i<=NUMCSVS; i++) {
out=sprintf("%s-%s%s%s", DIRNAME, CSVPREFIX, i, CSVEXT)
if (!system("test -e " CSVNAME)) continue
system("date '\''+[%F %T] created: " out "'\'' | tee -a status.txt >&2")
comma=""
for ( j=1; j<=NUMCOLS; j++ ) {
printf "%s%s-csv-%s-header-%s", comma, DIRNAME, i, j > out
comma=", "
}
printf "\n" >> out
for ( k=1; k<=NUMROWS; k++ ) {
comma=""
for ( l=1; l<=NUMCOLS; l++ ) {
printf "%s%s-csv-%s-r%sc%s", comma, DIRNAME, i, k, l >> out
comma=", "
}
printf "\n" >> out
}
}
}
'
}
Note that awk does not suffer from the same open/closer overhead mentioned earlier with bash; when a file is used for output or as a pipe, it gets opened once and is left open until it is closed.
Comparing the two really highlights the choice you need to make:
$ time bash -c '. file; NUMCSVS=1 NUMCOLS=10 NUMROWS=100000 DIRNAME=2 CSVPREFIX=x CSVEXT=.csv csvGenerator2'
[2019-03-29 23:57:26] created: 2-x1.csv
real 0m30.260s
user 0m28.012s
sys 0m1.395s
$ time bash -c '. file; NUMCSVS=1 NUMCOLS=10 NUMROWS=100000 DIRNAME=3 CSVPREFIX=x CSVEXT=.csv csvGenerator3'
[2019-03-29 23:58:23] created: 3-x1.csv
real 0m4.994s
user 0m3.297s
sys 0m1.639s
Note that even my optimized bash version is only a little faster than your original code.

Refactoring your two inner for-loops to loops like this will save time:
for ((j=1; j<$NUMCOLS; ++j)); do
HEADERARRAY+=$DIRNAME"-csv-"$i"-header-"$j", "
done
HEADERARRAY+=$DIRNAME"-csv-"$i"-header-"$NUMCOLS

Related

trouble debugging a .bash file that separates and echoes either student or student score

Like the title implies, I'm currently having trouble debugging a .bash file. As you can tell, I'm still very new to bash and unix in general. Inside The inputfile that hw3.bash executes on contains the following strings in two columns:
jack 80
mary 95
michael 60
jeffrey 90
The file will print out the names only, then scores only, then highest score, then lowest, then rank, etc.
Both lines 72 and 150 are the hiccups in the program that i cannot seem to debug. Is this a indentation issue or simply a grammatical issue.
line 72: unexpected EOF while looking for matching `''
line 150: syntax error: unexpected end of file
#1.The first positional argument ($1) provides the file name. Assign it to a
# variable named inputfile.
inputfile=$1
#2.Test if the variable inputfile refers to a file. If not, print out a proper
# message and exit with an exit code of 1.
if [[ -e inputfile ]]; then
echo "inputfile is not a file'
exit 1
fi
#3.Suppose each line of the input file is made up of the first name of a
#student followed by a score (the first name and score are separate by a
# space). Put the first names in the input file into an array called names.
record=( $(cat $inputfile | awk 'END{print NR}' ))
names=( $(cat $inputfile | awk'{print $1}'))
echo ${names[*]}
#4.Using a similar approach, put the corresponding scores into an
# array called scores.
scores=( $(cat $inputfile | awk'{print $2}'))
echo ${scores[*]}
#5.Print names and corresponding scores (i.e. output should look the same as
# the input file). You must use a loop to do this.
names=( $(cat $inputfile | awk'{print $1}'))
echo${names[*]}
scores=( $(cat $inputfile | awk'{print $2}'))
echo${names[*]}
for((i=0;i<${names[*] && $i<scores[*];i++))
do
echo{$names[i]} {$scores[*]}
done
#6.Find the highest scorer, and print the name and the score. You may assume
#that there is only one highest score.
maxVal=1
maxIndex=1
for(( i=0; i<$#names[*]} && i<${#scores[*]}; i++ ))
do
if [[ (-n ${scores[$i]}) ]]
then
if(( ${scores[$i]} > $maxValue ))
then
maxVal=${scores[$i]}
maxIndex=$i
fi
fi
done
echo "Highest Scorer:" ${names[$maxIndex]} " " $maxVal
#7.Find the lowest scorer, and print the name and the score. You may assume
#that there is only one lowest score.
minVal=10000
maxIndex=1
for(( i=0; i<${#names[*]} && i<${#scores[*]}; i++ ))
do
if [[ (-n ${scores[$i]}) ]]
then
if(( $minVal > ${scores[i]} ))
then
minVal=${scores[$i]}
minIndex=$i;
fi
done
echo "Lower Scorer:' ${names[$minIndex]} " " $minVal
#8.Calculate the average of all scores, and print it.
avg=0
total=0;
for(( i=0;i<${#names[*]} && i<${$#scores[*]};i++ ))
do
if [[ (-n ${scores[$i]} ]]
then
total=$(( $total + ${scores[$i]} ))
fi
#9.Sort the arrays in terms of scores (in descending order). The final
#position of each name in the array names must match with the position of the
# corresponding score in the array scores.
m=${names[*]}
n=${scores[*]}
for(( i=0; i<$n && i<$m; i++ ))
do
maxValue=1
maxIndex=0
for(( i=0; j<$n && j<$m; j++))
do
if [[ !(-z ${scores[$j]} ) ]]
then
if (( $maxVal < ${scores[$j]} ))
then
maxVal=${scores[$j]}
maxIndex=$j
fi
fi
done
a1[${#a1[*]}]=$maxVal
a2[${#a2[*]}]=${names[$maxIndex]}
unset scores[$maxIndex]
done
for(( i=0; j<${#a2[*]} && i < ${#a1[*]; i++ ))
do
echo ${a2[i]} ${a1[i]}
done
#10.Print sorted names and scores. Put the rank of each student before the
m=${#names[*]}
n=${#scores[*]}
for (( i=0; i<$n && i<$m; i++ ))
do
maxVal=1
maxIndex=0
for (( j=0: j<$n && j<$m; j++ ))
do
if [[ !_-z ${scores[$j]}) ]]
then
if (( $maxVal < ${scores[$j]}
maxIndex=$j
fi
fi
done
a1[${#a1[*]}]=$maxVal
a2[${#a2[*]}=${names[$maxIndex]}
unset scores[$maxIndex]
done
k=1
while [[ $k -lt ${#a2[*]} ]]
do
for(( i=0;i < ${#a2[*]} && i < ${#a1[*]};i++ ))
do
echo "Ranking" $k "is:" ${a2[i]}
k=$(($k+1))
done
done

Like you can see in your line, just before the then one, you mixed single and double quotes:
echo "inputfile is not a file'
Just replace the final simple quote, by a double one.

slow running script. How can I increase its speed?

How can I speed this up? it's taking about 5 minutes to make one file...
it runs correctly, but I have a little more than 100000 files to make.
Is my implementation of awk or sed slowing it down? I could break it down into several smaller loops and run it on multiple processors but one script is much easier.
#!/bin/zsh
#1000 configs per file
alpha=( a b c d e f g h i j k l m n o p q r s t u v w x y z )
m=1000 # number of configs per file
t=1 #file number
for (( i=1; i<=4; i++ )); do
for (( j=i; j<=26; j++ )); do
input="arc"${alpha[$i]}${alpha[$j]}
n=1 #line number
#length=`sed -n ${n}p $input| awk '{printf("%d",$1)}'`
#(( length= $length + 1 ))
length=644
for ((k=1; k<=$m; k++ )); do
echo "$hmbi" >> ~/Glycine_Tinker/configs/config$t.in
echo "jobtype = energy" >> ~/Glycine_Tinker/configs/config$t.in
echo "analyze_only = false" >> ~/Glycine_Tinker/configs/config$t.in
echo "qm_path = qm_$t" >> ~/Glycine_Tinker/configs/config$t.in
echo "mm_path = aiff_$t" >> ~/Glycine_Tinker/configs/config$t.in
cat head.in >> ~/Glycine_Tinker/configs/config$t.in
water=4
echo $k
for (( l=1; l<=$length; l++ )); do
natom=`sed -n ${n}p $input| awk '{printf("%d",$1)}'`
number=`sed -n ${n}p $input| awk '{printf("%d",$6)}'`
if [[ $natom -gt 10 && $number -gt 0 ]]; then
symbol=`sed -n ${n}p $input| awk '{printf("%s",$2)}'`
x=`sed -n ${n}p $input| awk '{printf("%.10f",$3)}'`
y=`sed -n ${n}p $input| awk '{printf("%.10f",$4)}'`
z=`sed -n ${n}p $input| awk '{printf("%.10f",$5)}'`
if [[ $water -eq 4 ]]; then
echo "--" >> ~/Glycine_Tinker/configs/config$t.in
echo "0 1 0.4638" >> ~/Glycine_Tinker/configs/config$t.in
water=1
fi
echo "$symbol $x $y $z" >> ~/Glycine_Tinker/configs/config$t.in
(( water= $water + 1 ))
fi
(( n= $n + 1 ))
done
cat tail.in >> ~/Glycine_Tinker/configs/config$t.in
(( t= $t + 1 ))
done
done
done

One thing that is going to be killing you here is the sheer number of processes being created. Especially when they are doing the exact same thing.
Consider doing the sed -n ${n}p $input once per loop iteration.
Also consider doing the equivalent of awk as a shell array assignment, then accessing the individual elements.
With these two things you should be able to get the 12 or so processes (and the shell invocation via back quotes) down to a single shell invocation and the backquote.

Obviously, Ed's advice is far preferable, but if you don't want to follow that, I had a couple of thoughts...
Thought 1
Rather than run echo 5 times and cat head.in onto the Glycine file, each of which causes the file to be opened, seeked (or sought maybe) to the end, and appended, you could do that in one go like this:
# Instead of
hmbi=3
echo "$hmbi" >> ~/Glycine_thing
echo "jobtype = energy" >> ~/Glycine_thing
echo "somethingelse" >> ~/Glycine_thing
echo ... >> ~/Glycine_thing
echo ... >> ~/Glycine_thing
cat ... >> ~/Glycine_thing
# Try this
{
echo "$hmbi"
echo "jobtype = energy"
echo "somethingelse"
echo
echo
cat head.in
} >> ~/Glycine_thing
# Or, better still, this
echo -e "$hmbi\njobtype = energy\nsomethingelse" >> Glycine_thing
# Or, use a here-document, as suggested by #mklement0
cat -<<EOF >>Glycine
$hmbi
jobtype = energy
next thing
EOF
Thought 2
Rather than invoke sed and awk 5 times to find 5 parameters, just let awk do what sed was doing, and also do all 5 things in one go:
read symbol x y z < <(awk '...{printf "%.10f %.10f %.10f" $2,$3,$4}' $input)

var+= adds string instead of sum

I have to make a script that makes the sum of all processes that have PIDS greater than 20. It kind of works but something is going wrong. I know it's something simple, base thing, but I can't get the logic.
For example I have 2 processes, "1504" and "1405". My result of sum is 15041405. How do I have to re-write sum+=${proc[$i]} to make an actual sum, not string attach?
proc=( $(ps | cut -d ' ' -f1) )
nr=$(ps | cut -d ' ' -f1 | wc -l)
sum=0
for (( i=0 ; i<=nr ; i++)); do
[[ ${proc[$i]} -gt 20 ]] && sum+=${proc[$i]}
done
echo $sum

Use a math context. In POSIX sh:
sum=$(( sum + val ))
...or, also valid POSIX:
: "$(( sum += val ))"
...or, in bash:
(( sum += val ))
You can also use much easier-to-read comparison operations in a math context, rather than using -gt inside of a non-math test context. In bash:
(( ${proc[$i]} >= 20 )) && (( sum += ${proc[$i]} ))
...or in POSIX shell (which doesn't support arrays, and so cannot exactly reproduce your sample code):
: "$(( sum += ( val < 20 ) ? 0 : val ))"
If you were trying to do the (more sensible) operation of counting PIDs, I'd consider an implementation more like the following (bash-only and Linux-only, but considerably more efficient):
count=0
for pid_file in /proc/[0-9]*; do
pid=${pid_file##*/}
(( pid > 20 )) && (( count++ ))
done
printf '%s\n' "$count"
...or, to put more of the effort on the glob engine:
# avoid inflating the result with non-matching globs
shopt -s nullglob
# define a function to avoid overwriting the global "$#" array
count_procs() {
set -- /proc/[3-9][0-9] /proc/[0-9][0-9][0-9]*
echo "$#"
}
# ...used as follows:
count_procs

How to write a tail script without the tail command

How would you achieve this in bash. It's a question I got asked in an interview and I could think of answers in high level languages but not in shell.
As I understand it, the real implementation of tail seeks to the end of the file and then reads backwards.

The main idea is to keep a fixed-size buffer and to remember the last lines. Here's a quick way to do a tail using the shell:
#!/bin/bash
SIZE=5
idx=0
while read line
do
arr[$idx]=$line
idx=$(( ( idx + 1 ) % SIZE ))
done < text
for ((i=0; i<SIZE; i++))
do
echo ${arr[$idx]}
idx=$(( ( idx + 1 ) % SIZE ))
done

If all not-tail commands are allowed, why not be whimsical?
#!/bin/sh
[ -r "$1" ] && exec < "$1"
tac | head | tac

Use wc -l to count the number of lines in the file. Subtract the number of lines you want from this, and add 1, to get the starting line number. Then use this with sed or awk to start printing the file from that line number, e.g.
sed -n "$start,\$p"

There's this:
#!/bin/bash
readarray file
lines=$(( ${#file[#]} - 1 ))
for (( line=$(($lines-$1)), i=${1:-$lines}; (( line < $lines && i > 0 )); line++, i-- )); do
echo -ne "${file[$line]}"
done
Based on this answer: https://stackoverflow.com/a/8020488/851273
You pass in the number of lines at the end of the file you want to see then send the file via stdin, puts the entire file into an array, and only prints the last # lines of the array.

The only way I can think of in “pure” shell is to do a while read linewise on the whole file into an array variable with indexing modulo n, where n is the number of tail lines (default 10) — i.e. a circular buffer, then iterate over the circular buffer from where you left off when the while read ends. It's not efficient or elegant, in any sense, but it'll work and avoids reading the whole file into memory. For example:
#!/bin/bash
incmod() {
let i=$1+1
n=$2
if [ $i -ge $2 ]; then
echo 0
else
echo $i
fi
}
n=10
i=0
buffer=
while read line; do
buffer[$i]=$line
i=$(incmod $i $n)
done < $1
j=$i
echo ${buffer[$i]}
i=$(incmod $i $n)
while [ $i -ne $j ]; do
echo ${buffer[$i]}
i=$(incmod $i $n)
done

This script somehow imitates tail:
#!/bin/bash
shopt -s extglob
LENGTH=10
while [[ $# -gt 0 ]]; do
case "$1" in
--)
FILES+=("${#:2}")
break
;;
-+([0-9]))
LENGTH=${1#-}
;;
-n)
if [[ $2 != +([0-9]) ]]; then
echo "Invalid argument to '-n': $1"
exit 1
fi
LENGTH=$2
shift
;;
-*)
echo "Unknown option: $1"
exit 1
;;
*)
FILES+=("$1")
;;
esac
shift
done
PRINTHEADER=false
case "${#FILES[#]}" in
0)
FILES=("/dev/stdin")
;;
1)
;;
*)
PRINTHEADER=true
;;
esac
IFS=
for I in "${!FILES[#]}"; do
F=${FILES[I]}
if [[ $PRINTHEADER == true ]]; then
[[ I -gt 0 ]] && echo
echo "==> $F <=="
fi
if [[ LENGTH -gt 0 ]]; then
LINES=()
COUNT=0
while read -r LINE; do
LINES[COUNT++ % LENGTH]=$LINE
done < "$F"
for (( I = COUNT >= LENGTH ? LENGTH : COUNT; I; --I )); do
echo "${LINES[--COUNT % LENGTH]}"
done
fi
done
Example run:
> bash script.sh -n 12 <(yes | sed 20q) <(yes | sed 5q)
==> /dev/fd/63 <==
y
y
y
y
y
y
y
y
y
y
y
y
==> /dev/fd/62 <==
y
y
y
y
y
> bash script.sh -4 <(yes | sed 200q)
y
y
y
y

Here's the answer I would give if I were actually asked this question in an interview:
What environment is this where I have bash but not tail? Early boot scripts, maybe? Can we get busybox in there so we can use the full complement of shell utilities? Or maybe we should see if we can squeeze a stripped-down Perl interpreter in, even without most of the modules that would make life a whole lot easier. You know dash is much smaller than bash and perfectly good for scripting use, right? That might also help. If none of that is an option, we should check how much space a statically linked C mini-tail would need, I bet I can fit it in the same number of disk blocks as the shell script you want.
If that doesn't convince the interviewer that it's a silly question, then I go on to observe that I don't believe in using bash extensions, because the only good reason to write anything complicated in shell script nowadays is if total portability is an overriding concern. By avoiding anything that isn't portable even in one-offs, I don't develop bad habits, and I don't get tempted to do something in shell when it would be better done in a real programming language.
Now the thing is, in truly portable shell, arrays may not be available. (I don't actually know whether the POSIX shell spec has arrays, but there certainly are legacy-Unix shells that don't have them.) So, if you have to emulate tail using only shell builtins and it's got to work everywhere, this is the best you can do, and yes, it's hideous, because you're writing in the wrong language:
#! /bin/sh
a=""
b=""
c=""
d=""
e=""
f=""
while read x; do
a="$b"
b="$c"
c="$d"
d="$e"
e="$f"
f="$x"
done
printf '%s\n' "$a"
printf '%s\n' "$b"
printf '%s\n' "$c"
printf '%s\n' "$d"
printf '%s\n' "$e"
printf '%s\n' "$f"
Adjust the number of variables to match the number of lines you want to print.
The battle-scarred will note that printf is not 100% available either. Unfortunately, if all you have is echo, you are up a creek: some versions of echo cannot print the literal string "-n", and others cannot print the literal string "\n", and even figuring out which one you have is a bit of a pain, particularly as, if you don't have printf (which is in POSIX), you probably don't have user-defined functions either.
(N.B. The code in this answer, sans rationale, was originally posted by user 'Nirk' but then deleted under downvote pressure from people whom I shall charitably assume were not aware that some shells do not have arrays.)

Comparison between array items

I've written a script to calculate the bandwidth usage of an OpenVZ container over time and suspend it if it uses too much too quickly. Here is the script so far:
#!/bin/bash
# Thresholds are in bytes per second
LOGDIR="/var/log/outbound_ddos"
THRESHOLD1=65536
THRESHOLD2=117964
while [ 1 ]
do
for veid in $(/usr/sbin/vzlist -o veid -H)
do
# Create the log file if it doesn't already exist
if ! test -e $LOGDIR/$veid.log; then
touch $LOGDIR/$veid.log
fi
# Parse out the inbound/outbound traffic and assign them to the corresponding variables
eval $(/usr/sbin/vzctl exec $veid "grep venet0 /proc/net/dev" | \
awk -F: '{print $2}' | awk '{printf"CTOUT=%s\n", $9}')
# Print the output and a timestamp to a log file
echo $(date +%s) $CTOUT >> $LOGDIR/$veid.log
# Read last 10 entries into arrays
i=0
tail $LOGDIR/$veid.log | while read time byte
do
times[i]=$time
bytes[i]=$byte
let ++i
done
# Time checks & calculations for higher threshold
counter=0
for (( i=0; i<9; i++ ))
do
# If we have roughly the right timestamp
if (( times[9-i] < times[8-i] + 20 ))
then
# If the user has gone over the threshold
if (( bytes[9-i] > bytes[8-i] + THRESHOLD2 * 10 ))
then let ++counter
fi
fi
done
# Now check counter
if (( counter == 9 ))
then vzctl stop $veid
fi
# Same for lower threshold
counter=0
for (( i=0; i<3; i++ ))
do
# If we have roughly the right timestamp
if (( times[3-i] < times[2-i] + 20 ))
then
# If the user has gone over the threshold
if (( bytes[3-i] > bytes[2-i] + THRESHOLD1 * 10 ))
then let ++counter
fi
fi
done
# Now check counter
if (( counter == 2 ))
then vzctl stop $veid
fi
done
sleep 10
done
I've checked the numbers in /var/log/outbound_ddos/vm101.log and they're increasing by more than the threshold, but nothing is happening.
I added some echo statements to try and figure out where the problem is and it seems to be this comparison that's returning false:
if (( bytes[9-i] > bytes[8-i] + THRESHOLD2 * 10 ))
So then I tried the following, which printed out nothing:
echo ${bytes[9-i]}
Could anyone point me in the right direction? I think the script is nearly done, probably something very simple.

Your shell runs the while read loop in a subshell (see here for why it does not work as expected), so your array magic does not propagate outside the tail | while construct.
Read this and fix accordingly :-)

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio