Get old value from a loop - bash

I want to to check every 3 second if the difference of the numbers is higher than 1000. How can I get the old value, for example: 16598, every time?
while true; do
testbro=$(wc -l < /home/web/log/access.log)
echo $testbro
sleep 3
done
It outputs as wanted:
16414
16471
16533
16598
16666

If your intention is to detect when your file grows >= 1000 lines in a 3 second period, you could do this:
#!/bin/bash
last_size=$(wc -l < /home/web/log/access.log)
while true; do
sleep 3
curr_size=$(wc -l < /home/web/log/access.log)
if ((curr_size - last_size >= 1000)); then
echo "$curr_size"
fi
last_size=$curr_size
done

Related

BASH: How to write values generated by a for loop to a file quickly

I have a for loop in bash that writes values to a file. However, because there are a lot of values, the process takes a long time, which I think can be saved by improving the code.
nk=1152
nb=24
for k in $(seq 0 $((nk-1))); do
for i in $(seq 0 $((nb-1))); do
for j in $(seq 0 $((nb-1))); do
echo -e "$k\t$i\t$j"
done
done
done > file.dat
I've moved the output action to after the entire loop is done rather than echo -e "$k\t$i\t$j" >> file.dat to avoid opening and closing the file many times. However, the speed the script writes to the file is still rather slow, ~ 10kbps.
Is there a better way to improve the IO?
Many thanks
Jacek
It looks like the seq calls are fairly punishing since that is a separate process. Try this just using shell math instead:
for ((k=0;k<=$nk-1;k++)); do
for ((i=0;i<=$nb-1;i++)); do
for ((j=0;j<=$nb-1;j++)); do
echo -e "$k\t$i\t$j"
done
done
done > file.dat
It takes just 7.5s on my machine.
Another way is to compute the sequences just once and use them repeatedly, saving a lot of shell calls:
nk=1152
nb=24
kseq=$(seq 0 $((nk-1)))
bseq=$(seq 0 $((nb-1)))
for k in $kseq; do
for i in $bseq; do
for j in $bseq; do
echo -e "$k\t$i\t$j"
done
done
done > file.dat
This is not really "better" than the first option, but it shows how much of the time is spent spinning up instances of seq versus actually getting stuff done.
Bash isn't always the best for this. Consider this Ruby equivalent which runs in 0.5s:
#!/usr/bin/env ruby
nk=1152
nb=24
nk.times do |k|
nb.times do |i|
nb.times do |j|
puts "%d\t%d\t%d" % [ k, i, j ]
end
end
end
What is the most time consuming is calling seq in a nested loop. Keep in mind that each time you call seq it loads command from disk, fork a process to run it, capture the output, and store the whole output sequence into memory.
Instead of calling seq you could use an arithmetic loop:
#!/usr/bin/env bash
declare -i nk=1152
declare -i nb=24
declare -i i j k
for ((k=0; k<nk; k++)); do
for (( i=0; i<nb; i++)); do
for (( j=0; j<nb; j++)); do
printf '%d\t%d\t%d\n' "$k" "$i" "$j"
done
done
done > file.dat
Running seq in a subshell consumes most of the time.
Switch to a different language that provides all the needed features without shelling out. For example, in Perl:
#!/usr/bin/perl
use warnings;
use strict;
use feature qw{ say };
my $nk = 1152;
my $nb = 24;
for my $k (0 .. $nk - 1) {
for my $i (0 .. $nb - 1) {
for my $j (0 .. $nb - 1) {
say "$k\t$i\t$j"
}
}
}
The original bash solution runs for 22 seconds, the Perl one finishes in 0.1 seconds. The output is identical.
#Jacek : I don't think the I/O is the problem, but the number of child processes spawned. I would store the result of the seq 0 $((nb-1)) into an array and loop over the array, i.e.
nb_seq=( $(seq 0 $((nb-1)) )
...
for i in "${nb_seq[#]}"; do
for j in "${nb_seq[#]}"; do
seq is bad) once i've done this function special for this case:
$ que () { printf -v _N %$1s; _N=(${_N// / 1}); printf "${!_N[*]}"; }
$ que 10
0 1 2 3 4 5 6 7 8 9
And you can try to write first all to a var and then whole var into a file:
store+="$k\t$i\t$j\n"
printf "$store" > file
No. it's even worse like that)

bash: manager processes run

I have multiple files in $tmpdir/$i.dirlist with entries command rsync.
Each file have (depending on the amount) 10 sometimes 50 and even 150 entries of rsync.
I'm wondering now how to manage it by FOR or WHILE loop with IF sequence to run from each files ($tmpdir/$i.dirlist - if we have have 100 files) only 2 entries and wait for complet some processes and if total of all running process of rsync are less than 200 processes - launched new entries, maintaining a fixed number of processes defined in the parameter. In this case 200
Any idea? how to do it?
Edit:
about rsync entry.
In each file $tmpdir/*.dirlist is (in this example 200) entries with directory
path like:
==> /tmp/rsync.23611/0.dirlist <==
system/root/etc/ssl
system/root/etc/dbus-1
system/root/etc/lirc
system/root/etc/sysctl.d
==> /tmp/rsync.23611/1.dirlist <==
system/root/etc/binfmt.d
system/root/etc/cit
system/root/etc/gdb
==> /tmp/rsync.23611/2.dirlist <==
system/root/usr/local
system/root/usr/bin
system/root/usr/lib
now to run it i use simply for
for i in $(seq 1 $rsyncs); do
while read r; do
rsync $rsyncopts backup#$host:$remotepath/$ri $r 2>&1 |
tee $tmpdir/$i.dirlist.log ;
done < $tmpdir/$i.dirlist &
done
with an example of use
for ARG in $*; do
command $ARG &
NPROC=$(($NPROC+1))
if [ "$NPROC" -ge 4 ]; then
wait
NPROC=0
fi
done
Assuming the maximum value of $i is 100, with your code above you are still below the maximum you want to allow of 200 processes.
So a solution would be to run twice as much processes. I suggest you to divide your main loop for i in $(seq 1 $rsyncs); do ... in two loops running concurrently, introduced by resp. for i in $(seq 1 2 $rsyncs); do ... for the odd values of $i, and for i in $(seq 2 2 $rsyncs); do ... for the even values of $i.
for i in $(seq 1 2 $rsyncs); do # i = 1 3 5 ...
while read r; do
rsync $rsyncopts backup#$host:$remotepath/$ri $r 2>&1 |
tee $tmpdir/$i.dirlist.log ;
done < $tmpdir/$i.dirlist &
done & # added an ampersand here
for i in $(seq 2 2 $rsyncs); do # i = 2 4 6 ...
while read r; do
rsync $rsyncopts backup#$host:$remotepath/$ri $r 2>&1 |
tee $tmpdir/$i.dirlist.log ;
done < $tmpdir/$i.dirlist &
done
Edit: Since my approach above doesn't convince you, let us try something completely different. First, create a list of all the processes you want to run and store these in an array:
processes=() # create an empty bash array
for i in $(sed 1 $rsyncs); do
while read r; do
# add the full rsync command line to the array
processes+=("rsync $rsyncopts backup#$host:$remotepath/$ri $r 2>&1 | tee $tmpdir/$i.dirlist.log");
done < $tmpdir/$i.dirlist
done
Once you have that array, launch say 200 processes, and then enter a loop to wait a process to finish and launch the next one:
for ((j=0;j<200;j++)); do
$processes[$j]& # launch processes in background
done
while [ ! -z "$processes[$j]" ] ; do
wait # wait one process finishes
$processes[((j++))]& # launch one more process
done
Please try this and tell us.

using for loop when scripting bash shell in linux

I have this script:
#!/bin/bash
echo Id,Name,Amount,TS > lfs.csv
I want to insert values that will match the columns I created (as above in the script) , I want for example to insert: 56,"Danny",579,311413567
I want to be able to insert it using 'FOR' loop which will insert values without stopping but to change the values for each insert
More detail would be useful what you like to achieve exactly, so I made a infinite for loop which put line into the csv incremented numbers you provide by $i. ( I cannot make comment yet to ask you )
Update:
I still using a infinite loop to get a number counting up to the endless, and using a variable (u_id) to count from 1 to 100 then reset it back to 1 if it is reach 100.
#!/bin/bash
echo 'Id,Name,Amount,TS,unique_ID' > lfs.csv
u_id=0
for (( id=1 ; ;id++ ))
do
[[ $u_id == 100 ]] && (( u_id = 1 )) || (( u_id +=1 ))
echo $id",Danny_"$id","$id","$id","$u_id >> lfs.csv
done
If you like to start Amount and TS from bigger number you can do that by modifing $id to $(( id + 50000 )) like:
echo $id",Danny_"$id","$(( id + 300 ))","$(( id + 50000 ))","$u_id >> lfs.csv

How to sum a row of numbers from text file-- Bash Shell Scripting

I'm trying to write a bash script that calculates the average of numbers by rows and columns. An example of a text file that I'm reading in is:
1 2 3 4 5
4 6 7 8 0
There is an unknown number of rows and unknown number of columns. Currently, I'm just trying to sum each row with a while loop. The desired output is:
1 2 3 4 5 Sum = 15
4 6 7 8 0 Sum = 25
And so on and so forth with each row. Currently this is the code I have:
while read i
do
echo "num: $i"
(( sum=$sum+$i ))
echo "sum: $sum"
done < $2
To call the program it's stats -r test_file. "-r" indicates rows--I haven't started columns quite yet. My current code actually just takes the first number of each column and adds them together and then the rest of the numbers error out as a syntax error. It says the error comes from like 16, which is the (( sum=$sum+$i )) line but I honestly can't figure out what the problem is. I should tell you I'm extremely new to bash scripting and I have googled and searched high and low for the answer for this and can't find it. Any help is greatly appreciated.
You are reading the file line by line, and summing line is not an arithmetic operation. Try this:
while read i
do
sum=0
for num in $i
do
sum=$(($sum + $num))
done
echo "$i Sum: $sum"
done < $2
just split each number from every line using for loop. I hope this helps.
Another non bash way (con: OP asked for bash, pro: does not depend on bashisms, works with floats).
awk '{c=0;for(i=1;i<=NF;++i){c+=$i};print $0, "Sum:", c}'
Another way (not a pure bash):
while read line
do
sum=$(sed 's/[ ]\+/+/g' <<< "$line" | bc -q)
echo "$line Sum = $sum"
done < filename
Using the numsum -r util covers the row addition, but the output format needs a little glue, by inefficiently paste-ing a few utils:
paste "$2" \
<(yes "Sum =" | head -$(wc -l < "$2") ) \
<(numsum -r "$2")
Output:
1 2 3 4 5 Sum = 15
4 6 7 8 0 Sum = 25
Note -- to run the above line on a given file foo, first initialize $2 like so:
set -- "" foo
paste "$2" <(yes "Sum =" | head -$(wc -l < "$2") ) <(numsum -r "$2")

Is there a shell command to delay a buffer?

I am looking for a shell command X such as, when I execute:
command_a | X 5000 | command_b
the stdout of command_a is written in stdin of command_b (at least) 5 seconds later.
A kind of delaying buffer.
As far as I know, buffer/mbuffer can write at constant rate (a fixed number of bytes per second). Instead, I would like a constant delay in time (t=0 is when X read a command_a output chunk, at t=5000 it must write this chunk to command_b).
[edit] I've implemented it: https://github.com/rom1v/delay
I know you said you're looking for a shell command, but what about using a subshell to your advantage? Something like:
command_a | (sleep 5; command_b)
So to grep a file cat-ed through (I know, I know, bad use of cat, but just an example):
cat filename | (sleep 5; grep pattern)
A more complete example:
$ cat testfile
The
quick
brown
fox
$ cat testfile | (sleep 5; grep brown)
# A 5-second sleep occurs here
brown
Or even, as Michale Kropat recommends, a group command with sleep would also work (and is arguably more correct). Like so:
$ cat testfile | { sleep 5; grep brown; }
Note: don't forget the semicolon after your command (here, the grep brown), as it is necessary!
As it seemed such a command dit not exist, I implemented it in C:
https://github.com/rom1v/delay
delay [-b <dtbufsize>] <delay>
Something like this?
#!/bin/bash
while :
do
read line
sleep 5
echo $line
done
Save the file as "slowboy", then do
chmod +x slowboy
and run as
command_a | ./slowboy | command_b
This might work
time_buffered () {
delay=$1
while read line; do
printf "%d %s\n" "$(date +%s)" "$line"
done | while read ts line; do
now=$(date +%s)
if (( now - ts < delay)); then
sleep $(( now - ts ))
fi
printf "%s\n" "$line"
done
}
commandA | time_buffered 5 | commandB
The first loop tags each line of its input with a timestamp and immediately feeds it to the second loop. The second loop checks the timestamp of each line, and will sleep if necessary until $delay seconds after it was first read before outputting the line.
Your question intrigued me, and I decided to come back and play with it. Here is a basic implementation in Perl. It's probably not portable (ioctl), tested on Linux only.
The basic idea is:
read available input every X microseconds
store each input chunk in a hash, with current timestamp as key
also push current timestamp on a queue (array)
lookup oldest timestamps on queue and write + discard data from the hash if delayed long enough
repeat
Max buffer size
There is a max size for stored data. If reached, additional data will not be read until space becomes available after writing.
Performance
It is probably not fast enough for your requirements (several Mb/s). My max throughput was 639 Kb/s, see below.
Testing
# Measure max throughput:
$ pv < /dev/zero | ./buffer_delay.pl > /dev/null
# Interactive manual test, use two terminal windows:
$ mkfifo data_fifo
terminal-one $ cat > data_fifo
terminal-two $ ./buffer_delay.pl < data_fifo
# now type in terminal-one and see it appear delayed in terminal-two.
# It will be line-buffered because of the terminals, not a limitation
# of buffer_delay.pl
buffer_delay.pl
#!/usr/bin/perl
use strict;
use warnings;
use IO::Select;
use Time::HiRes qw(gettimeofday usleep);
require 'sys/ioctl.ph';
$|++;
my $delay_usec = 3 * 1000000; # (3s) delay in microseconds
my $buffer_size_max = 10 * 1024 * 1024 ; # (10 Mb) max bytes our buffer is allowed to contain.
# When buffer is full, incoming data will not be read
# until space becomes available after writing
my $read_frequency = 10; # Approximate read frequency in Hz (will not be exact)
my %buffer; # the data we are delaying, saved in chunks by timestamp
my #timestamps; # keys to %buffer, used as a queue
my $buffer_size = 0; # num bytes currently in %buffer, compare to $buffer_size_max
my $time_slice = 1000000 / $read_frequency; # microseconds, min time for each discrete read-step
my $sel = IO::Select->new([\*STDIN]);
my $overflow_unread = 0; # Num bytes waiting when $buffer_size_max is reached
while (1) {
my $now = sprintf "%d%06d", gettimeofday; # timestamp, used to label incoming chunks
# input available?
if ($overflow_unread || $sel->can_read($time_slice / 1000000)) {
# how much?
my $available_bytes;
if ($overflow_unread) {
$available_bytes = $overflow_unread;
}
else {
$available_bytes = pack("L", 0);
ioctl (STDIN, FIONREAD(), $available_bytes);
$available_bytes = unpack("L", $available_bytes);
}
# will it fit?
my $remaining_space = $buffer_size_max - $buffer_size;
my $try_to_read_bytes = $available_bytes;
if ($try_to_read_bytes > $remaining_space) {
$try_to_read_bytes = $remaining_space;
}
# read input
if ($try_to_read_bytes > 0) {
my $input_data;
my $num_read = read (STDIN, $input_data, $try_to_read_bytes);
die "read error: $!" unless defined $num_read;
exit if $num_read == 0; # EOF
$buffer{$now} = $input_data; # save input
push #timestamps, $now; # save the timestamp
$buffer_size += length $input_data;
if ($overflow_unread) {
$overflow_unread -= length $input_data;
}
elsif (length $input_data < $available_bytes) {
$overflow_unread = $available_bytes - length $input_data;
}
}
}
# write + delete any data old enough
my $then = $now - $delay_usec; # when data is old enough
while (scalar #timestamps && $timestamps[0] < $then) {
my $ts = shift #timestamps;
print $buffer{$ts} if defined $buffer{$ts};
$buffer_size -= length $buffer{$ts};
die "Serious problem\n" unless $buffer_size >= 0;
delete $buffer{$ts};
}
# usleep any remaining time up to $time_slice
my $time_left = (sprintf "%d%06d", gettimeofday) - $now;
usleep ($time_slice - $time_left) if $time_slice > $time_left;
}
Feel free to post comments and suggestions below!

Resources