Efficient way of sending the same data to multiple dynamic processes - shell

I have a stream of line-buffered data, and many readers from other processes
The readers need to attach to the system dynamically, they are not known to the process writing the stream
First i tried to read every line and simply send them to a lot of pipes
#writer
command | while read -r line; do
printf '%s\n' "$line" | tee listeners/*
done
#reader
mkfifo listeners/1
cat listeners/1
But that's consume a lot of CPU
So i though about writing to a file and cleaning it repeatedly
#writer
command >> file &
while true; do
: > file
sleep 1
done
#reader
tail -f -n0 file
But sometimes, a line is not read by one or more readers before truncation, making a race condition
Is there a better way on how i could implement this?

Sounds like pub/sub to me - see Wikipedia.
Basically, new interested parties come along whenever they like and "subscribe" to your channel. The process receiving the data then "publishes" it, line by line, to that channel.
You can do it with MQTT using mosquitto or with Redis. Both have command-line interfaces/bindings, as well as Python, C/C++, Ruby, PHP etc. Client and server need not be on same machine, some clients could be elsewhere on the network.
Mosquitto example here.
I did a few tests on my Mac with Redis pub/sub. The client code in Terminal to subscribe to a channel called myStream looks like this:
redis-cli SUBSCRIBE myStream
I then ran a process to synthesise 10,000 lines like this:
time seq 10000 | while read a ; do redis-cli PUBLISH myStream "$a" >/dev/null 2>&1 ; done
And that takes 40s, so it does around 250 lines per second, but it has to start a whole new process for each line and create and tear down the connection to Redis... and we don't want to send your CPU mad.
More appropriately for your situation then, here is how you can create a file with 100,000 lines, and read them one at a time, and send them to all your subscribers in Python:
# Make a "BigFile" with 100,000 lines
seq 100000 > BigFile
and read the lines and publish them with:
#!/usr/bin/env python3
import redis
if __name__ == '__main__':
# Redis connection
r = redis.Redis(host='localhost', port=6379, db=0)
# Read file line by line...
with open('BigFile', 'r') as infile:
for line in infile:
# Publish the current line to subscribers
r.publish('myStream', line)
The entire 100,000 lines were sent and received in 4s, so 25,000 lines per second. Here is a little recording of it in action. At the top you can see the CPU is not unduly troubled by it. The second window from the top is a client, receiving 100,000 lines and the next window down is a second client. The bottom window shows the server running the Python code above and sending all 100,000 lines in 4s.
Keywords: Redis, mosquitto, pub/sub, publish, subscribe.

Related

bash asynchronous variable setting (dns lookup)

Let's say we had a loop that we want to have run as quickly as possible. Let's say something was being done to a list of hosts inside that loop; just for the sake of argument, let's say it was a redis query. Let's say that the list of hosts may change occasionally due to hosts being added/removed from a pool (not load balanced); however, the list is predictable (e.g., they all start with “foo” and end with 2 digits. So we want to run this occasionally; say, once every 15 minutes:
listOfHosts=$(dig +noall +ans foo{00..99}.domain | while read -r n rest; do printf '%s\n' ${n%.}; done)
to get the list of hosts. Let's say our loop looked something like this:
while :; do
for i in $listOfHosts; do
redis-cli -h $i llen something
done
(( ( $(date +%s) % 60 * 15) == 0 )) && callFunctionThatSetslistOfHosts
done
(now obviously there's some things missing, like testing to see if we've already run callFunctionThatSetslistOfHosts in the current minute and only running it once, and doing something with the redis output, and maybe the list of hosts should be an array, but basically this is it.)
How can we run callFunctionThatSetslistOfHosts asynchronously so that it doesn't slow down the loop. I.e., have it running in the background setting listOfHosts occasionally (e.g. once every 15 minutes), so that the next time the inner loop is run it gets a potentially different set of hosts to run the redis query on?
My major problem seems to be that in order to set listOfHosts in a loop, that loop has to be a subshell, and listOfHosts is local to that subshell, and setting it doesn't affect the global listOfHosts.
I may resort to pipes, but will have to poll the reader before generating a new list — not that that's terribly bad if I poll slowly, but I thought I'd present this as a problem.
Thanks.

ruby read lines which updates latest only from file

There is a script which updates the output to a file and planning to setup monitoring of that process and send the latest output information to central server.
For example, the monitoring script will look for the output file for every minute and it has to read only whatever the latest data updated on the output file and send it to central server. Means it should read the old data whatever it read already. Is there any way to make this happen with ruby version >= 1.8.7
Here is my scenario,
the script test.sh is running which writes output to file called /tmp/script_output.txt
I have another monitoring script jobmonitor.rb which is triggered by test.sh and is used to monitor the job status and sends information to central server. Right now it reads the entire contents of script_output.txt file and sending the details and it has to send the information for every 2 mins for live monitoring.
Some cases the output file script_output.txt will have more than 500 lines, so every 2 mins the monitoring script sending the entire content which includes 2 mins old content as well.
So I am looking way to read the output file content for every 2 mins with the updated content only.
For example:
at present the script_output.txt file has contents,
line1
line2
.
.
.
line10
and the monitoring script sent these information.
Now after 2 mins the output file has below contents,
line1
line2
.
.
.
line10
line11
.
.
.
line 20
Now the monitoring script capture all 20 lines and sends the information, instead I need to send the output contents from line11-20 only.
Update:
I tried with below,
open(filepath, 'r') do |f|
while true
puts "#{f.readline()}"
sleep 60
end
end
it reads one line only but I want to read the n number of lines which was updated in 120secs or from next line of last read to till now.
Note : I was raised this question already on https://stackoverflow.com/questions/43446881/ruby-read-lines-from-file-only-latest-updates but I missed to update detailed information initially and so the question was closed. Sorry for that.

How to resume reading a file?

I'm trying to find the best and most efficient way to resume reading a file from a given point.
The given file is being written frequently (this is a log file).
This file is rotated on a daily basis.
In the log file I'm looking for a pattern 'slow transaction'. End of such lines have a number into parentheses. I want to have the sum of the numbers.
Example of log line:
Jun 24 2015 10:00:00 slow transaction (5)
Jun 24 2015 10:00:06 slow transaction (1)
This is easy part that I could do with awk command to get total of 6 with above example.
Now my challenge is that I want to get the values from this file on a regular basis. I've an external system that polls a custom OID using SNMP. When hitting this OID the Linux host runs a couple of basic commands.
I want this SNMP polling event to get the number of events since the last polling only. I don't want to have the total every time, just the total of the newly added lines.
Just to mention that only bash can be used, or basic commands such as awk sed tail etc. No perl or advanced programming language.
I hope my description will be clear enough. Apologizes if this is duplicate. I did some researches before posting but did not find something that precisely correspond to my need.
Thank you for any assistance
In addition to the methods in the comment link, you can also simply use dd and stat to read the logfile size, save it and sleep 300 then check the logfile size again. If the filesize has changed, then skip over the old information with dd and read the new information only.
Note: you can add a test to handle the case where the logfile is deleted and then restarted with 0 size (e.g. if $((newsize < size)) then read all.
Here is a short example with 5 minute intervals:
#!/bin/bash
lfn=${1:-/path/to/logfile}
size=$(stat -c "%s" "$lfn") ## save original log size
while :; do
newsize=$(stat -c "%s" "$lfn") ## get new log size
if ((size != newsize)); then ## if change, use new info
## use dd to skip over existing text to new text
newtext=$(dd if="$lfn" bs="$size" skip=1 2>/dev/null)
## process newtext however you need
printf "\nnewtext:\n\n%s\n" "$newtext"
size=$((newsize)); ## update size to newsize
fi
sleep 300
done

Errors in UDP sending in a sub-script (bash)

Using a Raspi/Debian - I have a script that parses the results from an iwlist scan and sends them via UDP to a Pure Data patch. This runs fine in gui mode, but now I'm trying to automate the whole process in another script with the following:
pd-extended -nogui /home/pi/patch.pd & /home/pi/libOSC/scan.sh && fg
But when I run this new script, the UDP appears to only send the info to Pure Data once, and then the scanning continues but Pd does not receive the packet. Any help with this would be appreciated.
What happens when you run /home/pi/libOSC/scan.sh? It sends the results only once? Then maybe you need to do it differently, like calling that script from within pd using the 'shell' or 'popen' objects for instance. Or you implement a polling command via UDP that will return the values.
how does your scan.sh script look like?
you probably want to make it something like:
pdhost=localhost
pdport=9999
do_scan() {
## some code here that does the scan and print's the result to stdout
}
doscan | while read line
do
echo "${line};" | pdsend ${pdhost} ${pdport}
done
rather than the following:
doscan | pdsend ${pdhost} ${pdport}

"cat" bit-stream with no EOF

I have a file opened both for reading and writing and associated this file with file descriptor 3, i.e. exec 3<>/dev/udp/10.10.10.1/161. When I redirect a crafted UDP packet to file descriptor 3 and receive a reply, then how can I read it from file-descriptor 3? Usual tools like cat or read do not work well as UDP packet(essentially just a bit stream) received as a reply does not have a newline or EOF and for example cat does not know that there is no more data to expect. For example here you can see how I had to SIGINT the cat:
$ cat <&3
0Gpublic�:�0,0+C1841.local^C
$
I would like to check if there was any UDP data received from 10.10.10.1 and this means that if file-descriptor 3 contains some data(even a single bit), then reply was received.
Your problem is that you cannot recognize an end-of-packet properly. There is not EOF (as you noticed) signifier (like a special character or file-closed event or similar). Instead all you can do is either
read a fixed size of characters (in case your packets are fixed in size) or
read single tokens (maybe bytes) until your packet's syntax states that it is complete or
read until a timeout occurred.
The first two are up to your responsibility, in case this is possible.
The last one can be achieved using a cat in a subshell which you kill after a certain amount of time:
cat <&3 & pid=$!
sleep 0.1
kill "$pid" 2>/dev/null
Put this in a function and each call will last one 0.1s and output what could be read in that time.

Resources