"cat" bit-stream with no EOF - bash

I have a file opened both for reading and writing and associated this file with file descriptor 3, i.e. exec 3<>/dev/udp/10.10.10.1/161. When I redirect a crafted UDP packet to file descriptor 3 and receive a reply, then how can I read it from file-descriptor 3? Usual tools like cat or read do not work well as UDP packet(essentially just a bit stream) received as a reply does not have a newline or EOF and for example cat does not know that there is no more data to expect. For example here you can see how I had to SIGINT the cat:
$ cat <&3
0Gpublic�:�0,0+C1841.local^C
$
I would like to check if there was any UDP data received from 10.10.10.1 and this means that if file-descriptor 3 contains some data(even a single bit), then reply was received.

Your problem is that you cannot recognize an end-of-packet properly. There is not EOF (as you noticed) signifier (like a special character or file-closed event or similar). Instead all you can do is either
read a fixed size of characters (in case your packets are fixed in size) or
read single tokens (maybe bytes) until your packet's syntax states that it is complete or
read until a timeout occurred.
The first two are up to your responsibility, in case this is possible.
The last one can be achieved using a cat in a subshell which you kill after a certain amount of time:
cat <&3 & pid=$!
sleep 0.1
kill "$pid" 2>/dev/null
Put this in a function and each call will last one 0.1s and output what could be read in that time.

Related

Removing progress bar from program output redirected into log file

I was running a program, and it will output the progress bar. I did it like this
python train.py |& tee train.log
The train.log looks like the following.
This is line 1
Training ...
This is line 2
...
[000] valid: 100%|█████████████████████████████████████████████████████████████▉| 2630/2631 [15:24<00:00, 2.98 track/s]
[000] valid: 100%|██████████████████████████████████████████████████████████████| 2631/2631 [15:25<00:00, 3.02 track/s]
Epoch 000: train=0.11940351 valid=0.10640465 best=0.1064 duration=0.79 days
This is line 3
...
[001] valid: 100%|█████████████████████████████████████████████████████████████▉| 2629/2631 [15:11<00:00, 2.90
[001] valid: 100%|█████████████████████████████████████████████████████████████▉| 2630/2631 [15:11<00:00, 2.89
[001] valid: 100%|██████████████████████████████████████████████████████████████| 2631/2631 [15:12<00:00, 2.88
Epoch 001: train=0.10971066 valid=0.09931737 best=0.0993 duration=0.79 days
On the terminal, they are supposed to be viewed as replacing itself, hence in the log file, there are alot of repetitions. So when I did wc -l train.log, it only returned 3 lines. However when I opened this 5MB text file in the text editor, there are like 20000 lines.
My objective is to only get these details:
Epoch 000: train=0.11940351 valid=0.10640465 best=0.1064 duration=0.79 days
Epoch 001: train=0.10971066 valid=0.09931737 best=0.0993 duration=0.79 days
My questions are:
How do I, without stopping my current training progress, extract my desired details from the suppposedly "3" lines of train.log? Keep in mind that this training will be continuously done for 10 more epochs, so I don't want to open the whole junk of progress bar in the editor.
In the future, how should I store my log file (instead of calling python train.py |& tee train.log) such that while I can see the progress bar in the terminal, I only keep the important information in a text file?
Edit 1 :
Here's a link to the file train.log
The progress bars are probably written to stderr, which you send to tee together with stdout by using |&.
To write only stdout to the file, use the normal pipe | instead.
The progress bar was generated by writing one line and then a carriage return character (\r) but no newline character (\n). To fix that and to be able to process the file further, you can use for example sed 's/\r/\n/g'.
The following works with the file linked in the question:
$ sed 's/\r/\n/g' train.log | grep Epoch
Epoch 000: train=0.11940351 valid=0.10640465 best=0.1064 duration=0.79 days
Ok, I solved it already.
According to this question,
You make a progress bar by doing echo -ne "your text \r" > log.file.
So because some editor that i used (Notepad, sublime text 3) recognise \r as a line breaker, you see them as seperate line, but in actual fact they are stored in single line.
So to reverse engineer it, you can make them into actual line breakers sed -i "s,\r,\n,g" train.log, and the grep accoringly.
Anyhoo, thanks #mkrieger1 for helping me out anyway !

Efficient way of sending the same data to multiple dynamic processes

I have a stream of line-buffered data, and many readers from other processes
The readers need to attach to the system dynamically, they are not known to the process writing the stream
First i tried to read every line and simply send them to a lot of pipes
#writer
command | while read -r line; do
printf '%s\n' "$line" | tee listeners/*
done
#reader
mkfifo listeners/1
cat listeners/1
But that's consume a lot of CPU
So i though about writing to a file and cleaning it repeatedly
#writer
command >> file &
while true; do
: > file
sleep 1
done
#reader
tail -f -n0 file
But sometimes, a line is not read by one or more readers before truncation, making a race condition
Is there a better way on how i could implement this?
Sounds like pub/sub to me - see Wikipedia.
Basically, new interested parties come along whenever they like and "subscribe" to your channel. The process receiving the data then "publishes" it, line by line, to that channel.
You can do it with MQTT using mosquitto or with Redis. Both have command-line interfaces/bindings, as well as Python, C/C++, Ruby, PHP etc. Client and server need not be on same machine, some clients could be elsewhere on the network.
Mosquitto example here.
I did a few tests on my Mac with Redis pub/sub. The client code in Terminal to subscribe to a channel called myStream looks like this:
redis-cli SUBSCRIBE myStream
I then ran a process to synthesise 10,000 lines like this:
time seq 10000 | while read a ; do redis-cli PUBLISH myStream "$a" >/dev/null 2>&1 ; done
And that takes 40s, so it does around 250 lines per second, but it has to start a whole new process for each line and create and tear down the connection to Redis... and we don't want to send your CPU mad.
More appropriately for your situation then, here is how you can create a file with 100,000 lines, and read them one at a time, and send them to all your subscribers in Python:
# Make a "BigFile" with 100,000 lines
seq 100000 > BigFile
and read the lines and publish them with:
#!/usr/bin/env python3
import redis
if __name__ == '__main__':
# Redis connection
r = redis.Redis(host='localhost', port=6379, db=0)
# Read file line by line...
with open('BigFile', 'r') as infile:
for line in infile:
# Publish the current line to subscribers
r.publish('myStream', line)
The entire 100,000 lines were sent and received in 4s, so 25,000 lines per second. Here is a little recording of it in action. At the top you can see the CPU is not unduly troubled by it. The second window from the top is a client, receiving 100,000 lines and the next window down is a second client. The bottom window shows the server running the Python code above and sending all 100,000 lines in 4s.
Keywords: Redis, mosquitto, pub/sub, publish, subscribe.

How to resume reading a file?

I'm trying to find the best and most efficient way to resume reading a file from a given point.
The given file is being written frequently (this is a log file).
This file is rotated on a daily basis.
In the log file I'm looking for a pattern 'slow transaction'. End of such lines have a number into parentheses. I want to have the sum of the numbers.
Example of log line:
Jun 24 2015 10:00:00 slow transaction (5)
Jun 24 2015 10:00:06 slow transaction (1)
This is easy part that I could do with awk command to get total of 6 with above example.
Now my challenge is that I want to get the values from this file on a regular basis. I've an external system that polls a custom OID using SNMP. When hitting this OID the Linux host runs a couple of basic commands.
I want this SNMP polling event to get the number of events since the last polling only. I don't want to have the total every time, just the total of the newly added lines.
Just to mention that only bash can be used, or basic commands such as awk sed tail etc. No perl or advanced programming language.
I hope my description will be clear enough. Apologizes if this is duplicate. I did some researches before posting but did not find something that precisely correspond to my need.
Thank you for any assistance
In addition to the methods in the comment link, you can also simply use dd and stat to read the logfile size, save it and sleep 300 then check the logfile size again. If the filesize has changed, then skip over the old information with dd and read the new information only.
Note: you can add a test to handle the case where the logfile is deleted and then restarted with 0 size (e.g. if $((newsize < size)) then read all.
Here is a short example with 5 minute intervals:
#!/bin/bash
lfn=${1:-/path/to/logfile}
size=$(stat -c "%s" "$lfn") ## save original log size
while :; do
newsize=$(stat -c "%s" "$lfn") ## get new log size
if ((size != newsize)); then ## if change, use new info
## use dd to skip over existing text to new text
newtext=$(dd if="$lfn" bs="$size" skip=1 2>/dev/null)
## process newtext however you need
printf "\nnewtext:\n\n%s\n" "$newtext"
size=$((newsize)); ## update size to newsize
fi
sleep 300
done

Read fails after tcpreplay with error: 0: Resource temporarily unavailabl

I have a very simple script to run. It calls tcpreplay and then ask the user to type in something. Then the read will fail with read: read error: 0: Resource temporarily unavailable.
Here is the code
#!/bin/bash
tcpreplay -ieth4 SMTP.pcap
echo TEST
read HANDLE
echo $HANDLE
And the output is
[root#vse1 quick_test]# ./test.sh
sending out eth4
processing file: SMTP.pcap
Actual: 28 packets (4380 bytes) sent in 0.53 seconds. Rated: 8264.2 bps, 0.06 Mbps, 52.83 pps
Statistics for network device: eth4
Attempted packets: 28
Successful packets: 28
Failed packets: 0
Retried packets (ENOBUFS): 0
Retried packets (EAGAIN): 0
TEST
./test.sh: line 6: read: read error: 0: Resource temporarily unavailable
[root#vse1 quick_test]#
I am wondering if I need to close or clear up any handles or pipes after I run tcpreplay?
Apparently tcpreplay sets O_NONBLOCK on stdin and then doesn't remove it. I'd say it's a bug in tcpreplay. To work it around you can run tcpreplay with stdin redirected from /dev/null. Like this:
tcpreplay -i eth4 SMTP.pcap </dev/null
Addition: note that this tcpreplay behavior breaks non-interactive shells only.
Another addition: alternatively, if you really need tcpreplay to receive your
input you can write a short program which resets O_NONBLOCK. Like this one
(reset-nonblock.c):
#include <stdio.h>
#include <unistd.h>
#include <fcntl.h>
int
main()
{
if (fcntl(STDIN_FILENO, F_SETFL,
fcntl(STDIN_FILENO, F_GETFL) & ~O_NONBLOCK) < 0) {
perror(NULL);
return 1;
}
return 0;
}
Make it with "make reset-nonblock", then put it in your PATH and use like this:
tcpreplay -i eth4 SMTP.pcap
reset-nonblock
While the C solution works, you can turn off nonblocking input in one line from the command-line using Python. Personally, I alias it to "setblocking" since it is fairly handy.
$ python3 -c $'import os\nos.set_blocking(0, True)'
You can also have Python print the previous state so that it may be changed only temporarily:
$ o=$(python3 -c $'import os\nprint(os.get_blocking(0))\nos.set_blocking(0, True)')
$ somecommandthatreadsstdin
$ python3 -c $'import os\nos.set_blocking(0, '$o')'
Resource temporarily unavailable is EAGAIN (or EWOULDBLOCK) which is the error code for nonblocking file descriptor when no further data is available (would block if wasn't in nonblocking mode). The previous command (tcpreplay in this case) erroneously left STDIN in nonblocking mode. The shell will not correct it, and the following process isn't meant to work with non- default nonblocking STDIN.
In your script, you can also turn off nonblocking with:
perl -MFcntl -e 'fcntl STDIN, F_SETFL, fcntl(STDIN, F_GETFL, 0) & ~O_NONBLOCK'

Is echo atomic when writing single lines

I am currently trying to get a script to write output from other started commands correctly into a log file. The script will write it's own messages to the log file using echo and there is a method to which I can pipe the lines from the other program.
The main problem is, that the program which produces the output is started in the background, so my function that does the read may write concurently to the logfile. Could this be a problem? Echo always only writes a single line, so it should not be to hard to ensure atomicity. However I have looked in google and I have found no way to make sure it actually is atomic.
Here is the current script:
LOG_FILE=/path/to/logfile
write_log() {
echo "$(date +%Y%m%d%H%M%S);$1" >> ${LOG_FILE}
}
write_output() {
while read data; do
write_log "Message from SUB process: [ $data ]"
done
}
write_log "Script started"
# do some stuff
call_complicated_program 2>&1 | write_output &
SUB_PID=$!
#do some more stuff
write_log "Script exiting"
wait $SUB_PID
As you can see, the script might write both on it's own as well as because of redirected output. Could this cause havok in the file?
echo just a simple wrapper around write (this is a simplification; see edit below for the gory details), so to determine if echo is atomic, it's useful to look up write. From the single UNIX specification:
Atomic/non-atomic: A write is atomic if the whole amount written in one operation is not interleaved with data from any other process. This is useful when there are multiple writers sending data to a single reader. Applications need to know how large a write request can be expected to be performed atomically.This maximum is called {PIPE_BUF}. Thisvolume of IEEE Std 1003.1-2001 does not say whether write requests for more than {PIPE_BUF} bytes are atomic, but requires that writes of {PIPE_BUF}or fewer bytes shall be atomic.
You can check PIPE_BUF on your system with a simple C program. If you're just printing a single line of output, that is not ridiculously long, it should be atomic.
Here is a simple program to check the value of PIPE_BUF:
#include <limits.h>
#include <stdio.h>
int main(void) {
printf("%d\n", PIPE_BUF);
return 0;
}
On Mac OS X, that gives me 512 (the minimum allowed value for PIPE_BUF). On Linux, I get 4096. So if your lines are fairly long, make sure you check it on the system in question.
edit to add: I decided to check the implementation of echo in Bash, to confirm that it will print atomically. It turns out, echo uses putchar or printf depending on whether you use the -e option. These are buffered stdio operations, which means that they fill up a buffer, and actually write it out only when a newline is reached (in line-buffered mode), the buffer is filled (in block-buffered mode), or you explicitly flush the output with fflush. By default, a stream will be in line buffered mode if it is an interactive terminal, and block buffered mode if it is any other file. Bash never sets the buffering type, so for your log file, it should default to block buffering mode. At then end of the echo builtin, Bash calls fflush to flush the output stream. Thus, the output will always be flushed at the end of echo, but may be flushed earlier if it doesn't fit into the buffer.
The size of the buffer used may be BUFSIZ, though it may be different; BUFSIZ is the default size if you set the buffer explicitly using setbuf, but there's no portable way to determine the actual the size of your buffer. There are also no portable guidelines for what BUFSIZ is, but when I tested it on Mac OS X and Linux, it was twice the size of PIPE_BUF.
What does this all mean? Since the output of echo is all buffered, it won't actually call the write until the buffer is filled or fflush is called. At that point, the output should be written, and the atomicity guarantee I mentioned above should apply. If the stdout buffer size is larger than PIPE_BUF, then PIPE_BUF will be the smallest atomic unit that can be written out. If PIPE_BUF is larger than the stdout buffer size, then the stream will write the buffer out when the buffer fills up.
So, echo is only guaranteed to atomically write sequences shorter than the smaller of PIPE_BUF and the size of the stdout buffer, which is most likely BUFSIZ. On most systems, BUFSIZ is larger that PIPE_BUF.
tl;dr: echo will atomically output lines, as long as those lines are short enough. On modern systems, you're probably safe up to 512 bytes, but it's not possible to determine the limit portably.
There is no involuntary file locking, but the >> operator is safe, the > operator is unsafe. So your practice is safe to do.
I tried the approach from user:pizza and could not get it to work like the answer from user:Brian Campbell. Let me know if I am doing something work and I'll update the answer. (And yes this is an answer because I'm actually giving a complete working demo.)
basic concurrency
This just illustrates the problem
$ for n in {1..5}; do (curl -svo /dev/null example.com 2>&1 &) done | grep GET
> GET / HTTP/1.1
>> GET / HTTP/1.1
GET / HTTP/1.1
>>> GET / HTTP/1.1
>>GET / HTTP/1.1
using echo on each line of output
This solves the problem using Brian Campbell's method. (Note that the length of the line for which this works is limited.)
$ for n in {1..5}; do (curl -svo /dev/null example.com 2>&1 | while read; do echo "${REPLY}"; done &) done | grep GET
> GET / HTTP/1.1
> GET / HTTP/1.1
> GET / HTTP/1.1
> GET / HTTP/1.1
> GET / HTTP/1.1
redirecting the for loop to append to stdout
Instinct should tell you that this is not going to work because it redirects after all the the output of the forked curls have been merged.
$ for n in {1..5}; do (curl -svo /dev/null example.com 2>&1 &) done >> /dev/stdout | grep GET
> GET / HTTP/1.1
> GET / HTTP/1.1
>> >GET / HTTP/1.1
> GET / HTTP/1.1
GET / HTTP/1.1
redirecting each curl to append to stdout
I suspect this failure is do to the fact that the entire content of each curl is being redirected and the size is greater than what the kernel is willing to block for. I have not taken the time to confirm that, but what Brian Campbell did share seems to support it.
$ for n in {1..5}; do (curl -svo /dev/null example.com >>/dev/stdout 2>&1 &) done | grep GET
>> GET / HTTP/1.1
GET / HTTP/1.1
> GET / HTTP/1.1
GET / HTTP/1.1
> GET / HTTP/1.1

Resources