Tail a log file and send rows to curl in 100 line batches - bash

I have a bash script that looks like this:
tail -f -n +1 my.log | \
awk -f influx.awk | \
xargs \
-I '{}' \
curl \
-XPOST 'http://influxdb/write?db=foo' \
--data-binary '{}'
What can I change so that instead of creating a curl request for each row, it would batch them up into say 100 rows (see influx curl docs)?
The problem I'm having is that each InfluxDB "point" needs to be separated by a new line, which is also the delimiter for xargs e.g. adding -L 100 to xargs doesn't work.
Bonus: how would I also make this terminate if no new lines has been added to the file after say 10s?

Rather than xargs, you want to use split, with its --filter option. For example, the following batches lines into groups of two:
$ seq 5 | split -l 2 --filter='echo begin; cat; echo end'
begin
1
2
end
begin
3
4
end
begin
5
end
In your case, you could try something like
tail -f -n +1 my.log | \
awk -f influx.awk | \
split -l 100 --filter='\
curl \
-XPOST "http://influxdb/write?db=foo" \
--data-binary #-'
The #- makes curl read data from standard input.

Related

How to process continuous stream output with grep utility?

I have a requirement where my curl command is receiving continuous output from a streaming HTTP service. The stream never ends. I want to just grep a string from the stream and pass/pipe this command output to another utility such as xargs and say, echo for an example, for further continuous processing.
This is the output of the continuous stream which I shall stop receiving only when I end running the curl command.
curl -X "POST" "http://localhost:8088/query" --header "Content-Type: application/json" -d $'{"ksql": "select * from SENSOR_S EMIT CHANGES;","streamsProperties": {"ksql.streams.auto.offset.reset": "earliest"}}' -s -N
[{"header":{"queryId":"none","schema":"`ROWTIME` BIGINT, `ROWKEY` STRING, `SENSOR_ID` STRING, `TEMP` BIGINT, `HUM` BIGINT"}},
{"row":{"columns":[1599624891102,"S2","S2",40,20]}},
{"row":{"columns":[1599624891113,"S1","S1",90,80]}},
{"row":{"columns":[1599624909117,"S2","S2",40,20]}},
{"row":{"columns":[1599624909125,"S1","S1",90,80]}},
{"row":{"columns":[1599625090320,"S2","S2",40,20]}},
Now when I pipe the output to grep, it works as expected and I keep receiving any new events.
curl -X "POST" "http://localhost:8088/query" --header "Content-Type: application/json" -d $'{"ksql": "select * from SENSOR_S EMIT CHANGES;","streamsProperties": {"ksql.streams.auto.offset.reset": "earliest"}}' -s -N | grep S1
{"row":{"columns":[1599624891113,"S1","S1",90,80]}},
{"row":{"columns":[1599624909125,"S1","S1",90,80]}},
But when I pipe this grep output to xargs and echo, the output just don't move at all.
curl -X "POST" "http://localhost:8088/query" --header "Content-Type: application/json" -d $'{"ksql": "select * from SENSOR_S EMIT CHANGES;","streamsProperties": {"ksql.streams.auto.offset.reset": "earliest"}}' -s -N | grep S1 | xargs -I {} echo {}
^C
When I remove grep from the middle, it works as expected.
curl -X "POST" "http://localhost:8088/query" --header "Content-Type: application/json" -d $'{"ksql": "select * from SENSOR_S EMIT CHANGES;","streamsProperties": {"ksql.streams.auto.offset.reset": "earliest"}}' -s -N | xargs -I {} echo {}
[{header:{queryId:none,schema:`ROWTIME` BIGINT, `ROWKEY` STRING, `SENSOR_ID` STRING, `TEMP` BIGINT, `HUM` BIGINT}},
{row:{columns:[1599624891102,S2,S2,40,20]}},
{row:{columns:[1599624891113,S1,S1,90,80]}},
{row:{columns:[1599624909117,S2,S2,40,20]}},
{row:{columns:[1599624909125,S1,S1,90,80]}},
{row:{columns:[1599625090320,S2,S2,40,20]}},
Looks like grep is looking for the input to end before it can pipe it further. When I tested the same thing with a finite input, it works as expected.
ls | grep sh | xargs -I {} echo {};
abcd.sh
123.sh
pqr.sh
xyz.sh
So, the questions are: Is my understanding correct? Is there a way where grep can keep passing the output to subsequent commands in real time? I want to keep some basic filtering logic out of the further scripting, hence wanting grep to work.
Thanks in Advance !
Anurag
As suggested by #larsks , " --line-buffered flush output on every line" option for grep is working fine when is test for similar requirement as yous .
So the command would be
curl -X "POST" "http://localhost:8088/query" --header "Content-Type: application/json" -d $'{"ksql": "select * from SENSOR_S EMIT CHANGES;","streamsProperties": {"ksql.streams.auto.offset.reset": "earliest"}}' -s -N | grep S1 --line-buffered | xargs -I {} echo {}
I tested on "/var/log/messages" file which gets continously updated as following :
[root#project1-master ~]# tail -f /var/log/messages | grep journal --line-buffered | xargs -I {} echo {}
Sep 11 11:15:47 project1-master journal: I0911 15:15:47.448254 1 node_lifecycle_controller.go:1429] Initializing eviction metric for zone:
Sep 11 11:15:52 project1-master journal: I0911 15:15:52.448704 1 node_lifecycle_controller.go:1429] Initializing eviction metric for zone:
Sep 11 11:15:54 project1-master journal: 2020-09-11 15:15:54.006 [INFO][46] felix/int_dataplane.go 1300: Applying dataplane updates

How do I detect a failed subprocess in a bash read statement?

In bash we can set an environment variable from a sequence of commands using read and a pipe to a subprocess. But I'm having trouble detecting errors in my processing in one edge case - a part of the subprocess pipeline producing some output before erroring.
A simplified example which takes an input file, looks for a line starting with "foo" and sets var to the first word on that line is:
set -e
set -o pipefail
set -o nounset
die() {
echo $1 > /dev/stderr
exit 1
}
read -r var rest < <( \
cat data.txt \
| grep foo \
|| die "PIPELINE" \
) || die "OUTER"
echo "var=$var"
Running this with data.txt like
blah
zap foo awesome
bang foo
will output
var=zap
Running this on a data.txt file that doesn't contain foo outputs (to stderr)
DEAD: PIPELINE
DEAD: OUTER
This is all as expected.
We can introduce another no-op stage like cat at the end of the process
...
read -r var rest < <( \
cat data.txt \
| grep foo \
| cat \
|| die "PIPELINE" \
) || die "OUTER"
...
and everything continues to work.
But if the additional stage is paste -s -d' ' and the input does not contain "foo" the output is
var=
DEAD: PIPELINE
Which seems to show that the pipeline errors, but read succeeds with an empty line. (It looks like paste -s -d' ' outputs a line of output even when its input is empty.)
Is there a simple way to detect this failure of the pipeline, and cause the main script to error out?
I guess I could check that the variable is not empty - but this is a simplified version - I'm actually using sed and paste to join multiple lines to set multiple variables, like
read -r v1 v2 v3 rest < <( \
cat data.txt \
| grep "^foo=" \
| sed -e 's/foo=//' \
| paste -s -d' ' \
|| die "PIPELINE"
) || die "OUTER"
You could use another grep to see if the output of paste contained something:
read -r var rest < <( \
cat data.txt \
| grep foo \
| paste -s -d' ' \
| grep . \
|| die "PIPELINE" \
) || die "OUTER"
In the end I went with two different solutions depending on the context.
The first was to pipe the results to a temporary file. This will process the entire file before performing the read, and thus any failures in the pipe will cause the script to fail.
cat data.txt \
| grep "^foo=" \
| sed -e 's/foo=//' \
| paste -s -d' ' \
> $TMP/result.txt
|| die "PIPELINE"
read -r var rest < $TMP/result.txt || die "OUTER"
The second was to just test that the variables were set. While this meant
there was a bunch of duplication that I wanted to avoid, it seemed the most bullet-proof solution.
read -r var rest < <( cat data.txt \
| grep "^foo=" \
| sed -e 's/foo=//' \
| paste -s -d' ' \
|| die "PIPELINE"
) || die "OUTER"
[ ! -z "$var" ] || die "VARIABLE NOT SET"

Rate Limiting the Number of cURL request given from a bash script [duplicate]

This question already has answers here:
Bash script processing limited number of commands in parallel
(4 answers)
Closed 6 years ago.
I wrote bash script to calculate the hashes of a files, when it's file path given and queries one among the file hash to get the result of that hash from a API service ( Parsing using jq ) .
#Scanner.sh
#!/bin/bash
for i in $(find $1 -type f);do
md5_checksum=$(md5sum $i|cut -d' ' -f 1)
sha1_checksum=$(sha1sum $i|cut -d' ' -f 1)
sha256_checksum=$(sha256sum $i|cut -d' ' -f 1)
json_result="$(curl --silent -H 'Authorization:Basic XXXX' 'https://XXXXX.com/api/databrowser/malware_presence/query/sha1/'$sha1_checksum'?format=json&extended=true'| jq -r '.rl.malware_presence.name,.rl.malware_presence.level' | awk -vORS=, '{print $1}' |sed 's/,$/\n/')"
echo "$md5_checksum,$sha1_checksum,$sha256_checksum,$json_result"
done
#Result :
c63a1576e4b416e6944a1c39dbdfc3b4,fd55dfdd9a432e34ce6c593cd09a48f457f7aab6,e2d1c1975c2406f60f1c0fe5255560c2cd06434e669933536a86296a2eb70020,Malware,5
Now, it's taking too much time to process one and get results for one file hash ( 10 Sec ). How can i send 5 request per second and get the results faster ?
Any suggestions please ?
You could put your code in a function and run it in the background, with something like this:
runCurl() {
md5_checksum=$(md5sum $1|cut -d' ' -f 1)
sha1_checksum=$(sha1sum $1|cut -d' ' -f 1)
sha256_checksum=$(sha256sum $1|cut -d' ' -f 1)
json_result="$(curl --silent -H 'Authorization:Basic XXXX' 'https://XXXXX.com/api/databrowser/malware_presence/query/sha1/'$sha1_checksum'?format=json&extended=true'| jq -r '.rl.malware_presence.name,.rl.malware_presence.level' | awk -vORS=, '{print $1}' |sed 's/,$/\n/')"
echo "$md5_checksum,$sha1_checksum,$sha256_checksum,$json_result"
}
for i in $(find $1 -type f);do
runCurl $i &
done

Need help escaping from awk quotations in bash script

I have an alias in my bashrc file that outputs current folder contents and system available storage, updated continuously by the watch function.
alias wtch='watch -n 0 -t "du -sch * -B 1000000 2>/dev/null | sort -h && df -h -B 1000000| head -2 | awk '{print \$4}'"'
The string worked fine until I put in the awk part. I know I need to escape the single quotation marks, while still staying in the double quotation marks and the $4 but I haven't been able to get it to work. What am I doing wrong?
This is the error I get
-bash: alias: $4}": not found
Since the quoting for the alias is making it tough, you could just make it a function instead:
wtch() {
watch -n 0 -t "du -sch * -B 1000000 2>/dev/null | sort -h && df -h -B 1000000| head -2 | awk '{print $4}'"
}
This is a lot like issue 2 in the BashFAQ/050
Also, a minor thing but you can skip the head process at the end and just have awk do it, even exiting after the second row like
wtch() {
watch -n 0 -t "du -sch * -B 1000000 2>/dev/null | sort -h && df -h -B 1000000| awk '{print $4} NR >= 3 {exit}'"
}
In this case you can use cut instead of awk. And you'll have the same effect.
alias wtch="watch -n 0 -t 'du -sch * -B 1000000 2>/dev/null | sort -h && df -h -B 1000000| head -2 | cut -d\ -f4'"
Explaining cut:
-d option defines a delimiter
-d\ means that my delimiter is space
-f selects a column
-f4 gives you the fourth column

BASH:How do i make output like in watch command

My linux 'watch' command is quite old and doesn't support '--color' option. How can I have same output like it does? because in my script the loop gives output one after another(of course). But i need it to replace the previous. Is there any tricks with terminal output?
#!/bin/bash
while true
do
/usr/sbin/asterisk -rx "show queue My_Compain" \
| grep Agent \
| grep -v \(Unavailable\) \
| sort -t"(" -k 2 \
| GREP_COLOR='01;31' egrep -i --color=always '^.*[0-9] \(Not in use.*$|$' \
| GREP_COLOR='01;36' egrep -i --color=always '^.*\(Busy*$|$'
sleep 2
done
You can use clear to clear the screen before dumping your output to give the appearance of in-place updates.
To reduce blinking, you can use the age old technique of double buffering:
#!/bin/bash
while true
do
buffer=$(
clear
/usr/sbin/asterisk -rx "show queue My_Compain" \
| grep Agent \
| grep -v \(Unavailable\) \
| sort -t"(" -k 2 \
| GREP_COLOR='01;31' egrep -i --color=always '^.*[0-9] \(Not in use.*$|$' \
| GREP_COLOR='01;36' egrep -i --color=always '^.*\(Busy*$|$'
)
echo "$buffer"
sleep 2
done

Resources