I have a requirement where my curl command is receiving continuous output from a streaming HTTP service. The stream never ends. I want to just grep a string from the stream and pass/pipe this command output to another utility such as xargs and say, echo for an example, for further continuous processing.
This is the output of the continuous stream which I shall stop receiving only when I end running the curl command.
curl -X "POST" "http://localhost:8088/query" --header "Content-Type: application/json" -d $'{"ksql": "select * from SENSOR_S EMIT CHANGES;","streamsProperties": {"ksql.streams.auto.offset.reset": "earliest"}}' -s -N
[{"header":{"queryId":"none","schema":"`ROWTIME` BIGINT, `ROWKEY` STRING, `SENSOR_ID` STRING, `TEMP` BIGINT, `HUM` BIGINT"}},
{"row":{"columns":[1599624891102,"S2","S2",40,20]}},
{"row":{"columns":[1599624891113,"S1","S1",90,80]}},
{"row":{"columns":[1599624909117,"S2","S2",40,20]}},
{"row":{"columns":[1599624909125,"S1","S1",90,80]}},
{"row":{"columns":[1599625090320,"S2","S2",40,20]}},
Now when I pipe the output to grep, it works as expected and I keep receiving any new events.
curl -X "POST" "http://localhost:8088/query" --header "Content-Type: application/json" -d $'{"ksql": "select * from SENSOR_S EMIT CHANGES;","streamsProperties": {"ksql.streams.auto.offset.reset": "earliest"}}' -s -N | grep S1
{"row":{"columns":[1599624891113,"S1","S1",90,80]}},
{"row":{"columns":[1599624909125,"S1","S1",90,80]}},
But when I pipe this grep output to xargs and echo, the output just don't move at all.
curl -X "POST" "http://localhost:8088/query" --header "Content-Type: application/json" -d $'{"ksql": "select * from SENSOR_S EMIT CHANGES;","streamsProperties": {"ksql.streams.auto.offset.reset": "earliest"}}' -s -N | grep S1 | xargs -I {} echo {}
^C
When I remove grep from the middle, it works as expected.
curl -X "POST" "http://localhost:8088/query" --header "Content-Type: application/json" -d $'{"ksql": "select * from SENSOR_S EMIT CHANGES;","streamsProperties": {"ksql.streams.auto.offset.reset": "earliest"}}' -s -N | xargs -I {} echo {}
[{header:{queryId:none,schema:`ROWTIME` BIGINT, `ROWKEY` STRING, `SENSOR_ID` STRING, `TEMP` BIGINT, `HUM` BIGINT}},
{row:{columns:[1599624891102,S2,S2,40,20]}},
{row:{columns:[1599624891113,S1,S1,90,80]}},
{row:{columns:[1599624909117,S2,S2,40,20]}},
{row:{columns:[1599624909125,S1,S1,90,80]}},
{row:{columns:[1599625090320,S2,S2,40,20]}},
Looks like grep is looking for the input to end before it can pipe it further. When I tested the same thing with a finite input, it works as expected.
ls | grep sh | xargs -I {} echo {};
abcd.sh
123.sh
pqr.sh
xyz.sh
So, the questions are: Is my understanding correct? Is there a way where grep can keep passing the output to subsequent commands in real time? I want to keep some basic filtering logic out of the further scripting, hence wanting grep to work.
Thanks in Advance !
Anurag
As suggested by #larsks , " --line-buffered flush output on every line" option for grep is working fine when is test for similar requirement as yous .
So the command would be
curl -X "POST" "http://localhost:8088/query" --header "Content-Type: application/json" -d $'{"ksql": "select * from SENSOR_S EMIT CHANGES;","streamsProperties": {"ksql.streams.auto.offset.reset": "earliest"}}' -s -N | grep S1 --line-buffered | xargs -I {} echo {}
I tested on "/var/log/messages" file which gets continously updated as following :
[root#project1-master ~]# tail -f /var/log/messages | grep journal --line-buffered | xargs -I {} echo {}
Sep 11 11:15:47 project1-master journal: I0911 15:15:47.448254 1 node_lifecycle_controller.go:1429] Initializing eviction metric for zone:
Sep 11 11:15:52 project1-master journal: I0911 15:15:52.448704 1 node_lifecycle_controller.go:1429] Initializing eviction metric for zone:
Sep 11 11:15:54 project1-master journal: 2020-09-11 15:15:54.006 [INFO][46] felix/int_dataplane.go 1300: Applying dataplane updates
In bash we can set an environment variable from a sequence of commands using read and a pipe to a subprocess. But I'm having trouble detecting errors in my processing in one edge case - a part of the subprocess pipeline producing some output before erroring.
A simplified example which takes an input file, looks for a line starting with "foo" and sets var to the first word on that line is:
set -e
set -o pipefail
set -o nounset
die() {
echo $1 > /dev/stderr
exit 1
}
read -r var rest < <( \
cat data.txt \
| grep foo \
|| die "PIPELINE" \
) || die "OUTER"
echo "var=$var"
Running this with data.txt like
blah
zap foo awesome
bang foo
will output
var=zap
Running this on a data.txt file that doesn't contain foo outputs (to stderr)
DEAD: PIPELINE
DEAD: OUTER
This is all as expected.
We can introduce another no-op stage like cat at the end of the process
...
read -r var rest < <( \
cat data.txt \
| grep foo \
| cat \
|| die "PIPELINE" \
) || die "OUTER"
...
and everything continues to work.
But if the additional stage is paste -s -d' ' and the input does not contain "foo" the output is
var=
DEAD: PIPELINE
Which seems to show that the pipeline errors, but read succeeds with an empty line. (It looks like paste -s -d' ' outputs a line of output even when its input is empty.)
Is there a simple way to detect this failure of the pipeline, and cause the main script to error out?
I guess I could check that the variable is not empty - but this is a simplified version - I'm actually using sed and paste to join multiple lines to set multiple variables, like
read -r v1 v2 v3 rest < <( \
cat data.txt \
| grep "^foo=" \
| sed -e 's/foo=//' \
| paste -s -d' ' \
|| die "PIPELINE"
) || die "OUTER"
You could use another grep to see if the output of paste contained something:
read -r var rest < <( \
cat data.txt \
| grep foo \
| paste -s -d' ' \
| grep . \
|| die "PIPELINE" \
) || die "OUTER"
In the end I went with two different solutions depending on the context.
The first was to pipe the results to a temporary file. This will process the entire file before performing the read, and thus any failures in the pipe will cause the script to fail.
cat data.txt \
| grep "^foo=" \
| sed -e 's/foo=//' \
| paste -s -d' ' \
> $TMP/result.txt
|| die "PIPELINE"
read -r var rest < $TMP/result.txt || die "OUTER"
The second was to just test that the variables were set. While this meant
there was a bunch of duplication that I wanted to avoid, it seemed the most bullet-proof solution.
read -r var rest < <( cat data.txt \
| grep "^foo=" \
| sed -e 's/foo=//' \
| paste -s -d' ' \
|| die "PIPELINE"
) || die "OUTER"
[ ! -z "$var" ] || die "VARIABLE NOT SET"
This question already has answers here:
Bash script processing limited number of commands in parallel
(4 answers)
Closed 6 years ago.
I wrote bash script to calculate the hashes of a files, when it's file path given and queries one among the file hash to get the result of that hash from a API service ( Parsing using jq ) .
#Scanner.sh
#!/bin/bash
for i in $(find $1 -type f);do
md5_checksum=$(md5sum $i|cut -d' ' -f 1)
sha1_checksum=$(sha1sum $i|cut -d' ' -f 1)
sha256_checksum=$(sha256sum $i|cut -d' ' -f 1)
json_result="$(curl --silent -H 'Authorization:Basic XXXX' 'https://XXXXX.com/api/databrowser/malware_presence/query/sha1/'$sha1_checksum'?format=json&extended=true'| jq -r '.rl.malware_presence.name,.rl.malware_presence.level' | awk -vORS=, '{print $1}' |sed 's/,$/\n/')"
echo "$md5_checksum,$sha1_checksum,$sha256_checksum,$json_result"
done
#Result :
c63a1576e4b416e6944a1c39dbdfc3b4,fd55dfdd9a432e34ce6c593cd09a48f457f7aab6,e2d1c1975c2406f60f1c0fe5255560c2cd06434e669933536a86296a2eb70020,Malware,5
Now, it's taking too much time to process one and get results for one file hash ( 10 Sec ). How can i send 5 request per second and get the results faster ?
Any suggestions please ?
You could put your code in a function and run it in the background, with something like this:
runCurl() {
md5_checksum=$(md5sum $1|cut -d' ' -f 1)
sha1_checksum=$(sha1sum $1|cut -d' ' -f 1)
sha256_checksum=$(sha256sum $1|cut -d' ' -f 1)
json_result="$(curl --silent -H 'Authorization:Basic XXXX' 'https://XXXXX.com/api/databrowser/malware_presence/query/sha1/'$sha1_checksum'?format=json&extended=true'| jq -r '.rl.malware_presence.name,.rl.malware_presence.level' | awk -vORS=, '{print $1}' |sed 's/,$/\n/')"
echo "$md5_checksum,$sha1_checksum,$sha256_checksum,$json_result"
}
for i in $(find $1 -type f);do
runCurl $i &
done
I have an alias in my bashrc file that outputs current folder contents and system available storage, updated continuously by the watch function.
alias wtch='watch -n 0 -t "du -sch * -B 1000000 2>/dev/null | sort -h && df -h -B 1000000| head -2 | awk '{print \$4}'"'
The string worked fine until I put in the awk part. I know I need to escape the single quotation marks, while still staying in the double quotation marks and the $4 but I haven't been able to get it to work. What am I doing wrong?
This is the error I get
-bash: alias: $4}": not found
Since the quoting for the alias is making it tough, you could just make it a function instead:
wtch() {
watch -n 0 -t "du -sch * -B 1000000 2>/dev/null | sort -h && df -h -B 1000000| head -2 | awk '{print $4}'"
}
This is a lot like issue 2 in the BashFAQ/050
Also, a minor thing but you can skip the head process at the end and just have awk do it, even exiting after the second row like
wtch() {
watch -n 0 -t "du -sch * -B 1000000 2>/dev/null | sort -h && df -h -B 1000000| awk '{print $4} NR >= 3 {exit}'"
}
In this case you can use cut instead of awk. And you'll have the same effect.
alias wtch="watch -n 0 -t 'du -sch * -B 1000000 2>/dev/null | sort -h && df -h -B 1000000| head -2 | cut -d\ -f4'"
Explaining cut:
-d option defines a delimiter
-d\ means that my delimiter is space
-f selects a column
-f4 gives you the fourth column
My linux 'watch' command is quite old and doesn't support '--color' option. How can I have same output like it does? because in my script the loop gives output one after another(of course). But i need it to replace the previous. Is there any tricks with terminal output?
#!/bin/bash
while true
do
/usr/sbin/asterisk -rx "show queue My_Compain" \
| grep Agent \
| grep -v \(Unavailable\) \
| sort -t"(" -k 2 \
| GREP_COLOR='01;31' egrep -i --color=always '^.*[0-9] \(Not in use.*$|$' \
| GREP_COLOR='01;36' egrep -i --color=always '^.*\(Busy*$|$'
sleep 2
done
You can use clear to clear the screen before dumping your output to give the appearance of in-place updates.
To reduce blinking, you can use the age old technique of double buffering:
#!/bin/bash
while true
do
buffer=$(
clear
/usr/sbin/asterisk -rx "show queue My_Compain" \
| grep Agent \
| grep -v \(Unavailable\) \
| sort -t"(" -k 2 \
| GREP_COLOR='01;31' egrep -i --color=always '^.*[0-9] \(Not in use.*$|$' \
| GREP_COLOR='01;36' egrep -i --color=always '^.*\(Busy*$|$'
)
echo "$buffer"
sleep 2
done