Is there a way to call synchronously asynchronous functions from bash script? - bash

I have some function in a bash script that involve sending http post requests. After I call them I store the results, and I check the disk usage.
How can I make sure that the du -sh I use run only after the http request is done?
I send a lot of http requests (1M), so checking everytime for the header code is a bit inefficient.
Does anyone here have an idea?
If I don't find anything elegant, is there a difference between:
#! /bin/bash
http_req_1 1000000
sleep X
Size=$(du -sh ./PATH)
http_req_2 1000001
...
Or:
#! bin/bash
function http_req {
local loop_num=1$
for i in $(eval echo "1...$loop_num")
do
curl ...
done
sleep X
SIZE=$(du -sh)
}
http_req 1000000
http_req 1000001
....
?
I use CentOS 7.
Thank you!

Related

bash how to wait for completion of forked processes that run in background

I wonder if i could achieve something like the following logic:
given a set of jobs to be done fold_num and a limit number of worker processes, say work_num, i hope to run work_num processes in parallel until all jobs fold_num are done. Finally, there is some other processing on the results of all these jobs. We can assume fold_num is always several times of work_num.
I haven't got the following snippet working so far, with tips from How to wait in bash for several subprocesses to finish and return exit code !=0 when any subprocess ends with code !=0?
#!/bin/bash
worker_num=5
fold_num=10
pids=""
result=0
for fold in $(seq 0 $(( $fold_num-1 ))); do
pids_idx=$(( $fold % ${worker_num} ))
echo "pids_idx=${pids_idx}, pids[${pids_idx}]=${pids[${pids_idx}]}"
wait ${pids[$pids_idx]} || let "result=1"
if [ "$result" == "1" ]; then
echo "some job is abnormal, aborting"
exit
fi
cmd="echo fold$fold" # use echo as an example, real command can be time-consuming to run
$cmd &
pids[${pids_idx}]="$!"
echo "pids=${pids[*]}"
done
# when the for-loop completes, do something else...
The output looks like:
pids_idx=0, pids[0]=
pids=5846
pids_idx=1, pids[1]=
fold0
pids=5846 5847
fold1
pids_idx=2, pids[2]=
pids=5846 5847 5848
fold2
pids_idx=3, pids[3]=
pids=5846 5847 5848 5849
fold3
pids_idx=4, pids[4]=
pids=5846 5847 5848 5849 5850
pids_idx=0, pids[0]=5846
fold4
./test_wait.sh: line 12: wait: pid 5846 is not a child of this shell
some job is abnormal, aborting
Question:
1. Seems the pids array has recorded correct process IDs, but failed to be 'wait' for. Any ideas how to fix this?
2. Do we need to use wait after the for-loop? if so, what to do after the for-loop?
alright, I guess I got a working solution with tips from folks on 'parallel'.
export worker_names=("foo", "bar")
export worker_num=${#worker_names[#]}
function some_computation {
fold=$1
cmd="..." #involves worker_names and fold
echo $cmd; $cmd
}
export -f some_computation # important, to make this function visible to subprocesses
for fold in $(seq 0 $(( $fold_num-1 ))); do
sem -j $worker_num some_computation $fold
done
sem --wait # wait for all jobs to complete
# do something below
Couple of things here:
I haven't got parallel working because of the post-computation processing i need to do after those parallel jobs. The parallel version i tried failed to wait for job completion. So i used GNU sem which stands for semaphore.
exporting variables is crucial for the computation function to access to in this situation. Otherwise those global variables are invisible.
exporting the computation function is also necessary for the same reason. Notice the -f option.
sem --wait perfectly fulfills the needs to wait for parallel jobs.
HTH.

How to verify AB responses?

Is there a way to make sure that AB gets proper responses from server? For example:
To force it to output the response of a single request to STDOUT OR
To ask it to check that some text fragment is included into the response body
I want to make sure that authentication worked properly and i am measuring response time of the target page, not the login form.
Currently I just replace ab -n 100 -c 1 -C "$MY_COOKIE" $MY_REQUEST with curl -b "$MY_COOKIE" $MY_REQUEST | lynx -stdin .
If it's not possible, is there an alternative more comprehensive tool that can do that?
You can use the -v option as listed in the man doc:
-v verbosity
Set verbosity level - 4 and above prints information on headers, 3 and above prints response codes (404, 200, etc.), 2 and above prints warnings and info.
https://httpd.apache.org/docs/2.4/programs/ab.html
So it would be:
ab -n 100 -c 1 -C "$MY_COOKIE" -v 4 $MY_REQUEST
This will spit out the response headers and HTML content. The 3 value will be enough to check for a redirect header.
I didn't try piping it to Lynx but grep worked fine.
Apache Benchmark is good for a cursory glance at your system but is not very sophisticated. I am currently attempting to tune a web service and am finding that AB does not measure complete response time when considering the transfer of the body. Also as you mention you can not verify what is returned.
My current recommendation is Apache JMeter. http://jmeter.apache.org/
I am having much better success with it. You may find the Response Assertion useful for your situation. http://jmeter.apache.org/usermanual/component_reference.html#Response_Assertion

pexpect responses includes echos of requests with <cr>

I have used pexpect and sendline before, but this time I am running a longer command with pipes and wild card, see below:
commandToRun='/bin/bash -c "/var/scripts/testscripts//extract -o | tail -3"'
returnedString = sendLine(commandToRun)
my class which has the sendLine function looks pretty much like this:
self.connection = pexpect.spawn('%s %s' % (protocol, host))
self.connection.setecho(False)
self.connection.setwinsize(300, 300)
But when I was running the code, I saw that the returnedString not only includes the response it also includes the request as well.
So if I print returnedString, it look like this:
bin/bash -c "/var/scripts/testscripts//extract -o | tail -3"<cr>
100<cr>
102<cr>
103<cr>
Why does the response includes the request in the same buffer?
I have already set setecho(False) and it does not help!
EDIT: (correct fix) I have to manually remove all from the response and remove the request as well. so setecho(False) still does nothing!
I found a solution to this myself. (turn off echo in response)
commandToRun = 'bash -c "less /readfile | tail -4"'
yourConnection.sendLine("stty -echo")
commandResult = yourConnection.sendLine(commandToRun)
self.sendLine("stty echo")
So basically, run you command in a shell using 'bash -c' and then turn of echo in the bash.

Is echo atomic when writing single lines

I am currently trying to get a script to write output from other started commands correctly into a log file. The script will write it's own messages to the log file using echo and there is a method to which I can pipe the lines from the other program.
The main problem is, that the program which produces the output is started in the background, so my function that does the read may write concurently to the logfile. Could this be a problem? Echo always only writes a single line, so it should not be to hard to ensure atomicity. However I have looked in google and I have found no way to make sure it actually is atomic.
Here is the current script:
LOG_FILE=/path/to/logfile
write_log() {
echo "$(date +%Y%m%d%H%M%S);$1" >> ${LOG_FILE}
}
write_output() {
while read data; do
write_log "Message from SUB process: [ $data ]"
done
}
write_log "Script started"
# do some stuff
call_complicated_program 2>&1 | write_output &
SUB_PID=$!
#do some more stuff
write_log "Script exiting"
wait $SUB_PID
As you can see, the script might write both on it's own as well as because of redirected output. Could this cause havok in the file?
echo just a simple wrapper around write (this is a simplification; see edit below for the gory details), so to determine if echo is atomic, it's useful to look up write. From the single UNIX specification:
Atomic/non-atomic: A write is atomic if the whole amount written in one operation is not interleaved with data from any other process. This is useful when there are multiple writers sending data to a single reader. Applications need to know how large a write request can be expected to be performed atomically.This maximum is called {PIPE_BUF}. Thisvolume of IEEE Std 1003.1-2001 does not say whether write requests for more than {PIPE_BUF} bytes are atomic, but requires that writes of {PIPE_BUF}or fewer bytes shall be atomic.
You can check PIPE_BUF on your system with a simple C program. If you're just printing a single line of output, that is not ridiculously long, it should be atomic.
Here is a simple program to check the value of PIPE_BUF:
#include <limits.h>
#include <stdio.h>
int main(void) {
printf("%d\n", PIPE_BUF);
return 0;
}
On Mac OS X, that gives me 512 (the minimum allowed value for PIPE_BUF). On Linux, I get 4096. So if your lines are fairly long, make sure you check it on the system in question.
edit to add: I decided to check the implementation of echo in Bash, to confirm that it will print atomically. It turns out, echo uses putchar or printf depending on whether you use the -e option. These are buffered stdio operations, which means that they fill up a buffer, and actually write it out only when a newline is reached (in line-buffered mode), the buffer is filled (in block-buffered mode), or you explicitly flush the output with fflush. By default, a stream will be in line buffered mode if it is an interactive terminal, and block buffered mode if it is any other file. Bash never sets the buffering type, so for your log file, it should default to block buffering mode. At then end of the echo builtin, Bash calls fflush to flush the output stream. Thus, the output will always be flushed at the end of echo, but may be flushed earlier if it doesn't fit into the buffer.
The size of the buffer used may be BUFSIZ, though it may be different; BUFSIZ is the default size if you set the buffer explicitly using setbuf, but there's no portable way to determine the actual the size of your buffer. There are also no portable guidelines for what BUFSIZ is, but when I tested it on Mac OS X and Linux, it was twice the size of PIPE_BUF.
What does this all mean? Since the output of echo is all buffered, it won't actually call the write until the buffer is filled or fflush is called. At that point, the output should be written, and the atomicity guarantee I mentioned above should apply. If the stdout buffer size is larger than PIPE_BUF, then PIPE_BUF will be the smallest atomic unit that can be written out. If PIPE_BUF is larger than the stdout buffer size, then the stream will write the buffer out when the buffer fills up.
So, echo is only guaranteed to atomically write sequences shorter than the smaller of PIPE_BUF and the size of the stdout buffer, which is most likely BUFSIZ. On most systems, BUFSIZ is larger that PIPE_BUF.
tl;dr: echo will atomically output lines, as long as those lines are short enough. On modern systems, you're probably safe up to 512 bytes, but it's not possible to determine the limit portably.
There is no involuntary file locking, but the >> operator is safe, the > operator is unsafe. So your practice is safe to do.
I tried the approach from user:pizza and could not get it to work like the answer from user:Brian Campbell. Let me know if I am doing something work and I'll update the answer. (And yes this is an answer because I'm actually giving a complete working demo.)
basic concurrency
This just illustrates the problem
$ for n in {1..5}; do (curl -svo /dev/null example.com 2>&1 &) done | grep GET
> GET / HTTP/1.1
>> GET / HTTP/1.1
GET / HTTP/1.1
>>> GET / HTTP/1.1
>>GET / HTTP/1.1
using echo on each line of output
This solves the problem using Brian Campbell's method. (Note that the length of the line for which this works is limited.)
$ for n in {1..5}; do (curl -svo /dev/null example.com 2>&1 | while read; do echo "${REPLY}"; done &) done | grep GET
> GET / HTTP/1.1
> GET / HTTP/1.1
> GET / HTTP/1.1
> GET / HTTP/1.1
> GET / HTTP/1.1
redirecting the for loop to append to stdout
Instinct should tell you that this is not going to work because it redirects after all the the output of the forked curls have been merged.
$ for n in {1..5}; do (curl -svo /dev/null example.com 2>&1 &) done >> /dev/stdout | grep GET
> GET / HTTP/1.1
> GET / HTTP/1.1
>> >GET / HTTP/1.1
> GET / HTTP/1.1
GET / HTTP/1.1
redirecting each curl to append to stdout
I suspect this failure is do to the fact that the entire content of each curl is being redirected and the size is greater than what the kernel is willing to block for. I have not taken the time to confirm that, but what Brian Campbell did share seems to support it.
$ for n in {1..5}; do (curl -svo /dev/null example.com >>/dev/stdout 2>&1 &) done | grep GET
>> GET / HTTP/1.1
GET / HTTP/1.1
> GET / HTTP/1.1
GET / HTTP/1.1
> GET / HTTP/1.1

Changing POST data used by Apache Bench per iteration

I'm using ab to do some load testing, and it's important that the supplied querystring (or POST) parameters change between requests.
I.e. I need to make requests to URLs like:
http://127.0.0.1:9080/meth?param=0
http://127.0.0.1:9080/meth?param=1
http://127.0.0.1:9080/meth?param=2
...
to properly exercise the application.
ab seems to only read the supplied POST data file once, at startup, so changing its content during the test run is not an option.
Any suggestions?
You're going to need to use a more full-featured benchmarking tool like jMeter for this.
Add my recommendation for jMeter...it works very well!
You could also create a script that creates a second script with something like:
ab -n 1 -c 1 'http://yoursever.com/method?param=0' &
ab -n 1 -c 1 'http://yoursever.com/method?param=1' &
ab -n 1 -c 1 'http://yoursever.com/method?param=2' &
ab -n 1 -c 1 'http://yoursever.com/method?param=3' &
ab -n 1 -c 1 'http://yoursever.com/method?param=4' &
But that's only really useful if you're trying to simulate load and observe your server. The actual benchmarks will have to be collated if you want to check ab performance. At that point I'd just use jMeter. For my use, I just need to simulate load and the ab processes are light enough that running 100 like this is no problem.
Here is patched version of ab or patch:
http://www.andboson.com/?p=1372
this version is included that patch http://chrismiles.info/dev/testing/ab
also can read many post-data line by line
upd:
sample request:
./ab -v1 -n2 -c1 -T'application/json' -ppostfile http://api.webhookinbox.com/i/HX6mC1WS/in/
postfile content:
{"data1":1, "data2":"4"}
{"data0":0, "x":"y"}
upd2:
also alternative
https://github.com/andboson/ab-go

Resources