Why does GNU parallel become less and less effective? - bash

I have a file containing 1 000 000 domain names and I'm currently launching the script testssl.sh (http://testssl.sh) on each domain of the list (i.e each line of the file). I'm using GNU parallel to improve performance. Here is how I launch testssl.sh with GNU parallel :
cat listDomainNames.txt | parallel --no-notice -j0 --workdir $PWD ./testMX.sh
Where testMX.sh launchs testssl.sh :
./testssl.sh --starttls smtp --vulnerable --server-preference -mx --append --csvfile result.csv $1
At the begin, my script is testing domain names very quickly (5 000 in 1 single hour) and after several hours, it becomes really slow (like 1 domain per min). Any idea what is happening ? Thanks in advance !

More and more processes will be hanging waiting for timeout.

Related

parallel computing in multiple cores for data which is indepedently run with the program

I have a simulation program in fortran which takes the input from a .dat. This file has 100.000 lines which takes really long to run. The program take the first line, run all the simulations and write in a .out the result and pass to the next line. I have a computer with 16 cpu so how can I do to split my data in 16 parts and run it separatly in each of the cpus? I am running in a machine with ubuntu. It is totally independent each line from the other.
For example my data is HeadData10000.dat, then I have a file simulation.ini with the name of the input data in this case: HeadData10000.dat and with the name of the output data. So the file simulation.ini will look like that
HeadData10000.dat
outputdata.out
Then now I have two computer so I split my HeadData10000.dat y two files and I do two simulation.ini for each input data and I run it like this in each computer: ./simulation.exe<./simulation.ini.
Assuming your list of 100,000 jobs is called "jobs.txt" and looks like this:
JobA
JobB
JobC
JobD
You could run this:
parallel 'printf "{}\n{.}.out" | ./simulation.exe' < jobs.txt
If you want to do a dry run to see what that would do without doing anything:
parallel --dry-run 'printf "{}\n{.}.out" | ./simulation.exe' < jobs.txt
Sample Output
printf "JobA\nJobA.out" | ./simulation.exe
printf "JobB\nJobB.out" | ./simulation.exe
printf "JobC\nJobC.out" | ./simulation.exe
printf "JobD\nJobD.out" | ./simulation.exe
If you have multiple servers available, look at using the -S parameter to GNU Parallel to spread the jobs across the machines. Also, look at the --eta and --bar parameters for getting progress reports.
I used printf "line1 \n line2" to generate two lines of input in order to avoid having to create, and later delete 100,000 files.
By default, GNU Parallel will keep 1 job per CPU core running, so there will always be 16 jobs running on your 16-core machine, but you can change that to, say, 8 if you want to with parallel -j 8. You can also specify the number of jobs to run on your second (and subsequent) machines.

Gnu Parallel: Does parallel reload program for every job?

Suppose I have a program that loads significant content before running...but this is a one time slowdown.
Next, I write:
cat ... | parallel -j 8 --spreadstdin --block $sz ... ./mycode
Will this induce the load overhead every single job?
If it does induce the overhead, is there a way to avoid it?
As #Barmar says, ./mycode is started for each block in your example.
But since you do not use -k in your example you may be able to use --round-robin.
... | parallel -j 8 --spreadstdin --round-robin --block $sz ... ./mycode
This will start 8 ./mycodes (but not one per block) and give blocks to any process that is ready to read.
This example shows that more blocks are given to process 11 and 10 than process 4 and 5 because 4 and 5 read slower:
seq 1000000 |
parallel -j8 --tag --roundrobin --pipe --block 1k 'pv -qL {}0000 | wc' ::: 11 4 5 6 9 8 7 10
parallel doesn't know anything about the internal workings of the program you're running with it. Each instance runs independently, there's no way that one invocation's initialization can be copied over to the others.
If you want the application to initialize once and then run multiple instances in parallel, you need to design that into the application itself. It should load the data, then use fork() to create multiple processes that use this data.

Is there a way to flush stdout on process termination for parallel processes

I'm running several independent programs on a single machine in parallel.
The processes (say 100) are all relatively short (<5 minutes) and their output is limited to a few hundred lines (~kilobytes).
Usually the output in a terminal then becomes mangled because the processes write directly to the same buffer. I would like these outputs to be un-mangled so that it's easier to debug certain processes. I could write these outputs to temporary files but I would like to limit disk IO and would prefer another method if possible. It would require cleaning up and probably won't really improve code readability.
Is there any shell native method that allows buffers to be PID separated which then flushes to stdout/stderr when the process terminates ? Do you see any other way to do this ?
Update
I ended up using the tail -n 1000000 trick from the comment of #Gem. Since the commands I'm using are long and (covering multiple lines) and I was already using subshells ( ... ) & that was a quite minimal change from ( ... ) & to ( ... ) 2>&1 | tail -n 1000000 &.
You can do that with GNU Parallel. Use -k to keep the output in order and ::: to separate the arguments you want passed to your program.
Here we run 4 instances of echo in parallel:
parallel -k echo {} ::: {0..4}
0
1
2
3
4
Now add in --tag to tag your output lines with the filenames or parameters you are using:
parallel --tag -k 'echo "Line 1, param {}"; echo "Line 2, param {}"' ::: {1..4}
1 Line 1, param 1
1 Line 2, param 1
2 Line 1, param 2
2 Line 2, param 2
3 Line 1, param 3
3 Line 2, param 3
4 Line 1, param 4
4 Line 2, param 4
You should notice that each line is tagged on the left side with the parameters and that the two lines from each job are kept together.
You can now specify how your output is organised.
Use --group to group output by job
Use --line-buffer to buffer a line at a time
Use --ungroup if you want output all mixed up, but as soon as available
Sounds like you just want syslog, or rather logger its Bash interface. Example:
echo "Something happened!" | logger -i -p local0.notice
If you insist on getting output to stderr too use --stderr. rsyslog will handle buffering, atomic writes, etc, and is presumably pretty good at optimizing disk I/O. However you could also easily configure rsyslog to route the log facility (i.e. local0 or what ever you choose to use) where ever you want, such as on a tmpfs or dedicated disk, or even over TCP. See /etc/rsyslog.conf.

fastest hashing in a unix environment?

I need to examine the output of a certain script 1000s of times on a unix platform and check if any of it has changed from before.
I've been doing this:
(script_stuff) | md5sum
and storing this value. I actually don't really need "md5", JUST a simple hash function which I can compare against a stored value to see if its changed. Its okay if there are an occassional false positive.
Is there anything better than md5sum that works faster and generates a fairly usable hash value? The script itself generates a few lines of text - maybe 10-20 on average to max 100 or so.
I had a look at fast md5sum on millions of strings in bash/ubuntu - that's wonderful, but I can't compile a new program. Need a system utility... :(
Additional "background" details:
I've been asked to monitor the DNS record of a set of 1000 or so domains and immediately call certain other scripts if there has been any change. I intend to do a dig xyz +short statement and hash its output and store that, and then check it against a previously stored value. Any change will trigger the other script, otherwise it just goes on. Right now, we're planning on using cron for a set of these 1000, but can think completely diffeerently for "seriously heavy" usage - ~20,000 or so.
I have no idea what the use of such a system would be, I'm just doing this as a job for someone else...
The cksum utility calculates a non-cryptographic CRC checksum.
How big is the output you're checking? A hundred lines max. I'd just save the entire original file then use cmp to see if it's changed. Given that a hash calculation will have to read every byte anyway, the only way you'll get an advantage from a checksum type calculation is if the cost of doing it is less than reading two files of that size.
And cmp won't give you any false positives or negatives :-)
pax> echo hello >qq1.txt
pax> echo goodbye >qq2.txt
pax> cp qq1.txt qq3.txt
pax> cmp qq1.txt qq2.txt >/dev/null
pax> echo $?
1
pax> cmp qq1.txt qq3.txt >/dev/null
pax> echo $?
0
Based on your question update:
I've been asked to monitor the DNS record of a set of 1000 or so domains and immediately call certain other scripts if there has been any change. I intend to do a dig xyz +short statement and hash its output and store that, and then check it against a previously stored value. Any change will trigger the other script, otherwise it just goes on. Right now, we're planning on using cron for a set of these 1000, but can think completely diffeerently for "seriously heavy" usage - ~20,000 or so.
I'm not sure you need to worry too much about the file I/O. The following script executed dig microsoft.com +short 5000 times first with file I/O then with output to /dev/null (by changing the comments).
#!/bin/bash
rm -rf qqtemp
mkdir qqtemp
((i = 0))
while [[ $i -ne 5000 ]] ; do
#dig microsoft.com +short >qqtemp/microsoft.com.$i
dig microsoft.com +short >/dev/null
((i = i + 1))
done
The elapsed times at 5 runs each are:
File I/O | /dev/null
----------+-----------
3:09 | 1:52
2:54 | 2:33
2:43 | 3:04
2:49 | 2:38
2:33 | 3:08
After removing the outliers and averaging, the results are 2:49 for the file I/O and 2:45 for the /dev/null. The time difference is four seconds for 5000 iterations, only 1/1250th of a second per item.
However, since an iteration over the 5000 takes up to three minutes, that's how long it will take maximum to detect a problem (a minute and a half on average). If that's not acceptable, you need to move away from bash to another tool.
Given that a single dig only takes about 0.012 seconds, you should theoretically do 5000 in sixty seconds assuming your checking tool takes no time at all. You may be better off doing something like this in Perl and using an associative array to store the output from dig.
Perl's semi-compiled nature means that it will probably run substantially faster than a bash script and Perl's fancy stuff will make the job a lot easier. However, you're unlikely to get that 60-second time much lower just because that's how long it takes to run the dig commands.

Changing POST data used by Apache Bench per iteration

I'm using ab to do some load testing, and it's important that the supplied querystring (or POST) parameters change between requests.
I.e. I need to make requests to URLs like:
http://127.0.0.1:9080/meth?param=0
http://127.0.0.1:9080/meth?param=1
http://127.0.0.1:9080/meth?param=2
...
to properly exercise the application.
ab seems to only read the supplied POST data file once, at startup, so changing its content during the test run is not an option.
Any suggestions?
You're going to need to use a more full-featured benchmarking tool like jMeter for this.
Add my recommendation for jMeter...it works very well!
You could also create a script that creates a second script with something like:
ab -n 1 -c 1 'http://yoursever.com/method?param=0' &
ab -n 1 -c 1 'http://yoursever.com/method?param=1' &
ab -n 1 -c 1 'http://yoursever.com/method?param=2' &
ab -n 1 -c 1 'http://yoursever.com/method?param=3' &
ab -n 1 -c 1 'http://yoursever.com/method?param=4' &
But that's only really useful if you're trying to simulate load and observe your server. The actual benchmarks will have to be collated if you want to check ab performance. At that point I'd just use jMeter. For my use, I just need to simulate load and the ab processes are light enough that running 100 like this is no problem.
Here is patched version of ab or patch:
http://www.andboson.com/?p=1372
this version is included that patch http://chrismiles.info/dev/testing/ab
also can read many post-data line by line
upd:
sample request:
./ab -v1 -n2 -c1 -T'application/json' -ppostfile http://api.webhookinbox.com/i/HX6mC1WS/in/
postfile content:
{"data1":1, "data2":"4"}
{"data0":0, "x":"y"}
upd2:
also alternative
https://github.com/andboson/ab-go

Resources