Is there a way to flush stdout on process termination for parallel processes

Is there a way to flush stdout on process termination for parallel processes - bash

I'm running several independent programs on a single machine in parallel.
The processes (say 100) are all relatively short (<5 minutes) and their output is limited to a few hundred lines (~kilobytes).
Usually the output in a terminal then becomes mangled because the processes write directly to the same buffer. I would like these outputs to be un-mangled so that it's easier to debug certain processes. I could write these outputs to temporary files but I would like to limit disk IO and would prefer another method if possible. It would require cleaning up and probably won't really improve code readability.
Is there any shell native method that allows buffers to be PID separated which then flushes to stdout/stderr when the process terminates ? Do you see any other way to do this ?
Update
I ended up using the tail -n 1000000 trick from the comment of #Gem. Since the commands I'm using are long and (covering multiple lines) and I was already using subshells ( ... ) & that was a quite minimal change from ( ... ) & to ( ... ) 2>&1 | tail -n 1000000 &.

You can do that with GNU Parallel. Use -k to keep the output in order and ::: to separate the arguments you want passed to your program.
Here we run 4 instances of echo in parallel:
parallel -k echo {} ::: {0..4}
0
1
2
3
4
Now add in --tag to tag your output lines with the filenames or parameters you are using:
parallel --tag -k 'echo "Line 1, param {}"; echo "Line 2, param {}"' ::: {1..4}
1 Line 1, param 1
1 Line 2, param 1
2 Line 1, param 2
2 Line 2, param 2
3 Line 1, param 3
3 Line 2, param 3
4 Line 1, param 4
4 Line 2, param 4
You should notice that each line is tagged on the left side with the parameters and that the two lines from each job are kept together.
You can now specify how your output is organised.
Use --group to group output by job
Use --line-buffer to buffer a line at a time
Use --ungroup if you want output all mixed up, but as soon as available

Sounds like you just want syslog, or rather logger its Bash interface. Example:
echo "Something happened!" | logger -i -p local0.notice
If you insist on getting output to stderr too use --stderr. rsyslog will handle buffering, atomic writes, etc, and is presumably pretty good at optimizing disk I/O. However you could also easily configure rsyslog to route the log facility (i.e. local0 or what ever you choose to use) where ever you want, such as on a tmpfs or dedicated disk, or even over TCP. See /etc/rsyslog.conf.

Related

parallel computing in multiple cores for data which is indepedently run with the program

I have a simulation program in fortran which takes the input from a .dat. This file has 100.000 lines which takes really long to run. The program take the first line, run all the simulations and write in a .out the result and pass to the next line. I have a computer with 16 cpu so how can I do to split my data in 16 parts and run it separatly in each of the cpus? I am running in a machine with ubuntu. It is totally independent each line from the other.
For example my data is HeadData10000.dat, then I have a file simulation.ini with the name of the input data in this case: HeadData10000.dat and with the name of the output data. So the file simulation.ini will look like that
HeadData10000.dat
outputdata.out
Then now I have two computer so I split my HeadData10000.dat y two files and I do two simulation.ini for each input data and I run it like this in each computer: ./simulation.exe<./simulation.ini.

Assuming your list of 100,000 jobs is called "jobs.txt" and looks like this:
JobA
JobB
JobC
JobD
You could run this:
parallel 'printf "{}\n{.}.out" | ./simulation.exe' < jobs.txt
If you want to do a dry run to see what that would do without doing anything:
parallel --dry-run 'printf "{}\n{.}.out" | ./simulation.exe' < jobs.txt
Sample Output
printf "JobA\nJobA.out" | ./simulation.exe
printf "JobB\nJobB.out" | ./simulation.exe
printf "JobC\nJobC.out" | ./simulation.exe
printf "JobD\nJobD.out" | ./simulation.exe
If you have multiple servers available, look at using the -S parameter to GNU Parallel to spread the jobs across the machines. Also, look at the --eta and --bar parameters for getting progress reports.
I used printf "line1 \n line2" to generate two lines of input in order to avoid having to create, and later delete 100,000 files.
By default, GNU Parallel will keep 1 job per CPU core running, so there will always be 16 jobs running on your 16-core machine, but you can change that to, say, 8 if you want to with parallel -j 8. You can also specify the number of jobs to run on your second (and subsequent) machines.

Gnu Parallel: Does parallel reload program for every job?

Suppose I have a program that loads significant content before running...but this is a one time slowdown.
Next, I write:
cat ... | parallel -j 8 --spreadstdin --block $sz ... ./mycode
Will this induce the load overhead every single job?
If it does induce the overhead, is there a way to avoid it?

As #Barmar says, ./mycode is started for each block in your example.
But since you do not use -k in your example you may be able to use --round-robin.
... | parallel -j 8 --spreadstdin --round-robin --block $sz ... ./mycode
This will start 8 ./mycodes (but not one per block) and give blocks to any process that is ready to read.
This example shows that more blocks are given to process 11 and 10 than process 4 and 5 because 4 and 5 read slower:
seq 1000000 |
parallel -j8 --tag --roundrobin --pipe --block 1k 'pv -qL {}0000 | wc' ::: 11 4 5 6 9 8 7 10

parallel doesn't know anything about the internal workings of the program you're running with it. Each instance runs independently, there's no way that one invocation's initialization can be copied over to the others.
If you want the application to initialize once and then run multiple instances in parallel, you need to design that into the application itself. It should load the data, then use fork() to create multiple processes that use this data.

optimize parallelisation in SLURM cluster: the case of genome alignemnt

I would like to understand what is the best way of using bwa in parallel in a SLURM cluster. Obviously, this will depend on the computational limits that I have as user.
bwa software has an argument "-t" specifying the number of threads. Let's imagine that I use bwa mem -t 3 ref.fa sampleA.fq.gz, this will mean that bwa split the job on three tasks/threads. In other words, it will align three reads at a time in parallel (I guess).
Now, if I want to run this command on several samples and in a SLURM cluster, Shall I specify the number of tasks as for bwa mem, and specify the number of CPUs per task(for instance 2)? Which would be:
sbatch -c 2 -n 3 bwa.sh
where bwa.sh containes:
cat data.info | while read indv; do
bwa mem -t 3 ref.fa sample${indv}.fq.gz
done
Do you have any suggestion? Or can you improve/correct my reasoning?

With -c 2 you are asking to have 2 CPUs per task.
With -n 3 you are asking to have 3 tasks.
That configuration prepares a set of resources that comprises 6 CPUs in up to 3 different nodes. But your script only used 3 CPUs (-t 3), so you are wasting resources and probably using resources that does not belong to you (because the task will use 3 CPUs and you only asked for 2 CPUs per task).
For that specific script, -c 3 is the proper parameter (the other defaults to one task):
sbatch -c 3 bwa.sh

Why does GNU parallel become less and less effective?

I have a file containing 1 000 000 domain names and I'm currently launching the script testssl.sh (http://testssl.sh) on each domain of the list (i.e each line of the file). I'm using GNU parallel to improve performance. Here is how I launch testssl.sh with GNU parallel :
cat listDomainNames.txt | parallel --no-notice -j0 --workdir $PWD ./testMX.sh
Where testMX.sh launchs testssl.sh :
./testssl.sh --starttls smtp --vulnerable --server-preference -mx --append --csvfile result.csv $1
At the begin, my script is testing domain names very quickly (5 000 in 1 single hour) and after several hours, it becomes really slow (like 1 domain per min). Any idea what is happening ? Thanks in advance !

More and more processes will be hanging waiting for timeout.

How to suppress the general information for top command

I wish to suppress the general information for the top command
using a top parameter.
By general information I mean the below stuff :
top - 09:35:05 up 3:26, 2 users, load average: 0.29, 0.22, 0.21
Tasks: 1 total, 0 running, 1 sleeping, 0 stopped, 0 zombie
Cpu(s): 2.3%us, 0.7%sy, 0.0%ni, 96.3%id, 0.8%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 3840932k total, 2687880k used, 1153052k free, 88380k buffers
Swap: 3998716k total, 0k used, 3998716k free, 987076k cached
What I do not wish to do is :
top -u user | grep process_name
or
top -bp $(pgrep process_name) | do_something
How can I achieve this?
Note: I am on Ubuntu 12.04 and top version is 3.2.8.

Came across this question today. I have a potential solution - create a top configuration file from inside top's interactive mode when the summary area is disabled. Since this file is also read at startup of top in batch mode, it will cause the summary area to be disabled in batch mode too.
Follow these steps to set it up..
Launch top in interactive mode.
Once inside interactive mode, disable the summary area by successively pressing 'l', 'm' and 't'.
Press 'W' (upper case) to write your top configuration file (normally, ~/.toprc)
Exit interactive mode.
Now when you run top in batch mode the summary area will not appear (!)
Taking it one step further...
If you only want this for certain situations and still want the summary area most of the time, you could use an alternate top configuration file. However, AFAIK, the way to get top to use an alternate config file is a bit funky. There are a couple of ways to do this. The approach I use is as follows:
Create a soft-link to the top executable. This does not have to be done as root, as long as you have write access to the link's location...
ln -s /usr/bin/top /home/myusername/bin/omgwtf
Launch top by typing the name of the link ('omgwtf') rather than 'top'. You will be in normal top interactive mode, but when you save the configuration file it will write to ~/.omgwtfrc, leaving ~/.toprc alone.
Disable the summary area and write the configuration file same as before (press 'l', 'm', 't' and 'W')
In the future, when you're ready to run top without summary info in batch mode, you'll have to invoke top via the link name you created. For example,
% omgwtf -usyslog -bn1
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
576 syslog 20 0 264496 8144 1352 S 0.0 0.1 0:03.66 rsyslogd
%

If you're running top in batch mode (-b -n1), just delete the header lines with sed:
top -b -n1 | sed 1,7d
That will remove the first 7 header lines that top outputs and returns only the processes.

It's known as the "Summary Area" and i don't think there is a way at top initialization to disable those.
But while top is running, you can disable those by pressing l, t, m.
From man top:
Summary-Area-defaults
'l' - Load Avg/Uptime On (thus program name)
't' - Task/Cpu states On (1+1 lines, see '1')
'm' - Mem/Swap usage On (2 lines worth)
'1' - Single Cpu On (thus 1 line if smp)

This will dump the output and it can be redirected to any file if needed.
top -n1 |grep -Ev "Tasks:|Cpu(s):|Swap:|Mem:"

To monitoring a particular process, following command is working for me -
top -sbn1 -p $(pidof <process_name>) | grep $(pidof <process_name>)
And to get the all process information you can use the following -
top -sbn1|sed -n '/PID/,/^$/p'

egrep may be good enough in this case, but I would add that perl -lane could do this kind of thing with lightning speed:
top -b -n 1 | perl -lane '/PID/ and $x=1; $x and print' | head -n10
This way you may forget the precise arguments for grep, sed, awk, etc. for good because perl is typically much faster than those tools.

On a mac you cannot use -b which is used in many of the other answers.
In that case the command would be top -n1 -l1 | sed 1,10d
Grabbing only the first process line (and its header), only logging once, instead of interactive, then suppress the general information for top command which are the first 10 lines.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio