GNU Parallel running a code having several options - parallel-processing

I have to admit that I was reading the gnu parallel documentation and I couldn't
find what I was looking for.
I need to run a code that has several options. The code is math intensive and it takes up to 5 days in a 3Ghz computer running in a single core.
I've used gfortran with -fopemp before but now I'm running this C code so gnu parallel seems adequate. Now to the issue, I need to execute wcmap.c with the following options using nice and nohup:
nohup nice -n 19 ./wcmap --slon_min 74.5 --slon_max 74.5 --ll_0_min 325 --ll_0_max 340 --bet_min 0.0 --bet_max 15 --vg 38.9 --ll_0_step 0.5 --bet_step 0.5 --path PARALLEL/ MORHIST-Exit.dat
I've tried gnu parallel with no success
parallel --gnu nice -n 19 ./wcmap --slon_min 74.5 --slon_max 74.5 --ll_0_min 325 --ll_0_max 340 --bet_min 0.0 --bet_max 15 --vg 38.9 --ll_0_step 0.5 --bet_step 0.5 --path PARALLEL/ MORHIST-Exit.dat :::
I need to leave this running on several nodes for some days in a remote server. Or even at my office computer (4 cores), that's why I'm using nohup from a remote session.
Any suggestions are appreciated!
Thank you in advance!
Sebastian

GNU Parallel cannot magically parallelize the internals of your wcmap program. What it can do is to run wcmap with different parameters in parallel. So let us assume you want to run:
./wcmap --slon_min 74.5 --slon_max 74.5 MORHIST-Exit.dat
./wcmap --slon_min 75 --slon_max 75 MORHIST-Exit.dat
./wcmap --slon_min 75.5 --slon_max 75.5 MORHIST-Exit.dat
./wcmap --slon_min 76 --slon_max 76 MORHIST-Exit.dat
Then you can do that with GNU Parallel:
parallel ./wcmap --slon_min {} --slon_max {} MORHIST-Exit.dat ::: 74.5 75 75.5 76

Related

Run million of list in PBS with parallel tool

I've huge size(few million) job contain list and wants to run java written tool to perform the features comparison. This tool completes the calculation in
real 0m0.179s
user 0m0.005s
sys 0m0.000s sec
Running 5 nodes(each have 72 cpus) with pbs torque scheduler in the GNU parallel, tool runs fine and produces the results but as I set 72 jobs per node, it should run 72 x 5 jobs at a time but I can see only it runs 25-35 jobs!
Checking of cpu utilization on each node also shows low utilization.
I desire to run 72 X 5 jobs or more at a time and produce the results by utilizing all the available source (72 X 5 cpus).
As I mentioned have ~200 millions of job to run, I desire to complete it faster(1-2 hours) by using/increasing the number of nodes/cpus.
Current code, input and job state:
example.lst (it has ~300 million lines)
ZNF512-xxxx_2_N-THRA-xxtx_2_N
ZNF512-xxxx_2_N-THRA-xxtx_3_N
ZNF512-xxxx_2_N-THRA-xxtx_4_N
.......
cat job_script.sh
#!/bin/bash
#PBS -l nodes=5:ppn=72
#PBS -N job01
#PBS -j oe
#work dir
export WDIR=/shared/data/work_dir
cd $WDIR;
# use available 72 cpu in each node
export JOBS_PER_NODE=72
#gnu parallel command
parallelrun="parallel -j $JOBS_PER_NODE --slf $PBS_NODEFILE --wd $WDIR --joblog process.log --resume"
$parallelrun -a example.lst sh run_script.sh {}
cat run_script.sh
#!/bin/bash
# parallel command options
i=$1
data=/shared/TF_data
# create tmp dir and work in
TMP_DIR=/shared/data/work_dir/$i
mkdir -p $TMP_DIR
cd $TMP_DIR/
# get file name
mk=$(echo "$i" | cut -d- -f1-2)
nk=$(echo "$i" | cut -d- -f3-6)
#run a tool to compare the features of pair files
/shared/software/tool_v2.1/tool -s1 $data/inf_tf/$mk -s1cf $data/features/$mk-cf -s1ss $data/features/$mk-ss -s2 $data/inf_tf/$nk.pdb -s2cf $data/features/$nk-cf.pdb -s2ss $data/features/$nk-ss.pdb > $data/$i.out
# move output files
mv matrix.txt $data/glosa_tf/matrix/$mk"_"$nk.txt
mv ali_struct.pdb $data/glosa_tf/aligned/$nk"_"$mk.pdb
# move back and remove tmp dir
cd $TMP_DIR/../
rm -rf $TMP_DIR
exit 0
PBS submission
qsub job_script.sh
Login to one of the node : ssh ip-172-31-9-208
top - 09:28:03 up 15 min, 1 user, load average: 14.77, 13.44, 8.08
Tasks: 928 total, 1 running, 434 sleeping, 0 stopped, 166 zombie
Cpu(s): 0.1%us, 0.1%sy, 0.0%ni, 98.4%id, 1.4%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 193694612k total, 1811200k used, 191883412k free, 94680k buffers
Swap: 0k total, 0k used, 0k free, 707960k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
15348 ec2-user 20 0 16028 2820 1820 R 0.3 0.0 0:00.10 top
15621 ec2-user 20 0 169m 7584 6684 S 0.3 0.0 0:00.01 ssh
15625 ec2-user 20 0 171m 7472 6552 S 0.3 0.0 0:00.01 ssh
15626 ec2-user 20 0 126m 3924 3492 S 0.3 0.0 0:00.01 perl
.....
All of the nodes top shows the similar state and produces the results by running only ~26 at a time!
I've aws-parallelcluster contains 5 nodes(each have 72 cpus) with torque scheduler and GNU Parallel 2018, Mar 2018
Update
By introducing the new function that takes input on stdin and running the script in parallel works great and utilizes all the CPU in local machine.
However, when its runs over remote machines it produces a
parallel: Error: test.lst is neither a file nor a block device
MCVE:
A simple code that echoing list gives the same error while running it in remote machines but works great in local machine:
cat test.lst # contains list
DNMT3L-5yx2B_1_N-DNMT3L-5yx2B_2_N
DNMT3L-5yx2B_1_N-DNMT3L-6brrC_3_N
DNMT3L-5yx2B_1_N-DNMT3L-6f57B_2_N
DNMT3L-5yx2B_1_N-DNMT3L-6f57C_2_N
DNMT3L-5yx2B_1_N-DUX4-6e8cA_4_N
DNMT3L-5yx2B_1_N-E2F8-4yo2A_3_P
DNMT3L-5yx2B_1_N-E2F8-4yo2A_6_N
DNMT3L-5yx2B_1_N-EBF3-3n50A_2_N
DNMT3L-5yx2B_1_N-ELK4-1k6oA_3_N
DNMT3L-5yx2B_1_N-EPAS1-1p97A_1_N
cat test_job.sh # GNU parallel submission script
#!/bin/bash
#PBS -l nodes=1:ppn=72
#PBS -N test
#PBS -k oe
# introduce new function and Run from ~/
dowork() {
parallel sh test_work.sh {}
}
export -f dowork
parallel -a test.lst --env dowork --pipepart --slf $PBS_NODEFILE --block -10 dowork
cat test_work.sh # run/work script
#!/bin/bash
i=$1
data=pwd
#create temporary folder in current dir
TMP_DIR=$data/$i
mkdir -p $TMP_DIR
cd $TMP_DIR/
# split list
mk=$(echo "$i" | cut -d- -f1-2)
nk=$(echo "$i" | cut -d- -f3-6)
# echo list and save in echo_test.out
echo $mk, $nk >> $data/echo_test.out
cd $TMP_DIR/../
rm -rf $TMP_DIR
From your timing:
real 0m0.179s
user 0m0.005s
sys 0m0.000s sec
it seems the tool uses very little CPU power. When GNU Parallel runs local jobs it has an overhead of 10 ms CPU time per job. Your jobs use 179 ms time, and 5 ms CPU time. So GNU Parallel will be using quite a bit of the time spent.
The overhead is much worse when running jobs remotely. Here we are talking 10 ms + running an ssh command. This can easily be in the order of 100 ms.
So how can we minimize the number of ssh commands and how can spread the overhead over multiple cores?
First let us make a function that can take input on stdin and run the script - one job per CPU thread in parallel:
dowork() {
[...set variables here. that becomes particularly important we when run remotely...]
parallel sh run_script.sh {}
}
export -f dowork
Test that this actually works by running:
head -n 1000 example.lst | dowork
Then let us look at running jobs locally. This can be done similar to described here: https://www.gnu.org/software/parallel/man.html#EXAMPLE:-Running-more-than-250-jobs-workaround
parallel -a example.lst --pipepart --block -10 dowork
This will split example.lst into 10 blocks per CPU thread. So on a machine with 72 CPU threads this will make 720 blocks. It will the start 72 doworks and when one is done it will get another of the 720 blocks. The reason I choose 10 instead of 1 is if one of the jobs "get stuck" for a while, then you are unlikely to notice this.
This should make sure 100% of the CPUs on the local machine is busy.
If that works, we need to distribute this work to remote machines:
parallel -j1 -a example.lst --env dowork --pipepart --slf $PBS_NODEFILE --block -10 dowork
This should in total start 10 ssh per CPU thread (i.e. 5*72*10) - namely one for each block. With 1 running per server listed in $PBS_NODEFILE in parallel.
Unfortunately this means that --joblog and --resume will not work. There is currently no way to make that work, but if it is valuable to you contact me via parallel#gnu.org.
I am not sure what tool does. But if the copying takes most of the time and if tool only reads the files, then you might just be able symlink the files into $TMP_DIR instead of copying.
A good indication of whether you can do it faster is to look at top of the 5 machines in the cluster. If they are all using all cores at >90% then you cannot expect to get it faster.

How to create specific, automated bandwidth/network traffic for experiment..?

So for my job I need to do an experiment..
The essence and part where I get stuck at now is how to create a specific amount of bandwidth/network traffic. From either Linux or windows, (I have both a Kali and Win10 VM). To a target Win10 pc which monitors it's resources to see the reaction to the traffic.
Preferably 3 different amounts of traffic with a certain time between them, like a scripted sleep timer or something..
So like the following:
Sleep 180 seconds
Send 10mbps for 120 seconds
Sleep 180 seconds
Send 25mbps for 120 seconds
Sleep 180 seconds
Send 50mbps for 120 seconds
Exit
Now this shouldn't be as hard as I'm thinking of it to be right?
Any pointers on some software or scripts I could use for this?
Thanks in advance
You can use iperf3, a tool for stress testing and measuring network performance, along with basic bash scripting to accomplish this. iperf3 is cross-platform, so it is supported on both Kali Linux and Win 10.
On Kali Linux, I used the system package manager to install it:
apt install iperf3
On Windows, I downloaded the 64 bit binary from the iperf3 download site.
Then used the following command to start an iperf3 server:
iperf3 -s
-----------------------------------------------------------
Server listening on 5201
-----------------------------------------------------------
This will start the server listening on a particular port as shown above. Then on my Kali Linux instance, I use iperf3 again as a client to connect to the server:
iperf3 -c <address-of-server> -b 25M -t 7
[ 5] local XXX.XXX.XXX.XXX port 42650 connected to XXX.XXX.XXX.XXX port 5201
[ ID] Interval Transfer Bitrate Retr Cwnd
[ 5] 0.00-1.00 sec 2.98 MBytes 25.0 Mbits/sec 0 133 KBytes
[ 5] 1.00-2.00 sec 3.00 MBytes 25.2 Mbits/sec 0 133 KBytes
[ 5] 2.00-3.00 sec 3.00 MBytes 25.2 Mbits/sec 0 133 KBytes
[ 5] 3.00-4.00 sec 3.00 MBytes 25.2 Mbits/sec 0 133 KBytes
[ 5] 4.00-5.00 sec 3.00 MBytes 25.2 Mbits/sec 0 133 KBytes
[ 5] 5.00-6.00 sec 3.00 MBytes 25.2 Mbits/sec 2 133 KBytes
[ 5] 6.00-7.00 sec 3.00 MBytes 25.2 Mbits/sec 1 133 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-7.00 sec 21.0 MBytes 25.1 Mbits/sec 3 sender
[ 5] 0.00-7.00 sec 21.0 MBytes 25.1 Mbits/sec receiver
The -b option allows me to specify the bit rate, in the example above 25 Mbits/sec and the -t option allows specifying how long it should run, in this case 7 seconds.
You can easily use this tool in a script to achieve the behavior you want, for example:
#!/bin/bash
echo "Sleeping for 180 seconds"
sleep 180
echo "Stress testing at 10Mb/s for 120 seconds"
iperf3 -c mywinserver.local -b 10M -t 120
echo "Sleeping for 180 seconds"
sleep 180
echo "Stress testing at 25Mb/s for 120 seconds"
iperf3 -c mywinserver.local -b 25M -t 120
echo "Sleeping for 180 seconds"
sleep 180
echo "Stress testing at 50Mb/s for 120 seconds"
iperf3 -c mywinserver.local -b 50M -t 120

How to know the number of active threads in Puma

I am trying to see the number of active puma threads on my server.
I can not see it through ps:
$ ps aux | grep puma
healthd 2623 0.0 1.8 683168 37700 ? Ssl May02 5:38 puma 2.11.1 (tcp://127.0.0.1:22221) [healthd]
root 8029 0.0 0.1 110460 2184 pts/0 S+ 06:34 0:00 grep --color=auto puma
root 18084 0.0 0.1 56836 2664 ? Ss May05 0:00 su -s /bin/bash -c puma -C /opt/elasticbeanstalk/support/conf/pumaconf.rb webapp
webapp 18113 0.0 0.8 83280 17324 ? Ssl May05 0:04 puma 2.16.0 (unix:///var/run/puma/my_app.sock) [/]
webapp 18116 3.5 6.2 784992 128924 ? Sl May05 182:35 puma: cluster worker 0: 18113 [/]
As in the configuration I have:
threads 8, 32
I was expecting to see at least 8 puma threads?
To quickly answer the question, the number of threads used by a
process running on a given PID, can be obtained using the
following :
% ps -h -o nlwp <pid>
This will just return you the total number of threads used by the
process. The option -h removes the headers and the option -o nlwp
formats the output of ps such that it only outputs the Number of Light Weight Processes (NLWP) or threads. Example, when only a single process puma is running and its PID is obtained with pgrep, you get:
% ps -h -o nlwp $(pgrep puma)
4
What is the difference between process, thread and light-weight process?
This question has been answered already in various places
[See here, here and the excellent geekstuff
article]. The quick, short and ugly version is :
a process is essentially any running instance of a program.
a thread is a flow of execution of the process. A process
containing multiple execution-flows is known as multi-threaded
process and shares its resources amongst its threads (memory,
open files, io, ...). The Linux kernel has no knowledge of what
threads are and only knows processes. In the past,
multi-threading was handled on a user level and not kernel
level. This made it hard for the kernel to do proper process
management.
Enter lightweight processes (LWP). This is essentially the
answer to the issue with threads. Each thread is considered to
be an LWP on kernel level. The main difference between a
process and an LWP is that the LWP shares resources. In other words, an Light Weight Process is kernel-speak for what users call a thread.
Can ps show information about threads or LWP's?
The ps command or process status command provides information
about the currently running processes including their corresponding
LWPs or threads. To do this, it makes use of the /proc directory
which is a virtual filesystem and regarded as the control and
information centre of the kernel. [See here and here].
By default ps will not give you any information about the LWPs,
however, adding the option -L and -m to the command generally does
the trick.
man ps :: THREAD DISPLAY
H Show threads as if they were processes.
-L Show threads, possibly with LWP and NLWP columns.
m Show threads after processes.
-m Show threads after processes.
-T Show threads, possibly with SPID column.
For a single process puma with pid given by pgrep puma
% ps -fL $(pgrep puma)
UID PID PPID LWP C NLWP STIME TTY STAT TIME CMD
kvantour 2160 2876 2160 0 4 15:22 pts/39 Sl+ 0:00 ./puma
kvantour 2160 2876 2161 99 4 15:22 pts/39 Rl+ 0:14 ./puma
kvantour 2160 2876 2162 99 4 15:22 pts/39 Rl+ 0:14 ./puma
kvantour 2160 2876 2163 99 4 15:22 pts/39 Rl+ 0:14 ./puma
however, adding the -m option clearly gives a nicer overview. This
is especially handy when multiple processes are running with the same
name.
% ps -fmL $(pgrep puma)
UID PID PPID LWP C NLWP STIME TTY STAT TIME CMD
kvantour 2160 2876 - 0 4 15:22 pts/39 - 0:44 ./puma
kvantour - - 2160 0 - 15:22 - Sl+ 0:00 -
kvantour - - 2161 99 - 15:22 - Rl+ 0:14 -
kvantour - - 2162 99 - 15:22 - Rl+ 0:14 -
kvantour - - 2163 99 - 15:22 - Rl+ 0:14 -
In this example, you see that process puma with PID 2160 runs with 4
threads (NLWP) having the ID's 2160--2163. Under STAT you see two different values Sl+ and 'Rl+'. Here the l is an indicator for multi-threaded. S and R stand for interruptible sleep (waiting for an event to complete) and respectively running. So we see that 3 of the 4 threads are running at 99% CPU and one thread is sleeping.
You also see the total accumulated CPU time (44s) while a single thread only runs for 14s.
Another way to obtain information is by directly using the format
specifiers with -o or -O.
man ps :: STANDARD FORMAT SPECIFIERS
lwp lightweight process (thread) ID of the dispatchable
entity (alias spid, tid). See tid for additional
information. Show threads as if they were processes.
nlwp number of lwps (threads) in the process. (alias thcount).
So you can use any of lwp,spid or tid and nlwp or thcount.
If you only want to get the number of threads of a process called
puma, you can use :
% ps -o nlwp $(pgrep puma)
NLWP
4
or if you don't like the header
% ps -h -o nlwp $(pgrep puma)
4
You can get a bit more information with :
% ps -O nlwp $(pgrep puma)
PID NLWP S TTY TIME COMMAND
19304 4 T pts/39 00:00:00 ./puma
Finally, you can combine the flags with ps aux to list the threads.
% ps aux -L
USER PID LWP %CPU NLWP %MEM VSZ RSS TTY STAT START TIME COMMAND
...
kvantour 1618 1618 0.0 4 0.0 33260 1436 pts/39 Sl+ 15:17 0:00 ./puma
kvantour 1618 1619 99.8 4 0.0 33260 1436 pts/39 Rl+ 15:17 0:14 ./puma
kvantour 1618 1620 99.8 4 0.0 33260 1436 pts/39 Rl+ 15:17 0:14 ./puma
kvantour 1618 1621 99.8 4 0.0 33260 1436 pts/39 Rl+ 15:17 0:14 ./puma
...
Can top show information about threads or LWP's?
top has the option to show threads by hitting H in the interactive mode or by launching top with top -H. The problem is that it lists the threads as processes (similar to ps -fH).
% top
top - 09:42:10 up 17 days, 3 min, 1 user, load average: 3.35, 3.33, 2.75
Tasks: 353 total, 3 running, 347 sleeping, 3 stopped, 0 zombie
%Cpu(s): 75.5 us, 0.6 sy, 0.5 ni, 22.6 id, 0.0 wa, 0.0 hi, 0.8 si, 0.0 st
KiB Mem : 16310772 total, 8082152 free, 3662436 used, 4566184 buff/cache
KiB Swap: 4194300 total, 4194300 free, 0 used. 11363832 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
868 kvantour 20 0 33268 1436 1308 S 299.7 0.0 46:16.22 puma
1163 root 20 0 920488 282524 258436 S 2.0 1.7 124:48.32 Xorg
...
Here you see that puma runs at about 300% CPU for an accumulated time of 46:16.22. There is, however, no indicator that this is a threaded process. The only indicator is the CPU usage, however, this could be below 100% if 3 threads are "sleeping"? Furthermore, the status flag states S which indicates that the first thread is asleep. Hitting H give you then
% top -H
top - 09:48:30 up 17 days, 10 min, 1 user, load average: 3.18, 3.44, 3.02
Threads: 918 total, 5 running, 910 sleeping, 3 stopped, 0 zombie
%Cpu(s): 75.6 us, 0.2 sy, 0.1 ni, 23.9 id, 0.0 wa, 0.0 hi, 0.2 si, 0.0 st
KiB Mem : 16310772 total, 8062296 free, 3696164 used, 4552312 buff/cache
KiB Swap: 4194300 total, 4194300 free, 0 used. 11345440 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
870 kvantour 20 0 33268 1436 1308 R 99.9 0.0 21:45.35 puma
869 kvantour 20 0 33268 1436 1308 R 99.7 0.0 21:45.43 puma
872 kvantour 20 0 33268 1436 1308 R 99.7 0.0 21:45.31 puma
1163 root 20 0 920552 282288 258200 R 2.0 1.7 124:52.05 Xorg
...
Now we see only 3 threads. As one of the Threads is "sleeping", it is way down the bottom as top sorts by CPU usage.
In order to see all threads, it is best to ask top to display a specific pid (for a single process):
% top -H -p $(pgrep puma)
top - 09:52:48 up 17 days, 14 min, 1 user, load average: 3.31, 3.38, 3.10
Threads: 4 total, 3 running, 1 sleeping, 0 stopped, 0 zombie
%Cpu(s): 75.5 us, 0.1 sy, 0.2 ni, 23.6 id, 0.0 wa, 0.0 hi, 0.7 si, 0.0 st
KiB Mem : 16310772 total, 8041048 free, 3706460 used, 4563264 buff/cache
KiB Swap: 4194300 total, 4194300 free, 0 used. 11325008 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
869 kvantour 20 0 33268 1436 1308 R 99.9 0.0 26:03.37 puma
870 kvantour 20 0 33268 1436 1308 R 99.9 0.0 26:03.30 puma
872 kvantour 20 0 33268 1436 1308 R 99.9 0.0 26:03.22 puma
868 kvantour 20 0 33268 1436 1308 S 0.0 0.0 0:00.00 puma
When you have multiple processes running, you might be interested in hitting f and toggle PGRP on. This shows the Group PID of the process. (PID in ps where PID in top is LWP in ps).
How do I get the thread count without using ps or top?
The file /proc/$PID/status contains a line stating how many threads
the process with PID $PID is using.
% grep Threads /proc/19304/status
Threads: 4
General comments
It is possible that you do not find the process of another user
and therefore cannot get the number of threads that process is using. This could be due to the mount options of /proc/ (hidepid=2).
Used example program:
#include <omp.h>
#include <stdio.h>
#include <stdlib.h>
int main (int argc, char *argv[]) {
char c = 0;
#pragma omp parallel shared(c) {
int i = 0;
if (omp_get_thread_num() == 0) {
printf("Read character from input : ");
c = getchar();
} else {
while (c == 0) i++;
printf("Total sum is on thread %d : %d\n", omp_get_thread_num(), i);
}
}
}
compiled with gcc -o puma --openmp
If you are just looking for number of threads that are spawned by the process, you can see the number of task folders created under /proc/[pid-of-process]/task because each thread creates a folder under this path. So counting the number of folders would be sufficient.
In fact the ps utility itself reads the information from this path, a file /proc/[PID]/cmdline which is represented in a more readable way.
From Linux Filesystem Hierarchy
/proc is very special in that it is also a virtual file-system. It's sometimes referred to as a process information pseudo-file system. It doesn't contain 'real' files but run-time system information (e.g. system memory, devices mounted, hardware configuration, etc). For this reason it can be regarded as a control and information center for the kernel. In fact, quite a lot of system utilities are simply calls to files in this directory.
All you need to get the PID of the process puma, use ps or any utility of your choice
ps aux | awk '/[p]uma/{print $1}'
or more directly use pidof(8) - Linux man page which gets you the PID directly given the process name as input
pidof -s puma
Now that you have the PID to count the number of task/ folders your process had created use the find command
find /proc/<PID>/task -maxdepth 1 -type d -print | wc -l
ps aux | grep puma will give you list of process for only puma. You need to find out , how many threads are running by the particular Process. May be this will help you:
ps -T -p 2623
You need to provide process id, for which you want to find out number of threads. Make sure you are providing accurate process id.
Using ps and wc to count puma threads:
ps --no-headers -T -C puma | wc -l
The string "puma" can be replaced as desired. Example, count bash threads:
ps --no-headers -T -C bash | wc -l
On my system that outputs:
9
The code in the question, ps aux | grep puma, has a few grep related problems:
It returns grep --color=auto puma, which isn't a puma thread at all.
Similarly any util or command with the string "puma", e.g. a util called notpuma, would be matched by grep.
I found "htop" to be an excellent solution. Just toggle "tree view" and you can view each puma-worker and the threads under that worker.
Number of puma threads for each worker:
ps aux | awk '/[p]uma/{print $2}' | xargs ps -h -o nlwp
Sample output:
7
59
59
61
59
60
59
59
59

"Max jobs to run" does not equal the number of jobs specified when using GNU Parallel on remote server?

I am trying to run many small serial jobs with GNU Parallel on a PBS cluster, each compute node has 16 cores, as I intended to use multiple compute nodes therefore I passed the option -S $SERVERNAME to GNUParallel, however what confuses me is that the number of jobs started on the node using -S $SERVERNAME does not equal to the number of jobs I specified when I intended to spawn more than 9 jobs, below are my observations:
[fchen14#shelob001 ~]$ parallel --version
GNU parallel 20160922
Copyright (C) 2007,2008,2009,2010,2011,2012,2013,2014,2015,2016
Ole Tange and Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
GNU parallel comes with no warranty.
Web site: http://www.gnu.org/software/parallel
When using programs that use GNU Parallel to process data for publication
please cite as described in 'parallel --citation'.
[fchen14#shelob001 ~]$ hostname # this shows my hostname
shelob001
When use GNUParallel as local host without -S $SERVERNAME, there is no problem, I intended to spawn 10 jobs, and GNUParallel started 10 jobs:
[fchen14#shelob001 ~]$ parallel --progress echo ::: `seq 1 10`
Computers / CPU cores / Max jobs to run
1:local / 16 / 10 # 10 jobs spawned, no problem
Computer:jobs running/jobs completed/%of started jobs/Average seconds to complete
local:10/0/100%/0.0s 1
local:9/1/100%/0.0s 2
local:8/2/100%/0.0s 3
local:7/3/100%/0.0s 4
local:6/4/100%/0.0s 5
local:5/5/100%/0.0s 6
local:4/6/100%/0.0s 7
local:3/7/100%/0.0s 8
local:2/8/100%/0.0s 9
local:1/9/100%/0.0s 10
local:0/10/100%/0.0s
When I use GNUParallel to spawn less than 10 jobs using the -S $SERVERNAME, still no problem.
[fchen14#shelob001 ~]$ parallel -S shelob001 --progress echo ::: `seq 1 1`
Computers / CPU cores / Max jobs to run
1:shelob001 / 16 / 1 # When the number of jobs is less than 10, no problem
Computer:jobs running/jobs completed/%of started jobs/Average seconds to complete
shelob001:1/0/100%/0.0s 1
shelob001:0/1/100%/1.0s
[fchen14#shelob001 ~]$ parallel -S shelob001 --progress echo ::: `seq 1 8`
Computers / CPU cores / Max jobs to run
1:shelob001 / 16 / 8 # When the number of jobs is less than 10, no problem
Computer:jobs running/jobs completed/%of started jobs/Average seconds to complete
shelob001:8/0/100%/0.0s 1
shelob001:7/1/100%/1.0s 7
shelob001:6/2/100%/0.5s 3
shelob001:5/3/100%/0.3s 8
shelob001:4/4/100%/0.2s 5
shelob001:3/5/100%/0.2s 2
shelob001:2/6/100%/0.2s 6
shelob001:1/7/100%/0.1s 4
shelob001:0/8/100%/0.1s
[fchen14#shelob001 ~]$ parallel -S shelob001 --progress echo ::: `seq 1 9`
Computers / CPU cores / Max jobs to run
1:shelob001 / 16 / 9 # When the number of jobs is less than 10, no problem
Computer:jobs running/jobs completed/%of started jobs/Average seconds to complete
shelob001:9/0/100%/0.0s 1
shelob001:8/1/100%/1.0s 5
shelob001:7/2/100%/0.5s 8
shelob001:6/3/100%/0.3s 2
shelob001:5/4/100%/0.2s 6
shelob001:4/5/100%/0.2s 9
shelob001:3/6/100%/0.2s 3
shelob001:2/7/100%/0.1s 4
shelob001:1/8/100%/0.1s 7
shelob001:0/9/100%/0.1s
Here is what confuses me, when I try to use a job number >=10, the number of jobs spawned is always one less than wanted, here I want to spawn 10, only started 9 jobs:
[fchen14#shelob001 ~]$ parallel -S shelob001 --progress echo ::: `seq 1 10` # I want to start 10 jobs
Computers / CPU cores / Max jobs to run
1:shelob001 / 16 / 9 #why here "Max jobs to run" is 9?
Computer:jobs running/jobs completed/%of started jobs/Average seconds to complete
shelob001:9/0/100%/0.0s 2
shelob001:9/1/100%/3.0s 1
shelob001:8/2/100%/1.5s 7
shelob001:7/3/100%/1.0s 4
shelob001:6/4/100%/0.8s 9
shelob001:5/5/100%/0.6s 8
shelob001:4/6/100%/0.5s 3
shelob001:3/7/100%/0.4s 5
shelob001:2/8/100%/0.4s 6
shelob001:1/9/100%/0.4s 10
shelob001:0/10/100%/0.4s
[fchen14#shelob001 ~]$ parallel -S shelob001 --progress echo ::: `seq 1 11`
Computers / CPU cores / Max jobs to run
1:shelob001 / 16 / 10 # it seems the jobs started is one less than I specified
Computer:jobs running/jobs completed/%of started jobs/Average seconds to complete
shelob001:10/0/100%/0.0s 1
shelob001:10/1/100%/3.0s 2
shelob001:9/2/100%/1.5s 8
shelob001:8/3/100%/1.0s 3
shelob001:7/4/100%/0.8s 4
shelob001:6/5/100%/0.6s 5
shelob001:5/6/100%/0.5s 7
shelob001:4/7/100%/0.4s 10
shelob001:3/8/100%/0.4s 9
shelob001:2/9/100%/0.3s 6
shelob001:1/10/100%/0.4s 11
shelob001:0/11/100%/0.4s
[fchen14#shelob001 ~]$
I checked the status of the compute node using "top", it does show that only 9 Cpus are used when I use seq 1 10. Hopefully I have made my problem clear, could anyone point out the possible cause of this problem? Any suggestion is welcome.
Thank you very much!
Looks like you found a bug. Workaround: -j+1

GNU Parallel timeout for process

I want to use GNU Parallel for this command:
seq -w 30 | parallel -k -j6 java -javaagent:build/libs/pddl4j-3.1.0.jar -server -Xms8048m -Xmx8048m fr.uga.pddl4j.planners.hsp.HSP -o pddl/benchmarks_STRIPS/benchmarks_STRIPS/ipc1/movie/domain.pddl -f pddl/benchmarks_STRIPS/benchmarks_STRIPS/ipc1/movie/p{}.pddl -i 8 '>>' AstarMovie.txt
I have a timeout of 600 seconds in the java program but parallel doesn't execute it. Processes can run for 2, 3, 4 or more hours and never stop.
I tried this command based on the GNU tutorial online, but it doesn't work either:
seq -w 30 | parallel -k --timeout 600000 -j6 java -javaagent:build/libs/pddl4j-3.1.0.jar -server -Xms2048m -Xmx2048m fr.uga.pddl4j.planners.hsp.HSP -o pddl/benchmarks_STRIPS/benchmarks_STRIPS/ipc1/movie/domain.pddl -f pddl/benchmarks_STRIPS/benchmarks_STRIPS/ipc1/movie/p{}.pddl -i 8 '>>' AstarMovie.txt
I saw in the tutorial that GNU Parallel uses milliseconds - so 600000 is 10 minutes which is what I need but after 12 minutes the process was still running. I need 6 processes to run at once for a maximum of 10 minutes each.
Any help would be great. Thanks.
EDIT:
Why do people feel the need to edit posts for small changes like '600seconds' to '600 seconds'? Stop doing it for karma..
The timeout for GNU Parallel is given in seconds, not milliseconds. You can test it with this snippet which waits for 15 seconds but with a timeout that cuts it off after 10 seconds:
time parallel --timeout 10 sleep {} ::: 15
real 0m10.961s
user 0m0.071s
sys 0m0.038s

Resources