How to use ulimit right - bash

I wanted to limit luakit to a maximum of 150mb virtual memory. Here is my shell script:
#!/bin/bash
#limit virtual memory to 150mb
ulimit -H -v 153600
while true
do
startx /usr/bin/luakit -U -- -s 0 dpms
done
But when memory usage goes above 150mb (in htop VIRT column) nothing happens.

Related

/tmp size fluctuating on Heroku

I noticed when checking the size of the tmp folder (df -h /tmp) that it's quite large ~70 GB and fluctuates a few GB every minute. I don't have anything in my application saving to the /tmp folder and am wondering what is going on? What is the "mask" file that appears under /tmp? Also, I was under the impression that a /tmp folder clears all contents when a dyno is restarted, it's ephemeral, so why is so much storage taken.
Below is the output from Heroku Bash, I ran df -h /tmp seconds apart.
~ $ df -h /tmp
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/evg0-evol0 376G 75G 282G 21% /tmp
~ $ df -h /tmp
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/evg0-evol0 376G 77G 280G 22% /tmp
~ $ cd /tmp
/tmp $ dir
mask

Run million of list in PBS with parallel tool

I've huge size(few million) job contain list and wants to run java written tool to perform the features comparison. This tool completes the calculation in
real 0m0.179s
user 0m0.005s
sys 0m0.000s sec
Running 5 nodes(each have 72 cpus) with pbs torque scheduler in the GNU parallel, tool runs fine and produces the results but as I set 72 jobs per node, it should run 72 x 5 jobs at a time but I can see only it runs 25-35 jobs!
Checking of cpu utilization on each node also shows low utilization.
I desire to run 72 X 5 jobs or more at a time and produce the results by utilizing all the available source (72 X 5 cpus).
As I mentioned have ~200 millions of job to run, I desire to complete it faster(1-2 hours) by using/increasing the number of nodes/cpus.
Current code, input and job state:
example.lst (it has ~300 million lines)
ZNF512-xxxx_2_N-THRA-xxtx_2_N
ZNF512-xxxx_2_N-THRA-xxtx_3_N
ZNF512-xxxx_2_N-THRA-xxtx_4_N
.......
cat job_script.sh
#!/bin/bash
#PBS -l nodes=5:ppn=72
#PBS -N job01
#PBS -j oe
#work dir
export WDIR=/shared/data/work_dir
cd $WDIR;
# use available 72 cpu in each node
export JOBS_PER_NODE=72
#gnu parallel command
parallelrun="parallel -j $JOBS_PER_NODE --slf $PBS_NODEFILE --wd $WDIR --joblog process.log --resume"
$parallelrun -a example.lst sh run_script.sh {}
cat run_script.sh
#!/bin/bash
# parallel command options
i=$1
data=/shared/TF_data
# create tmp dir and work in
TMP_DIR=/shared/data/work_dir/$i
mkdir -p $TMP_DIR
cd $TMP_DIR/
# get file name
mk=$(echo "$i" | cut -d- -f1-2)
nk=$(echo "$i" | cut -d- -f3-6)
#run a tool to compare the features of pair files
/shared/software/tool_v2.1/tool -s1 $data/inf_tf/$mk -s1cf $data/features/$mk-cf -s1ss $data/features/$mk-ss -s2 $data/inf_tf/$nk.pdb -s2cf $data/features/$nk-cf.pdb -s2ss $data/features/$nk-ss.pdb > $data/$i.out
# move output files
mv matrix.txt $data/glosa_tf/matrix/$mk"_"$nk.txt
mv ali_struct.pdb $data/glosa_tf/aligned/$nk"_"$mk.pdb
# move back and remove tmp dir
cd $TMP_DIR/../
rm -rf $TMP_DIR
exit 0
PBS submission
qsub job_script.sh
Login to one of the node : ssh ip-172-31-9-208
top - 09:28:03 up 15 min, 1 user, load average: 14.77, 13.44, 8.08
Tasks: 928 total, 1 running, 434 sleeping, 0 stopped, 166 zombie
Cpu(s): 0.1%us, 0.1%sy, 0.0%ni, 98.4%id, 1.4%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 193694612k total, 1811200k used, 191883412k free, 94680k buffers
Swap: 0k total, 0k used, 0k free, 707960k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
15348 ec2-user 20 0 16028 2820 1820 R 0.3 0.0 0:00.10 top
15621 ec2-user 20 0 169m 7584 6684 S 0.3 0.0 0:00.01 ssh
15625 ec2-user 20 0 171m 7472 6552 S 0.3 0.0 0:00.01 ssh
15626 ec2-user 20 0 126m 3924 3492 S 0.3 0.0 0:00.01 perl
.....
All of the nodes top shows the similar state and produces the results by running only ~26 at a time!
I've aws-parallelcluster contains 5 nodes(each have 72 cpus) with torque scheduler and GNU Parallel 2018, Mar 2018
Update
By introducing the new function that takes input on stdin and running the script in parallel works great and utilizes all the CPU in local machine.
However, when its runs over remote machines it produces a
parallel: Error: test.lst is neither a file nor a block device
MCVE:
A simple code that echoing list gives the same error while running it in remote machines but works great in local machine:
cat test.lst # contains list
DNMT3L-5yx2B_1_N-DNMT3L-5yx2B_2_N
DNMT3L-5yx2B_1_N-DNMT3L-6brrC_3_N
DNMT3L-5yx2B_1_N-DNMT3L-6f57B_2_N
DNMT3L-5yx2B_1_N-DNMT3L-6f57C_2_N
DNMT3L-5yx2B_1_N-DUX4-6e8cA_4_N
DNMT3L-5yx2B_1_N-E2F8-4yo2A_3_P
DNMT3L-5yx2B_1_N-E2F8-4yo2A_6_N
DNMT3L-5yx2B_1_N-EBF3-3n50A_2_N
DNMT3L-5yx2B_1_N-ELK4-1k6oA_3_N
DNMT3L-5yx2B_1_N-EPAS1-1p97A_1_N
cat test_job.sh # GNU parallel submission script
#!/bin/bash
#PBS -l nodes=1:ppn=72
#PBS -N test
#PBS -k oe
# introduce new function and Run from ~/
dowork() {
parallel sh test_work.sh {}
}
export -f dowork
parallel -a test.lst --env dowork --pipepart --slf $PBS_NODEFILE --block -10 dowork
cat test_work.sh # run/work script
#!/bin/bash
i=$1
data=pwd
#create temporary folder in current dir
TMP_DIR=$data/$i
mkdir -p $TMP_DIR
cd $TMP_DIR/
# split list
mk=$(echo "$i" | cut -d- -f1-2)
nk=$(echo "$i" | cut -d- -f3-6)
# echo list and save in echo_test.out
echo $mk, $nk >> $data/echo_test.out
cd $TMP_DIR/../
rm -rf $TMP_DIR
From your timing:
real 0m0.179s
user 0m0.005s
sys 0m0.000s sec
it seems the tool uses very little CPU power. When GNU Parallel runs local jobs it has an overhead of 10 ms CPU time per job. Your jobs use 179 ms time, and 5 ms CPU time. So GNU Parallel will be using quite a bit of the time spent.
The overhead is much worse when running jobs remotely. Here we are talking 10 ms + running an ssh command. This can easily be in the order of 100 ms.
So how can we minimize the number of ssh commands and how can spread the overhead over multiple cores?
First let us make a function that can take input on stdin and run the script - one job per CPU thread in parallel:
dowork() {
[...set variables here. that becomes particularly important we when run remotely...]
parallel sh run_script.sh {}
}
export -f dowork
Test that this actually works by running:
head -n 1000 example.lst | dowork
Then let us look at running jobs locally. This can be done similar to described here: https://www.gnu.org/software/parallel/man.html#EXAMPLE:-Running-more-than-250-jobs-workaround
parallel -a example.lst --pipepart --block -10 dowork
This will split example.lst into 10 blocks per CPU thread. So on a machine with 72 CPU threads this will make 720 blocks. It will the start 72 doworks and when one is done it will get another of the 720 blocks. The reason I choose 10 instead of 1 is if one of the jobs "get stuck" for a while, then you are unlikely to notice this.
This should make sure 100% of the CPUs on the local machine is busy.
If that works, we need to distribute this work to remote machines:
parallel -j1 -a example.lst --env dowork --pipepart --slf $PBS_NODEFILE --block -10 dowork
This should in total start 10 ssh per CPU thread (i.e. 5*72*10) - namely one for each block. With 1 running per server listed in $PBS_NODEFILE in parallel.
Unfortunately this means that --joblog and --resume will not work. There is currently no way to make that work, but if it is valuable to you contact me via parallel#gnu.org.
I am not sure what tool does. But if the copying takes most of the time and if tool only reads the files, then you might just be able symlink the files into $TMP_DIR instead of copying.
A good indication of whether you can do it faster is to look at top of the 5 machines in the cluster. If they are all using all cores at >90% then you cannot expect to get it faster.

Adjust ulimit [open file handle limit] on Mac OS X Sierra to run GATK tools

I'm trying to run the VariantsToBinaryPed tool from GATK3, but it seems that my system's 'open file handle limit' is too small for it to successfully run.
I've tried increasing the limit using ulimit, as shown below, but the command still fails.
The GATK command:
> java -jar GenomeAnalysisTK.jar \
-T VariantsToBinaryPed \
-R Homo_sapiens_assembly38.fasta \
-V ~/vcf/snp.indel.recal.splitMA_norm.vcf.bgz\
-m ~/03_IdentityCheck/KING/targeted_seq_ped_clean.fam\
-bed output.bed\
-bim output.bim\
-fam output.fam\
--minGenotypeQuality 0
Returns this error:
ERROR MESSAGE: An error occurred because there were too many files
open concurrently; your system's open file handle limit is probably too small.
See the unix ulimit command to adjust this limit or
ask your system administrator for help.
Following the advice given here, I ran:
echo kern.maxfiles=65536 | sudo tee -a /etc/sysctl.conf
echo kern.maxfilesperproc=65536 | sudo tee -a /etc/sysctl.conf
sudo sysctl -w kern.maxfiles=65536
sudo sysctl -w kern.maxfilesperproc=65536
sudo ulimit -n 65536 65536
and added this line to my .bash_profile and sourced it:
ulimit -n 65536 65536
So that now, when I run ulimit -n, I get:
65536
However, I still get the same error from GATK:
ERROR MESSAGE: An error occurred because there were too many files
open concurrently; your system's open file handle limit is probably too small.
See the unix ulimit command to adjust this limit or
ask your system administrator for help.
Is there anything else I can do to avoid this error?

Can't fork process: Resource temporarily unavailable issue

I've got a problems with using terminal on macOS 10.12.3 on Mac mini
When I try to run any command I get the following message:
can't fork process: Resource temporarily unavailable.
I have already had such problem. Last time I was able to fix it - increase the number of process and my system looked like:
sysctl -a | grep maxproc
kern.maxproc: 2048
kern.maxprocperuid: 2048
ulimit -a
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
file size (blocks, -f) unlimited
max locked memory (kbytes, -l) unlimited
max memory size (kbytes, -m) unlimited
open files (-n) 65536
pipe size (512 bytes, -p) 1
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes (-u) 2048
virtual memory (kbytes, -v) unlimited
I thought that the problem was solved, but now I had that issue once again.
I was able to solve this problem this time with reboot - but it's strongly undesireable to reboot my mac every time.
Do you have any advise how to fix that problem once and for all?
No idea why your question was downvoted…
sudo sysctl kern.tty.ptmx_max=255 (or 511, or whatever) should fix it.
My default (in El Capitan) was 127. (As a tmux user, I need
more than that.)
To learn more:
sysctl -a | grep max
ulimit -a
launchctl limit
cat /private/etc/launchd.conf
cat /private/etc/sysctl.conf
man 8 sysctl
Rebooting solves the issue. In my case, I opened Activity Monitor and found way too many processes for "Outlook WebContent". I quit MS Outlook and voila, the terminal reopened and that error disappeared.
A short-term solution (besides rebooting) is to quit unused apps or close unused documents or projects.
I had a similar experience and it turned out I had a cronjob spawning every minute and not finishing. If you can free up enough resources to run the below command, you can see what is repeatedly spawning.
ps -e | awk '{print $4" "$5" "$6}' | sort | uniq -c | sort -n
For me the last few lines were this:
15 /Applications/Google Chrome.app/Contents/Frameworks/Google Chrome
36 (find)
1184 (bash)
1220 (cron)
1221 /usr/sbin/cron
My answer was easy. crontab -e and remove the offending line. Then a reboot cleared all the zombie crons.

How to disable core dump in linux

I'm trying to disable core dumps for my application, I changed ulimit -c 0
But whenever I am trying to attach to the process with gdb using gdb --pid=<pid> then gcore I am still getting the core dump for that application. I'm using bash:
-bash-3.2$ ulimit -a
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 65600
max locked memory (kbytes, -l) 50000000
max memory size (kbytes, -m) unlimited
open files (-n) 131072
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 10240
cpu time (seconds, -t) unlimited
max user processes (-u) 131072
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
-bash-3.2$ ps -ef | grep top
oracle 8951 8879 0 Mar05 ? 00:01:44 /u01/R122_EBS/fs1/FMW_Home/jrockit32 jre/bin/java -classpath /u01/R122_EBS/fs1/FMW_Home/webtier/opmn/lib/wlfullclient.jar:/u01/R122_EBS/fs1/FMW_Home/Oracle_EBS-app1/shared-libs/ebs-appsborg/WEB-INF/lib/ebsAppsborgManifest.jar:/u01/R122_EBS/fs1/EBSapps/comn/java/classes -mx256m oracle.apps.ad.tools.configuration.RegisterWLSListeners -appsuser APPS -appshost rws3510293 -appsjdbcconnectdesc jdbc:oracle:thin:#(DESCRIPTION=(ADDRESS_LIST=(LOAD_BALANCE=YES)(FAILOVER=YES)(ADDRESS=(PROTOCOL=tcp)(HOST=rws3510293.us.oracle.com)(PORT=1521)))(CONNECT_DATA=(SERVICE_NAME=rahulshr))) -adtop /u01/R122_EBS/fs1/EBSapps/appl/ad/12.0.0 -wlshost rws3510293 -wlsuser weblogic -wlsport 7001 -dbsid rahulshr -dbhost rws3510293 -dbdomain us.oracle.com -dbport 1521 -outdir /u01/R122_EBS/fs1/inst/apps/rahulshr_rws3510293/appltmp/Tue_Mar_5_00_42_52_2013 -log /u01/R122_EBS/fs1/inst/apps/rahulshr_rws3510293/logs/appl/rgf/Tue_Mar_5_00_42_52_2013/adRegisterWLSListeners.log -promptmsg hide -contextfile /u01/R122_EBS/fs1/inst/apps/rahulshr_rws3510293/appl/admin/rahulshr_rws3510293.xml
oracle 23694 22895 0 Mar05 pts/0 00:00:00 top
oracle 26235 22895 0 01:51 pts/0 00:00:00 grep top
-bash-3.2$ gcore
usage: gcore [-o filename] pid
-bash-3.2$ gcore 23694
0x000000355cacbfe8 in tcsetattr () from /lib64/libc.so.6
Saved corefile core.23694
[2]+ Stopped top
-bash-3.2$ ls -l
total 2384
-rw-r--r-- 1 oracle dba 2425288 Mar 6 01:52 core.23694
drwxr----- 3 oracle dba 4096 Mar 5 03:32 oradiag_oracle
-rwxr-xr-x 1 oracle dba 20 Mar 5 04:06 test.sh
-bash-3.2$
The gcore command in gdb is not using the Linux core file dumping code in the kernel. It is walking the memory itself, and writing out a binary file in the same format as a process core file. This is apparent since the process is still active after issuing gcore, while if Linux was dumping the core file, the process would have been terminated.

Resources