Ruby infinite loop causes 100% cpu load - ruby

I implemented some code, which runs in a loop:
loop do
..
end
In that loop, I handle keypresses with Curses library. If I press N and entered something, I start a new Thread, which counts time( loop do .. end again)
The question is, why loop or while true causes 100% cpu load on one of the cpu cores? Is the problem actaully in loop?
Is there a way to do infinite loop with lower cpu consumption in ruby?
The full sources available here
UPD - Strace
$ strace -c -p 5480
Process 5480 attached - interrupt to quit
^CProcess 5480 detached
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
51.52 0.002188 0 142842 ioctl
24.21 0.001028 0 71421 select
14.22 0.000604 0 47614 gettimeofday
10.05 0.000427 0 47614 rt_sigaction
0.00 0.000000 0 25 write
0.00 0.000000 0 16 futex
------ ----------- ----------- --------- --------- ----------------
100.00 0.004247 309532 total

After some thinking and suggestions from user2246674 I managed to resolve the issue. It was not inside the threads, it was the main loop.
I had such code inside the main loop:
c = Curses.getch
unless c.nil?
# input handling
After adding sleep 1 to else problem was resolved. It does nothing when there's no input from Curses, then checks again in one second, and this stops it from actively polling STDIN and generating high CPU load

Related

samtools calmd is pretty slow

I am using "samtools calmd" to add MD tag back to BAM file. The size of original BAM is around 50Gb (whole genome sequence by using pacbio HIFI reads). The issue that I encountered is that the speed of "calmd" is incredibly slow! The jobs have already run 12 hours, and only 600MB BAM with MD tag are generated. In this way, 50GB BAM will take 30days to be finished!
Here is the code I used to add MD tag (very normal):
rule addMDTag:
input:
rules.pbmm2_alignment.output
output:
strBAMDir + "/pbmm2/v37/{wcReadsType}/Tmp/rawReads{readsIndex}.MD.bam"
params:
ref = strRef
threads:
16
log:
strBAMDir + "/pbmm2/v37/{wcReadsType}/Log/rawReads{readsIndex}.MD.log"
benchmark:
strBAMDir + "/pbmm2/v37/{wcReadsType}/Benchmark/rawReads{readsIndex}.MD.benchmark.txt"
shell:
"samtools calmd -# {threads} {input} {params.ref} -bAr > {output}"
The version of samtools I used is v1.10.
BTW, I use 16 cores to run calmd, however, it looks like the samtools is still using 1 core to run it:
top - 11:44:53 up 47 days, 20:35, 1 user, load average: 2.00, 2.01, 2.00
Tasks: 1723 total, 3 running, 1720 sleeping, 0 stopped, 0 zombie
Cpu(s): 2.8%us, 0.3%sy, 0.0%ni, 96.8%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 529329180k total, 232414724k used, 296914456k free, 84016k buffers
Swap: 12582908k total, 74884k used, 12508024k free, 227912476k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
93137 lix33 20 0 954m 151m 2180 R 100.2 0.0 659:04.13 samtools
May I know how to make calmd be much faster? Or is there any other tool that can do the same job more efficiently?
Thanks so much
After the collaboration with samtools maintenance team, this issue has been solved.
The calmd will be super slow if the bam was unsorted. Therefore, always make sure the BAM has been sorted before run calmd.
See the details below:
Are your files name sorted, and does your reference have more than one entry?
If so calmd will be switching between references all the time,
which means it may be doing a lot of reference loading and not much MD calculation.
You may find it goes a lot faster if you position-sort the input, and then run it through calmd.

strace'ing/profiling a bash script

I'm currently trying to benchmark a bash script in 4 different versions. Each one does a giant rsync job and it usually takes a very long time to finish. There are many steps in the bash script which involves setting up and tearing down the environment to rsync to.
However, when I ran strace on the bash scripts, I get surprisingly short results, which leads me to believe that strace is not actually tracing the time waiting for a command like rsync(which might be spawned in a subshell and is completely not recorded by rsync), or, it's waking up intermittently and sleep for another amount of time of which strace is not counting. Here's a snippet:
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
99.98 12.972555 120116 108 52 wait4
0.01 0.000751 13 56 clone
0.00 0.000380 1 553 rt_sigprocmask
0.00 0.000303 2 197 85 stat
0.00 0.000274 2 134 read
0.00 0.000223 19 12 open
0.00 0.000190 48 4 getdents
0.00 0.000110 1 82 8 close
0.00 0.000110 1 153 rt_sigaction
0.00 0.000084 1 61 getegid
0.00 0.000074 4 19 write
So what tools can I use that are similar to strace, OR, maybe I'm missing some type of recursive flag in strace to find out correctly where my bash script is waiting on?
I would like something along the lines of:
% time command
------ --------
... rsync
... ls
Any suggestions would be appreciated. Thank you!

Disable timer interrupt on Linux kernel

I want to disable the timer interrupt on some of the cores (1-2) on my machine which is a x86 running centos 7 with rt patch, both cores are isolated cores with nohz_full, (you can see the cmdline) but timer interrupt continues to interrupt the real time process which are running on core1 and core2.
1. uname -r
3.10.0-693.11.1.rt56.632.el7.x86_64
2. cat /proc/cmdline
BOOT_IMAGE=/vmlinuz-3.10.0-693.11.1.rt56.632.el7.x86_64 \
root=/dev/mapper/centos-root ro crashkernel=auto \
rd.lvm.lv=centos/root rd.lvm.lv=centos/swap rhgb quiet \
default_hugepagesz=2M hugepagesz=2M hugepages=1024 \
intel_iommu=on isolcpus=1-2 irqaffinity=0 intel_idle.max_cstate=0 \
processor.max_cstate=0 idle=mwait tsc=perfect rcu_nocbs=1-2 rcu_nocb_poll \
nohz_full=1-2 nmi_watchdog=0
3. cat /proc/interrupts
CPU0 CPU1 CPU2
0: 29 0 0 IO-APIC-edge timer
.....
......
NMI: 0 0 0 Non-maskable interrupts
LOC: 835205157 308723100 308384525 Local timer interrupts
SPU: 0 0 0 Spurious interrupts
PMI: 0 0 0 Performance monitoring interrupts
IWI: 0 0 0 IRQ work interrupts
RTR: 0 0 0 APIC ICR read retries
RES: 347330843 309191325 308417790 Rescheduling interrupts
CAL: 0 935 935 Function call interrupts
TLB: 320 22 58 TLB shootdowns
TRM: 0 0 0 Thermal event interrupts
THR: 0 0 0 Threshold APIC interrupts
DFR: 0 0 0 Deferred Error APIC interrupts
MCE: 0 0 0 Machine check exceptions
MCP: 2 2 2 Machine check polls
CPUs/Clocksource:
4. lscpu | grep CPU.s
CPU(s): 3
On-line CPU(s) list: 0-2
NUMA node0 CPU(s): 0-2
5. cat /sys/devices/system/clocksource/clocksource0/current_clocksource
tsc
Thanks a lot for any help.
Moses
Even with nohz_full= you get some ticks on the isolated CPUs:
Some process-handling operations still require the occasional scheduling-clock tick. These operations include calculating CPU load, maintaining sched average, computing CFS entity vruntime, computing avenrun, and carrying out load balancing. They are currently accommodated by scheduling-clock tick every second or so. On-going work will eliminate the need even for these infrequent scheduling-clock ticks.
(Documentation/timers/NO_HZ.txt, cf. (Nearly) full tickless operation in 3.10 LWN, 2013)
Thus, you have to check the rate of the local timer, e.g.:
$ perf stat -a -A -e irq_vectors:local_timer_entry sleep 120
(while your isolated threads/processes are running)
Also, nohz_full= is only effective if there is just one runnable task on each isolated core. You can check that with e.g. ps -L -e -o pid,tid,user,state,psr,cmd and cat /proc/sched_debug.
Perhaps you need to move some (kernel) tasks to you house-keeping core, e.g.:
# tuna -U -t '*' -c 0-4 -m
You can get more insights into what timers are still active by looking at /proc/timer_list.
Another method to investigate causes for possible interruption is to use the functional tracer (ftrace). See also Reducing OS jitter due to per-cpu kthreads for some examples.
I see nmi_watchdog=0 in your kernel parameters, but you don't disable the soft watchdog. Perhaps this is another timer tick source that would show up with ftrace.
You can disable all watchdogs with nowatchdog.
Btw, some of your kernel parameters seem to be off:
tsc=perfect - do you mean tsc=reliable? The 'perfect' value isn't documented in the kernel docs
idle=mwait - do you mean idle=poll? Again, I can't find the 'mwait' value in the kernel docs
intel_iommu=on - what's the purpose of this?

How to obtain the virtual private memory of a process from the command line under OSX?

I would like to obtain the virtual private memory consumed by a process under OSX from the command line. This is the value that Activity Monitor reports in the "Virtual Mem" column. ps -o vsz reports the total address space available to the process and is therefore not useful.
You can obtain the virtual private memory use of a single process by running
top -l 1 -s 0 -i 1 -stats vprvt -pid PID
where PID is the process ID of the process you are interested in. This results in about a dozen lines of output ending with
VPRVT
55M+
So by parsing the last line of output, one can at least obtain the memory footprint in MB. I tested this on OSX 10.6.8.
update
I realized (after I got downvoted) that #user1389686 gave an answer in the comment section of the OP that was better than my paltry first attempt. What follows is based on user1389686's own answer. I cannot take credit for it -- I've just cleaned it up a bit.
original, edited with -stats vprvt
As Mahmoud Al-Qudsi mentioned, top does what you want. If PID 8631 is the process you want to examine:
$ top -l 1 -s 0 -stats vprvt -pid 8631
Processes: 84 total, 2 running, 82 sleeping, 378 threads
2012/07/14 02:42:05
Load Avg: 0.34, 0.15, 0.04
CPU usage: 15.38% user, 30.76% sys, 53.84% idle
SharedLibs: 4668K resident, 4220K data, 0B linkedit.
MemRegions: 15160 total, 961M resident, 25M private, 520M shared.
PhysMem: 917M wired, 1207M active, 276M inactive, 2400M used, 5790M free.
VM: 171G vsize, 1039M framework vsize, 1523860(0) pageins, 811163(0) pageouts.
Networks: packets: 431147/140M in, 261381/59M out.
Disks: 487900/8547M read, 2784975/40G written.
VPRVT
8631
Here's how I get at this value using a bit of Ruby code:
# Return the virtual memory size of the current process
def virtual_private_memory
s = `top -l 1 -s 0 -stats vprvt -pid #{Process.pid}`.split($/).last
return nil unless s =~ /\A(\d*)([KMG])/
$1.to_i * case $2
when "K"
1000
when "M"
1000000
when "G"
1000000000
else
raise ArgumentError.new("unrecognized multiplier in #{f}")
end
end
Updated answer, thats work under Yosemite, from user1389686:
top -l 1 -s 0 -stats mem -pid PID

Linpack sometimes starting, sometimes not, but nothing changed

I installed Linpack on a 2-Node cluster with Xeon processors. Sometimes if I start Linpack with this command:
mpiexec -np 28 -print-rank-map -f /root/machines.HOSTS ./xhpl_intel64
linpack starts and prints the output, sometimes I only see the mpi mappings printed and then nothing following. To me this seems like random behaviour because I don't change anything between the calls and as already mentioned, Linpack sometimes starts, sometimes not.
In top I can see that xhpl_intel64processes have been created and they are heavily using the CPU but when watching the traffic between the nodes, iftop is telling me that it nothing is sent.
I am using MPICH2 as MPI implementation. This is my HPL.dat:
# cat HPL.dat
HPLinpack benchmark input file
Innovative Computing Laboratory, University of Tennessee
HPL.out output file name (if any)
6 device out (6=stdout,7=stderr,file)
1 # of problems sizes (N)
10000 Ns
1 # of NBs
250 NBs
0 PMAP process mapping (0=Row-,1=Column-major)
1 # of process grids (P x Q)
2 Ps
14 Qs
16.0 threshold
1 # of panel fact
2 PFACTs (0=left, 1=Crout, 2=Right)
1 # of recursive stopping criterium
4 NBMINs (>= 1)
1 # of panels in recursion
2 NDIVs
1 # of recursive panel fact.
1 RFACTs (0=left, 1=Crout, 2=Right)
1 # of broadcast
1 BCASTs (0=1rg,1=1rM,2=2rg,3=2rM,4=Lng,5=LnM)
1 # of lookahead depth
1 DEPTHs (>=0)
2 SWAP (0=bin-exch,1=long,2=mix)
64 swapping threshold
0 L1 in (0=transposed,1=no-transposed) form
0 U in (0=transposed,1=no-transposed) form
1 Equilibration (0=no,1=yes)
8 memory alignment in double (> 0)
edit2:
I now just let the program run for a while and after 30min it tells me:
# mpiexec -np 32 -print-rank-map -f /root/machines.HOSTS ./xhpl_intel64
(node-0:0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15)
(node-1:16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31)
Assertion failed in file ../../socksm.c at line 2577: (it_plfd->revents & 0x008) == 0
internal ABORT - process 0
APPLICATION TERMINATED WITH THE EXIT STRING: Hangup (signal 1)
Is this a mpi problem?
Do you know what type of problem this could be?
I figured out what the problem was: MPICH2 uses different random ports each time it starts and if these are blocked your application wont start up correctly.
The solution for MPICH2 is to set the environment variable MPICH_PORT_RANGE to START:END, like this:
export MPICH_PORT_RANGE=50000:51000
Best,
heinrich

Resources