I have read that the STATUS INACTIVE is harmless and not locking any further queries. but at the same time the STATE says WAITING. And my program is actually keeps waiting for a query to execute. I do not have privilage to Kill any queries.
So how can this queries get terminated? Or how to check the max time out for these queries?. Also can anyone tell me how much time is this ? 8147476214 is it milliseconds ? i get insane hours if devided by 60 3 times
WAIT_TIME_MICRO TIME_REMAINING_MICRO SID SERIAL# STATUS STATE
8147476214 -1 363 17919 INACTIVE WAITING
8147060408 -1 600 30087 INACTIVE WAITING
1288463255 -1 1522 31615 INACTIVE WAITING
1288463255 -1 1433 19943 INACTIVE WAITING
1288463239 -1 204 3751 INACTIVE WAITING
1288463217 -1 203 17423 INACTIVE WAITING
1288463217 -1 300 25717 INACTIVE WAITING
1288463204 -1 703 12277 INACTIVE WAITING
1288463193 -1 361 10209 INACTIVE WAITING
1288463144 -1 457 35447 INACTIVE WAITING
1288082277 -1 102 49705 INACTIVE WAITING
1288082274 -1 216 14829 INACTIVE WAITING
1288082274 -1 58 62561 INACTIVE WAITING
1288082273 -1 308 11485 INACTIVE WAITING
1288082273 -1 497 45153 INACTIVE WAITING
1288082258 -1 1289 15497 INACTIVE WAITING
1288082243 -1 1526 48083 INACTIVE WAITING
1288082224 -1 992 34411 INACTIVE WAITING
300678 -1 451 11433 ACTIVE WAITING
261893 -1 304 30921 ACTIVE WAITING
241839 -1 50 32641 ACTIVE WAITING
228894 -1 11 42861 ACTIVE WAITING
79480 -1 996 19657 ACTIVE WAITING
40677 -1 896 28523 ACTIVE WAITING
25609 -1 845 55041 ACTIVE WAITING
12517 -1 115 25819 ACTIVE WAITING
2 54 47749 ACTIVE WAITED SHORT TIME
2 1283 50745 ACTIVE WAITED SHORT TIME
0 -1 548 51351 ACTIVE WAITING
0 0 405 44925 ACTIVE WAITING
Related
I am trying to configure ntp server on centos 7 instead of ip address its showing INIT while executing ntpq -p command.
my ntp server configuration file.
/etc/ntp.conf
driftfile /var/lib/ntp/drift
restrict default nomodify notrap nopeer noquery
restrict 127.0.0.1
restrict ::1
# Hosts on local network are less restricted.
Allow 10.0.3.0/24 network clients to syncronize time with this server
restrict 10.0.3.0 mask 255.255.255.0 nomodify notrap
# Use public servers from the pool.ntp.org project.
# Please consider joining the pool (http://www.pool.ntp.org/join.html).
server 2.in.pool.ntp.org
server 1.asia.pool.ntp.org
server 3.asia.pool.ntp.org
includefile /etc/ntp/crypto/pw
keys /etc/ntp/keys
disable monitor
I have have following outputs while executing ntpq -q command
[centos#ip-10-0-3-53 etc]$ ntpq -p
remote refid st t when poll reach delay offset jitter
==============================================================================
ntp.your.org .INIT. 16 u - 128 0 0.000 0.000 0.000
ns3.weiszhostin .INIT. 16 u - 128 0 0.000 0.000 0.000
resolver2.skyfi .INIT. 16 u - 128 0 0.000 0.000 0.000
ntp2.wiktel.com .INIT. 16 u - 128 0 0.000 0.000 0.000
[spryiq#ip-10-0-3-53 etc]$
Port 123 is listening correctly
[centos#ip-10-0-3-53 etc]$ netstat -plnu | grep 123
(Not all processes could be identified, non-owned process info
will not be shown, you would have to be root to see it all.)
udp 0 0 10.0.3.53:123 0.0.0.0:* -
udp 0 0 127.0.0.1:123 0.0.0.0:* -
udp 0 0 0.0.0.0:123 0.0.0.0:* -
udp6 0 0 fe80::10c9:ecff:fe9:123 :::* -
udp6 0 0 ::1:123 :::* -
udp6 0 0 :::123 :::* -
[centos#ip-10-0-3-53 etc]$ ntpstat
timeout
[centos#ip-10-0-3-53 etc]$ ntpq
ntpq> as
ind assid status conf reach auth condition last_event cnt
===========================================================
1 1255 8011 yes no none reject mobilize 1
2 1256 8011 yes no none reject mobilize 1
3 1257 8011 yes no none reject mobilize 1
4 1258 8011 yes no none reject mobilize 1
I have configured iptables correctly.instead of INIT it should display ip addresses How can i troubleshoot this problem??
Problems:
More and more data nodes become bad health in Cloudera Manager.
Clue1:
no any task or job, just an idle data node here,
top
-bash-4.1$ top
top - 18:27:22 up 4:59, 3 users, load average: 4.55, 3.52, 3.18
Tasks: 139 total, 1 running, 137 sleeping, 1 stopped, 0 zombie
Cpu(s): 14.8%us, 85.2%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 7932720k total, 1243372k used, 6689348k free, 52244k buffers
Swap: 6160376k total, 0k used, 6160376k free, 267228k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
13766 root 20 0 2664m 21m 7048 S 85.4 0.3 190:34.75 java
17688 root 20 0 2664m 19m 7048 S 75.5 0.3 1:05.97 java
12765 root 20 0 2859m 21m 7140 S 36.9 0.3 133:25.46 java
2909 mapred 20 0 1894m 113m 14m S 1.0 1.5 2:55.26 java
1850 root 20 0 1469m 62m 4436 S 0.7 0.8 2:54.53 python
1332 root 20 0 50000 3000 2424 S 0.3 0.0 0:12.04 vmtoolsd
2683 hbase 20 0 1927m 152m 18m S 0.3 2.0 0:36.64 java
Clue2:
-bash-4.1$ ps -ef|grep 13766
root 13766 1850 99 16:01 ? 03:12:54 java -classpath /usr/share/cmf/lib/agent-4.6.3.jar com.cloudera.cmon.agent.DnsTest
Clue3:
in cloudera-scm-agent.log,
[30/Aug/2013 16:01:58 +0000] 1850 Monitor-HostMonitor throttling_logger ERROR Timeout with args ['java', '-classpath', '/usr/share/cmf/lib/agent-4.6.3.jar', 'com.cloudera.cmon.agent.DnsTest']
None
[30/Aug/2013 16:01:58 +0000] 1850 Monitor-HostMonitor throttling_logger ERROR Failed to collect java-based DNS names
Traceback (most recent call last):
File "/usr/lib64/cmf/agent/src/cmf/monitor/host/dns_names.py", line 53, in collect
result, stdout, stderr = self._subprocess_with_timeout(args, self._poll_timeout)
File "/usr/lib64/cmf/agent/src/cmf/monitor/host/dns_names.py", line 42, in _subprocess_with_timeout
return SubprocessTimeout().subprocess_with_timeout(args, timeout)
File "/usr/lib64/cmf/agent/src/cmf/monitor/host/subprocess_timeout.py", line 70, in subprocess_with_timeout
raise Exception("timeout with args %s" % args)
Exception: timeout with args ['java', '-classpath', '/usr/share/cmf/lib/agent-4.6.3.jar', 'com.cloudera.cmon.agent.DnsTest']
"cloudera-scm-agent.log" line 30357 of 30357 --100%-- col 1
Backgrouds:
if I restart all nodes, then everythings are OK, but after half and hour or more, bad health is coming one by one.
Version: Cloudera Standard 4.6.3 (#192 built by jenkins on 20130812-1221 git: fa61cf8559fbefeb5af7f223fd02164d1a0adfdb)
I added all nodes in /etc/hosts
the installed CDH is 4.3.1.
in fact, these nodes are VMs with fixed IP address.
Any suggestions?
BTW, where can I download source code of com.cloudera.cmon.agent.DnsTest?
I writing a data mover in go. Taking data located in one data center and moving it to another data center. Figured go would be perfect for this given the go routines.
I notice if I have one program running 1800 threads the amount of data being transmitted is really low
here's the dstat print out averaged over 30 seconds
---load-avg--- ----total-cpu-usage---- -dsk/total- -net/total- ---paging-- ---system--
1m 5m 15m |usr sys idl wai hiq siq| read writ| recv send| in out | int csw
0.70 3.58 4.42| 10 1 89 0 0 0| 0 156k|7306k 6667k| 0 0 | 11k 6287
0.61 3.28 4.29| 12 2 85 0 0 1| 0 6963B|8822k 8523k| 0 0 | 14k 7531
0.65 3.03 4.18| 12 2 86 0 0 1| 0 1775B|8660k 8514k| 0 0 | 13k 7464
0.67 2.81 4.07| 12 2 86 0 0 1| 0 1638B|8908k 8735k| 0 0 | 13k 7435
0.67 2.60 3.96| 12 2 86 0 0 1| 0 819B|8752k 8385k| 0 0 | 13k 7445
0.47 2.37 3.84| 11 2 86 0 0 1| 0 2185B|8740k 8491k| 0 0 | 13k 7548
0.61 2.22 3.74| 10 2 88 0 0 0| 0 1229B|7122k 6765k| 0 0 | 11k 6228
0.52 2.04 3.63| 3 1 97 0 0 0| 0 546B|1999k 1365k| 0 0 |3117 2033
If I run 9 instances of the program with 200 threads each I see much better performance
---load-avg--- ----total-cpu-usage---- -dsk/total- -net/total- ---paging-- ---system--
1m 5m 15m |usr sys idl wai hiq siq| read writ| recv send| in out | int csw
8.34 9.56 8.78| 53 8 36 0 0 3| 0 410B| 38M 32M| 0 0 | 41k 26k
8.01 9.37 8.74| 74 10 12 0 0 4| 0 137B| 51M 51M| 0 0 | 59k 39k
8.36 9.31 8.74| 75 9 12 0 0 4| 0 1092B| 51M 51M| 0 0 | 59k 39k
6.93 8.89 8.62| 74 10 12 0 0 4| 0 5188B| 50M 49M| 0 0 | 59k 38k
7.09 8.73 8.58| 75 9 12 0 0 4| 0 410B| 51M 50M| 0 0 | 60k 39k
7.40 8.62 8.54| 75 9 12 0 0 4| 0 137B| 52M 49M| 0 0 | 61k 40k
7.96 8.63 8.55| 75 9 12 0 0 4| 0 956B| 51M 51M| 0 0 | 59k 39k
7.46 8.44 8.49| 75 9 12 0 0 4| 0 273B| 51M 50M| 0 0 | 58k 38k
8.08 8.51 8.51| 75 9 12 0 0 4| 0 410B| 51M 51M| 0 0 | 59k 39k
load average is a little high but I'll worry about that later. The network traffic though is almost hitting the network potential.
I'm on Ubuntu 12.04,
8 Gigs Ram,
2.3 GHz processors (says EC2 :P)
Also, I've increased my file descriptors from 1024 to 10240
I thought go was designed for this kind of thing or am I expecting too much of go for this application?
Is there something trivial that I'm missing? Do I need to configure my system to maximizes go's potential?
EDIT
I guess my question wasn't clear enough. Sorry. I'm not asking for magic from go, I know the computers have limitations to what they can handle.
So I'll rephrase. Why is 1 instance with 1800 go routines != 9 instances with 200 threads each? Same amount of go routines significantly less performance for 1 instance compared to 9 instances.
Please note, that goroutines are also limited to your local maschine and that channels are not natively network enabled, i.e. your particular case is probably not biting go's chocolate site.
Also: What did you expect from throwing (suposedly) every transfer into a goroutine? IO-Operations tend to have their bottleneck where the bits hit the metal, i.e. the physical transfer of the data to the medium. Think of it like that: No matter how many Threads or (Goroutines in this case) try to write to Networkcard, you still only have one Networkcard. Most likely hitting it with to many concurrent write calls will only slow things down, since the involved overhead increases
If you think this is not the problem or want to audit your code for optimized performance, go has neat builtin features to do so: profiling go programs (official go blog)
But still the actual bottleneck might well be outside your go program AND/OR in the way it interacts with the os.
Adressing your actual problem without code is pointless guessing. Post some and everyone will try their best to help you.
You will probably have to post your source code to get any real input, but just to be sure, you have increased number of cpus to use?
import "runtime"
func main() {
runtime.GOMAXPROCS(runtime.NumCPU())
}
When reading /proc/stat, I get these return values:
cpu 20582190 643 1606363 658948861 509691 24 112555 0 0 0
cpu0 3408982 106 264219 81480207 19354 0 35 0 0 0
cpu1 3395441 116 265930 81509149 11129 0 30 0 0 0
cpu2 3411003 197 214515 81133228 418090 0 1911 0 0 0
cpu3 3478358 168 257604 81417703 30421 0 29 0 0 0
cpu4 1840706 20 155376 83328751 1564 0 7 0 0 0
cpu5 1416488 15 171101 83410586 1645 13 108729 0 0 0
cpu6 1773002 7 133686 83346305 25666 10 1803 0 0 0
cpu7 1858207 10 143928 83322929 1819 0 8 0 0 0
Some sources state to read only the first four values to calculate CPU usage, while some sources say to read all the values.
Do I read only the first four values to calculate CPU utilization; the values user, nice, system, and idle? Or do I need all the values? Or not all, but more than four? Would I need iowait, irq, or softirq?
cpu 20582190 643 1606363
Versus the entire line.
cpu 20582190 643 1606363 658948861 509691 24 112555 0 0 0
Edits: Some sources also state that iowait is added into idle.
When calculating a specific process' CPU usage, does the method differ?
The man page states that it varies with architecture, and also gives a couple of examples describing how they are different:
In Linux 2.6 this line includes three additional columns: ...
Since Linux 2.6.11, there is an eighth column, ...
Since Linux 2.6.24, there is a ninth column, ...
When "some people said to only use..." they were probably not taking these into account.
Regarding whether the calculation differs across CPUs: You will find lines related to "cpu", "cpu0", "cpu1", ... in /proc/stat. The "cpu" fields are all aggregates (not averages) of corresponding fields for the individual CPUs. You can check that for yourself with a simple awk one-liner.
cpu 84282 747 20805 1615949 44349 0 308 0 0 0
cpu0 26754 343 9611 375347 27092 0 301 0 0 0
cpu1 12707 56 2581 422198 5036 0 1 0 0 0
cpu2 33356 173 6160 394561 7508 0 4 0 0 0
cpu3 11464 174 2452 423841 4712 0 1 0 0 0
percentage of memory used used by a process.
normally prstat -J will give the memory of process image and RSS(resident set size) etc.
how do i knowlist of processes with percentage of memory is used by a each process.
i am working on solaris unix.
addintionally ,what are the regular commands that you use for monitoring processes,performences of processes that might be very useful to all!
The top command will give you several memory-consumption numbers. htop is much nicer, and will give you percentages, but it isn't installed by default on most systems.
run
top and then Shift+O this will bring you to the options, press n (this maybe different on your machine) for memory and then hit enter
Example of memory sort.
top - 08:17:29 up 3 days, 8:54, 6 users, load average: 13.98, 14.01, 11.60
Tasks: 654 total, 2 running, 652 sleeping, 0 stopped, 0 zombie
Cpu(s): 14.7%us, 1.5%sy, 0.0%ni, 59.5%id, 23.5%wa, 0.1%hi, 0.8%si, 0.0%st
Mem: 65851896k total, 49049196k used, 16802700k free, 1074664k buffers
Swap: 50331640k total, 0k used, 50331640k free, 32776940k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
21635 oracle 15 0 6750m 636m 51m S 1.6 1.0 62:34.53 oracle
21623 oracle 15 0 6686m 572m 53m S 1.1 0.9 61:16.95 oracle
21633 oracle 16 0 6566m 445m 235m S 3.7 0.7 30:22.60 oracle
21615 oracle 16 0 6550m 428m 220m S 3.7 0.7 29:36.74 oracle
16349 oracle RT 0 431m 284m 41m S 0.5 0.4 2:41.08 ocssd.bin
17891 root RT 0 139m 118m 40m S 0.5 0.2 41:08.19 osysmond
18154 root RT 0 182m 98m 43m S 0.0 0.2 10:02.40 ologgerd
12211 root 15 0 1432m 84m 14m S 0.0 0.1 17:57.80 java
Another method on Solaris is to do the following
prstat -s size 1 1
Example prstat output
www004:/# prstat -s size 1 1
PID USERNAME SIZE RSS STATE PRI NICE TIME CPU PROCESS/NLWP
420 nobody 139M 60M sleep 29 10 1:46:56 0.1% webservd/76
603 nobody 135M 59M sleep 29 10 5:33:18 0.1% webservd/96
339 root 134M 70M sleep 59 0 0:35:38 0.0% java/24
435 iplanet 132M 55M sleep 29 10 1:10:39 0.1% webservd/76
573 nobody 131M 53M sleep 29 10 0:24:32 0.0% webservd/76
588 nobody 130M 53M sleep 29 10 2:40:55 0.1% webservd/86
454 nobody 128M 51M sleep 29 10 0:09:01 0.0% webservd/76
489 iplanet 126M 49M sleep 29 10 0:00:13 0.0% webservd/74
405 root 119M 45M sleep 29 10 0:00:13 0.0% webservd/31
717 root 54M 46M sleep 59 0 2:31:27 0.2% agent/7
Keep in mind this is sorted by Size not RSS, if you need it by RSS use the rss key
www004:/# prstat -s rss 1 1
PID USERNAME SIZE RSS STATE PRI NICE TIME CPU PROCESS/NLWP
339 root 134M 70M sleep 59 0 0:35:39 0.1% java/24
420 nobody 139M 60M sleep 29 10 1:46:57 0.4% webservd/76
603 nobody 135M 59M sleep 29 10 5:33:19 0.5% webservd/96
435 iplanet 132M 55M sleep 29 10 1:10:39 0.0% webservd/76
573 nobody 131M 53M sleep 29 10 0:24:32 0.0% webservd/76
588 nobody 130M 53M sleep 29 10 2:40:55 0.0% webservd/86
454 nobody 128M 51M sleep 29 10 0:09:01 0.0% webservd/76
489 iplanet 126M 49M sleep 29 10 0:00:13 0.0% webservd/74
I'm not sure if ps is standardized but at least on linux, ps -o %mem gives the percentage of memory used (you would obviously want to add some other columns as well)