Understanding ruby-prof output - ruby

I ran ruby-profiler on one of my programs. I'm trying to figure out what each fields mean. I'm guessing everything is CPU time (and not wall clock time), which is fantastic. I want to understand what the "---" stands for. Is there some sort of stack information in there. What does calls a/b mean?
Thread ID: 81980260
Total Time: 0.28
%total %self total self wait child calls Name
--------------------------------------------------------------------------------
0.28 0.00 0.00 0.28 5/6 FrameParser#receive_data
100.00% 0.00% 0.28 0.00 0.00 0.28 6 FrameParser#read_frames
0.28 0.00 0.00 0.28 4/4 ChatServerClient#receive_frame
0.00 0.00 0.00 0.00 5/47 Fixnum#+
0.00 0.00 0.00 0.00 1/2 DebugServer#receive_frame
0.00 0.00 0.00 0.00 10/29 String#[]
0.00 0.00 0.00 0.00 10/21 <Class::Range>#allocate
0.00 0.00 0.00 0.00 10/71 String#index
--------------------------------------------------------------------------------
100.00% 0.00% 0.28 0.00 0.00 0.28 5 FrameParser#receive_data
0.28 0.00 0.00 0.28 5/6 FrameParser#read_frames
0.00 0.00 0.00 0.00 5/16 ActiveSupport::CoreExtensions::String::OutputSafety#add_with_safety
--------------------------------------------------------------------------------
0.28 0.00 0.00 0.28 4/4 FrameParser#read_frames
100.00% 0.00% 0.28 0.00 0.00 0.28 4 ChatServerClient#receive_frame
0.28 0.00 0.00 0.28 4/6 <Class::Lal>#safe_call
--------------------------------------------------------------------------------
0.00 0.00 0.00 0.00 1/6 <Class::Lal>#safe_call
0.00 0.00 0.00 0.00 1/6 DebugServer#receive_frame
0.28 0.00 0.00 0.28 4/6 ChatServerClient#receive_frame
100.00% 0.00% 0.28 0.00 0.00 0.28 6 <Class::Lal>#safe_call
0.21 0.00 0.00 0.21 2/4 ChatUserFunction#register
0.06 0.00 0.00 0.06 2/2 ChatUserFunction#packet
0.01 0.00 0.00 0.01 4/130 Class#new
0.00 0.00 0.00 0.00 1/1 DebugServer#profile_stop
0.00 0.00 0.00 0.00 1/33 String#==
0.00 0.00 0.00 0.00 1/6 <Class::Lal>#safe_call
0.00 0.00 0.00 0.00 5/5 JSON#parse
0.00 0.00 0.00 0.00 5/8 <Class::Log>#log
0.00 0.00 0.00 0.00 5/5 String#strip!
--------------------------------------------------------------------------------

Each section of the ruby-prof output is broken up into the examination of a particular function. for instance, look at the first section of your output. The read_frames method on FrameParser is the focus and it is basically saying the following:
100% of the execution time that was profiled was spent inside of FrameParser#read_frames
FrameParser#read_frames was called 6 times.
5 out of the 6 calls to read_frames came from FrameParser#receive_data and this accounted 100% of the execution time (this is the line above the read_frames line).
The lines below the read_frames (but within that first section) method are all of the methods that FrameParser#read_frames calls (you should be aware of that since this seems like it's your code), how many of that methods total calls read_frames is responsible for (the a/b calls column), and how much time those calls took. They are ordered by which of them took up the most execution time. In your case, that is receive_frame method on the ChatServer class.
You can then look down at the section focusing on receive_frames (2 down and centered with the '100%' line on receive_frame) and see how it's performance is broken down. each section is set up the same way and usually the subsequent function call which took the most time is the focus of the next section down. ruby-prof will continue doing this through the full call stack. You can go as deep as you want until you find the bottleneck you'd like to resolve.

Related

get MAx value of a column withing a fixed timestamp

I have a file with timestamp and data in 12 columns. This data is dumped every second and I need to pick the MAX value of 6th column within every Minute. I am not even sure from were to start .I thought of doing as follow ,but do not know how to get one out of minute group. Also what if data is more then of 24 hours. so cannot use this approach. I think somehow I need to create a group of 60 rows and then sort data out of it, but not sure how to do that.
cat file |sort -k6 -r |awk '!a[$1]++' |sort -k1
For example :Input data
16:06:00 0 1.01 0.00 4.04 1.00 0.00 0.00 0.00 0.00 0.00 94.95
16:06:01 0 0.00 0.00 2.00 2.00 0.00 0.00 0.00 0.00 0.00 98.00
16:06:02 0 3.03 0.00 6.06 5.00 0.00 0.00 0.00 0.00 0.00 90.91
16:06:03 0 4.08 1.02 2.04 2.00 0.00 0.00 0.00 0.00 0.00 92.86
...
...
16:06:59 0 4.08 1.02 2.04 3.00 0.00 0.00 0.00 0.00 0.00 92.86
16:07:00 0 1.01 0.00 4.04 4.00 0.00 0.00 0.00 0.00 0.00 94.95
16:07:01 0 0.00 0.00 2.00 5.00 0.00 0.00 0.00 0.00 0.00 98.00
16:07:02 0 3.03 0.00 6.06 9.00 0.00 0.00 0.00 0.00 0.00 90.91
16:07:03 0 4.08 1.02 2.04 0.00 0.00 0.00 0.00 0.00 0.00 92.86
...
...
16:07:59 0 4.08 1.02 2.04 0.00 0.00 0.00 0.00 0.00 0.00 92.86
...
...
Expected output:
16:06:02 0 3.03 0.00 6.06 5.00 0.00 0.00 0.00 0.00 0.00 90.91
16:07:02 0 3.03 0.00 6.06 9.00 0.00 0.00 0.00 0.00 0.00 90.91
awk to the rescue!
$ awk ' {split($1,a,":"); k=a[1]a[2]}
max[k]<$6 {max[k]=$6; maxR[k]=$0}
END {for(r in maxR) print maxR[r]}' file
16:06:02 0 3.03 0.00 6.06 5.00 0.00 0.00 0.00 0.00 0.00 90.91
16:07:02 0 3.03 0.00 6.06 9.00 0.00 0.00 0.00 0.00 0.00 90.91
note that max is not initialized (implicitly initialized to zero), if values are all negative this is not going to work. Workaround is simple but perhaps not needed in this context.
This alternative assumes time sorted records and prints the max in one minute intervals, so different dates will not be merged.
$ awk '{split($1,a,":"); k=a[1]a[2]}
max<$6 {max=$6; maxR=$0}
p!=k {if(p) print maxR; p=k}
END {print maxR}' file
16:06:02 0 3.03 0.00 6.06 5.00 0.00 0.00 0.00 0.00 0.00 90.91
16:07:02 0 3.03 0.00 6.06 9.00 0.00 0.00 0.00 0.00 0.00 90.91
Using Perl
$ cat monk.log
16:06:00 0 1.01 0.00 4.04 1.00 0.00 0.00 0.00 0.00 0.00 94.95
16:06:01 0 0.00 0.00 2.00 2.00 0.00 0.00 0.00 0.00 0.00 98.00
16:06:02 0 3.03 0.00 6.06 5.00 0.00 0.00 0.00 0.00 0.00 90.91
16:06:03 0 4.08 1.02 2.04 2.00 0.00 0.00 0.00 0.00 0.00 92.86
16:06:59 0 4.08 1.02 2.04 3.00 0.00 0.00 0.00 0.00 0.00 92.86
16:07:00 0 1.01 0.00 4.04 4.00 0.00 0.00 0.00 0.00 0.00 94.95
16:07:01 0 0.00 0.00 2.00 5.00 0.00 0.00 0.00 0.00 0.00 98.00
16:07:02 0 3.03 0.00 6.06 9.00 0.00 0.00 0.00 0.00 0.00 90.91
16:07:03 0 4.08 1.02 2.04 0.00 0.00 0.00 0.00 0.00 0.00 92.86
16:07:59 0 4.08 1.02 2.04 0.00 0.00 0.00 0.00 0.00 0.00 92.86
$ perl -F'/\s+/' -lane ' $F[0]=~/(.*):/ and $x=$1 ; if( $F[5]>$kv{$x} ) { $kv{$x}=$F[5]; $kv2{$x}=$_ } END { print "$kv2{$_}" for(keys %kv) } ' monk.log
16:06:02 0 3.03 0.00 6.06 5.00 0.00 0.00 0.00 0.00 0.00 90.91
16:07:02 0 3.03 0.00 6.06 9.00 0.00 0.00 0.00 0.00 0.00 90.91
or
$ perl -F'/\s+/' -lane ' $F[0]=~/(.*):/ ; if( $F[5]>$kv{$1} ) { $kv{$1}=$F[5]; $kv2{$1}=$_ } END { print "$kv2{$_}" for(keys %kv) } ' monk.log
16:07:02 0 3.03 0.00 6.06 9.00 0.00 0.00 0.00 0.00 0.00 90.91
16:06:02 0 3.03 0.00 6.06 5.00 0.00 0.00 0.00 0.00 0.00 90.91
awk + sort
$ cat monk.log
16:06:00 0 1.01 0.00 4.04 1.00 0.00 0.00 0.00 0.00 0.00 94.95
16:06:01 0 0.00 0.00 2.00 2.00 0.00 0.00 0.00 0.00 0.00 98.00
16:06:02 0 3.03 0.00 6.06 5.00 0.00 0.00 0.00 0.00 0.00 90.91
16:06:03 0 4.08 1.02 2.04 2.00 0.00 0.00 0.00 0.00 0.00 92.86
16:06:59 0 4.08 1.02 2.04 3.00 0.00 0.00 0.00 0.00 0.00 92.86
16:07:00 0 1.01 0.00 4.04 4.00 0.00 0.00 0.00 0.00 0.00 94.95
16:07:01 0 0.00 0.00 2.00 5.00 0.00 0.00 0.00 0.00 0.00 98.00
16:07:02 0 3.03 0.00 6.06 9.00 0.00 0.00 0.00 0.00 0.00 90.91
16:07:03 0 4.08 1.02 2.04 0.00 0.00 0.00 0.00 0.00 0.00 92.86
16:07:59 0 4.08 1.02 2.04 0.00 0.00 0.00 0.00 0.00 0.00 92.86
$ awk ' { split($1,t,":"); $(NF+1)=t[1]t[2] }1 ' monk.log | sort -k12 -n -k6 | awk ' !a[$NF] { a[$NF]++ ; NF--; print} '
16:06:02 0 3.03 0.00 6.06 5.00 0.00 0.00 0.00 0.00 0.00 90.91
16:07:02 0 3.03 0.00 6.06 9.00 0.00 0.00 0.00 0.00 0.00 90.91
or
$ awk ' split($1,t,":") && $(NF+1)=t[1]t[2] ' monk.log | sort -k12 -n -k6 | awk ' !a[$NF] { a[$NF]++ ; NF--; print} '
16:06:02 0 3.03 0.00 6.06 5.00 0.00 0.00 0.00 0.00 0.00 90.91
16:07:02 0 3.03 0.00 6.06 9.00 0.00 0.00 0.00 0.00 0.00 90.91

Vowpal Wabbit varinfo and ngrams : non-existent combinations

I'm trying to use vw to find words or phrases that predict if someone will open an email. The target is 1 if they opened the email and 0 otherwise. My data looks like this:
1 |A this is a test
0 |A this test is only temporary
1 |A i bought a new polo shirt
1 |A that was a great online sale
I put it into a file called 'test1.txt' and run the following code to do ngrams of 2 and also output variable information:
C:\~\vw>perl vw-varinfo.pl -V --ngram 2 test1.txt >> out.txt
When I look at the output there are bigrams that I don't see in the original data. Is this a bug or am I misunderstanding something.
Output:
FeatureName HashVal MinVal MaxVal Weight RelScore
A^a 239656 0.00 1.00 +0.1664 100.00%
A^is 7514 0.00 1.00 +0.0772 46.38%
A^test 12331 0.00 1.00 +0.0772 46.38%
A^this 169573 0.00 1.00 +0.0772 46.38%
A^bought 245782 0.00 1.00 +0.0650 39.06%
A^i 245469 0.00 1.00 +0.0650 39.06%
A^new 51974 0.00 1.00 +0.0650 39.06%
A^polo 48680 0.00 1.00 +0.0650 39.06%
A^shirt 73882 0.00 1.00 +0.0650 39.06%
A^great 220692 0.00 1.00 +0.0610 36.64%
A^online 147727 0.00 1.00 +0.0610 36.64%
A^sale 242707 0.00 1.00 +0.0610 36.64%
A^that 206586 0.00 1.00 +0.0610 36.64%
A^was 223274 0.00 1.00 +0.0610 36.64%
A^a^bought 216990 0.00 0.00 +0.0000 0.00%
A^bought^great 7122 0.00 0.00 +0.0000 0.00%
A^great^i 190625 0.00 0.00 +0.0000 0.00%
A^i^is 76227 0.00 0.00 +0.0000 0.00%
A^is^new 140536 0.00 0.00 +0.0000 0.00%
A^new^online 69117 0.00 0.00 +0.0000 0.00%
A^online^only 173498 0.00 0.00 +0.0000 0.00%
A^only^polo 51059 0.00 0.00 +0.0000 0.00%
A^polo^sale 131483 0.00 0.00 +0.0000 0.00%
A^sale^shirt 191329 0.00 0.00 +0.0000 0.00%
A^shirt^temporary 81555 0.00 0.00 +0.0000 0.00%
A^temporary^test 90632 0.00 0.00 +0.0000 0.00%
A^test^that 13689 0.00 0.00 +0.0000 0.00%
A^that^this 127863 0.00 0.00 +0.0000 0.00%
A^this^was 22011 0.00 0.00 +0.0000 0.00%
Constant 116060 0.00 0.00 +0.1465 0.00%
A^only 62951 0.00 1.00 -0.0490 -29.47%
A^temporary 44641 0.00 1.00 -0.0490 -29.47%
For instance, ^bought^great never actually occurs in any of the original input rows. Am I doing something wrong?
It is a bug in vw-varinfo.
This can be verified by running vw alone with --invert_hash:
$ vw --ngram 2 test1.txt --invert_hash train.ih
$ grep '^bought^great' train.ih
# no output
The quick partial work-around is to treat all features with a weight of 0.0 as highly suspect, and probably bogus. Unfortunately, there are some features that are missing too because vw-varinfo knows nothing about --ngram.
I really need to rewrite vw-varinfo. vw changed a lot since vw-varinfo was written, plus vw-varinfo was written sub-optimally repeating a lot of the cross-feature logic that's already in vw itself. The new implementation which I have in mind should be significanly more efficient and less vulnerable to these kinds of bugs.
This project was put on hold due to more urgent stuff. Hope to find some time to correct this this year.
Unrelated tip: since you're doing binary classification, you should use labels in {-1, 1} rather than in {0,1} and use --loss_function logistic for best results.

Hadoop cluster stops after mapping

I have a hadoop cluster consisting of three machines. I put on hadoop 20 G file, I start hadoop and it stops after mapping.
"13/08/22 08:09:34 INFO mapred.JobClient: map 100% reduce 11%"
After mapping all cpu don't work. I can wait one day, but it can't start again.
What can I do?
This is last 10 lines of my log file, when map is 100% and reduce is 11%:
2013-08-22 14:15:32,503 INFO org.apache.hadoop.mapred.MapTask: Starting flush of map output
2013-08-22 14:15:32,542 INFO org.apache.hadoop.mapred.MapTask: Finished spill 67
2013-08-22 14:15:32,552 INFO org.apache.hadoop.mapred.Merger: Merging 68 sorted segments
2013-08-22 14:15:32,558 INFO org.apache.hadoop.mapred.Merger: Merging 5 intermediate segments out of a total of 68
2013-08-22 14:15:32,622 INFO org.apache.hadoop.mapred.Merger: Down to the last merge-pass, with 64 segments left of total size: 1600710 bytes
2013-08-22 14:15:32,708 INFO org.apache.hadoop.mapred.Task: Task:attempt_201308221308_0002_m_000302_0 is done. And is in the process of commiting
2013-08-22 14:15:32,717 INFO org.apache.hadoop.mapred.Task: Task 'attempt_201308221308_0002_m_000302_0' done.
2013-08-22 14:15:32,759 INFO org.apache.hadoop.mapred.TaskLogsTruncater: Initializing logs' truncater with mapRetainSize=-1 and reduceRetainSize=-1
2013-08-22 14:15:32,774 INFO org.apache.hadoop.io.nativeio.NativeIO: Initialized cache for UID to User mapping with a cache timeout of 14400 seconds.
2013-08-22 14:15:32,774 INFO org.apache.hadoop.io.nativeio.NativeIO: Got UserName llobocki for UID 1000 from the native implementation
My Child of master hadoop thread dump, when map is 100% and reduce is 11%:
2013-08-23 11:37:26
Full thread dump Java HotSpot(TM) 64-Bit Server VM (23.25-b01 mixed mode):
"Attach Listener" daemon prio=10 tid=0x0000000000f85800 nid=0x3873 waiting on condition [0x0000000000000000]
java.lang.Thread.State: RUNNABLE
"Thread for polling Map Completion Events" daemon prio=10 tid=0x00007fc32860c800 nid=0x1d7a waiting on condition [0x00007fc31c183000]
java.lang.Thread.State: TIMED_WAITING (sleeping)
at java.lang.Thread.sleep(Native Method)
at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$GetMapEventsThread.run(ReduceTask.java:2882)
"Thread for merging in memory files" daemon prio=10 tid=0x00007fc32860a800 nid=0x1d78 in Object.wait() [0x00007fc31c284000]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0x00000005bd6dd7c8> (a java.lang.Object)
at java.lang.Object.wait(Object.java:503)
at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$ShuffleRamManager.waitForDataToMerge(ReduceTask.java:1197)
- locked <0x00000005bd6dd7c8> (a java.lang.Object)
at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$InMemFSMergeThread.run(ReduceTask.java:2760)
"Thread for merging on-disk files" daemon prio=10 tid=0x00007fc328608000 nid=0x1d77 in Object.wait() [0x00007fc31c385000]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0x00000005bd713988> (a java.util.TreeSet)
at java.lang.Object.wait(Object.java:503)
at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$LocalFSMerger.run(ReduceTask.java:2654)
- locked <0x00000005bd713988> (a java.util.TreeSet)
"MapOutputCopier attempt_201308230927_0001_r_000000_0.4" prio=10 tid=0x00007fc328606800 nid=0x1d76 in Object.wait() [0x00007fc31c486000]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0x00000005bd762eb0> (a java.util.ArrayList)
at java.lang.Object.wait(Object.java:503)
at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run(ReduceTask.java:1324)
- locked <0x00000005bd762eb0> (a java.util.ArrayList)
"MapOutputCopier attempt_201308230927_0001_r_000000_0.3" prio=10 tid=0x00007fc328602000 nid=0x1d75 in Object.wait() [0x00007fc31c587000]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0x00000005bd762eb0> (a java.util.ArrayList)
at java.lang.Object.wait(Object.java:503)
at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run(ReduceTask.java:1324)
- locked <0x00000005bd762eb0> (a java.util.ArrayList)
"MapOutputCopier attempt_201308230927_0001_r_000000_0.2" prio=10 tid=0x00007fc328600000 nid=0x1d73 in Object.wait() [0x00007fc31c688000]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0x00000005bd762eb0> (a java.util.ArrayList)
at java.lang.Object.wait(Object.java:503)
at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run(ReduceTask.java:1324)
- locked <0x00000005bd762eb0> (a java.util.ArrayList)
"MapOutputCopier attempt_201308230927_0001_r_000000_0.1" prio=10 tid=0x00007fc3285ff000 nid=0x1d72 in Object.wait() [0x00007fc31c789000]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0x00000005bd762eb0> (a java.util.ArrayList)
at java.lang.Object.wait(Object.java:503)
at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run(ReduceTask.java:1324)
- locked <0x00000005bd762eb0> (a java.util.ArrayList)
"MapOutputCopier attempt_201308230927_0001_r_000000_0.0" prio=10 tid=0x00007fc3285f8800 nid=0x1d70 in Object.wait() [0x00007fc31c88a000]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0x00000005bd762eb0> (a java.util.ArrayList)
at java.lang.Object.wait(Object.java:503)
at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run(ReduceTask.java:1324)
- locked <0x00000005bd762eb0> (a java.util.ArrayList)
"communication thread" daemon prio=10 tid=0x00007fc3285d2000 nid=0x1d53 in Object.wait() [0x00007fc31c9b3000]
java.lang.Thread.State: TIMED_WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0x00000005bd762e90> (a java.lang.Object)
at org.apache.hadoop.mapred.Task$TaskReporter.run(Task.java:658)
- locked <0x00000005bd762e90> (a java.lang.Object)
at java.lang.Thread.run(Thread.java:724)
"Timer for 'ReduceTask' metrics system" daemon prio=10 tid=0x00007fc3285b1000 nid=0x1d49 in Object.wait() [0x00007fc31cbb5000]
java.lang.Thread.State: TIMED_WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0x00000005bd919a30> (a java.util.TaskQueue)
at java.util.TimerThread.mainLoop(Timer.java:552)
- locked <0x00000005bd919a30> (a java.util.TaskQueue)
at java.util.TimerThread.run(Timer.java:505)
"Thread for syncLogs" daemon prio=10 tid=0x00007fc328494000 nid=0x1d3e waiting on condition [0x00007fc31cebd000]
java.lang.Thread.State: TIMED_WAITING (sleeping)
at java.lang.Thread.sleep(Native Method)
at org.apache.hadoop.mapred.Child$3.run(Child.java:139)
"IPC Client (47) connection to /127.0.0.1:35127 from job_201308230927_0001" daemon prio=10 tid=0x00007fc328492800 nid=0x1d3d in Object.wait() [0x00007fc31cfbe000]
java.lang.Thread.State: TIMED_WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0x00000005bd721b60> (a org.apache.hadoop.ipc.Client$Connection)
at org.apache.hadoop.ipc.Client$Connection.waitForWork(Client.java:747)
- locked <0x00000005bd721b60> (a org.apache.hadoop.ipc.Client$Connection)
at org.apache.hadoop.ipc.Client$Connection.run(Client.java:789)
"Service Thread" daemon prio=10 tid=0x00007fc3280f4000 nid=0x1cf7 runnable [0x0000000000000000]
java.lang.Thread.State: RUNNABLE
"C2 CompilerThread1" daemon prio=10 tid=0x00007fc3280f1800 nid=0x1cf5 waiting on condition [0x0000000000000000]
java.lang.Thread.State: RUNNABLE
"C2 CompilerThread0" daemon prio=10 tid=0x00007fc3280ee800 nid=0x1cf4 waiting on condition [0x0000000000000000]
java.lang.Thread.State: RUNNABLE
"Signal Dispatcher" daemon prio=10 tid=0x00007fc3280ec800 nid=0x1cf3 runnable [0x0000000000000000]
java.lang.Thread.State: RUNNABLE
"Finalizer" daemon prio=10 tid=0x00007fc32809e000 nid=0x1ce5 in Object.wait() [0x00007fc2c1b7f000]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0x00000005bd6fb1f8> (a java.lang.ref.ReferenceQueue$Lock)
at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:135)
- locked <0x00000005bd6fb1f8> (a java.lang.ref.ReferenceQueue$Lock)
at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:151)
at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:189)
"Reference Handler" daemon prio=10 tid=0x00007fc32809c000 nid=0x1ce4 in Object.wait() [0x00007fc2c1c80000]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0x00000005bd6fade8> (a java.lang.ref.Reference$Lock)
at java.lang.Object.wait(Object.java:503)
at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:133)
- locked <0x00000005bd6fade8> (a java.lang.ref.Reference$Lock)
"main" prio=10 tid=0x00007fc32800b000 nid=0x1cc8 waiting on condition [0x00007fc32dc3a000]
java.lang.Thread.State: TIMED_WAITING (sleeping)
at java.lang.Thread.sleep(Native Method)
at org.apache.hadoop.mapred.ReduceTask$ReduceCopier.fetchOutputs(ReduceTask.java:2191)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:386)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
at org.apache.hadoop.mapred.Child.main(Child.java:249)
"VM Thread" prio=10 tid=0x00007fc328094800 nid=0x1cdf runnable
"GC task thread#0 (ParallelGC)" prio=10 tid=0x00007fc328018800 nid=0x1ccc runnable
"GC task thread#1 (ParallelGC)" prio=10 tid=0x00007fc32801a800 nid=0x1cce runnable
"GC task thread#2 (ParallelGC)" prio=10 tid=0x00007fc32801c800 nid=0x1cd7 runnable
"GC task thread#3 (ParallelGC)" prio=10 tid=0x00007fc32801e000 nid=0x1cd8 runnable
"VM Periodic Task Thread" prio=10 tid=0x00007fc3280fe800 nid=0x1cf8 waiting on condition
JNI global references: 224
During mapping the net traffic is ~20 MiB on master, but when reduce starts, net traffic goes down to 3 KiB.
iostat
of my machines.
Master during map:
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await svctm %util
sdd 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdb 0.00 0.00 0.00 7.00 0.00 0.02 6.29 0.48 68.43 68.29 47.80
sda 0.00 0.00 43.00 7.00 5.38 0.02 221.04 0.22 4.42 2.78 13.90
sdc 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
md1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
md2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
md3 0.00 0.00 43.00 3.00 5.38 0.01 239.83 0.00 0.00 0.00 0.00
md0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Filesystem: rMB_nor/s wMB_nor/s rMB_dir/s wMB_dir/s rMB_svr/s wMB_svr/s ops/s rops/s wops/s
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await svctm %util
sdd 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdb 0.00 14.00 0.00 53.00 0.00 1.34 51.66 1.58 29.77 5.38 28.50
sda 3.00 14.00 34.00 53.00 4.62 1.34 140.34 1.27 14.55 3.84 33.40
sdc 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
md1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
md2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
md3 0.00 0.00 37.00 62.00 4.62 1.32 122.99 0.00 0.00 0.00 0.00
md0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Slave during map:
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await svctm %util
sdd 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sda 2.00 0.00 12.00 4.00 1.75 0.01 225.25 0.76 47.50 25.19 40.30
sdb 0.00 0.00 0.00 6.00 0.00 0.02 6.00 0.09 20.00 14.67 8.80
sdc 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
md1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
md2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
md3 0.00 0.00 14.00 2.00 1.75 0.01 225.00 0.00 0.00 0.00 0.00
md0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Filesystem: rMB_nor/s wMB_nor/s rMB_dir/s wMB_dir/s rMB_svr/s wMB_svr/s ops/s rops/s wops/s
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await svctm %util
sdd 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sda 0.00 0.00 28.00 4.00 3.50 0.01 224.81 0.39 12.28 7.16 22.90
sdb 0.00 0.00 5.00 3.00 0.42 0.01 110.25 0.25 31.50 22.12 17.70
sdc 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
md1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
md2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
md3 0.00 0.00 33.00 0.00 3.92 0.00 243.39 0.00 0.00 0.00 0.00
md0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Master stopped:
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await svctm %util
sdd 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdb 0.00 0.00 8.00 1.00 1.00 0.00 228.44 0.03 3.44 3.00 2.70
sda 0.00 0.00 0.00 1.00 0.00 0.00 8.00 0.01 13.00 13.00 1.30
sdc 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
md1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
md2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
md3 0.00 0.00 8.00 0.00 1.00 0.00 256.00 0.00 0.00 0.00 0.00
md0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Filesystem: rMB_nor/s wMB_nor/s rMB_dir/s wMB_dir/s rMB_svr/s wMB_svr/s ops/s rops/s wops/s
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await svctm %util
sdd 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdb 0.00 0.00 8.00 0.00 1.00 0.00 256.00 0.01 0.62 0.50 0.40
sda 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdc 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
md1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
md2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
md3 0.00 0.00 8.00 0.00 1.00 0.00 256.00 0.00 0.00 0.00 0.00
md0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Filesystem: rMB_nor/s wMB_nor/s rMB_dir/s wMB_dir/s rMB_svr/s wMB_svr/s ops/s rops/s wops/s
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await svctm %util
sdd 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdb 0.00 0.00 8.00 0.00 1.00 0.00 256.00 0.02 2.38 2.38 1.90
sda 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdc 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
md1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
md2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
md3 0.00 0.00 8.00 0.00 1.00 0.00 256.00 0.00 0.00 0.00 0.00
md0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Filesystem: rMB_nor/s wMB_nor/s rMB_dir/s wMB_dir/s rMB_svr/s wMB_svr/s ops/s rops/s wops/s
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await svctm %util
sdd 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdb 0.00 0.00 8.00 0.00 1.00 0.00 256.00 0.01 0.75 0.50 0.40
sda 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdc 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
md1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
md2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
md3 0.00 0.00 8.00 0.00 1.00 0.00 256.00 0.00 0.00 0.00 0.00
md0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Slave stopped:
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await svctm %util
sdd 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sda 0.00 0.00 8.00 0.00 1.00 0.00 256.00 0.01 1.38 1.12 0.90
sdb 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdc 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
md1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
md2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
md3 0.00 0.00 8.00 0.00 1.00 0.00 256.00 0.00 0.00 0.00 0.00
md0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Filesystem: rMB_nor/s wMB_nor/s rMB_dir/s wMB_dir/s rMB_svr/s wMB_svr/s ops/s rops/s wops/s
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await svctm %util
sdd 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sda 0.00 0.00 7.00 0.00 0.88 0.00 256.00 0.01 0.71 0.57 0.40
sdb 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdc 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
md1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
md2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
md3 0.00 0.00 7.00 0.00 0.88 0.00 256.00 0.00 0.00 0.00 0.00
md0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Filesystem: rMB_nor/s wMB_nor/s rMB_dir/s wMB_dir/s rMB_svr/s wMB_svr/s ops/s rops/s wops/s
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await svctm %util
sdd 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sda 0.00 0.00 8.00 0.00 1.00 0.00 256.00 0.01 0.75 0.62 0.50
sdb 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdc 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
md1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
md2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
md3 0.00 0.00 8.00 0.00 1.00 0.00 256.00 0.00 0.00 0.00 0.00
md0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
I've solved my problem. I had an incorrect value in /etc/hosts.
Earlier:
ip alias
Now:
ip domain alias

Uneven CPU load distribution

I have a system with uneven CPU load in a odd pattern. It's serving up apache, elastic search, redis, and email.
Here's the mpstat output. Notice how %usr for the last 12 cores is well below the top 12.
# mpstat -P ALL
Linux 3.5.0-17-generic (<server1>) 02/16/2013 _x86_64_ (24 CPU)
10:21:46 PM CPU %usr %nice %sys %iowait %irq %soft %steal %guest %idle
10:21:46 PM all 17.15 0.00 2.20 0.33 0.00 0.09 0.00 0.00 80.23
10:21:46 PM 0 27.34 0.00 4.08 0.56 0.00 0.53 0.00 0.00 67.48
10:21:46 PM 1 24.51 0.00 3.25 0.53 0.00 0.34 0.00 0.00 71.38
10:21:46 PM 2 26.69 0.00 4.20 0.50 0.00 0.24 0.00 0.00 68.36
10:21:46 PM 3 24.38 0.00 3.04 0.70 0.00 0.23 0.00 0.00 71.65
10:21:46 PM 4 24.50 0.00 4.04 0.57 0.00 0.15 0.00 0.00 70.74
10:21:46 PM 5 21.75 0.00 2.80 0.74 0.00 0.15 0.00 0.00 74.55
10:21:46 PM 6 28.30 0.00 3.75 0.84 0.00 0.04 0.00 0.00 67.07
10:21:46 PM 7 30.20 0.00 3.94 0.16 0.00 0.03 0.00 0.00 65.67
10:21:46 PM 8 30.55 0.00 4.09 0.12 0.00 0.03 0.00 0.00 65.21
10:21:46 PM 9 32.66 0.00 3.40 0.09 0.00 0.03 0.00 0.00 63.81
10:21:46 PM 10 32.20 0.00 3.57 0.08 0.00 0.03 0.00 0.00 64.12
10:21:46 PM 11 32.08 0.00 3.92 0.08 0.00 0.03 0.00 0.00 63.88
10:21:46 PM 12 4.53 0.00 0.41 0.34 0.00 0.04 0.00 0.00 94.68
10:21:46 PM 13 9.14 0.00 1.42 0.32 0.00 0.04 0.00 0.00 89.08
10:21:46 PM 14 5.92 0.00 0.70 0.35 0.00 0.06 0.00 0.00 92.97
10:21:46 PM 15 6.14 0.00 0.66 0.35 0.00 0.04 0.00 0.00 92.81
10:21:46 PM 16 7.39 0.00 0.65 0.34 0.00 0.04 0.00 0.00 91.57
10:21:46 PM 17 6.60 0.00 0.83 0.39 0.00 0.05 0.00 0.00 92.13
10:21:46 PM 18 5.49 0.00 0.54 0.30 0.00 0.01 0.00 0.00 93.65
10:21:46 PM 19 6.78 0.00 0.88 0.21 0.00 0.01 0.00 0.00 92.12
10:21:46 PM 20 6.17 0.00 0.58 0.11 0.00 0.01 0.00 0.00 93.13
10:21:46 PM 21 5.78 0.00 0.82 0.10 0.00 0.01 0.00 0.00 93.29
10:21:46 PM 22 6.29 0.00 0.60 0.10 0.00 0.01 0.00 0.00 93.00
10:21:46 PM 23 6.18 0.00 0.61 0.10 0.00 0.01 0.00 0.00 93.10
I have another system, a database server running MySQL, which shows an even distribution.
# mpstat -P ALL
Linux 3.5.0-17-generic (<server2>) 02/16/2013 _x86_64_ (32 CPU)
10:27:57 PM CPU %usr %nice %sys %iowait %irq %soft %steal %guest %idle
10:27:57 PM all 0.77 0.00 0.07 0.68 0.00 0.00 0.00 0.00 98.47
10:27:57 PM 0 2.31 0.00 0.19 1.86 0.00 0.01 0.00 0.00 95.63
10:27:57 PM 1 1.73 0.00 0.17 1.87 0.00 0.01 0.00 0.00 96.21
10:27:57 PM 2 2.62 0.00 0.25 2.51 0.00 0.01 0.00 0.00 94.62
10:27:57 PM 3 1.60 0.00 0.17 1.99 0.00 0.01 0.00 0.00 96.23
10:27:57 PM 4 1.86 0.00 0.16 1.84 0.00 0.01 0.00 0.00 96.13
10:27:57 PM 5 2.30 0.00 0.25 2.45 0.00 0.01 0.00 0.00 94.99
10:27:57 PM 6 2.05 0.00 0.20 1.89 0.00 0.01 0.00 0.00 95.86
10:27:57 PM 7 2.13 0.00 0.20 2.31 0.00 0.01 0.00 0.00 95.36
10:27:57 PM 8 0.82 0.00 0.11 4.05 0.00 0.03 0.00 0.00 94.99
10:27:57 PM 9 0.70 0.00 0.18 0.06 0.00 0.00 0.00 0.00 99.06
10:27:57 PM 10 0.18 0.00 0.04 0.01 0.00 0.00 0.00 0.00 99.77
10:27:57 PM 11 0.20 0.00 0.01 0.01 0.00 0.00 0.00 0.00 99.78
10:27:57 PM 12 0.13 0.00 0.01 0.01 0.00 0.00 0.00 0.00 99.86
10:27:57 PM 13 0.04 0.00 0.01 0.00 0.00 0.00 0.00 0.00 99.95
10:27:57 PM 14 0.03 0.00 0.01 0.00 0.00 0.00 0.00 0.00 99.97
10:27:57 PM 15 0.03 0.00 0.00 0.00 0.00 0.00 0.00 0.00 99.97
10:27:57 PM 16 0.05 0.00 0.00 0.00 0.00 0.00 0.00 0.00 99.94
10:27:57 PM 17 0.41 0.00 0.10 0.04 0.00 0.00 0.00 0.00 99.45
10:27:57 PM 18 2.78 0.00 0.06 0.14 0.00 0.00 0.00 0.00 97.01
10:27:57 PM 19 1.19 0.00 0.08 0.19 0.00 0.00 0.00 0.00 98.53
10:27:57 PM 20 0.48 0.00 0.04 0.30 0.00 0.00 0.00 0.00 99.17
10:27:57 PM 21 0.70 0.00 0.03 0.16 0.00 0.00 0.00 0.00 99.11
10:27:57 PM 22 0.08 0.00 0.01 0.02 0.00 0.00 0.00 0.00 99.90
10:27:57 PM 23 0.30 0.00 0.02 0.06 0.00 0.00 0.00 0.00 99.62
10:27:57 PM 24 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00
10:27:57 PM 25 0.04 0.00 0.03 0.00 0.00 0.00 0.00 0.00 99.94
10:27:57 PM 26 0.06 0.00 0.01 0.00 0.00 0.00 0.00 0.00 99.93
10:27:57 PM 27 0.01 0.00 0.01 0.00 0.00 0.00 0.00 0.00 99.98
10:27:57 PM 28 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 99.99
10:27:57 PM 29 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00
10:27:57 PM 30 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00
10:27:57 PM 31 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 99.99
Both are dedicated systems running Ubuntu 12.10 (not virtual).
I've thought and read up about setting nice, taskset, or trying to tweak the scheduler but I don't want to make any rash decisions. Also, this system isn't performing "bad" per-se, I just want to ensure all cores are being utilized properly.
Let me know if I can provide additional information. Any suggestions to even the CPU load on "server1" are greatly appreciated.
This is not a problem until some cores hit 100% and others don't (i.e. in the statistics you've shown us, there's nothing that would suggest that the uneven distribution is negatively affecting the performance). In your case, you probably have quite a few processes that distribute evenly, resulting in a base load of 6-10% on each core, and then ~12 more threads that require 10-20% of a core each. You can't split a single process/thread between cores.

is it possible to create a graph by shell script

i want to create a graph file using shell script. For example, i want to make graph of sar output of my system.
sar 1 10
05:36:32 AM CPU %user %nice %system %iowait %steal %idle
05:36:33 AM all 0.00 0.00 0.00 0.00 0.00 100.00
05:36:34 AM all 0.00 0.00 0.00 0.00 0.00 100.00
05:36:35 AM all 0.00 0.00 0.00 0.00 0.00 100.00
05:36:36 AM all 0.00 0.00 0.00 0.00 0.00 100.00
05:36:37 AM all 0.00 0.00 0.00 0.00 0.00 100.00
05:36:38 AM all 0.00 0.00 0.00 0.00 0.00 100.00
05:36:39 AM all 0.00 0.00 0.00 0.00 0.00 100.00
05:36:40 AM all 0.00 0.00 0.00 0.00 0.00 100.00
05:36:41 AM all 0.00 0.00 0.00 0.00 0.00 100.00
05:36:42 AM all 0.00 0.00 0.00 0.00 0.00 100.00
Average: all 0.00 0.00 0.00 0.00 0.00 100.00
As a visualizer you can use Gnuplot.

Resources