what cause the minor page faults?RSS and VSZ constant - memory-management

The pidstat -r result:
10:53:20 PID minflt/s majflt/s VSZ RSS %MEM
10:53:21 15800 48.00 0.00 4859808 4164084 12.74
10:53:22 15800 100.00 0.00 4859808 4164084 12.74
10:53:23 15800 54.00 0.00 4859808 4164084 12.74
10:53:24 15800 35.00 0.00 4859808 4164084 12.74
10:53:25 15800 41.00 0.00 4859808 4164084 12.74
10:53:26 15800 41.00 0.00 4859808 4164084 12.74
10:53:27 15800 29.00 0.00 4859808 4164084 12.74
10:53:28 15800 19.00 0.00 4859808 4164084 12.74
10:53:29 15800 94.00 0.00 4859808 4164084 12.74
10:53:30 15800 28.00 0.00 4859808 4164084 12.74
10:53:31 15800 65.00 0.00 4859808 4164084 12.74
the program use big share memory, about 4G, no other program share the memory;
use pidstat -r to see the statistics, we can see the VSZ and RSS not increase.
and after the program runs for a week, also can see the minor page faults.
so I do not know why?

Related

Mariadb 10 Throughput bottleneck

I have a machine Intel Atom C2758 (8Core) with an WD WD500LPLX (7200 RPM 500GB) 2.5" HDD. OS is Ubuntu 15.10 (kernel 4.2) with MariaDB 10 (default config)
I am insert about 20-30 rows per second (row size 355 bytes). It looks like the bottleneck is with the HDD IO but the wMB/s is low and utilization is high.
What else could i do to make it take more transactions.
Thank you.
With:
iostat -m -x 1
avg-cpu: %user %nice %system %iowait %steal %idle
0.50 0.00 0.50 12.20 0.00 86.79
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
sda 0.00 22.00 0.00 61.00 0.00 0.39 12.98 1.00 15.74 0.00 15.74 16.13 98.40
avg-cpu: %user %nice %system %iowait %steal %idle
0.50 0.00 0.00 12.31 0.00 87.19
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
sda 0.00 24.00 0.00 68.00 0.00 0.45 13.53 0.99 14.71 0.00 14.71 14.24 96.80
avg-cpu: %user %nice %system %iowait %steal %idle
0.75 0.00 0.50 12.31 0.00 86.43
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
sda 0.00 27.00 0.00 75.00 0.00 0.48 13.12 1.02 13.71 0.00 13.71 12.91 96.80
INSERT is too slow?
innodb_flush_log_at_trx_commit = 2 is faster, but slightly less secure.
"Batch" the inserts: either INSERT ... VALUES (...), (...) ... or use LOAD DATA.

Parse iostat output

I have a need to grep only certain lines in a log file generated with iostat. iostat command is iostat -x 1 -m > disk.log and it saves a file like this:
Linux 2.6.32-358.18.1.el6.x86_64 (parekosam) 11/26/2013 _x86_64_ (2 CPU)
avg-cpu: %user %nice %system %iowait %steal %idle
0.04 0.01 0.14 0.28 0.00 99.53
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await svctm %util
sda 72.44 6.67 4.15 0.34 0.33 0.03 162.23 0.02 3.92 1.77 0.80
dm-0 0.00 0.00 1.30 6.96 0.03 0.03 15.11 0.65 78.37 0.69 0.57
dm-1 0.00 0.00 0.07 0.00 0.00 0.00 7.99 0.00 2.57 0.67 0.00
avg-cpu: %user %nice %system %iowait %steal %idle
0.00 0.00 0.00 1.01 0.00 98.99
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await svctm %util
sda 0.00 5.00 0.00 3.00 0.00 0.03 18.67 0.03 10.67 10.67 3.20
dm-0 0.00 0.00 0.00 7.00 0.00 0.03 8.00 0.04 5.29 4.57 3.20
dm-1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
I'd like to only show rMB/s and wMB/s columns so that I can calculate average speeds. I've tried some things with sed and awk but with little success. Ideal output should look like this:
12.27 10.23
11.27 10.22
15.26 20.23
12.24 10.25
12.26 50.23
12.23 10.26
13.23 23.23
12.22 10.23
12.23 10.23
22.23 14.27
13.21 10.23
12.23 10.23
14.22 10.23
12.23 10.21
Please notice this is for 'sda' only.
iostat -x 1 -m | awk '/sda/ { print $6, $7}'
Does this do what you want?
/^$/ {a=""}
a {print $6,$7}
/^Device/ {a=1}

Hadoop cluster stops after mapping

I have a hadoop cluster consisting of three machines. I put on hadoop 20 G file, I start hadoop and it stops after mapping.
"13/08/22 08:09:34 INFO mapred.JobClient: map 100% reduce 11%"
After mapping all cpu don't work. I can wait one day, but it can't start again.
What can I do?
This is last 10 lines of my log file, when map is 100% and reduce is 11%:
2013-08-22 14:15:32,503 INFO org.apache.hadoop.mapred.MapTask: Starting flush of map output
2013-08-22 14:15:32,542 INFO org.apache.hadoop.mapred.MapTask: Finished spill 67
2013-08-22 14:15:32,552 INFO org.apache.hadoop.mapred.Merger: Merging 68 sorted segments
2013-08-22 14:15:32,558 INFO org.apache.hadoop.mapred.Merger: Merging 5 intermediate segments out of a total of 68
2013-08-22 14:15:32,622 INFO org.apache.hadoop.mapred.Merger: Down to the last merge-pass, with 64 segments left of total size: 1600710 bytes
2013-08-22 14:15:32,708 INFO org.apache.hadoop.mapred.Task: Task:attempt_201308221308_0002_m_000302_0 is done. And is in the process of commiting
2013-08-22 14:15:32,717 INFO org.apache.hadoop.mapred.Task: Task 'attempt_201308221308_0002_m_000302_0' done.
2013-08-22 14:15:32,759 INFO org.apache.hadoop.mapred.TaskLogsTruncater: Initializing logs' truncater with mapRetainSize=-1 and reduceRetainSize=-1
2013-08-22 14:15:32,774 INFO org.apache.hadoop.io.nativeio.NativeIO: Initialized cache for UID to User mapping with a cache timeout of 14400 seconds.
2013-08-22 14:15:32,774 INFO org.apache.hadoop.io.nativeio.NativeIO: Got UserName llobocki for UID 1000 from the native implementation
My Child of master hadoop thread dump, when map is 100% and reduce is 11%:
2013-08-23 11:37:26
Full thread dump Java HotSpot(TM) 64-Bit Server VM (23.25-b01 mixed mode):
"Attach Listener" daemon prio=10 tid=0x0000000000f85800 nid=0x3873 waiting on condition [0x0000000000000000]
java.lang.Thread.State: RUNNABLE
"Thread for polling Map Completion Events" daemon prio=10 tid=0x00007fc32860c800 nid=0x1d7a waiting on condition [0x00007fc31c183000]
java.lang.Thread.State: TIMED_WAITING (sleeping)
at java.lang.Thread.sleep(Native Method)
at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$GetMapEventsThread.run(ReduceTask.java:2882)
"Thread for merging in memory files" daemon prio=10 tid=0x00007fc32860a800 nid=0x1d78 in Object.wait() [0x00007fc31c284000]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0x00000005bd6dd7c8> (a java.lang.Object)
at java.lang.Object.wait(Object.java:503)
at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$ShuffleRamManager.waitForDataToMerge(ReduceTask.java:1197)
- locked <0x00000005bd6dd7c8> (a java.lang.Object)
at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$InMemFSMergeThread.run(ReduceTask.java:2760)
"Thread for merging on-disk files" daemon prio=10 tid=0x00007fc328608000 nid=0x1d77 in Object.wait() [0x00007fc31c385000]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0x00000005bd713988> (a java.util.TreeSet)
at java.lang.Object.wait(Object.java:503)
at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$LocalFSMerger.run(ReduceTask.java:2654)
- locked <0x00000005bd713988> (a java.util.TreeSet)
"MapOutputCopier attempt_201308230927_0001_r_000000_0.4" prio=10 tid=0x00007fc328606800 nid=0x1d76 in Object.wait() [0x00007fc31c486000]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0x00000005bd762eb0> (a java.util.ArrayList)
at java.lang.Object.wait(Object.java:503)
at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run(ReduceTask.java:1324)
- locked <0x00000005bd762eb0> (a java.util.ArrayList)
"MapOutputCopier attempt_201308230927_0001_r_000000_0.3" prio=10 tid=0x00007fc328602000 nid=0x1d75 in Object.wait() [0x00007fc31c587000]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0x00000005bd762eb0> (a java.util.ArrayList)
at java.lang.Object.wait(Object.java:503)
at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run(ReduceTask.java:1324)
- locked <0x00000005bd762eb0> (a java.util.ArrayList)
"MapOutputCopier attempt_201308230927_0001_r_000000_0.2" prio=10 tid=0x00007fc328600000 nid=0x1d73 in Object.wait() [0x00007fc31c688000]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0x00000005bd762eb0> (a java.util.ArrayList)
at java.lang.Object.wait(Object.java:503)
at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run(ReduceTask.java:1324)
- locked <0x00000005bd762eb0> (a java.util.ArrayList)
"MapOutputCopier attempt_201308230927_0001_r_000000_0.1" prio=10 tid=0x00007fc3285ff000 nid=0x1d72 in Object.wait() [0x00007fc31c789000]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0x00000005bd762eb0> (a java.util.ArrayList)
at java.lang.Object.wait(Object.java:503)
at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run(ReduceTask.java:1324)
- locked <0x00000005bd762eb0> (a java.util.ArrayList)
"MapOutputCopier attempt_201308230927_0001_r_000000_0.0" prio=10 tid=0x00007fc3285f8800 nid=0x1d70 in Object.wait() [0x00007fc31c88a000]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0x00000005bd762eb0> (a java.util.ArrayList)
at java.lang.Object.wait(Object.java:503)
at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run(ReduceTask.java:1324)
- locked <0x00000005bd762eb0> (a java.util.ArrayList)
"communication thread" daemon prio=10 tid=0x00007fc3285d2000 nid=0x1d53 in Object.wait() [0x00007fc31c9b3000]
java.lang.Thread.State: TIMED_WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0x00000005bd762e90> (a java.lang.Object)
at org.apache.hadoop.mapred.Task$TaskReporter.run(Task.java:658)
- locked <0x00000005bd762e90> (a java.lang.Object)
at java.lang.Thread.run(Thread.java:724)
"Timer for 'ReduceTask' metrics system" daemon prio=10 tid=0x00007fc3285b1000 nid=0x1d49 in Object.wait() [0x00007fc31cbb5000]
java.lang.Thread.State: TIMED_WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0x00000005bd919a30> (a java.util.TaskQueue)
at java.util.TimerThread.mainLoop(Timer.java:552)
- locked <0x00000005bd919a30> (a java.util.TaskQueue)
at java.util.TimerThread.run(Timer.java:505)
"Thread for syncLogs" daemon prio=10 tid=0x00007fc328494000 nid=0x1d3e waiting on condition [0x00007fc31cebd000]
java.lang.Thread.State: TIMED_WAITING (sleeping)
at java.lang.Thread.sleep(Native Method)
at org.apache.hadoop.mapred.Child$3.run(Child.java:139)
"IPC Client (47) connection to /127.0.0.1:35127 from job_201308230927_0001" daemon prio=10 tid=0x00007fc328492800 nid=0x1d3d in Object.wait() [0x00007fc31cfbe000]
java.lang.Thread.State: TIMED_WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0x00000005bd721b60> (a org.apache.hadoop.ipc.Client$Connection)
at org.apache.hadoop.ipc.Client$Connection.waitForWork(Client.java:747)
- locked <0x00000005bd721b60> (a org.apache.hadoop.ipc.Client$Connection)
at org.apache.hadoop.ipc.Client$Connection.run(Client.java:789)
"Service Thread" daemon prio=10 tid=0x00007fc3280f4000 nid=0x1cf7 runnable [0x0000000000000000]
java.lang.Thread.State: RUNNABLE
"C2 CompilerThread1" daemon prio=10 tid=0x00007fc3280f1800 nid=0x1cf5 waiting on condition [0x0000000000000000]
java.lang.Thread.State: RUNNABLE
"C2 CompilerThread0" daemon prio=10 tid=0x00007fc3280ee800 nid=0x1cf4 waiting on condition [0x0000000000000000]
java.lang.Thread.State: RUNNABLE
"Signal Dispatcher" daemon prio=10 tid=0x00007fc3280ec800 nid=0x1cf3 runnable [0x0000000000000000]
java.lang.Thread.State: RUNNABLE
"Finalizer" daemon prio=10 tid=0x00007fc32809e000 nid=0x1ce5 in Object.wait() [0x00007fc2c1b7f000]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0x00000005bd6fb1f8> (a java.lang.ref.ReferenceQueue$Lock)
at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:135)
- locked <0x00000005bd6fb1f8> (a java.lang.ref.ReferenceQueue$Lock)
at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:151)
at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:189)
"Reference Handler" daemon prio=10 tid=0x00007fc32809c000 nid=0x1ce4 in Object.wait() [0x00007fc2c1c80000]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0x00000005bd6fade8> (a java.lang.ref.Reference$Lock)
at java.lang.Object.wait(Object.java:503)
at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:133)
- locked <0x00000005bd6fade8> (a java.lang.ref.Reference$Lock)
"main" prio=10 tid=0x00007fc32800b000 nid=0x1cc8 waiting on condition [0x00007fc32dc3a000]
java.lang.Thread.State: TIMED_WAITING (sleeping)
at java.lang.Thread.sleep(Native Method)
at org.apache.hadoop.mapred.ReduceTask$ReduceCopier.fetchOutputs(ReduceTask.java:2191)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:386)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
at org.apache.hadoop.mapred.Child.main(Child.java:249)
"VM Thread" prio=10 tid=0x00007fc328094800 nid=0x1cdf runnable
"GC task thread#0 (ParallelGC)" prio=10 tid=0x00007fc328018800 nid=0x1ccc runnable
"GC task thread#1 (ParallelGC)" prio=10 tid=0x00007fc32801a800 nid=0x1cce runnable
"GC task thread#2 (ParallelGC)" prio=10 tid=0x00007fc32801c800 nid=0x1cd7 runnable
"GC task thread#3 (ParallelGC)" prio=10 tid=0x00007fc32801e000 nid=0x1cd8 runnable
"VM Periodic Task Thread" prio=10 tid=0x00007fc3280fe800 nid=0x1cf8 waiting on condition
JNI global references: 224
During mapping the net traffic is ~20 MiB on master, but when reduce starts, net traffic goes down to 3 KiB.
iostat
of my machines.
Master during map:
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await svctm %util
sdd 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdb 0.00 0.00 0.00 7.00 0.00 0.02 6.29 0.48 68.43 68.29 47.80
sda 0.00 0.00 43.00 7.00 5.38 0.02 221.04 0.22 4.42 2.78 13.90
sdc 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
md1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
md2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
md3 0.00 0.00 43.00 3.00 5.38 0.01 239.83 0.00 0.00 0.00 0.00
md0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Filesystem: rMB_nor/s wMB_nor/s rMB_dir/s wMB_dir/s rMB_svr/s wMB_svr/s ops/s rops/s wops/s
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await svctm %util
sdd 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdb 0.00 14.00 0.00 53.00 0.00 1.34 51.66 1.58 29.77 5.38 28.50
sda 3.00 14.00 34.00 53.00 4.62 1.34 140.34 1.27 14.55 3.84 33.40
sdc 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
md1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
md2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
md3 0.00 0.00 37.00 62.00 4.62 1.32 122.99 0.00 0.00 0.00 0.00
md0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Slave during map:
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await svctm %util
sdd 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sda 2.00 0.00 12.00 4.00 1.75 0.01 225.25 0.76 47.50 25.19 40.30
sdb 0.00 0.00 0.00 6.00 0.00 0.02 6.00 0.09 20.00 14.67 8.80
sdc 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
md1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
md2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
md3 0.00 0.00 14.00 2.00 1.75 0.01 225.00 0.00 0.00 0.00 0.00
md0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Filesystem: rMB_nor/s wMB_nor/s rMB_dir/s wMB_dir/s rMB_svr/s wMB_svr/s ops/s rops/s wops/s
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await svctm %util
sdd 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sda 0.00 0.00 28.00 4.00 3.50 0.01 224.81 0.39 12.28 7.16 22.90
sdb 0.00 0.00 5.00 3.00 0.42 0.01 110.25 0.25 31.50 22.12 17.70
sdc 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
md1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
md2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
md3 0.00 0.00 33.00 0.00 3.92 0.00 243.39 0.00 0.00 0.00 0.00
md0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Master stopped:
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await svctm %util
sdd 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdb 0.00 0.00 8.00 1.00 1.00 0.00 228.44 0.03 3.44 3.00 2.70
sda 0.00 0.00 0.00 1.00 0.00 0.00 8.00 0.01 13.00 13.00 1.30
sdc 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
md1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
md2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
md3 0.00 0.00 8.00 0.00 1.00 0.00 256.00 0.00 0.00 0.00 0.00
md0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Filesystem: rMB_nor/s wMB_nor/s rMB_dir/s wMB_dir/s rMB_svr/s wMB_svr/s ops/s rops/s wops/s
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await svctm %util
sdd 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdb 0.00 0.00 8.00 0.00 1.00 0.00 256.00 0.01 0.62 0.50 0.40
sda 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdc 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
md1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
md2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
md3 0.00 0.00 8.00 0.00 1.00 0.00 256.00 0.00 0.00 0.00 0.00
md0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Filesystem: rMB_nor/s wMB_nor/s rMB_dir/s wMB_dir/s rMB_svr/s wMB_svr/s ops/s rops/s wops/s
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await svctm %util
sdd 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdb 0.00 0.00 8.00 0.00 1.00 0.00 256.00 0.02 2.38 2.38 1.90
sda 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdc 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
md1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
md2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
md3 0.00 0.00 8.00 0.00 1.00 0.00 256.00 0.00 0.00 0.00 0.00
md0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Filesystem: rMB_nor/s wMB_nor/s rMB_dir/s wMB_dir/s rMB_svr/s wMB_svr/s ops/s rops/s wops/s
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await svctm %util
sdd 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdb 0.00 0.00 8.00 0.00 1.00 0.00 256.00 0.01 0.75 0.50 0.40
sda 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdc 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
md1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
md2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
md3 0.00 0.00 8.00 0.00 1.00 0.00 256.00 0.00 0.00 0.00 0.00
md0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Slave stopped:
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await svctm %util
sdd 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sda 0.00 0.00 8.00 0.00 1.00 0.00 256.00 0.01 1.38 1.12 0.90
sdb 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdc 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
md1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
md2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
md3 0.00 0.00 8.00 0.00 1.00 0.00 256.00 0.00 0.00 0.00 0.00
md0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Filesystem: rMB_nor/s wMB_nor/s rMB_dir/s wMB_dir/s rMB_svr/s wMB_svr/s ops/s rops/s wops/s
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await svctm %util
sdd 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sda 0.00 0.00 7.00 0.00 0.88 0.00 256.00 0.01 0.71 0.57 0.40
sdb 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdc 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
md1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
md2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
md3 0.00 0.00 7.00 0.00 0.88 0.00 256.00 0.00 0.00 0.00 0.00
md0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Filesystem: rMB_nor/s wMB_nor/s rMB_dir/s wMB_dir/s rMB_svr/s wMB_svr/s ops/s rops/s wops/s
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await svctm %util
sdd 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sda 0.00 0.00 8.00 0.00 1.00 0.00 256.00 0.01 0.75 0.62 0.50
sdb 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdc 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
md1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
md2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
md3 0.00 0.00 8.00 0.00 1.00 0.00 256.00 0.00 0.00 0.00 0.00
md0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
I've solved my problem. I had an incorrect value in /etc/hosts.
Earlier:
ip alias
Now:
ip domain alias

Uneven CPU load distribution

I have a system with uneven CPU load in a odd pattern. It's serving up apache, elastic search, redis, and email.
Here's the mpstat output. Notice how %usr for the last 12 cores is well below the top 12.
# mpstat -P ALL
Linux 3.5.0-17-generic (<server1>) 02/16/2013 _x86_64_ (24 CPU)
10:21:46 PM CPU %usr %nice %sys %iowait %irq %soft %steal %guest %idle
10:21:46 PM all 17.15 0.00 2.20 0.33 0.00 0.09 0.00 0.00 80.23
10:21:46 PM 0 27.34 0.00 4.08 0.56 0.00 0.53 0.00 0.00 67.48
10:21:46 PM 1 24.51 0.00 3.25 0.53 0.00 0.34 0.00 0.00 71.38
10:21:46 PM 2 26.69 0.00 4.20 0.50 0.00 0.24 0.00 0.00 68.36
10:21:46 PM 3 24.38 0.00 3.04 0.70 0.00 0.23 0.00 0.00 71.65
10:21:46 PM 4 24.50 0.00 4.04 0.57 0.00 0.15 0.00 0.00 70.74
10:21:46 PM 5 21.75 0.00 2.80 0.74 0.00 0.15 0.00 0.00 74.55
10:21:46 PM 6 28.30 0.00 3.75 0.84 0.00 0.04 0.00 0.00 67.07
10:21:46 PM 7 30.20 0.00 3.94 0.16 0.00 0.03 0.00 0.00 65.67
10:21:46 PM 8 30.55 0.00 4.09 0.12 0.00 0.03 0.00 0.00 65.21
10:21:46 PM 9 32.66 0.00 3.40 0.09 0.00 0.03 0.00 0.00 63.81
10:21:46 PM 10 32.20 0.00 3.57 0.08 0.00 0.03 0.00 0.00 64.12
10:21:46 PM 11 32.08 0.00 3.92 0.08 0.00 0.03 0.00 0.00 63.88
10:21:46 PM 12 4.53 0.00 0.41 0.34 0.00 0.04 0.00 0.00 94.68
10:21:46 PM 13 9.14 0.00 1.42 0.32 0.00 0.04 0.00 0.00 89.08
10:21:46 PM 14 5.92 0.00 0.70 0.35 0.00 0.06 0.00 0.00 92.97
10:21:46 PM 15 6.14 0.00 0.66 0.35 0.00 0.04 0.00 0.00 92.81
10:21:46 PM 16 7.39 0.00 0.65 0.34 0.00 0.04 0.00 0.00 91.57
10:21:46 PM 17 6.60 0.00 0.83 0.39 0.00 0.05 0.00 0.00 92.13
10:21:46 PM 18 5.49 0.00 0.54 0.30 0.00 0.01 0.00 0.00 93.65
10:21:46 PM 19 6.78 0.00 0.88 0.21 0.00 0.01 0.00 0.00 92.12
10:21:46 PM 20 6.17 0.00 0.58 0.11 0.00 0.01 0.00 0.00 93.13
10:21:46 PM 21 5.78 0.00 0.82 0.10 0.00 0.01 0.00 0.00 93.29
10:21:46 PM 22 6.29 0.00 0.60 0.10 0.00 0.01 0.00 0.00 93.00
10:21:46 PM 23 6.18 0.00 0.61 0.10 0.00 0.01 0.00 0.00 93.10
I have another system, a database server running MySQL, which shows an even distribution.
# mpstat -P ALL
Linux 3.5.0-17-generic (<server2>) 02/16/2013 _x86_64_ (32 CPU)
10:27:57 PM CPU %usr %nice %sys %iowait %irq %soft %steal %guest %idle
10:27:57 PM all 0.77 0.00 0.07 0.68 0.00 0.00 0.00 0.00 98.47
10:27:57 PM 0 2.31 0.00 0.19 1.86 0.00 0.01 0.00 0.00 95.63
10:27:57 PM 1 1.73 0.00 0.17 1.87 0.00 0.01 0.00 0.00 96.21
10:27:57 PM 2 2.62 0.00 0.25 2.51 0.00 0.01 0.00 0.00 94.62
10:27:57 PM 3 1.60 0.00 0.17 1.99 0.00 0.01 0.00 0.00 96.23
10:27:57 PM 4 1.86 0.00 0.16 1.84 0.00 0.01 0.00 0.00 96.13
10:27:57 PM 5 2.30 0.00 0.25 2.45 0.00 0.01 0.00 0.00 94.99
10:27:57 PM 6 2.05 0.00 0.20 1.89 0.00 0.01 0.00 0.00 95.86
10:27:57 PM 7 2.13 0.00 0.20 2.31 0.00 0.01 0.00 0.00 95.36
10:27:57 PM 8 0.82 0.00 0.11 4.05 0.00 0.03 0.00 0.00 94.99
10:27:57 PM 9 0.70 0.00 0.18 0.06 0.00 0.00 0.00 0.00 99.06
10:27:57 PM 10 0.18 0.00 0.04 0.01 0.00 0.00 0.00 0.00 99.77
10:27:57 PM 11 0.20 0.00 0.01 0.01 0.00 0.00 0.00 0.00 99.78
10:27:57 PM 12 0.13 0.00 0.01 0.01 0.00 0.00 0.00 0.00 99.86
10:27:57 PM 13 0.04 0.00 0.01 0.00 0.00 0.00 0.00 0.00 99.95
10:27:57 PM 14 0.03 0.00 0.01 0.00 0.00 0.00 0.00 0.00 99.97
10:27:57 PM 15 0.03 0.00 0.00 0.00 0.00 0.00 0.00 0.00 99.97
10:27:57 PM 16 0.05 0.00 0.00 0.00 0.00 0.00 0.00 0.00 99.94
10:27:57 PM 17 0.41 0.00 0.10 0.04 0.00 0.00 0.00 0.00 99.45
10:27:57 PM 18 2.78 0.00 0.06 0.14 0.00 0.00 0.00 0.00 97.01
10:27:57 PM 19 1.19 0.00 0.08 0.19 0.00 0.00 0.00 0.00 98.53
10:27:57 PM 20 0.48 0.00 0.04 0.30 0.00 0.00 0.00 0.00 99.17
10:27:57 PM 21 0.70 0.00 0.03 0.16 0.00 0.00 0.00 0.00 99.11
10:27:57 PM 22 0.08 0.00 0.01 0.02 0.00 0.00 0.00 0.00 99.90
10:27:57 PM 23 0.30 0.00 0.02 0.06 0.00 0.00 0.00 0.00 99.62
10:27:57 PM 24 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00
10:27:57 PM 25 0.04 0.00 0.03 0.00 0.00 0.00 0.00 0.00 99.94
10:27:57 PM 26 0.06 0.00 0.01 0.00 0.00 0.00 0.00 0.00 99.93
10:27:57 PM 27 0.01 0.00 0.01 0.00 0.00 0.00 0.00 0.00 99.98
10:27:57 PM 28 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 99.99
10:27:57 PM 29 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00
10:27:57 PM 30 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00
10:27:57 PM 31 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 99.99
Both are dedicated systems running Ubuntu 12.10 (not virtual).
I've thought and read up about setting nice, taskset, or trying to tweak the scheduler but I don't want to make any rash decisions. Also, this system isn't performing "bad" per-se, I just want to ensure all cores are being utilized properly.
Let me know if I can provide additional information. Any suggestions to even the CPU load on "server1" are greatly appreciated.
This is not a problem until some cores hit 100% and others don't (i.e. in the statistics you've shown us, there's nothing that would suggest that the uneven distribution is negatively affecting the performance). In your case, you probably have quite a few processes that distribute evenly, resulting in a base load of 6-10% on each core, and then ~12 more threads that require 10-20% of a core each. You can't split a single process/thread between cores.

is it possible to create a graph by shell script

i want to create a graph file using shell script. For example, i want to make graph of sar output of my system.
sar 1 10
05:36:32 AM CPU %user %nice %system %iowait %steal %idle
05:36:33 AM all 0.00 0.00 0.00 0.00 0.00 100.00
05:36:34 AM all 0.00 0.00 0.00 0.00 0.00 100.00
05:36:35 AM all 0.00 0.00 0.00 0.00 0.00 100.00
05:36:36 AM all 0.00 0.00 0.00 0.00 0.00 100.00
05:36:37 AM all 0.00 0.00 0.00 0.00 0.00 100.00
05:36:38 AM all 0.00 0.00 0.00 0.00 0.00 100.00
05:36:39 AM all 0.00 0.00 0.00 0.00 0.00 100.00
05:36:40 AM all 0.00 0.00 0.00 0.00 0.00 100.00
05:36:41 AM all 0.00 0.00 0.00 0.00 0.00 100.00
05:36:42 AM all 0.00 0.00 0.00 0.00 0.00 100.00
Average: all 0.00 0.00 0.00 0.00 0.00 100.00
As a visualizer you can use Gnuplot.

Resources