Parse iostat output - bash

I have a need to grep only certain lines in a log file generated with iostat. iostat command is iostat -x 1 -m > disk.log and it saves a file like this:
Linux 2.6.32-358.18.1.el6.x86_64 (parekosam) 11/26/2013 _x86_64_ (2 CPU)
avg-cpu: %user %nice %system %iowait %steal %idle
0.04 0.01 0.14 0.28 0.00 99.53
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await svctm %util
sda 72.44 6.67 4.15 0.34 0.33 0.03 162.23 0.02 3.92 1.77 0.80
dm-0 0.00 0.00 1.30 6.96 0.03 0.03 15.11 0.65 78.37 0.69 0.57
dm-1 0.00 0.00 0.07 0.00 0.00 0.00 7.99 0.00 2.57 0.67 0.00
avg-cpu: %user %nice %system %iowait %steal %idle
0.00 0.00 0.00 1.01 0.00 98.99
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await svctm %util
sda 0.00 5.00 0.00 3.00 0.00 0.03 18.67 0.03 10.67 10.67 3.20
dm-0 0.00 0.00 0.00 7.00 0.00 0.03 8.00 0.04 5.29 4.57 3.20
dm-1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
I'd like to only show rMB/s and wMB/s columns so that I can calculate average speeds. I've tried some things with sed and awk but with little success. Ideal output should look like this:
12.27 10.23
11.27 10.22
15.26 20.23
12.24 10.25
12.26 50.23
12.23 10.26
13.23 23.23
12.22 10.23
12.23 10.23
22.23 14.27
13.21 10.23
12.23 10.23
14.22 10.23
12.23 10.21

Please notice this is for 'sda' only.
iostat -x 1 -m | awk '/sda/ { print $6, $7}'

Does this do what you want?
/^$/ {a=""}
a {print $6,$7}
/^Device/ {a=1}

Related

Mariadb 10 Throughput bottleneck

I have a machine Intel Atom C2758 (8Core) with an WD WD500LPLX (7200 RPM 500GB) 2.5" HDD. OS is Ubuntu 15.10 (kernel 4.2) with MariaDB 10 (default config)
I am insert about 20-30 rows per second (row size 355 bytes). It looks like the bottleneck is with the HDD IO but the wMB/s is low and utilization is high.
What else could i do to make it take more transactions.
Thank you.
With:
iostat -m -x 1
avg-cpu: %user %nice %system %iowait %steal %idle
0.50 0.00 0.50 12.20 0.00 86.79
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
sda 0.00 22.00 0.00 61.00 0.00 0.39 12.98 1.00 15.74 0.00 15.74 16.13 98.40
avg-cpu: %user %nice %system %iowait %steal %idle
0.50 0.00 0.00 12.31 0.00 87.19
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
sda 0.00 24.00 0.00 68.00 0.00 0.45 13.53 0.99 14.71 0.00 14.71 14.24 96.80
avg-cpu: %user %nice %system %iowait %steal %idle
0.75 0.00 0.50 12.31 0.00 86.43
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
sda 0.00 27.00 0.00 75.00 0.00 0.48 13.12 1.02 13.71 0.00 13.71 12.91 96.80
INSERT is too slow?
innodb_flush_log_at_trx_commit = 2 is faster, but slightly less secure.
"Batch" the inserts: either INSERT ... VALUES (...), (...) ... or use LOAD DATA.

Unix Shell: Summing up values, one per line but skipping every nth row

I am trying to design a Unix shell script (preferably generic sh) that will take a file whose contents are numbers, one per line. These numbers are the CPU idle time from mpstat obtained by:
cat ${PARSE_FILE} | awk '{print $13}' | grep "^[!0-9]" > temp.txt
So the file is a list if numbers, like:
46.19
93.41
73.60
99.40
95.80
96.00
77.10
99.20
52.76
81.18
69.38
89.80
97.00
97.40
76.18
97.10
What these values really are is that line 1 is for Core 1, line 2 for Core 2, etc... for X number of cores (in my case 8) - so every 9th line is again for Core 1, etc...
The original file looks something like this:
10/28/2013 Linux 2.6.32-358.el6.x86_64 (host) 10/28/2013 _x86_64_
(32 CPU)
10/28/2013
10/28/2013 02:25:05 PM CPU %usr %nice %sys %iowait %irq %soft %steal %guest %idle
10/28/2013 02:25:15 PM 0 51.20 0.00 2.61 0.00 0.00 0.00 0.00 0.00 46.19
10/28/2013 02:25:15 PM 1 6.09 0.00 0.50 0.00 0.00 0.00 0.00 0.00 93.41
10/28/2013 02:25:15 PM 2 25.20 0.00 1.20 0.00 0.00 0.00 0.00 0.00 73.60
10/28/2013 02:25:15 PM 3 0.40 0.00 0.20 0.00 0.00 0.00 0.00 0.00 99.40
10/28/2013 02:25:15 PM 4 3.80 0.00 0.40 0.00 0.00 0.00 0.00 0.00 95.80
10/28/2013 02:25:15 PM 5 3.70 0.00 0.30 0.00 0.00 0.00 0.00 0.00 96.00
10/28/2013 02:25:15 PM 6 21.70 0.00 1.20 0.00 0.00 0.00 0.00 0.00 77.10
10/28/2013 02:25:15 PM 7 0.70 0.00 0.10 0.00 0.00 0.00 0.00 0.00 99.20
10/28/2013 02:25:25 PM 0 45.03 0.00 1.61 0.00 0.00 0.60 0.00 0.00 52.76
10/28/2013 02:25:25 PM 1 17.82 0.00 1.00 0.00 0.00 0.00 0.00 0.00 81.18
10/28/2013 02:25:25 PM 2 29.62 0.00 1.00 0.00 0.00 0.00 0.00 0.00 69.38
10/28/2013 02:25:25 PM 3 9.70 0.00 0.40 0.00 0.00 0.10 0.00 0.00 89.80
10/28/2013 02:25:25 PM 4 2.40 0.00 0.60 0.00 0.00 0.00 0.00 0.00 97.00
10/28/2013 02:25:25 PM 5 2.00 0.00 0.60 0.00 0.00 0.00 0.00 0.00 97.40
10/28/2013 02:25:25 PM 6 22.92 0.00 0.90 0.00 0.00 0.00 0.00 0.00 76.18
10/28/2013 02:25:25 PM 7 2.40 0.00 0.50 0.00 0.00 0.00 0.00 0.00 97.10
I'm trying to design a script that will take the number of cores and this file as a variable and get me the average for each core and I'm not sure how to do this. Here is what I have:
cat ${PARSE_FILE} | awk '{print $13}' | grep "^[!0-9]" > temp.txt
NUMBER_OF_CORES=8
NUMBER_OF_LINES=`awk ' END { print NR } ' temp.txt`
NUMBER_OF_VALUES=`echo "scale=0;${NUMBER_OF_LINES}/${NUMBER_OF_CORES}" | bc`
for i in `seq 1 ${NUMBER_OF_CORES}`
do
awk 'NR % $i == 0' temp.txt
echo Core: ${i} Average: xx
done
So I have the number of values (lines over cores) that each core has, so that is every nth line I need to skip but I'm not sure how to cleanly do this. I basically need to loop every "NUMBER_OF_CORES" times through the file, skipping every "NUMBER_OF_CORES" line and summing them up to divide by "NUMBER_OF_VALUES".
Will this do ?
awk '/CPU/&&/idle/{f=1;next}f{a[$4]+=$13;b[$4]++}END{for(i in a){print i,a[i]/b[i]}}' your_file
Actually the number of cores is not needed here. It will calculate average idle time for all the cores available in the file
Tested:
> cat temp
10/28/2013 Linux 2.6.32-358.el6.x86_64 (host) 10/28/2013 _x86_64_
(32 CPU)
10/28/2013
10/28/2013 02:25:05 PM CPU %usr %nice %sys %iowait %irq %soft %steal %guest %idle
10/28/2013 02:25:15 PM 0 51.20 0.00 2.61 0.00 0.00 0.00 0.00 0.00 46.19
10/28/2013 02:25:15 PM 1 6.09 0.00 0.50 0.00 0.00 0.00 0.00 0.00 93.41
10/28/2013 02:25:15 PM 2 25.20 0.00 1.20 0.00 0.00 0.00 0.00 0.00 73.60
10/28/2013 02:25:15 PM 3 0.40 0.00 0.20 0.00 0.00 0.00 0.00 0.00 99.40
10/28/2013 02:25:15 PM 4 3.80 0.00 0.40 0.00 0.00 0.00 0.00 0.00 95.80
10/28/2013 02:25:15 PM 5 3.70 0.00 0.30 0.00 0.00 0.00 0.00 0.00 96.00
10/28/2013 02:25:15 PM 6 21.70 0.00 1.20 0.00 0.00 0.00 0.00 0.00 77.10
10/28/2013 02:25:15 PM 7 0.70 0.00 0.10 0.00 0.00 0.00 0.00 0.00 99.20
10/28/2013 02:25:25 PM 0 45.03 0.00 1.61 0.00 0.00 0.60 0.00 0.00 52.76
10/28/2013 02:25:25 PM 1 17.82 0.00 1.00 0.00 0.00 0.00 0.00 0.00 81.18
10/28/2013 02:25:25 PM 2 29.62 0.00 1.00 0.00 0.00 0.00 0.00 0.00 69.38
10/28/2013 02:25:25 PM 3 9.70 0.00 0.40 0.00 0.00 0.10 0.00 0.00 89.80
10/28/2013 02:25:25 PM 4 2.40 0.00 0.60 0.00 0.00 0.00 0.00 0.00 97.00
10/28/2013 02:25:25 PM 5 2.00 0.00 0.60 0.00 0.00 0.00 0.00 0.00 97.40
10/28/2013 02:25:25 PM 6 22.92 0.00 0.90 0.00 0.00 0.00 0.00 0.00 76.18
10/28/2013 02:25:25 PM 7 2.40 0.00 0.50 0.00 0.00 0.00 0.00 0.00 97.10
> nawk '/CPU/&&/idle/{f=1;next}f{a[$4]+=$13;b[$4]++}END{for(i in a){print i,a[i]/b[i]}}' temp
2 71.49
3 94.6
4 96.4
5 96.7
6 76.64
7 98.15
0 49.475
1 87.295
>
The script below countCores.sh is based on the data you gave in temp.txt
This may not be what you want but will give you some ideas. I was'nt sure
what overall total average you wanted so I just chose to show average of the values
in column one for all 8 cores. I also used cat -n to represent the core number.
Hope This helps. VonBell
#!/bin/bash
#Execute As: countCores.sh temp.txt 8
AllCoreTotals=0
DataFile="$1"
NumCores="$2"
AllCoreTotals=0
NumLines="`cat -n $DataFile|cut -f1|tail -1|tr -d " "`"
PrtCols="`echo $NumLines / $NumCores|bc`"
clear;echo;echo
echo "============================================================="
pr -t${PrtCols} $DataFile|tr -d "\t"|tr -s " " "+"|bc |\
while read CoreTotal
do
CoreAverage=`echo $CoreTotal / $PrtCols|bc`
echo "$CoreTotal Core Average $CoreAverage"
AllCoreTotals="`echo $CoreTotal + $AllCoreTotals|bc`"
echo "$AllCoreTotals" > AllCoreTot.tmp
done|cat -n
AllCoreAverage=`cat AllCoreTot.tmp`
AllCoreAverage="`echo $AllCoreAverage / $NumCores|bc`"
echo "============================================================="
echo "(Col One) Total Core Average: $AllCoreAverage "
rm $DataFile
rm AllCoreTot.tmp
Why not do it for all cores at the same time:
awk -f prog.awk ${PARSE_FILE}
Then in prog.awk put
{ if ((NF == 13) && ($4 != "CPU"))
{ SUM[$4] += $13;
CNT[$4]++;
}
}
END { for(loop in SUM)
{ printf("CPU: %d Total: %d Count: %d Average: %d\n",
loop, SUM[loop], CNT[loop], SUM[loop]/CNT[loop]);
}
}
If you want to do it on one line:
awk '{if ((NF == 13) && ($4 != "CPU")){SUM[$4] += $13;CNT[$4]++;}} END {for(loop in SUM){printf("CPU: %d Total: %d Count: %d Average: %d\n", loop, SUM[loop], CNT[loop], SUM[loop]/CNT[loop]);}}' ${PARSE_FILE}
After some more study, this snippet seems to do the trick:
#Parse logs to get CPU averages for cores
PARSE_FILE=`ls ~/logs/*mpstat*`
echo "Parsing ${PARSE_FILE}..."
cat ${PARSE_FILE} | awk '{print $13}' | grep "^[!0-9]" > temp.txt
NUMBER_OF_CORES=8
NUMBER_OF_LINES=`awk ' END { print NR } ' temp.txt`
NUMBER_OF_VALUES=`echo "scale=0;${NUMBER_OF_LINES}/${NUMBER_OF_CORES}" | bc`
TOTAL=0
for i in `seq 1 ${NUMBER_OF_CORES}`
do
sed -n $i'~'$NUMBER_OF_CORES'p' temp.txt > temp2.txt
SUM=`awk '{s+=$0} END {print s}' temp2.txt`
AVERAGE=`echo "scale=0;${SUM}/${NUMBER_OF_VALUES}" | bc`
echo Core: ${i} Average: `expr 100 - ${AVERAGE}`
TOTAL=$((TOTAL+${AVERAGE}))
done
TOTAL_AVERAGE=`echo "scale=0;${TOTAL}/${NUMBER_OF_CORES}" | bc`
echo "Total Average: `expr 100 - ${TOTAL_AVERAGE}`"
rm temp*.txt

Hadoop cluster stops after mapping

I have a hadoop cluster consisting of three machines. I put on hadoop 20 G file, I start hadoop and it stops after mapping.
"13/08/22 08:09:34 INFO mapred.JobClient: map 100% reduce 11%"
After mapping all cpu don't work. I can wait one day, but it can't start again.
What can I do?
This is last 10 lines of my log file, when map is 100% and reduce is 11%:
2013-08-22 14:15:32,503 INFO org.apache.hadoop.mapred.MapTask: Starting flush of map output
2013-08-22 14:15:32,542 INFO org.apache.hadoop.mapred.MapTask: Finished spill 67
2013-08-22 14:15:32,552 INFO org.apache.hadoop.mapred.Merger: Merging 68 sorted segments
2013-08-22 14:15:32,558 INFO org.apache.hadoop.mapred.Merger: Merging 5 intermediate segments out of a total of 68
2013-08-22 14:15:32,622 INFO org.apache.hadoop.mapred.Merger: Down to the last merge-pass, with 64 segments left of total size: 1600710 bytes
2013-08-22 14:15:32,708 INFO org.apache.hadoop.mapred.Task: Task:attempt_201308221308_0002_m_000302_0 is done. And is in the process of commiting
2013-08-22 14:15:32,717 INFO org.apache.hadoop.mapred.Task: Task 'attempt_201308221308_0002_m_000302_0' done.
2013-08-22 14:15:32,759 INFO org.apache.hadoop.mapred.TaskLogsTruncater: Initializing logs' truncater with mapRetainSize=-1 and reduceRetainSize=-1
2013-08-22 14:15:32,774 INFO org.apache.hadoop.io.nativeio.NativeIO: Initialized cache for UID to User mapping with a cache timeout of 14400 seconds.
2013-08-22 14:15:32,774 INFO org.apache.hadoop.io.nativeio.NativeIO: Got UserName llobocki for UID 1000 from the native implementation
My Child of master hadoop thread dump, when map is 100% and reduce is 11%:
2013-08-23 11:37:26
Full thread dump Java HotSpot(TM) 64-Bit Server VM (23.25-b01 mixed mode):
"Attach Listener" daemon prio=10 tid=0x0000000000f85800 nid=0x3873 waiting on condition [0x0000000000000000]
java.lang.Thread.State: RUNNABLE
"Thread for polling Map Completion Events" daemon prio=10 tid=0x00007fc32860c800 nid=0x1d7a waiting on condition [0x00007fc31c183000]
java.lang.Thread.State: TIMED_WAITING (sleeping)
at java.lang.Thread.sleep(Native Method)
at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$GetMapEventsThread.run(ReduceTask.java:2882)
"Thread for merging in memory files" daemon prio=10 tid=0x00007fc32860a800 nid=0x1d78 in Object.wait() [0x00007fc31c284000]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0x00000005bd6dd7c8> (a java.lang.Object)
at java.lang.Object.wait(Object.java:503)
at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$ShuffleRamManager.waitForDataToMerge(ReduceTask.java:1197)
- locked <0x00000005bd6dd7c8> (a java.lang.Object)
at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$InMemFSMergeThread.run(ReduceTask.java:2760)
"Thread for merging on-disk files" daemon prio=10 tid=0x00007fc328608000 nid=0x1d77 in Object.wait() [0x00007fc31c385000]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0x00000005bd713988> (a java.util.TreeSet)
at java.lang.Object.wait(Object.java:503)
at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$LocalFSMerger.run(ReduceTask.java:2654)
- locked <0x00000005bd713988> (a java.util.TreeSet)
"MapOutputCopier attempt_201308230927_0001_r_000000_0.4" prio=10 tid=0x00007fc328606800 nid=0x1d76 in Object.wait() [0x00007fc31c486000]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0x00000005bd762eb0> (a java.util.ArrayList)
at java.lang.Object.wait(Object.java:503)
at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run(ReduceTask.java:1324)
- locked <0x00000005bd762eb0> (a java.util.ArrayList)
"MapOutputCopier attempt_201308230927_0001_r_000000_0.3" prio=10 tid=0x00007fc328602000 nid=0x1d75 in Object.wait() [0x00007fc31c587000]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0x00000005bd762eb0> (a java.util.ArrayList)
at java.lang.Object.wait(Object.java:503)
at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run(ReduceTask.java:1324)
- locked <0x00000005bd762eb0> (a java.util.ArrayList)
"MapOutputCopier attempt_201308230927_0001_r_000000_0.2" prio=10 tid=0x00007fc328600000 nid=0x1d73 in Object.wait() [0x00007fc31c688000]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0x00000005bd762eb0> (a java.util.ArrayList)
at java.lang.Object.wait(Object.java:503)
at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run(ReduceTask.java:1324)
- locked <0x00000005bd762eb0> (a java.util.ArrayList)
"MapOutputCopier attempt_201308230927_0001_r_000000_0.1" prio=10 tid=0x00007fc3285ff000 nid=0x1d72 in Object.wait() [0x00007fc31c789000]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0x00000005bd762eb0> (a java.util.ArrayList)
at java.lang.Object.wait(Object.java:503)
at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run(ReduceTask.java:1324)
- locked <0x00000005bd762eb0> (a java.util.ArrayList)
"MapOutputCopier attempt_201308230927_0001_r_000000_0.0" prio=10 tid=0x00007fc3285f8800 nid=0x1d70 in Object.wait() [0x00007fc31c88a000]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0x00000005bd762eb0> (a java.util.ArrayList)
at java.lang.Object.wait(Object.java:503)
at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run(ReduceTask.java:1324)
- locked <0x00000005bd762eb0> (a java.util.ArrayList)
"communication thread" daemon prio=10 tid=0x00007fc3285d2000 nid=0x1d53 in Object.wait() [0x00007fc31c9b3000]
java.lang.Thread.State: TIMED_WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0x00000005bd762e90> (a java.lang.Object)
at org.apache.hadoop.mapred.Task$TaskReporter.run(Task.java:658)
- locked <0x00000005bd762e90> (a java.lang.Object)
at java.lang.Thread.run(Thread.java:724)
"Timer for 'ReduceTask' metrics system" daemon prio=10 tid=0x00007fc3285b1000 nid=0x1d49 in Object.wait() [0x00007fc31cbb5000]
java.lang.Thread.State: TIMED_WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0x00000005bd919a30> (a java.util.TaskQueue)
at java.util.TimerThread.mainLoop(Timer.java:552)
- locked <0x00000005bd919a30> (a java.util.TaskQueue)
at java.util.TimerThread.run(Timer.java:505)
"Thread for syncLogs" daemon prio=10 tid=0x00007fc328494000 nid=0x1d3e waiting on condition [0x00007fc31cebd000]
java.lang.Thread.State: TIMED_WAITING (sleeping)
at java.lang.Thread.sleep(Native Method)
at org.apache.hadoop.mapred.Child$3.run(Child.java:139)
"IPC Client (47) connection to /127.0.0.1:35127 from job_201308230927_0001" daemon prio=10 tid=0x00007fc328492800 nid=0x1d3d in Object.wait() [0x00007fc31cfbe000]
java.lang.Thread.State: TIMED_WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0x00000005bd721b60> (a org.apache.hadoop.ipc.Client$Connection)
at org.apache.hadoop.ipc.Client$Connection.waitForWork(Client.java:747)
- locked <0x00000005bd721b60> (a org.apache.hadoop.ipc.Client$Connection)
at org.apache.hadoop.ipc.Client$Connection.run(Client.java:789)
"Service Thread" daemon prio=10 tid=0x00007fc3280f4000 nid=0x1cf7 runnable [0x0000000000000000]
java.lang.Thread.State: RUNNABLE
"C2 CompilerThread1" daemon prio=10 tid=0x00007fc3280f1800 nid=0x1cf5 waiting on condition [0x0000000000000000]
java.lang.Thread.State: RUNNABLE
"C2 CompilerThread0" daemon prio=10 tid=0x00007fc3280ee800 nid=0x1cf4 waiting on condition [0x0000000000000000]
java.lang.Thread.State: RUNNABLE
"Signal Dispatcher" daemon prio=10 tid=0x00007fc3280ec800 nid=0x1cf3 runnable [0x0000000000000000]
java.lang.Thread.State: RUNNABLE
"Finalizer" daemon prio=10 tid=0x00007fc32809e000 nid=0x1ce5 in Object.wait() [0x00007fc2c1b7f000]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0x00000005bd6fb1f8> (a java.lang.ref.ReferenceQueue$Lock)
at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:135)
- locked <0x00000005bd6fb1f8> (a java.lang.ref.ReferenceQueue$Lock)
at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:151)
at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:189)
"Reference Handler" daemon prio=10 tid=0x00007fc32809c000 nid=0x1ce4 in Object.wait() [0x00007fc2c1c80000]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0x00000005bd6fade8> (a java.lang.ref.Reference$Lock)
at java.lang.Object.wait(Object.java:503)
at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:133)
- locked <0x00000005bd6fade8> (a java.lang.ref.Reference$Lock)
"main" prio=10 tid=0x00007fc32800b000 nid=0x1cc8 waiting on condition [0x00007fc32dc3a000]
java.lang.Thread.State: TIMED_WAITING (sleeping)
at java.lang.Thread.sleep(Native Method)
at org.apache.hadoop.mapred.ReduceTask$ReduceCopier.fetchOutputs(ReduceTask.java:2191)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:386)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
at org.apache.hadoop.mapred.Child.main(Child.java:249)
"VM Thread" prio=10 tid=0x00007fc328094800 nid=0x1cdf runnable
"GC task thread#0 (ParallelGC)" prio=10 tid=0x00007fc328018800 nid=0x1ccc runnable
"GC task thread#1 (ParallelGC)" prio=10 tid=0x00007fc32801a800 nid=0x1cce runnable
"GC task thread#2 (ParallelGC)" prio=10 tid=0x00007fc32801c800 nid=0x1cd7 runnable
"GC task thread#3 (ParallelGC)" prio=10 tid=0x00007fc32801e000 nid=0x1cd8 runnable
"VM Periodic Task Thread" prio=10 tid=0x00007fc3280fe800 nid=0x1cf8 waiting on condition
JNI global references: 224
During mapping the net traffic is ~20 MiB on master, but when reduce starts, net traffic goes down to 3 KiB.
iostat
of my machines.
Master during map:
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await svctm %util
sdd 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdb 0.00 0.00 0.00 7.00 0.00 0.02 6.29 0.48 68.43 68.29 47.80
sda 0.00 0.00 43.00 7.00 5.38 0.02 221.04 0.22 4.42 2.78 13.90
sdc 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
md1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
md2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
md3 0.00 0.00 43.00 3.00 5.38 0.01 239.83 0.00 0.00 0.00 0.00
md0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Filesystem: rMB_nor/s wMB_nor/s rMB_dir/s wMB_dir/s rMB_svr/s wMB_svr/s ops/s rops/s wops/s
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await svctm %util
sdd 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdb 0.00 14.00 0.00 53.00 0.00 1.34 51.66 1.58 29.77 5.38 28.50
sda 3.00 14.00 34.00 53.00 4.62 1.34 140.34 1.27 14.55 3.84 33.40
sdc 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
md1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
md2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
md3 0.00 0.00 37.00 62.00 4.62 1.32 122.99 0.00 0.00 0.00 0.00
md0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Slave during map:
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await svctm %util
sdd 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sda 2.00 0.00 12.00 4.00 1.75 0.01 225.25 0.76 47.50 25.19 40.30
sdb 0.00 0.00 0.00 6.00 0.00 0.02 6.00 0.09 20.00 14.67 8.80
sdc 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
md1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
md2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
md3 0.00 0.00 14.00 2.00 1.75 0.01 225.00 0.00 0.00 0.00 0.00
md0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Filesystem: rMB_nor/s wMB_nor/s rMB_dir/s wMB_dir/s rMB_svr/s wMB_svr/s ops/s rops/s wops/s
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await svctm %util
sdd 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sda 0.00 0.00 28.00 4.00 3.50 0.01 224.81 0.39 12.28 7.16 22.90
sdb 0.00 0.00 5.00 3.00 0.42 0.01 110.25 0.25 31.50 22.12 17.70
sdc 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
md1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
md2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
md3 0.00 0.00 33.00 0.00 3.92 0.00 243.39 0.00 0.00 0.00 0.00
md0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Master stopped:
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await svctm %util
sdd 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdb 0.00 0.00 8.00 1.00 1.00 0.00 228.44 0.03 3.44 3.00 2.70
sda 0.00 0.00 0.00 1.00 0.00 0.00 8.00 0.01 13.00 13.00 1.30
sdc 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
md1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
md2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
md3 0.00 0.00 8.00 0.00 1.00 0.00 256.00 0.00 0.00 0.00 0.00
md0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Filesystem: rMB_nor/s wMB_nor/s rMB_dir/s wMB_dir/s rMB_svr/s wMB_svr/s ops/s rops/s wops/s
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await svctm %util
sdd 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdb 0.00 0.00 8.00 0.00 1.00 0.00 256.00 0.01 0.62 0.50 0.40
sda 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdc 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
md1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
md2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
md3 0.00 0.00 8.00 0.00 1.00 0.00 256.00 0.00 0.00 0.00 0.00
md0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Filesystem: rMB_nor/s wMB_nor/s rMB_dir/s wMB_dir/s rMB_svr/s wMB_svr/s ops/s rops/s wops/s
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await svctm %util
sdd 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdb 0.00 0.00 8.00 0.00 1.00 0.00 256.00 0.02 2.38 2.38 1.90
sda 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdc 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
md1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
md2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
md3 0.00 0.00 8.00 0.00 1.00 0.00 256.00 0.00 0.00 0.00 0.00
md0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Filesystem: rMB_nor/s wMB_nor/s rMB_dir/s wMB_dir/s rMB_svr/s wMB_svr/s ops/s rops/s wops/s
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await svctm %util
sdd 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdb 0.00 0.00 8.00 0.00 1.00 0.00 256.00 0.01 0.75 0.50 0.40
sda 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdc 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
md1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
md2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
md3 0.00 0.00 8.00 0.00 1.00 0.00 256.00 0.00 0.00 0.00 0.00
md0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Slave stopped:
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await svctm %util
sdd 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sda 0.00 0.00 8.00 0.00 1.00 0.00 256.00 0.01 1.38 1.12 0.90
sdb 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdc 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
md1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
md2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
md3 0.00 0.00 8.00 0.00 1.00 0.00 256.00 0.00 0.00 0.00 0.00
md0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Filesystem: rMB_nor/s wMB_nor/s rMB_dir/s wMB_dir/s rMB_svr/s wMB_svr/s ops/s rops/s wops/s
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await svctm %util
sdd 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sda 0.00 0.00 7.00 0.00 0.88 0.00 256.00 0.01 0.71 0.57 0.40
sdb 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdc 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
md1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
md2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
md3 0.00 0.00 7.00 0.00 0.88 0.00 256.00 0.00 0.00 0.00 0.00
md0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Filesystem: rMB_nor/s wMB_nor/s rMB_dir/s wMB_dir/s rMB_svr/s wMB_svr/s ops/s rops/s wops/s
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await svctm %util
sdd 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sda 0.00 0.00 8.00 0.00 1.00 0.00 256.00 0.01 0.75 0.62 0.50
sdb 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdc 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
md1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
md2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
md3 0.00 0.00 8.00 0.00 1.00 0.00 256.00 0.00 0.00 0.00 0.00
md0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
I've solved my problem. I had an incorrect value in /etc/hosts.
Earlier:
ip alias
Now:
ip domain alias

Uneven CPU load distribution

I have a system with uneven CPU load in a odd pattern. It's serving up apache, elastic search, redis, and email.
Here's the mpstat output. Notice how %usr for the last 12 cores is well below the top 12.
# mpstat -P ALL
Linux 3.5.0-17-generic (<server1>) 02/16/2013 _x86_64_ (24 CPU)
10:21:46 PM CPU %usr %nice %sys %iowait %irq %soft %steal %guest %idle
10:21:46 PM all 17.15 0.00 2.20 0.33 0.00 0.09 0.00 0.00 80.23
10:21:46 PM 0 27.34 0.00 4.08 0.56 0.00 0.53 0.00 0.00 67.48
10:21:46 PM 1 24.51 0.00 3.25 0.53 0.00 0.34 0.00 0.00 71.38
10:21:46 PM 2 26.69 0.00 4.20 0.50 0.00 0.24 0.00 0.00 68.36
10:21:46 PM 3 24.38 0.00 3.04 0.70 0.00 0.23 0.00 0.00 71.65
10:21:46 PM 4 24.50 0.00 4.04 0.57 0.00 0.15 0.00 0.00 70.74
10:21:46 PM 5 21.75 0.00 2.80 0.74 0.00 0.15 0.00 0.00 74.55
10:21:46 PM 6 28.30 0.00 3.75 0.84 0.00 0.04 0.00 0.00 67.07
10:21:46 PM 7 30.20 0.00 3.94 0.16 0.00 0.03 0.00 0.00 65.67
10:21:46 PM 8 30.55 0.00 4.09 0.12 0.00 0.03 0.00 0.00 65.21
10:21:46 PM 9 32.66 0.00 3.40 0.09 0.00 0.03 0.00 0.00 63.81
10:21:46 PM 10 32.20 0.00 3.57 0.08 0.00 0.03 0.00 0.00 64.12
10:21:46 PM 11 32.08 0.00 3.92 0.08 0.00 0.03 0.00 0.00 63.88
10:21:46 PM 12 4.53 0.00 0.41 0.34 0.00 0.04 0.00 0.00 94.68
10:21:46 PM 13 9.14 0.00 1.42 0.32 0.00 0.04 0.00 0.00 89.08
10:21:46 PM 14 5.92 0.00 0.70 0.35 0.00 0.06 0.00 0.00 92.97
10:21:46 PM 15 6.14 0.00 0.66 0.35 0.00 0.04 0.00 0.00 92.81
10:21:46 PM 16 7.39 0.00 0.65 0.34 0.00 0.04 0.00 0.00 91.57
10:21:46 PM 17 6.60 0.00 0.83 0.39 0.00 0.05 0.00 0.00 92.13
10:21:46 PM 18 5.49 0.00 0.54 0.30 0.00 0.01 0.00 0.00 93.65
10:21:46 PM 19 6.78 0.00 0.88 0.21 0.00 0.01 0.00 0.00 92.12
10:21:46 PM 20 6.17 0.00 0.58 0.11 0.00 0.01 0.00 0.00 93.13
10:21:46 PM 21 5.78 0.00 0.82 0.10 0.00 0.01 0.00 0.00 93.29
10:21:46 PM 22 6.29 0.00 0.60 0.10 0.00 0.01 0.00 0.00 93.00
10:21:46 PM 23 6.18 0.00 0.61 0.10 0.00 0.01 0.00 0.00 93.10
I have another system, a database server running MySQL, which shows an even distribution.
# mpstat -P ALL
Linux 3.5.0-17-generic (<server2>) 02/16/2013 _x86_64_ (32 CPU)
10:27:57 PM CPU %usr %nice %sys %iowait %irq %soft %steal %guest %idle
10:27:57 PM all 0.77 0.00 0.07 0.68 0.00 0.00 0.00 0.00 98.47
10:27:57 PM 0 2.31 0.00 0.19 1.86 0.00 0.01 0.00 0.00 95.63
10:27:57 PM 1 1.73 0.00 0.17 1.87 0.00 0.01 0.00 0.00 96.21
10:27:57 PM 2 2.62 0.00 0.25 2.51 0.00 0.01 0.00 0.00 94.62
10:27:57 PM 3 1.60 0.00 0.17 1.99 0.00 0.01 0.00 0.00 96.23
10:27:57 PM 4 1.86 0.00 0.16 1.84 0.00 0.01 0.00 0.00 96.13
10:27:57 PM 5 2.30 0.00 0.25 2.45 0.00 0.01 0.00 0.00 94.99
10:27:57 PM 6 2.05 0.00 0.20 1.89 0.00 0.01 0.00 0.00 95.86
10:27:57 PM 7 2.13 0.00 0.20 2.31 0.00 0.01 0.00 0.00 95.36
10:27:57 PM 8 0.82 0.00 0.11 4.05 0.00 0.03 0.00 0.00 94.99
10:27:57 PM 9 0.70 0.00 0.18 0.06 0.00 0.00 0.00 0.00 99.06
10:27:57 PM 10 0.18 0.00 0.04 0.01 0.00 0.00 0.00 0.00 99.77
10:27:57 PM 11 0.20 0.00 0.01 0.01 0.00 0.00 0.00 0.00 99.78
10:27:57 PM 12 0.13 0.00 0.01 0.01 0.00 0.00 0.00 0.00 99.86
10:27:57 PM 13 0.04 0.00 0.01 0.00 0.00 0.00 0.00 0.00 99.95
10:27:57 PM 14 0.03 0.00 0.01 0.00 0.00 0.00 0.00 0.00 99.97
10:27:57 PM 15 0.03 0.00 0.00 0.00 0.00 0.00 0.00 0.00 99.97
10:27:57 PM 16 0.05 0.00 0.00 0.00 0.00 0.00 0.00 0.00 99.94
10:27:57 PM 17 0.41 0.00 0.10 0.04 0.00 0.00 0.00 0.00 99.45
10:27:57 PM 18 2.78 0.00 0.06 0.14 0.00 0.00 0.00 0.00 97.01
10:27:57 PM 19 1.19 0.00 0.08 0.19 0.00 0.00 0.00 0.00 98.53
10:27:57 PM 20 0.48 0.00 0.04 0.30 0.00 0.00 0.00 0.00 99.17
10:27:57 PM 21 0.70 0.00 0.03 0.16 0.00 0.00 0.00 0.00 99.11
10:27:57 PM 22 0.08 0.00 0.01 0.02 0.00 0.00 0.00 0.00 99.90
10:27:57 PM 23 0.30 0.00 0.02 0.06 0.00 0.00 0.00 0.00 99.62
10:27:57 PM 24 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00
10:27:57 PM 25 0.04 0.00 0.03 0.00 0.00 0.00 0.00 0.00 99.94
10:27:57 PM 26 0.06 0.00 0.01 0.00 0.00 0.00 0.00 0.00 99.93
10:27:57 PM 27 0.01 0.00 0.01 0.00 0.00 0.00 0.00 0.00 99.98
10:27:57 PM 28 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 99.99
10:27:57 PM 29 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00
10:27:57 PM 30 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00
10:27:57 PM 31 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 99.99
Both are dedicated systems running Ubuntu 12.10 (not virtual).
I've thought and read up about setting nice, taskset, or trying to tweak the scheduler but I don't want to make any rash decisions. Also, this system isn't performing "bad" per-se, I just want to ensure all cores are being utilized properly.
Let me know if I can provide additional information. Any suggestions to even the CPU load on "server1" are greatly appreciated.
This is not a problem until some cores hit 100% and others don't (i.e. in the statistics you've shown us, there's nothing that would suggest that the uneven distribution is negatively affecting the performance). In your case, you probably have quite a few processes that distribute evenly, resulting in a base load of 6-10% on each core, and then ~12 more threads that require 10-20% of a core each. You can't split a single process/thread between cores.

is it possible to create a graph by shell script

i want to create a graph file using shell script. For example, i want to make graph of sar output of my system.
sar 1 10
05:36:32 AM CPU %user %nice %system %iowait %steal %idle
05:36:33 AM all 0.00 0.00 0.00 0.00 0.00 100.00
05:36:34 AM all 0.00 0.00 0.00 0.00 0.00 100.00
05:36:35 AM all 0.00 0.00 0.00 0.00 0.00 100.00
05:36:36 AM all 0.00 0.00 0.00 0.00 0.00 100.00
05:36:37 AM all 0.00 0.00 0.00 0.00 0.00 100.00
05:36:38 AM all 0.00 0.00 0.00 0.00 0.00 100.00
05:36:39 AM all 0.00 0.00 0.00 0.00 0.00 100.00
05:36:40 AM all 0.00 0.00 0.00 0.00 0.00 100.00
05:36:41 AM all 0.00 0.00 0.00 0.00 0.00 100.00
05:36:42 AM all 0.00 0.00 0.00 0.00 0.00 100.00
Average: all 0.00 0.00 0.00 0.00 0.00 100.00
As a visualizer you can use Gnuplot.

Resources