I need help with diagnosing why a particular Job in Jobtracker is long-running and workarounds for improving it.
Here is an excerpt of the job in question (please pardon the formatting):
Hadoop job_201901281553_38848
User: mapred
Job-ACLs: All users are allowed
Job Setup: Successful
Status: Running
Started at: Fri Feb 01 12:39:05 CST 2019
Running for: 3hrs, 23mins, 58sec
Job Cleanup: Pending
Kind % Complete Num Tasks Pending Running Complete Killed Failed/Killed
Task Attempts
map 100.00% 1177 0 0 1177 0 0 / 0
reduce 95.20% 12 0 2 10 0 0 / 0
Counter Map Reduce Total
File System Counters FILE: Number of bytes read 1,144,088,621 1,642,723,691 2,786,812,312
FILE: Number of bytes written 3,156,884,366 1,669,567,665 4,826,452,031
FILE: Number of read operations 0 0 0
FILE: Number of large read operations 0 0 0
FILE: Number of write operations 0 0 0
HDFS: Number of bytes read 11,418,749,621 0 11,418,749,621
HDFS: Number of bytes written 0 8,259,932,078 8,259,932,078
HDFS: Number of read operations 2,365 5 2,370
HDFS: Number of large read operations 0 0 0
HDFS: Number of write operations 0 12 12
Job Counters Launched map tasks 0 0 1,177
Launched reduce tasks 0 0 12
Data-local map tasks 0 0 1,020
Rack-local map tasks 0 0 157
Total time spent by all maps in occupied slots (ms) 0 0 4,379,522
Total time spent by all reduces in occupied slots (ms) 0 0 81,115,664
Map-Reduce Framework Map input records 77,266,616 0 77,266,616
Map output records 77,266,616 0 77,266,616
Map output bytes 11,442,228,060 0 11,442,228,060
Input split bytes 177,727 0 177,727
Combine input records 0 0 0
Combine output records 0 0 0
Reduce input groups 0 37,799,412 37,799,412
Reduce shuffle bytes 0 1,853,727,946 1,853,727,946
Reduce input records 0 76,428,913 76,428,913
Reduce output records 0 48,958,874 48,958,874
Spilled Records 112,586,947 62,608,254 175,195,201
CPU time spent (ms) 2,461,980 14,831,230 17,293,210
Physical memory (bytes) snapshot 366,933,626,880 9,982,947,328 376,916,574,208
Virtual memory (bytes) snapshot 2,219,448,848,384 23,215,755,264 2,242,664,603,648
Total committed heap usage (bytes) 1,211,341,733,888 8,609,333,248 1,219,951,067,136
AcsReducer ColumnDeletesOnTable- 0 3,284,862 3,284,862
ColumnDeletesOnTable- 0 3,285,695 3,285,695
ColumnDeletesOnTable- 0 3,284,862 3,284,862
ColumnDeletesOnTable- 0 129,653 129,653
ColumnDeletesOnTable- 0 129,653 129,653
ColumnDeletesOnTable- 0 129,653 129,653
ColumnDeletesOnTable- 0 129,653 129,653
ColumnDeletesOnTable- 0 517,641 517,641
ColumnDeletesOnTable- 0 23,786 23,786
ColumnDeletesOnTable- 0 594,872 594,872
ColumnDeletesOnTable- 0 597,739 597,739
ColumnDeletesOnTable- 0 595,665 595,665
ColumnDeletesOnTable- 0 36,101,345 36,101,345
ColumnDeletesOnTable- 0 11,791 11,791
ColumnDeletesOnTable- 0 11,898 11,898
ColumnDeletesOnTable-0 176 176
RowDeletesOnTable- 0 224,044 224,044
RowDeletesOnTable- 0 224,045 224,045
RowDeletesOnTable- 0 224,044 224,044
RowDeletesOnTable- 0 17,425 17,425
RowDeletesOnTable- 0 17,425 17,425
RowDeletesOnTable- 0 17,425 17,425
RowDeletesOnTable- 0 17,425 17,425
RowDeletesOnTable- 0 459,890 459,890
RowDeletesOnTable- 0 23,786 23,786
RowDeletesOnTable- 0 105,910 105,910
RowDeletesOnTable- 0 107,829 107,829
RowDeletesOnTable- 0 105,909 105,909
RowDeletesOnTable- 0 36,101,345 36,101,345
RowDeletesOnTable- 0 11,353 11,353
RowDeletesOnTable- 0 11,459 11,459
RowDeletesOnTable- 0 168 168
WholeRowDeletesOnTable- 0 129,930 129,930
deleteRowsCount 0 37,799,410 37,799,410
deleteRowsMicros 0 104,579,855,042 104,579,855,042
emitCount 0 48,958,874 48,958,874
emitMicros 0 201,996,180 201,996,180
rollupValuesCount 0 37,799,412 37,799,412
rollupValuesMicros 0 234,085,342 234,085,342
As you can see its been running almost 3.5 hours now. There were 1177 Map tasks and they complete some time ago. The Reduce phase is incomplete at 95%.
So I drill into the 'reduce' link and it takes me to the tasklist. If I drill into the first incomplete task, here it is:
Job job_201901281553_38848
All Task Attempts
Task Attempts Machine Status Progress Start Time Shuffle Finished Sort Finished Finish Time Errors Task Logs Counters Actions
attempt_201901281553_38848_r_000000_0 RUNNING 70.81% 2/1/2019 12:39 1-Feb-2019 12:39:59 (18sec) 1-Feb-2019 12:40:01 (2sec) Last 4KB 60
Last 8KB
All
From there I can see the machine/datanode running the task so i ssh into it and I look at the log (filtering on just the task in question)
from datanode $/var/log/hadoop-0.20-mapreduce/hadoop-mapred-tasktracker-.log
2019-02-01 12:39:40,836 INFO org.apache.hadoop.mapred.TaskTracker: LaunchTaskAction (registerTask): attempt_201901281553_38848_r_000000_0 task's state:UNASSIGNED
2019-02-01 12:39:40,838 INFO org.apache.hadoop.mapred.TaskTracker: Trying to launch : attempt_201901281553_38848_r_000000_0 which needs 1 slots
2019-02-01 12:39:40,838 INFO org.apache.hadoop.mapred.TaskTracker: In TaskLauncher, current free slots : 21 and trying to launch attempt_201901281553_38848_r_000000_0 which needs 1 slots
2019-02-01 12:39:40,925 INFO org.apache.hadoop.mapred.TaskController: Writing commands to /disk12/mapreduce/tmp-map-data/ttprivate/taskTracker/mapred/jobcache/job_201901281553_38848/attempt_201901281553_38848_r_000000_0/taskjvm.sh
2019-02-01 12:39:41,904 INFO org.apache.hadoop.mapred.TaskTracker: JVM with ID: jvm_201901281553_38848_r_-819481850 given task: attempt_201901281553_38848_r_000000_0
2019-02-01 12:39:49,011 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201901281553_38848_r_000000_0 0.09402435% reduce > copy (332 of 1177 at 23.66 MB/s) >
2019-02-01 12:39:56,250 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201901281553_38848_r_000000_0 0.25233644% reduce > copy (891 of 1177 at 12.31 MB/s) >
2019-02-01 12:39:59,206 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201901281553_38848_r_000000_0 0.25233644% reduce > copy (891 of 1177 at 12.31 MB/s) >
2019-02-01 12:39:59,350 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201901281553_38848_r_000000_0 0.33333334% reduce > sort
2019-02-01 12:40:01,599 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201901281553_38848_r_000000_0 0.33333334% reduce > sort
2019-02-01 12:40:02,469 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201901281553_38848_r_000000_0 0.6667039% reduce > reduce
2019-02-01 12:40:05,565 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201901281553_38848_r_000000_0 0.6667039% reduce > reduce
2019-02-01 12:40:11,666 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201901281553_38848_r_000000_0 0.6668788% reduce > reduce
2019-02-01 12:40:14,755 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201901281553_38848_r_000000_0 0.66691136% reduce > reduce
2019-02-01 12:40:17,838 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201901281553_38848_r_000000_0 0.6670001% reduce > reduce
2019-02-01 12:40:20,930 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201901281553_38848_r_000000_0 0.6671631% reduce > reduce
2019-02-01 12:40:24,016 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201901281553_38848_r_000000_0 0.6672566% reduce > reduce
.. and these lines repeat in this manner for hours..
^ so it appears the shuffle/sort phase went very quick but after that, its just the reduce phase crawling, the percentage is slowly increasing but takes hours before the task completes.
1) So that looks like the bottleneck here- am I correct in identifying the cause of my long-running job is because this task (and many tasks like it) is taking a very long time on the reduce phase for the task?
2) if so, what are my options for speeding it up?
Load appears to be reasonably low on the datanode assigned that task, as well as its iowait:
top - 15:20:03 up 124 days, 1:04, 1 user, load average: 3.85, 5.64, 5.96
Tasks: 1095 total, 2 running, 1092 sleeping, 0 stopped, 1 zombie
Cpu(s): 3.8%us, 1.5%sy, 0.9%ni, 93.6%id, 0.2%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 503.498G total, 495.180G used, 8517.543M free, 5397.789M buffers
Swap: 2046.996M total, 0.000k used, 2046.996M free, 432.468G cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
82236 hbase 20 0 16.9g 16g 17m S 136.9 3.3 26049:16 java
30143 root 39 19 743m 621m 13m R 82.3 0.1 1782:06 clamscan
62024 mapred 20 0 2240m 1.0g 24m S 75.1 0.2 1:21.28 java
36367 mapred 20 0 1913m 848m 24m S 11.2 0.2 22:56.98 java
36567 mapred 20 0 1898m 825m 24m S 9.5 0.2 22:23.32 java
36333 mapred 20 0 1879m 880m 24m S 8.2 0.2 22:44.28 java
36374 mapred 20 0 1890m 831m 24m S 6.9 0.2 23:15.65 java
and a snippet of iostat -xm 4:
avg-cpu: %user %nice %system %iowait %steal %idle
2.15 0.92 0.30 0.17 0.00 96.46
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
sda 0.00 350.25 0.00 30.00 0.00 1.49 101.67 0.02 0.71 0.00 0.71 0.04 0.12
sdb 0.00 2.75 0.00 6.00 0.00 0.03 11.67 0.00 0.00 0.00 0.00 0.00 0.00
sdd 0.00 9.75 0.00 1.25 0.00 0.04 70.40 0.00 0.00 0.00 0.00 0.00 0.00
sdf 0.00 6.50 0.00 0.75 0.00 0.03 77.33 0.00 0.00 0.00 0.00 0.00 0.00
sdg 0.00 5.75 0.00 0.50 0.00 0.02 100.00 0.00 0.00 0.00 0.00 0.00 0.00
sdc 0.00 8.00 0.00 0.75 0.00 0.03 93.33 0.00 0.00 0.00 0.00 0.00 0.00
sdh 0.00 6.25 0.00 0.50 0.00 0.03 108.00 0.00 0.00 0.00 0.00 0.00 0.00
sdi 0.00 3.75 93.25 0.50 9.03 0.02 197.57 0.32 3.18 3.20 0.00 1.95 18.30
sdj 0.00 3.50 0.00 0.50 0.00 0.02 64.00 0.00 0.00 0.00 0.00 0.00 0.00
sdk 0.00 7.00 0.00 0.75 0.00 0.03 82.67 0.00 0.33 0.00 0.33 0.33 0.03
sdl 0.00 6.75 0.00 0.75 0.00 0.03 80.00 0.00 0.00 0.00 0.00 0.00 0.00
sdm 0.00 7.75 0.00 5.75 0.00 0.05 18.78 0.00 0.04 0.00 0.04 0.04 0.03
#<machine>:~$ df -h
Filesystem Size Used Avail Use% Mounted on
/dev/sda3 40G 5.9G 32G 16% /
tmpfs 252G 0 252G 0% /dev/shm
/dev/sda1 488M 113M 350M 25% /boot
/dev/sda8 57G 460M 54G 1% /tmp
/dev/sda7 9.8G 1.1G 8.2G 12% /var
/dev/sda5 40G 17G 21G 45% /var/log
/dev/sda6 30G 4.4G 24G 16% /var/log/audit.d
/dev/sdb1 7.2T 3.3T 3.6T 48% /disk1
/dev/sdc1 7.2T 3.3T 3.6T 49% /disk2
/dev/sdd1 7.2T 3.3T 3.6T 48% /disk3
/dev/sde1 7.2T 3.3T 3.6T 48% /disk4
/dev/sdf1 7.2T 3.3T 3.6T 48% /disk5
/dev/sdi1 7.2T 3.3T 3.6T 48% /disk6
/dev/sdg1 7.2T 3.3T 3.6T 48% /disk7
/dev/sdh1 7.2T 3.3T 3.6T 48% /disk8
/dev/sdj1 7.2T 3.3T 3.6T 48% /disk9
/dev/sdk1 7.2T 3.3T 3.6T 48% /disk10
/dev/sdm1 7.2T 3.3T 3.6T 48% /disk11
/dev/sdl1 7.2T 3.3T 3.6T 48% /disk12
This is version Hadoop 2.0.0-cdh4.3.0. Its highly-available with 3 zookeeper nodes, 2 namenodes, and 35 datanodes. YARN is not installed. Using hbase, oozie. Jobs mainly come in via Hive and HUE.
Each datanode has 2 physical cpus, each with 22 cores. Hyperthreading is enabled.
If you need more information, please let me know. My guess is maybe I need more reducers, there are mapred-site.xml settings that need tuning, perhaps the input data from map phase is too large, or Hive query needs written better. Im fairly new administrator to Hadoop, any detailed advice is great.
Thanks!
I am quite new in using bash to extract but I am not what search terms to look for my problem. I like to extract data for some variables from a very large log file.
Sample of logfile
temp[min,max]=[ 24.0000000000000 .. 834.230000000000 ]
CHANGE working on TEMS
RMS(TEMS)= 6.425061887244621E-002 DIFMAX: 0.896672707535103
765 1 171
CHANGE working on PHI
RMS(PHI )= 1.92403467949391 DIFMAX: 62.3113693145351
765 1 170
CHANGE working on TEMP
RMS(TEMP)= 6.425061887244621E-002 DIFMAX: 0.896672707535103
765 1 171
PMONI working
TIMSTP working
COPEQE working : INFO
DELT = 630720000.000000 sec
Courant-Number in x,y,z:
Max. : 5.05 , 0.00 , 6.93
Min. : 0.00 , 0.00 , 0.00
Avg. : 0.568E-02, 0.00 , 0.383
PROBLEM: Courant-Number(s) greater than 1 : 11.9802093558263
max. TEMP-Peclet in X: 653 1
170
max. TEMP-Peclet in Y: 653 1
170
Temperature-Peclet-Number in x,y,z:
Max. : 0.357 , 0.00 , 0.313E-01
Min. : 0.00 , 0.00 , 0.00
Avg. : 0.307E-03, 0.00 , 0.435E-03
Temperature-Neumann-Number in x,y,z:
Max.: 64.9 , 64.9 , 64.9
Min.: 0.619E-02, 0.619E-02, 0.619E-02
Avg.: 35.5 , 35.5 , 35.5
PROBLEM: Temp-Neumann-Number greater than 0.5 : 194.710793368065
(Dominating: Courant-Number)
DRUCK working
KOPPX working
#########################################################################
STRESS PERIOD: 1 1
1 of 100 <<<<<
Time Step: 50 ( 1.0% of 0.315E+13 sec )(0.631E+09 sec )
#########################################################################
### Continues on ###
I managed to extract the lines relating to the variables I am looking for using bash.
grep -A 3 'Courant-Number in x,y,z' logfile.log > courant.txt
grep -A 2 'Max.' courant.txt > courant.txt
to get this...
Max. : 0.146E+04, 0.00 , 0.169E+04
Min. : 0.00 , 0.00 , 0.00
Avg. : 1.15 , 0.00 , 0.986
--
Max. : 0.184E+04, 0.00 , 0.175E+04
Min. : 0.00 , 0.00 , 0.00
Avg. : 1.13 , 0.00 , 1.05
--
Max. : 0.163E+04, 0.00 , 0.172E+04
Min. : 0.00 , 0.00 , 0.00
Avg. : 1.13 , 0.00 , 1.17
I would like to convert this data to a CSV file with the following columns, thus making a total of 9 columns.
Max_x | Max_y | Max_z | Min_x | Min_y | Min_z | Avg_x | Avg_y | Avg_z
I would like to continue to use bash to get this data. Any inputs will be most appreciated.
Thanks!
You've got a good start. I had a much worse solution a bit earlier, but then I learned about paste -d.
grep -A 3 'Courant-Number in x,y,z' logfile.log |
grep -A 2 'Max.' |
grep -v -- '--' |
sed 's/^.*://' |
paste -d "," - - - |
sed 's/ *//g'
find courant number + 3 lines
find max + 2 following lines
get rid of lines that have '--'
get rid of min: max: avg:
join every three lines with commas
get rid of whitespace
I am trying to design a Unix shell script (preferably generic sh) that will take a file whose contents are numbers, one per line. These numbers are the CPU idle time from mpstat obtained by:
cat ${PARSE_FILE} | awk '{print $13}' | grep "^[!0-9]" > temp.txt
So the file is a list if numbers, like:
46.19
93.41
73.60
99.40
95.80
96.00
77.10
99.20
52.76
81.18
69.38
89.80
97.00
97.40
76.18
97.10
What these values really are is that line 1 is for Core 1, line 2 for Core 2, etc... for X number of cores (in my case 8) - so every 9th line is again for Core 1, etc...
The original file looks something like this:
10/28/2013 Linux 2.6.32-358.el6.x86_64 (host) 10/28/2013 _x86_64_
(32 CPU)
10/28/2013
10/28/2013 02:25:05 PM CPU %usr %nice %sys %iowait %irq %soft %steal %guest %idle
10/28/2013 02:25:15 PM 0 51.20 0.00 2.61 0.00 0.00 0.00 0.00 0.00 46.19
10/28/2013 02:25:15 PM 1 6.09 0.00 0.50 0.00 0.00 0.00 0.00 0.00 93.41
10/28/2013 02:25:15 PM 2 25.20 0.00 1.20 0.00 0.00 0.00 0.00 0.00 73.60
10/28/2013 02:25:15 PM 3 0.40 0.00 0.20 0.00 0.00 0.00 0.00 0.00 99.40
10/28/2013 02:25:15 PM 4 3.80 0.00 0.40 0.00 0.00 0.00 0.00 0.00 95.80
10/28/2013 02:25:15 PM 5 3.70 0.00 0.30 0.00 0.00 0.00 0.00 0.00 96.00
10/28/2013 02:25:15 PM 6 21.70 0.00 1.20 0.00 0.00 0.00 0.00 0.00 77.10
10/28/2013 02:25:15 PM 7 0.70 0.00 0.10 0.00 0.00 0.00 0.00 0.00 99.20
10/28/2013 02:25:25 PM 0 45.03 0.00 1.61 0.00 0.00 0.60 0.00 0.00 52.76
10/28/2013 02:25:25 PM 1 17.82 0.00 1.00 0.00 0.00 0.00 0.00 0.00 81.18
10/28/2013 02:25:25 PM 2 29.62 0.00 1.00 0.00 0.00 0.00 0.00 0.00 69.38
10/28/2013 02:25:25 PM 3 9.70 0.00 0.40 0.00 0.00 0.10 0.00 0.00 89.80
10/28/2013 02:25:25 PM 4 2.40 0.00 0.60 0.00 0.00 0.00 0.00 0.00 97.00
10/28/2013 02:25:25 PM 5 2.00 0.00 0.60 0.00 0.00 0.00 0.00 0.00 97.40
10/28/2013 02:25:25 PM 6 22.92 0.00 0.90 0.00 0.00 0.00 0.00 0.00 76.18
10/28/2013 02:25:25 PM 7 2.40 0.00 0.50 0.00 0.00 0.00 0.00 0.00 97.10
I'm trying to design a script that will take the number of cores and this file as a variable and get me the average for each core and I'm not sure how to do this. Here is what I have:
cat ${PARSE_FILE} | awk '{print $13}' | grep "^[!0-9]" > temp.txt
NUMBER_OF_CORES=8
NUMBER_OF_LINES=`awk ' END { print NR } ' temp.txt`
NUMBER_OF_VALUES=`echo "scale=0;${NUMBER_OF_LINES}/${NUMBER_OF_CORES}" | bc`
for i in `seq 1 ${NUMBER_OF_CORES}`
do
awk 'NR % $i == 0' temp.txt
echo Core: ${i} Average: xx
done
So I have the number of values (lines over cores) that each core has, so that is every nth line I need to skip but I'm not sure how to cleanly do this. I basically need to loop every "NUMBER_OF_CORES" times through the file, skipping every "NUMBER_OF_CORES" line and summing them up to divide by "NUMBER_OF_VALUES".
Will this do ?
awk '/CPU/&&/idle/{f=1;next}f{a[$4]+=$13;b[$4]++}END{for(i in a){print i,a[i]/b[i]}}' your_file
Actually the number of cores is not needed here. It will calculate average idle time for all the cores available in the file
Tested:
> cat temp
10/28/2013 Linux 2.6.32-358.el6.x86_64 (host) 10/28/2013 _x86_64_
(32 CPU)
10/28/2013
10/28/2013 02:25:05 PM CPU %usr %nice %sys %iowait %irq %soft %steal %guest %idle
10/28/2013 02:25:15 PM 0 51.20 0.00 2.61 0.00 0.00 0.00 0.00 0.00 46.19
10/28/2013 02:25:15 PM 1 6.09 0.00 0.50 0.00 0.00 0.00 0.00 0.00 93.41
10/28/2013 02:25:15 PM 2 25.20 0.00 1.20 0.00 0.00 0.00 0.00 0.00 73.60
10/28/2013 02:25:15 PM 3 0.40 0.00 0.20 0.00 0.00 0.00 0.00 0.00 99.40
10/28/2013 02:25:15 PM 4 3.80 0.00 0.40 0.00 0.00 0.00 0.00 0.00 95.80
10/28/2013 02:25:15 PM 5 3.70 0.00 0.30 0.00 0.00 0.00 0.00 0.00 96.00
10/28/2013 02:25:15 PM 6 21.70 0.00 1.20 0.00 0.00 0.00 0.00 0.00 77.10
10/28/2013 02:25:15 PM 7 0.70 0.00 0.10 0.00 0.00 0.00 0.00 0.00 99.20
10/28/2013 02:25:25 PM 0 45.03 0.00 1.61 0.00 0.00 0.60 0.00 0.00 52.76
10/28/2013 02:25:25 PM 1 17.82 0.00 1.00 0.00 0.00 0.00 0.00 0.00 81.18
10/28/2013 02:25:25 PM 2 29.62 0.00 1.00 0.00 0.00 0.00 0.00 0.00 69.38
10/28/2013 02:25:25 PM 3 9.70 0.00 0.40 0.00 0.00 0.10 0.00 0.00 89.80
10/28/2013 02:25:25 PM 4 2.40 0.00 0.60 0.00 0.00 0.00 0.00 0.00 97.00
10/28/2013 02:25:25 PM 5 2.00 0.00 0.60 0.00 0.00 0.00 0.00 0.00 97.40
10/28/2013 02:25:25 PM 6 22.92 0.00 0.90 0.00 0.00 0.00 0.00 0.00 76.18
10/28/2013 02:25:25 PM 7 2.40 0.00 0.50 0.00 0.00 0.00 0.00 0.00 97.10
> nawk '/CPU/&&/idle/{f=1;next}f{a[$4]+=$13;b[$4]++}END{for(i in a){print i,a[i]/b[i]}}' temp
2 71.49
3 94.6
4 96.4
5 96.7
6 76.64
7 98.15
0 49.475
1 87.295
>
The script below countCores.sh is based on the data you gave in temp.txt
This may not be what you want but will give you some ideas. I was'nt sure
what overall total average you wanted so I just chose to show average of the values
in column one for all 8 cores. I also used cat -n to represent the core number.
Hope This helps. VonBell
#!/bin/bash
#Execute As: countCores.sh temp.txt 8
AllCoreTotals=0
DataFile="$1"
NumCores="$2"
AllCoreTotals=0
NumLines="`cat -n $DataFile|cut -f1|tail -1|tr -d " "`"
PrtCols="`echo $NumLines / $NumCores|bc`"
clear;echo;echo
echo "============================================================="
pr -t${PrtCols} $DataFile|tr -d "\t"|tr -s " " "+"|bc |\
while read CoreTotal
do
CoreAverage=`echo $CoreTotal / $PrtCols|bc`
echo "$CoreTotal Core Average $CoreAverage"
AllCoreTotals="`echo $CoreTotal + $AllCoreTotals|bc`"
echo "$AllCoreTotals" > AllCoreTot.tmp
done|cat -n
AllCoreAverage=`cat AllCoreTot.tmp`
AllCoreAverage="`echo $AllCoreAverage / $NumCores|bc`"
echo "============================================================="
echo "(Col One) Total Core Average: $AllCoreAverage "
rm $DataFile
rm AllCoreTot.tmp
Why not do it for all cores at the same time:
awk -f prog.awk ${PARSE_FILE}
Then in prog.awk put
{ if ((NF == 13) && ($4 != "CPU"))
{ SUM[$4] += $13;
CNT[$4]++;
}
}
END { for(loop in SUM)
{ printf("CPU: %d Total: %d Count: %d Average: %d\n",
loop, SUM[loop], CNT[loop], SUM[loop]/CNT[loop]);
}
}
If you want to do it on one line:
awk '{if ((NF == 13) && ($4 != "CPU")){SUM[$4] += $13;CNT[$4]++;}} END {for(loop in SUM){printf("CPU: %d Total: %d Count: %d Average: %d\n", loop, SUM[loop], CNT[loop], SUM[loop]/CNT[loop]);}}' ${PARSE_FILE}
After some more study, this snippet seems to do the trick:
#Parse logs to get CPU averages for cores
PARSE_FILE=`ls ~/logs/*mpstat*`
echo "Parsing ${PARSE_FILE}..."
cat ${PARSE_FILE} | awk '{print $13}' | grep "^[!0-9]" > temp.txt
NUMBER_OF_CORES=8
NUMBER_OF_LINES=`awk ' END { print NR } ' temp.txt`
NUMBER_OF_VALUES=`echo "scale=0;${NUMBER_OF_LINES}/${NUMBER_OF_CORES}" | bc`
TOTAL=0
for i in `seq 1 ${NUMBER_OF_CORES}`
do
sed -n $i'~'$NUMBER_OF_CORES'p' temp.txt > temp2.txt
SUM=`awk '{s+=$0} END {print s}' temp2.txt`
AVERAGE=`echo "scale=0;${SUM}/${NUMBER_OF_VALUES}" | bc`
echo Core: ${i} Average: `expr 100 - ${AVERAGE}`
TOTAL=$((TOTAL+${AVERAGE}))
done
TOTAL_AVERAGE=`echo "scale=0;${TOTAL}/${NUMBER_OF_CORES}" | bc`
echo "Total Average: `expr 100 - ${TOTAL_AVERAGE}`"
rm temp*.txt