text formating to specific width - bash

I wrote a script to show down- and up-speed of my notebook with polybar. The problem I run into is to put the output of echo in formation.
ATM my output looks like this (bash script loops in terminal) ...
WLAN0: ⬇️ 14 MiB/s ⬆️ 16 KiB/s
WLAN0: ⬇️ 60 B/s ⬆️ 0 B/s
WLAN0: ⬇️ 120 B/s ⬆️ 120 B/s
But I want it lined up, like this ...
WLAN0: ⬇️ 14 MiB/s ⬆️ 16 KiB/s
WLAN0: ⬇️ 60 B/s ⬆️ 0 B/s
WLAN0: ⬇️ 120 B/s ⬆️ 120 B/s
The essence of my code is the following simplified line ...
echo "yada: ⬇️ $string1 ⬆️ $string2"
The variables include a number and text (up to 10 chars max) each, depending on transfer speed.
So there should be at least 12 static fields between the two emoji.
But I have no clue how and I am hoping you can explain to me how to format some kind of variables width, with printf I assume.

Align left with printf:
string1="14 MiB/s"; string2="16 KiB/s"
printf "yada: ⬇️ %-12s ⬆️ %-12s\n" "$string1" "$string2"
Output:
yada: ⬇️ 14 MiB/s ⬆️ 16 KiB/s

Here you is how you can align columns:
#!/usr/bin/env bash
dnIcon=$'\342\254\207\357\270\217'
upIcon=$'\342\254\206\357\270\217'
nMiBs=14
sMiBs="$nMiBs MiB/s"
nKiBs=16
sKiBs="$nKiBs KiB/s"
printf 'WLAN0: %s %-10s %s %-10s\n' "$dnIcon" "$sMiBs" "$upIcon" "$sKiBs"
Sample output:
WLAN0: ⬇️ 14 MiB/s ⬆️ 16 KiB/s

Related

Aggregate Top CPU Using Processes

When I run top -n 1 -d 2 | head -n 12; it returns processor usage for some processes sorted by %cpu desc as desired, but I'm not convinced that the results are aggregated as they should be. I'm wanting to put these results in a file maybe like
while true; do
top -n 1 -d 2 | head -n 12;
done > top_cpu_users;
When I run top -d 2; interactively, I first see some results, then two seconds later I see the results updated and they appear to be aggregated over the last two seconds. The first results do not appear to be aggregated in the same way.
How do I get top cpu users every two seconds aggregated over the previous two seconds?
top will always capture a first full scan of process info for use as a baseline. It uses that to initialize the utility's database of values used for later comparative reporting. That is the basis of the first report presented to the screen.
The follow-on reports are the true measures for the specified evaluation intervals.
Your code snippet will therefore never provide what you are really looking for.
You need to skip the results from the first scan and only use the follow on reports, but the only way to do that is to generate them from a single command by specifying the count of scans desired, then parse the resulting combined report.
To that end, here is a proposed solution:
#!/bin/bash
output="top_cpu_users"
rm -f ${output} ${output}.tmp
snapshots=5
interval=2
process_count=6 ### Number of heavy hitter processes being monitored
top_head=7 ### Number of header lines in top report
lines=$(( ${process_count} + ${top_head} )) ### total lines saved from each report run
echo -e "\n Collecting process snapshots every ${interval} seconds ..."
top -b -n $(( ${snapshots} + 1 )) -d ${interval} > ${output}.tmp
echo -e "\n Parsing snapshots ..."
awk -v max="${lines}" 'BEGIN{
doprint=0 ;
first=1 ;
}
{
if( $1 == "top" ){
if( first == 1 ){
first=0 ;
}else{
print NR | "cat >&2" ;
print "" ;
doprint=1 ;
entry=0 ;
} ;
} ;
if( doprint == 1 ){
entry++ ;
print $0 ;
if( entry == max ){
doprint=0 ;
} ;
} ;
}' ${output}.tmp >${output}
more ${output}
The session output for that will look like this:
Collecting process snapshots every 2 seconds ...
Parsing snapshots ...
266
531
796
1061
1326
top - 20:14:02 up 8:37, 1 user, load average: 0.15, 0.13, 0.15
Tasks: 257 total, 1 running, 256 sleeping, 0 stopped, 0 zombie
%Cpu(s): 0.5 us, 1.0 sy, 0.0 ni, 98.5 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
MiB Mem : 3678.9 total, 157.6 free, 2753.7 used, 767.6 buff/cache
MiB Swap: 2048.0 total, 1116.4 free, 931.6 used. 629.2 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
31773 root 20 0 0 0 0 I 1.5 0.0 0:09.08 kworker/0:3-events
32254 ericthe+ 20 0 14500 3876 3092 R 1.0 0.1 0:00.04 top
1503 mysql 20 0 2387360 20664 2988 S 0.5 0.5 3:10.11 mysqld
2250 ericthe+ 20 0 1949412 130004 20272 S 0.5 3.5 0:46.16 caja
3104 ericthe+ 20 0 4837044 461944 127416 S 0.5 12.3 81:26.50 firefox
29998 ericthe+ 20 0 2636764 165632 54700 S 0.5 4.4 0:36.97 Isolated Web Co
top - 20:14:04 up 8:37, 1 user, load average: 0.14, 0.13, 0.15
Tasks: 257 total, 1 running, 256 sleeping, 0 stopped, 0 zombie
%Cpu(s): 1.5 us, 0.7 sy, 0.0 ni, 97.4 id, 0.4 wa, 0.0 hi, 0.0 si, 0.0 st
MiB Mem : 3678.9 total, 157.5 free, 2753.7 used, 767.6 buff/cache
MiB Swap: 2048.0 total, 1116.4 free, 931.6 used. 629.2 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
3104 ericthe+ 20 0 4837044 462208 127416 S 3.0 12.3 81:26.56 firefox
1503 mysql 20 0 2387360 20664 2988 S 1.0 0.5 3:10.13 mysqld
32254 ericthe+ 20 0 14500 3876 3092 R 1.0 0.1 0:00.06 top
1489 root 20 0 546692 61584 48956 S 0.5 1.6 17:23.78 Xorg
2233 ericthe+ 20 0 303744 11036 7500 S 0.5 0.3 4:46.84 compton
7239 ericthe+ 20 0 2617520 127452 44768 S 0.5 3.4 1:41.13 Isolated Web Co
top - 20:14:06 up 8:37, 1 user, load average: 0.14, 0.13, 0.15
Tasks: 257 total, 1 running, 256 sleeping, 0 stopped, 0 zombie
%Cpu(s): 0.6 us, 0.4 sy, 0.0 ni, 99.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
MiB Mem : 3678.9 total, 157.5 free, 2753.7 used, 767.6 buff/cache
MiB Swap: 2048.0 total, 1116.4 free, 931.6 used. 629.2 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
1489 root 20 0 546700 61584 48956 S 1.5 1.6 17:23.81 Xorg
3104 ericthe+ 20 0 4837044 462208 127416 S 1.5 12.3 81:26.59 firefox
1503 mysql 20 0 2387360 20664 2988 S 0.5 0.5 3:10.14 mysqld
2233 ericthe+ 20 0 303744 11036 7500 S 0.5 0.3 4:46.85 compton
2478 ericthe+ 20 0 346156 10368 8792 S 0.5 0.3 0:22.97 mate-cpufreq-ap
2481 ericthe+ 20 0 346540 11148 9168 S 0.5 0.3 0:41.73 mate-sensors-ap
top - 20:14:08 up 8:37, 1 user, load average: 0.14, 0.13, 0.15
Tasks: 257 total, 1 running, 256 sleeping, 0 stopped, 0 zombie
%Cpu(s): 0.6 us, 0.5 sy, 0.0 ni, 98.9 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
MiB Mem : 3678.9 total, 157.5 free, 2753.6 used, 767.7 buff/cache
MiB Swap: 2048.0 total, 1116.4 free, 931.6 used. 629.3 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
32254 ericthe+ 20 0 14500 3876 3092 R 1.0 0.1 0:00.08 top
3104 ericthe+ 20 0 4837044 462208 127416 S 0.5 12.3 81:26.60 firefox
18370 ericthe+ 20 0 2682392 97268 45144 S 0.5 2.6 0:55.36 Isolated Web Co
19436 ericthe+ 20 0 2618496 123608 52540 S 0.5 3.3 1:55.08 Isolated Web Co
26630 ericthe+ 20 0 2690464 179020 56060 S 0.5 4.8 1:45.57 Isolated Web Co
29998 ericthe+ 20 0 2636764 165632 54700 S 0.5 4.4 0:36.98 Isolated Web Co
top - 20:14:10 up 8:37, 1 user, load average: 0.13, 0.13, 0.15
Tasks: 257 total, 1 running, 256 sleeping, 0 stopped, 0 zombie
%Cpu(s): 2.5 us, 0.9 sy, 0.0 ni, 96.6 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
MiB Mem : 3678.9 total, 157.5 free, 2753.6 used, 767.7 buff/cache
MiB Swap: 2048.0 total, 1116.4 free, 931.6 used. 629.3 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
3104 ericthe+ 20 0 4837076 463000 127416 S 7.5 12.3 81:26.75 firefox
1489 root 20 0 546716 61584 48956 S 1.5 1.6 17:23.84 Xorg
1503 mysql 20 0 2387360 20664 2988 S 1.0 0.5 3:10.16 mysqld
32254 ericthe+ 20 0 14500 3876 3092 R 1.0 0.1 0:00.10 top
2233 ericthe+ 20 0 303744 11036 7500 S 0.5 0.3 4:46.86 compton
2481 ericthe+ 20 0 346540 11148 9168 S 0.5 0.3 0:41.74 mate-sensors-ap

Cut text into two string using string delimiter - bash

I want to cut string into two string using an string
CH 7 ][ Elapsed: 0 s ][ 2021-11-27 12:55
BSSID PWR Beacons #Data, #/s CH MB ENC CIPHER AUTH ESSID
EE:EE:EE:EE:EE:EE -82 3 0 0 6 130 WPA2 CCMP PSK Tenda
FF:FF:FF:FF:FF:FF -90 4 0 0 1 130 WPA2 CCMP PSK Wifi
BSSID STATION PWR Rate Lost Frames Notes Probes
EE:EE:EE:EE:EE:EE AA:AA:AA:AA:AA:AA -63 0 - 1e 0 3
EE:EE:EE:EE:EE:EE BB:BB:BB:BB:BB:BB -74 0 - 1 0 1
I want to cut my text using this delimiter BSSID STATION PWR Rate Lost Frames Notes Probes I try with awk -F 'BSSID' '{print $1}' file but it cut all occurrence, I want to cut only last occurrence.
desired output :
CH 7 ][ Elapsed: 0 s ][ 2021-11-27 12:55
BSSID PWR Beacons #Data, #/s CH MB ENC CIPHER AUTH ESSID
EE:EE:EE:EE:EE:EE -82 3 0 0 6 130 WPA2 CCMP PSK Tenda
FF:FF:FF:FF:FF:FF -90 4 0 0 1 130 WPA2 CCMP PSK Wifi
awk '/BSSID STATION PWR Rate Lost Frames Notes Probes/{exit} 1' file

Hadoop: Diagnose long running job

I need help with diagnosing why a particular Job in Jobtracker is long-running and workarounds for improving it.
Here is an excerpt of the job in question (please pardon the formatting):
Hadoop job_201901281553_38848
User: mapred
Job-ACLs: All users are allowed
Job Setup: Successful
Status: Running
Started at: Fri Feb 01 12:39:05 CST 2019
Running for: 3hrs, 23mins, 58sec
Job Cleanup: Pending
Kind % Complete Num Tasks Pending Running Complete Killed Failed/Killed
Task Attempts
map 100.00% 1177 0 0 1177 0 0 / 0
reduce 95.20% 12 0 2 10 0 0 / 0
Counter Map Reduce Total
File System Counters FILE: Number of bytes read 1,144,088,621 1,642,723,691 2,786,812,312
FILE: Number of bytes written 3,156,884,366 1,669,567,665 4,826,452,031
FILE: Number of read operations 0 0 0
FILE: Number of large read operations 0 0 0
FILE: Number of write operations 0 0 0
HDFS: Number of bytes read 11,418,749,621 0 11,418,749,621
HDFS: Number of bytes written 0 8,259,932,078 8,259,932,078
HDFS: Number of read operations 2,365 5 2,370
HDFS: Number of large read operations 0 0 0
HDFS: Number of write operations 0 12 12
Job Counters Launched map tasks 0 0 1,177
Launched reduce tasks 0 0 12
Data-local map tasks 0 0 1,020
Rack-local map tasks 0 0 157
Total time spent by all maps in occupied slots (ms) 0 0 4,379,522
Total time spent by all reduces in occupied slots (ms) 0 0 81,115,664
Map-Reduce Framework Map input records 77,266,616 0 77,266,616
Map output records 77,266,616 0 77,266,616
Map output bytes 11,442,228,060 0 11,442,228,060
Input split bytes 177,727 0 177,727
Combine input records 0 0 0
Combine output records 0 0 0
Reduce input groups 0 37,799,412 37,799,412
Reduce shuffle bytes 0 1,853,727,946 1,853,727,946
Reduce input records 0 76,428,913 76,428,913
Reduce output records 0 48,958,874 48,958,874
Spilled Records 112,586,947 62,608,254 175,195,201
CPU time spent (ms) 2,461,980 14,831,230 17,293,210
Physical memory (bytes) snapshot 366,933,626,880 9,982,947,328 376,916,574,208
Virtual memory (bytes) snapshot 2,219,448,848,384 23,215,755,264 2,242,664,603,648
Total committed heap usage (bytes) 1,211,341,733,888 8,609,333,248 1,219,951,067,136
AcsReducer ColumnDeletesOnTable- 0 3,284,862 3,284,862
ColumnDeletesOnTable- 0 3,285,695 3,285,695
ColumnDeletesOnTable- 0 3,284,862 3,284,862
ColumnDeletesOnTable- 0 129,653 129,653
ColumnDeletesOnTable- 0 129,653 129,653
ColumnDeletesOnTable- 0 129,653 129,653
ColumnDeletesOnTable- 0 129,653 129,653
ColumnDeletesOnTable- 0 517,641 517,641
ColumnDeletesOnTable- 0 23,786 23,786
ColumnDeletesOnTable- 0 594,872 594,872
ColumnDeletesOnTable- 0 597,739 597,739
ColumnDeletesOnTable- 0 595,665 595,665
ColumnDeletesOnTable- 0 36,101,345 36,101,345
ColumnDeletesOnTable- 0 11,791 11,791
ColumnDeletesOnTable- 0 11,898 11,898
ColumnDeletesOnTable-0 176 176
RowDeletesOnTable- 0 224,044 224,044
RowDeletesOnTable- 0 224,045 224,045
RowDeletesOnTable- 0 224,044 224,044
RowDeletesOnTable- 0 17,425 17,425
RowDeletesOnTable- 0 17,425 17,425
RowDeletesOnTable- 0 17,425 17,425
RowDeletesOnTable- 0 17,425 17,425
RowDeletesOnTable- 0 459,890 459,890
RowDeletesOnTable- 0 23,786 23,786
RowDeletesOnTable- 0 105,910 105,910
RowDeletesOnTable- 0 107,829 107,829
RowDeletesOnTable- 0 105,909 105,909
RowDeletesOnTable- 0 36,101,345 36,101,345
RowDeletesOnTable- 0 11,353 11,353
RowDeletesOnTable- 0 11,459 11,459
RowDeletesOnTable- 0 168 168
WholeRowDeletesOnTable- 0 129,930 129,930
deleteRowsCount 0 37,799,410 37,799,410
deleteRowsMicros 0 104,579,855,042 104,579,855,042
emitCount 0 48,958,874 48,958,874
emitMicros 0 201,996,180 201,996,180
rollupValuesCount 0 37,799,412 37,799,412
rollupValuesMicros 0 234,085,342 234,085,342
As you can see its been running almost 3.5 hours now. There were 1177 Map tasks and they complete some time ago. The Reduce phase is incomplete at 95%.
So I drill into the 'reduce' link and it takes me to the tasklist. If I drill into the first incomplete task, here it is:
Job job_201901281553_38848
All Task Attempts
Task Attempts Machine Status Progress Start Time Shuffle Finished Sort Finished Finish Time Errors Task Logs Counters Actions
attempt_201901281553_38848_r_000000_0 RUNNING 70.81% 2/1/2019 12:39 1-Feb-2019 12:39:59 (18sec) 1-Feb-2019 12:40:01 (2sec) Last 4KB 60
Last 8KB
All
From there I can see the machine/datanode running the task so i ssh into it and I look at the log (filtering on just the task in question)
from datanode $/var/log/hadoop-0.20-mapreduce/hadoop-mapred-tasktracker-.log
2019-02-01 12:39:40,836 INFO org.apache.hadoop.mapred.TaskTracker: LaunchTaskAction (registerTask): attempt_201901281553_38848_r_000000_0 task's state:UNASSIGNED
2019-02-01 12:39:40,838 INFO org.apache.hadoop.mapred.TaskTracker: Trying to launch : attempt_201901281553_38848_r_000000_0 which needs 1 slots
2019-02-01 12:39:40,838 INFO org.apache.hadoop.mapred.TaskTracker: In TaskLauncher, current free slots : 21 and trying to launch attempt_201901281553_38848_r_000000_0 which needs 1 slots
2019-02-01 12:39:40,925 INFO org.apache.hadoop.mapred.TaskController: Writing commands to /disk12/mapreduce/tmp-map-data/ttprivate/taskTracker/mapred/jobcache/job_201901281553_38848/attempt_201901281553_38848_r_000000_0/taskjvm.sh
2019-02-01 12:39:41,904 INFO org.apache.hadoop.mapred.TaskTracker: JVM with ID: jvm_201901281553_38848_r_-819481850 given task: attempt_201901281553_38848_r_000000_0
2019-02-01 12:39:49,011 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201901281553_38848_r_000000_0 0.09402435% reduce > copy (332 of 1177 at 23.66 MB/s) >
2019-02-01 12:39:56,250 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201901281553_38848_r_000000_0 0.25233644% reduce > copy (891 of 1177 at 12.31 MB/s) >
2019-02-01 12:39:59,206 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201901281553_38848_r_000000_0 0.25233644% reduce > copy (891 of 1177 at 12.31 MB/s) >
2019-02-01 12:39:59,350 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201901281553_38848_r_000000_0 0.33333334% reduce > sort
2019-02-01 12:40:01,599 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201901281553_38848_r_000000_0 0.33333334% reduce > sort
2019-02-01 12:40:02,469 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201901281553_38848_r_000000_0 0.6667039% reduce > reduce
2019-02-01 12:40:05,565 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201901281553_38848_r_000000_0 0.6667039% reduce > reduce
2019-02-01 12:40:11,666 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201901281553_38848_r_000000_0 0.6668788% reduce > reduce
2019-02-01 12:40:14,755 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201901281553_38848_r_000000_0 0.66691136% reduce > reduce
2019-02-01 12:40:17,838 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201901281553_38848_r_000000_0 0.6670001% reduce > reduce
2019-02-01 12:40:20,930 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201901281553_38848_r_000000_0 0.6671631% reduce > reduce
2019-02-01 12:40:24,016 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201901281553_38848_r_000000_0 0.6672566% reduce > reduce
.. and these lines repeat in this manner for hours..
^ so it appears the shuffle/sort phase went very quick but after that, its just the reduce phase crawling, the percentage is slowly increasing but takes hours before the task completes.
1) So that looks like the bottleneck here- am I correct in identifying the cause of my long-running job is because this task (and many tasks like it) is taking a very long time on the reduce phase for the task?
2) if so, what are my options for speeding it up?
Load appears to be reasonably low on the datanode assigned that task, as well as its iowait:
top - 15:20:03 up 124 days, 1:04, 1 user, load average: 3.85, 5.64, 5.96
Tasks: 1095 total, 2 running, 1092 sleeping, 0 stopped, 1 zombie
Cpu(s): 3.8%us, 1.5%sy, 0.9%ni, 93.6%id, 0.2%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 503.498G total, 495.180G used, 8517.543M free, 5397.789M buffers
Swap: 2046.996M total, 0.000k used, 2046.996M free, 432.468G cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
82236 hbase 20 0 16.9g 16g 17m S 136.9 3.3 26049:16 java
30143 root 39 19 743m 621m 13m R 82.3 0.1 1782:06 clamscan
62024 mapred 20 0 2240m 1.0g 24m S 75.1 0.2 1:21.28 java
36367 mapred 20 0 1913m 848m 24m S 11.2 0.2 22:56.98 java
36567 mapred 20 0 1898m 825m 24m S 9.5 0.2 22:23.32 java
36333 mapred 20 0 1879m 880m 24m S 8.2 0.2 22:44.28 java
36374 mapred 20 0 1890m 831m 24m S 6.9 0.2 23:15.65 java
and a snippet of iostat -xm 4:
avg-cpu: %user %nice %system %iowait %steal %idle
2.15 0.92 0.30 0.17 0.00 96.46
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
sda 0.00 350.25 0.00 30.00 0.00 1.49 101.67 0.02 0.71 0.00 0.71 0.04 0.12
sdb 0.00 2.75 0.00 6.00 0.00 0.03 11.67 0.00 0.00 0.00 0.00 0.00 0.00
sdd 0.00 9.75 0.00 1.25 0.00 0.04 70.40 0.00 0.00 0.00 0.00 0.00 0.00
sdf 0.00 6.50 0.00 0.75 0.00 0.03 77.33 0.00 0.00 0.00 0.00 0.00 0.00
sdg 0.00 5.75 0.00 0.50 0.00 0.02 100.00 0.00 0.00 0.00 0.00 0.00 0.00
sdc 0.00 8.00 0.00 0.75 0.00 0.03 93.33 0.00 0.00 0.00 0.00 0.00 0.00
sdh 0.00 6.25 0.00 0.50 0.00 0.03 108.00 0.00 0.00 0.00 0.00 0.00 0.00
sdi 0.00 3.75 93.25 0.50 9.03 0.02 197.57 0.32 3.18 3.20 0.00 1.95 18.30
sdj 0.00 3.50 0.00 0.50 0.00 0.02 64.00 0.00 0.00 0.00 0.00 0.00 0.00
sdk 0.00 7.00 0.00 0.75 0.00 0.03 82.67 0.00 0.33 0.00 0.33 0.33 0.03
sdl 0.00 6.75 0.00 0.75 0.00 0.03 80.00 0.00 0.00 0.00 0.00 0.00 0.00
sdm 0.00 7.75 0.00 5.75 0.00 0.05 18.78 0.00 0.04 0.00 0.04 0.04 0.03
#<machine>:~$ df -h
Filesystem Size Used Avail Use% Mounted on
/dev/sda3 40G 5.9G 32G 16% /
tmpfs 252G 0 252G 0% /dev/shm
/dev/sda1 488M 113M 350M 25% /boot
/dev/sda8 57G 460M 54G 1% /tmp
/dev/sda7 9.8G 1.1G 8.2G 12% /var
/dev/sda5 40G 17G 21G 45% /var/log
/dev/sda6 30G 4.4G 24G 16% /var/log/audit.d
/dev/sdb1 7.2T 3.3T 3.6T 48% /disk1
/dev/sdc1 7.2T 3.3T 3.6T 49% /disk2
/dev/sdd1 7.2T 3.3T 3.6T 48% /disk3
/dev/sde1 7.2T 3.3T 3.6T 48% /disk4
/dev/sdf1 7.2T 3.3T 3.6T 48% /disk5
/dev/sdi1 7.2T 3.3T 3.6T 48% /disk6
/dev/sdg1 7.2T 3.3T 3.6T 48% /disk7
/dev/sdh1 7.2T 3.3T 3.6T 48% /disk8
/dev/sdj1 7.2T 3.3T 3.6T 48% /disk9
/dev/sdk1 7.2T 3.3T 3.6T 48% /disk10
/dev/sdm1 7.2T 3.3T 3.6T 48% /disk11
/dev/sdl1 7.2T 3.3T 3.6T 48% /disk12
This is version Hadoop 2.0.0-cdh4.3.0. Its highly-available with 3 zookeeper nodes, 2 namenodes, and 35 datanodes. YARN is not installed. Using hbase, oozie. Jobs mainly come in via Hive and HUE.
Each datanode has 2 physical cpus, each with 22 cores. Hyperthreading is enabled.
If you need more information, please let me know. My guess is maybe I need more reducers, there are mapred-site.xml settings that need tuning, perhaps the input data from map phase is too large, or Hive query needs written better. Im fairly new administrator to Hadoop, any detailed advice is great.
Thanks!

Go routine performance maximizing

I writing a data mover in go. Taking data located in one data center and moving it to another data center. Figured go would be perfect for this given the go routines.
I notice if I have one program running 1800 threads the amount of data being transmitted is really low
here's the dstat print out averaged over 30 seconds
---load-avg--- ----total-cpu-usage---- -dsk/total- -net/total- ---paging-- ---system--
1m 5m 15m |usr sys idl wai hiq siq| read writ| recv send| in out | int csw
0.70 3.58 4.42| 10 1 89 0 0 0| 0 156k|7306k 6667k| 0 0 | 11k 6287
0.61 3.28 4.29| 12 2 85 0 0 1| 0 6963B|8822k 8523k| 0 0 | 14k 7531
0.65 3.03 4.18| 12 2 86 0 0 1| 0 1775B|8660k 8514k| 0 0 | 13k 7464
0.67 2.81 4.07| 12 2 86 0 0 1| 0 1638B|8908k 8735k| 0 0 | 13k 7435
0.67 2.60 3.96| 12 2 86 0 0 1| 0 819B|8752k 8385k| 0 0 | 13k 7445
0.47 2.37 3.84| 11 2 86 0 0 1| 0 2185B|8740k 8491k| 0 0 | 13k 7548
0.61 2.22 3.74| 10 2 88 0 0 0| 0 1229B|7122k 6765k| 0 0 | 11k 6228
0.52 2.04 3.63| 3 1 97 0 0 0| 0 546B|1999k 1365k| 0 0 |3117 2033
If I run 9 instances of the program with 200 threads each I see much better performance
---load-avg--- ----total-cpu-usage---- -dsk/total- -net/total- ---paging-- ---system--
1m 5m 15m |usr sys idl wai hiq siq| read writ| recv send| in out | int csw
8.34 9.56 8.78| 53 8 36 0 0 3| 0 410B| 38M 32M| 0 0 | 41k 26k
8.01 9.37 8.74| 74 10 12 0 0 4| 0 137B| 51M 51M| 0 0 | 59k 39k
8.36 9.31 8.74| 75 9 12 0 0 4| 0 1092B| 51M 51M| 0 0 | 59k 39k
6.93 8.89 8.62| 74 10 12 0 0 4| 0 5188B| 50M 49M| 0 0 | 59k 38k
7.09 8.73 8.58| 75 9 12 0 0 4| 0 410B| 51M 50M| 0 0 | 60k 39k
7.40 8.62 8.54| 75 9 12 0 0 4| 0 137B| 52M 49M| 0 0 | 61k 40k
7.96 8.63 8.55| 75 9 12 0 0 4| 0 956B| 51M 51M| 0 0 | 59k 39k
7.46 8.44 8.49| 75 9 12 0 0 4| 0 273B| 51M 50M| 0 0 | 58k 38k
8.08 8.51 8.51| 75 9 12 0 0 4| 0 410B| 51M 51M| 0 0 | 59k 39k
load average is a little high but I'll worry about that later. The network traffic though is almost hitting the network potential.
I'm on Ubuntu 12.04,
8 Gigs Ram,
2.3 GHz processors (says EC2 :P)
Also, I've increased my file descriptors from 1024 to 10240
I thought go was designed for this kind of thing or am I expecting too much of go for this application?
Is there something trivial that I'm missing? Do I need to configure my system to maximizes go's potential?
EDIT
I guess my question wasn't clear enough. Sorry. I'm not asking for magic from go, I know the computers have limitations to what they can handle.
So I'll rephrase. Why is 1 instance with 1800 go routines != 9 instances with 200 threads each? Same amount of go routines significantly less performance for 1 instance compared to 9 instances.
Please note, that goroutines are also limited to your local maschine and that channels are not natively network enabled, i.e. your particular case is probably not biting go's chocolate site.
Also: What did you expect from throwing (suposedly) every transfer into a goroutine? IO-Operations tend to have their bottleneck where the bits hit the metal, i.e. the physical transfer of the data to the medium. Think of it like that: No matter how many Threads or (Goroutines in this case) try to write to Networkcard, you still only have one Networkcard. Most likely hitting it with to many concurrent write calls will only slow things down, since the involved overhead increases
If you think this is not the problem or want to audit your code for optimized performance, go has neat builtin features to do so: profiling go programs (official go blog)
But still the actual bottleneck might well be outside your go program AND/OR in the way it interacts with the os.
Adressing your actual problem without code is pointless guessing. Post some and everyone will try their best to help you.
You will probably have to post your source code to get any real input, but just to be sure, you have increased number of cpus to use?
import "runtime"
func main() {
runtime.GOMAXPROCS(runtime.NumCPU())
}

percentage of memory used used by a process

percentage of memory used used by a process.
normally prstat -J will give the memory of process image and RSS(resident set size) etc.
how do i knowlist of processes with percentage of memory is used by a each process.
i am working on solaris unix.
addintionally ,what are the regular commands that you use for monitoring processes,performences of processes that might be very useful to all!
The top command will give you several memory-consumption numbers. htop is much nicer, and will give you percentages, but it isn't installed by default on most systems.
run
top and then Shift+O this will bring you to the options, press n (this maybe different on your machine) for memory and then hit enter
Example of memory sort.
top - 08:17:29 up 3 days, 8:54, 6 users, load average: 13.98, 14.01, 11.60
Tasks: 654 total, 2 running, 652 sleeping, 0 stopped, 0 zombie
Cpu(s): 14.7%us, 1.5%sy, 0.0%ni, 59.5%id, 23.5%wa, 0.1%hi, 0.8%si, 0.0%st
Mem: 65851896k total, 49049196k used, 16802700k free, 1074664k buffers
Swap: 50331640k total, 0k used, 50331640k free, 32776940k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
21635 oracle 15 0 6750m 636m 51m S 1.6 1.0 62:34.53 oracle
21623 oracle 15 0 6686m 572m 53m S 1.1 0.9 61:16.95 oracle
21633 oracle 16 0 6566m 445m 235m S 3.7 0.7 30:22.60 oracle
21615 oracle 16 0 6550m 428m 220m S 3.7 0.7 29:36.74 oracle
16349 oracle RT 0 431m 284m 41m S 0.5 0.4 2:41.08 ocssd.bin
17891 root RT 0 139m 118m 40m S 0.5 0.2 41:08.19 osysmond
18154 root RT 0 182m 98m 43m S 0.0 0.2 10:02.40 ologgerd
12211 root 15 0 1432m 84m 14m S 0.0 0.1 17:57.80 java
Another method on Solaris is to do the following
prstat -s size 1 1
Example prstat output
www004:/# prstat -s size 1 1
PID USERNAME SIZE RSS STATE PRI NICE TIME CPU PROCESS/NLWP
420 nobody 139M 60M sleep 29 10 1:46:56 0.1% webservd/76
603 nobody 135M 59M sleep 29 10 5:33:18 0.1% webservd/96
339 root 134M 70M sleep 59 0 0:35:38 0.0% java/24
435 iplanet 132M 55M sleep 29 10 1:10:39 0.1% webservd/76
573 nobody 131M 53M sleep 29 10 0:24:32 0.0% webservd/76
588 nobody 130M 53M sleep 29 10 2:40:55 0.1% webservd/86
454 nobody 128M 51M sleep 29 10 0:09:01 0.0% webservd/76
489 iplanet 126M 49M sleep 29 10 0:00:13 0.0% webservd/74
405 root 119M 45M sleep 29 10 0:00:13 0.0% webservd/31
717 root 54M 46M sleep 59 0 2:31:27 0.2% agent/7
Keep in mind this is sorted by Size not RSS, if you need it by RSS use the rss key
www004:/# prstat -s rss 1 1
PID USERNAME SIZE RSS STATE PRI NICE TIME CPU PROCESS/NLWP
339 root 134M 70M sleep 59 0 0:35:39 0.1% java/24
420 nobody 139M 60M sleep 29 10 1:46:57 0.4% webservd/76
603 nobody 135M 59M sleep 29 10 5:33:19 0.5% webservd/96
435 iplanet 132M 55M sleep 29 10 1:10:39 0.0% webservd/76
573 nobody 131M 53M sleep 29 10 0:24:32 0.0% webservd/76
588 nobody 130M 53M sleep 29 10 2:40:55 0.0% webservd/86
454 nobody 128M 51M sleep 29 10 0:09:01 0.0% webservd/76
489 iplanet 126M 49M sleep 29 10 0:00:13 0.0% webservd/74
I'm not sure if ps is standardized but at least on linux, ps -o %mem gives the percentage of memory used (you would obviously want to add some other columns as well)

Resources