Collecting CPU time log using Apache Flume - hadoop

I am new for hadoop and learning apache Flume. I installed CDH 4.7 on Virtualbox. The below command will output the top cputime. How can I transfer this log data output of the below command to my HDFS using Apache flume?. How to create the flume configuration file?
user#computer-Lenovo-IdeaPad-S510p:$ dstat -ta --top-cputime
----system---- ----total-cpu-usage---- -dsk/total- -net/total- ---paging-- ---system-- --highest-total--
time |usr sys idl wai hiq siq| read writ| recv send| in out | int csw | cputime process
27-02 13:14:32| 6 5 87 1 0 0| 216k 235k| 0 0 | 0 11B| 11k 2934 |X 29
27-02 13:14:33| 1 7 93 0 0 0| 64k 176k| 0 0 | 0 0 | 38k 3194 |X 8650
27-02 13:14:34| 2 11 87 0 0 0| 512B 188k| 0 0 | 0 0 | 24k 2612 | --enable-cra 11
27-02 13:14:35| 2 13 85 0 0 0| 45k 56k| 0 0 | 0 0 | 22k 2432 |X 11
27-02 13:14:36| 2 13 85 0 0 0|2093k 0 | 0 0 | 0 0 | 25k 3962 |VirtualBox 12
27-02 13:14:37| 1 4 95 1 0 0| 0 20k| 0 0 | 0 0 | 27k 3126 |VirtualBox 8942
27-02 13:14:38| 2 7 92 0 0 0| 0 8192B| 0 0 | 0 0 | 21k 3019 |VirtualBox 9082
27-02 13:14:39| 3 9 88 0 0 0| 512B 168k| 0 0 | 0 0 | 30k 2508 | --enable-cra 16
27-02 13:14:40| 2 13 86 0 0 0| 0 0 | 0 0 | 0 0 | 21k 2433 |VirtualBox 8041
27-02 13:14:41| 1 10 88 0 0 0| 0 0 | 0 0 | 0 0 | 19k 3191 |VirtualBox 10
27-02 13:14:42| 2 7 91 0 0 0| 32k 0 | 0 0 | 0 0 | 23k 2799 |X 8713
27-02 13:14:43| 2 7 90 1 0 0| 0 192k| 0 0 | 0 0 | 39k 2696 |X 10
27-02 13:14:44| 2 11 87 0 0 0| 0 140k| 0 0 | 0 0 | 35k 2434 |VirtualBox 8961
27-02 13:14:45| 2 11 87 0 0 0| 0 0 | 0 0 | 0 0 | 19k 2157 |VirtualBox 8126
27-02 13:14:46| 2 15 83 0 0 0| 182k 0 | 0 0 | 0 0 | 20k 3262 |VirtualBox 13^C

You can use flume exec source, to collection log and use hdfs sink to store log.
config can like this:
a1.sources = r1
a1.channels = c1
a1.sources.r1.type = exec
a1.sources.r1.command = dstat -ta --top-cputime
a1.sources.r1.channels = c1
http://flume.apache.org/FlumeUserGuide.html#hdfs-sink
http://flume.apache.org/FlumeUserGuide.html#exec-source

Related

Programmable Logic Array (PLA) Design

I need to design Programmable Logic Array (PLA) circuit with given functions. How can I design circuit with POS form functions. Can someone explain that? enter image description here
Below is the design process completed for F, the other two outputs, G and H, can done in a similar fashon.
F Truth Table
A B C D # F G H
0 0 0 0 0 1
0 0 0 1 1 0
0 0 1 0 2 0
0 0 1 1 3 1
0 1 0 0 4 1
0 1 0 1 5 1
0 1 1 0 6 0
0 1 1 1 7 0
1 0 0 0 8 0
1 0 0 1 9 0
1 0 1 0 10 1
1 0 1 1 11 1
1 1 0 0 12 0
1 1 0 1 13 0
1 1 1 0 14 1
1 1 1 1 15 1
F Karnaugh Map:
AB\CD 00 01 11 10
\
----------
00 0 | 1 x 1 | 0
----------
--------
01 | 1 y 1 | 0 0
--------
----------
11 0 0 | 1 1 |
| z |
| |
10 0 0 | 1 1 |
----------
F Sum Of Product Boolean:
(A'B'D)+(A'BC')+(AC)
x y z ==>(for ref):
x=(A'B'D)
y=(A'BC')
z=(AC)
F Logic Circuit:
------- ------
A---NOT---| | ------- | |
| AND |---| | | |
B---NOT---| | | AND | x | |
------- | |--------------| |
D---------------------------| | | |
------- | |
| | F
------- | OR |---
A---NOT---| | ------- ------ | |
| AND |---| | y | | | |
C---NOT---| | | AND |---| | | |
------- | | | | y+z | |
B---------------------------| | | |-----| |
------- | OR | | |
------- | | | |
A---NOT---| | z | | | |
| AND |---------------------| | ------
C---NOT---| | | |
------- | |
------

AWK Formatting Using First Row as a Header and Iterating by column

I'm struggling trying to format a collectd ploted file si I can later import it to an influx db instance.
This is how the file looks like:
#Date Time [CPU]User% [CPU]Nice% [CPU]Sys% [CPU]Wait% [CPU]Irq% [CPU]Soft% [CPU]Steal% [CPU]Idle% [CPU]Totl% [CPU]Intrpt/sec [CPU]Ctx/sec [CPU]Proc/sec [CPU]ProcQue [CPU]ProcRun [CPU]L-Avg1 [CPU]L-Avg5 [CPU]L-Avg15 [CPU]RunTot [CPU]BlkTot [MEM]Tot [MEM]Used [MEM]Free [MEM]Shared [MEM]Buf [MEM]Cached [MEM]Slab [MEM]Map [MEM]Anon [MEM]Commit [MEM]Locked [MEM]SwapTot [MEM]SwapUsed [MEM]SwapFree [MEM]SwapIn [MEM]SwapOut [MEM]Dirty [MEM]Clean [MEM]Laundry [MEM]Inactive [MEM]PageIn [MEM]PageOut [MEM]PageFaults [MEM]PageMajFaults [MEM]HugeTotal [MEM]HugeFree [MEM]HugeRsvd [MEM]SUnreclaim [SOCK]Used [SOCK]Tcp [SOCK]Orph [SOCK]Tw [SOCK]Alloc [SOCK]Mem [SOCK]Udp [SOCK]Raw [SOCK]Frag [SOCK]FragMem [NET]RxPktTot [NET]TxPktTot [NET]RxKBTot [NET]TxKBTot [NET]RxCmpTot [NET]RxMltTot [NET]TxCmpTot [NET]RxErrsTot [NET]TxErrsTot [DSK]ReadTot [DSK]WriteTot [DSK]OpsTot [DSK]ReadKBTot [DSK]WriteKBTot [DSK]KbTot [DSK]ReadMrgTot [DSK]WriteMrgTot [DSK]MrgTot [INODE]NumDentry [INODE]openFiles [INODE]MaxFile% [INODE]used [NFS]ReadsS [NFS]WritesS [NFS]MetaS [NFS]CommitS [NFS]Udp [NFS]Tcp [NFS]TcpConn [NFS]BadAuth [NFS]BadClient [NFS]ReadsC [NFS]WritesC [NFS]MetaC [NFS]CommitC [NFS]Retrans [NFS]AuthRef [TCP]IpErr [TCP]TcpErr [TCP]UdpErr [TCP]IcmpErr [TCP]Loss [TCP]FTrans [BUD]1Page [BUD]2Pages [BUD]4Pages [BUD]8Pages [BUD]16Pages [BUD]32Pages [BUD]64Pages [BUD]128Pages [BUD]256Pages [BUD]512Pages [BUD]1024Pages
20190228 00:01:00 12 0 3 0 0 1 0 84 16 26957 20219 14 2991 3 0.05 0.18 0.13 1 0 198339428 197144012 1195416 0 817844 34053472 1960600 76668 158641184 201414800 0 17825788 0 17825788 0 0 224 0 0 19111168 3 110 4088 0 0 0 0 94716 2885 44 0 5 1982 1808 0 0 0 0 9739 9767 30385 17320 0 0 0 0 0 0 12 13 3 110 113 0 16 16 635592 7488 0 476716 0 0 0 0 0 0 0 0 0 0 0 8 0 0 22 0 1 0 0 0 0 48963 10707 10980 1226 496 282 142 43 19 6 132
20190228 00:02:00 11 0 3 0 0 1 0 85 15 26062 18226 5 2988 3 0.02 0.14 0.12 2 0 198339428 197138128 1201300 0 817856 34054692 1960244 75468 158636064 201398036 0 17825788 0 17825788 0 0 220 0 0 19111524 0 81 960 0 0 0 0 94420 2867 42 0 7 1973 1842 0 0 0 0 9391 9405 28934 16605 0 0 0 0 0 0 9 9 0 81 81 0 11 11 635446 7232 0 476576 0 0 0 0 0 0 0 0 0 0 0 3 0 0 8 0 1 0 0 0 0 49798 10849 10995 1241 499 282 142 43 19 6 132
20190228 00:03:00 11 0 3 0 0 1 0 85 15 25750 17963 4 2980 0 0.00 0.11 0.10 2 0 198339428 197137468 1201960 0 817856 34056400 1960312 75468 158633880 201397832 0 17825788 0 17825788 0 0 320 0 0 19111712 0 75 668 0 0 0 0 94488 2869 42 0 5 1975 1916 0 0 0 0 9230 9242 28411 16243 0 0 0 0 0 0 9 9 0 75 75 0 10 10 635434 7232 0 476564 0 0 0 0 0 0 0 0 0 0 0 2 0 0 6 0 1 0 0 0 0 50029 10817 10998 1243 501 282 142 43 19 6 132
20190228 00:04:00 11 0 3 0 0 1 0 84 16 25755 17871 10 2981 5 0.08 0.11 0.10 3 0 198339428 197140864 1198564 0 817856 34058072 1960320 75468 158634508 201398088 0 17825788 0 17825788 0 0 232 0 0 19111980 0 79 2740 0 0 0 0 94488 2867 4 0 2 1973 1899 0 0 0 0 9191 9197 28247 16183 0 0 0 0 0 0 9 9 0 79 79 0 10 10 635433 7264 0 476563 0 0 0 0 0 0 0 0 0 0 0 5 0 0 12 0 1 0 0 0 0 49243 10842 10985 1245 501 282 142 43 19 6 132
20190228 00:05:00 12 0 4 0 0 1 0 83 17 26243 18319 76 2985 3 0.06 0.10 0.09 2 0 198339428 197148040 1191388 0 817856 34059808 1961420 75492 158637636 201405208 0 17825788 0 17825788 0 0 252 0 0 19112012 0 85 18686 0 0 0 0 95556 2884 43 0 6 1984 1945 0 0 0 0 9176 9173 28153 16029 0 0 0 0 0 0 10 10 0 85 85 0 12 12 635473 7328 0 476603 0 0 0 0 0 0 0 0 0 0 0 3 0 0 7 0 1 0 0 0 0 47625 10801 10979 1253 505 282 142 43 19 6 132
What I'm trying to do, is to get it in a format that looks like this:
cpu_value,host=mxspacr1,instance=5,type=cpu,type_instance=softirq value=180599 1551128614916131663
cpu_value,host=mxspacr1,instance=2,type=cpu,type_instance=interrupt value=752 1551128614916112943
cpu_value,host=mxspacr1,instance=4,type=cpu,type_instance=softirq value=205697 1551128614916128446
cpu_value,host=mxspacr1,instance=7,type=cpu,type_instance=nice value=19250943 1551128614916111618
cpu_value,host=mxspacr1,instance=2,type=cpu,type_instance=softirq value=160513 1551128614916127690
cpu_value,host=mxspacr1,instance=1,type=cpu,type_instance=softirq value=178677 1551128614916127265
cpu_value,host=mxspacr1,instance=0,type=cpu,type_instance=softirq value=212274 1551128614916126586
cpu_value,host=mxspacr1,instance=6,type=cpu,type_instance=interrupt value=673 1551128614916116661
cpu_value,host=mxspacr1,instance=4,type=cpu,type_instance=interrupt value=701 1551128614916115893
cpu_value,host=mxspacr1,instance=3,type=cpu,type_instance=interrupt value=723 1551128614916115492
cpu_value,host=mxspacr1,instance=1,type=cpu,type_instance=interrupt value=756 1551128614916112550
cpu_value,host=mxspacr1,instance=6,type=cpu,type_instance=nice value=21661921 1551128614916111032
cpu_value,host=mxspacr1,instance=3,type=cpu,type_instance=nice value=18494760 1551128614916098304
cpu_value,host=mxspacr1,instance=0,type=cpu,type_instance=interrupt value=552 1551
What I have managed to do so far is just to convert the date string into EPOCH format.
I was thinking somehow to use the first value "[CPU]" as the measurement, and the "User%" as the type, the host I can take it from the system where the script will run.
I would really appreciate your help, because I really basic knowledge of text editing.
Thanks.
EDIT: this is what would expect to get with the information of the second line using as a header the first row:
cpu_value,host=mxspacr1,type=cpu,type_instance=user% value=0 1551128614916131663
EDIT: This is what I have so far, and I'm stuck here.
awk -v HOSTNAME="$HOSTNAME" 'BEGIN { FS="[][]"; getline; NR==1; f1=$2; f2=$3 } { RS=" "; printf f1"_measurement,host="HOSTNAME",type="f2"value="$3" ", system("date +%s -d \""$1" "$2"\"") }' mxmcaim01-20190228.tab
And this is what I get, but this is only for 1 column, now I don't know how to process the remaining columns such as Nice, Sys, Wait and so on.
CPU_measurement,host=mxmcamon05,type=User% value= 1552014000
CPU_measurement,host=mxmcamon05,type=User% value= 1551960000
CPU_measurement,host=mxmcamon05,type=User% value= 1551343500
CPU_measurement,host=mxmcamon05,type=User% value= 1551997620
CPU_measurement,host=mxmcamon05,type=User% value= 1551985200
CPU_measurement,host=mxmcamon05,type=User% value= 1551938400
CPU_measurement,host=mxmcamon05,type=User% value= 1551949200
CPU_measurement,host=mxmcamon05,type=User% value= 1551938400
CPU_measurement,host=mxmcamon05,type=User% value= 1551938400
CPU_measurement,host=mxmcamon05,type=User% value= 1551945600
CPU_measurement,host=mxmcamon05,type=User% value= 1551938400
Please help.
EDIT. First of all, Thanks for your help.
Taking Advantage from you knowledge in text editing, I was expecting to use this for 3 separate files, but unfortunately and I don't know why the format is different, like this:
#Date Time SlabName ObjInUse ObjInUseB ObjAll ObjAllB SlabInUse SlabInUseB SlabAll SlabAllB SlabChg SlabPct
20190228 00:01:00 nfsd_drc 0 0 0 0 0 0 0 0 0 0
20190228 00:01:00 nfsd4_delegations 0 0 0 0 0 0 0 0 0 0
20190228 00:01:00 nfsd4_stateids 0 0 0 0 0 0 0 0 0 0
20190228 00:01:00 nfsd4_files 0 0 0 0 0 0 0 0 0 0
20190228 00:01:00 nfsd4_stateowners 0 0 0 0 0 0 0 0 0 0
20190228 00:01:00 nfs_direct_cache 0 0 0 0 0 0 0 0 0 0
So I don't how to handle the arrays in a way that I can use nfsd_drc as the type and then Iterate through ObjInUse ObjInUseB ObjAll ObjAllB SlabInUse SlabInUseB SlabAll SlabAllB SlabChg SlabPct and use them like the type_instance and finally the value in this case for ObjInUse will be 0, ObjInUseB = 0, ObjAll = 0, an so one, making something like this:
slab_value,host=mxspacr1,type=nfsd_drc,type_instance=ObjectInUse value=0 1551128614916131663
slab_value,host=mxspacr1,type=nfsd_drc,type_instance=ObjInuseB value=0 1551128614916131663
slab_value,host=mxspacr1,type=nfsd_drc,type_instance=ObjAll value=0 1551128614916112943
slab_value,host=mxspacr1,type=nfsd_drc,type_instance=ObjAllB value=0 1551128614916128446
slab_value,host=mxspacr1,type=nfsd_drc,type_instance=SlabInUse value=0 1551128614916111618
slab_value,host=mxspacr1,type=nfsd_drc,type_instance=SlabInUseB value=0 1551128614916127690
slab_value,host=mxspacr1,type=nfsd_drc,type_instance=SlabAll value=0 1551128614916127265
slab_value,host=mxspacr1,type=nfsd_drc,type_instance=SlabAllB value=0 1551128614916126586
slab_value,host=mxspacr1,type=nfsd_drc,type_instance=SlabChg value=0 1551128614916116661
slab_value,host=mxspacr1,type=nfsd_drc,type_instance=SlabPct value=0 1551128614916115893
slab_value is a hard-coded value.
Thanks.
It is not clear where do instance and type_instance=interrupt come from in your final desired format. Otherwise awk code below should work.
Note: it doesn't strip % from tag values and prints timestamp at end of line in seconds (append extra zeros if you want nanoseconds).
gawk -v HOSTNAME="$HOSTNAME" 'NR==1 {split($0,h,/[ \t\[\]]+/,s); for(i=0;i<length(h);i++){ h[i]=tolower(h[i]); };}; NR>1 { for(j=2;j<NF;j++) {k=2*j; printf("%s_value,host=%s,type=%s,type_instance=%s value=%s %s\n", h[k], HOSTNAME, h[k], h[k+1],$(j+1), mktime(substr($1,1,4)" "substr($1,5,2)" "substr($1,7,2)" "substr($2,1,2)" "substr($2,4,2)" "substr($2,7,2)));}}' mxmcaim01-20190228.tab

Extract column from file with shell [duplicate]

This question already has answers here:
bash: shortest way to get n-th column of output
(8 answers)
Closed 4 years ago.
I would like to extract column number 8 from the following table using shell (ash):
0xd024 2 0 32 20 3 0 1 0 2 1384 1692 -61 27694088
0xd028 0 1 5 11 1 0 46 0 0 301 187 -74 27689154
0xd02c 0 0 35 14 1 0 21 0 0 257 250 -80 27689410
0xd030 1 1 15 13 1 0 38 0 0 176 106 -91 27689666
0xd034 1 1 50 20 1 0 8 0 0 790 283 -71 27689980
0xd038 0 0 0 3 4 0 89 0 0 1633 390 -90 27690291
0xd03c 0 0 8 3 3 0 82 0 0 1837 184 -95 27690603
0xd040 0 0 4 5 1 0 90 0 0 0 148 -97 27690915
0xd064 0 0 36 9 1 0 29 0 0 321 111 -74 27691227
0xd068 0 0 5 14 14 0 40 0 0 8066 2270 -85 27691539
0xd06c 1 1 39 19 1 0 15 0 0 1342 261 -74 27691850
0xd070 0 0 12 11 1 0 53 0 0 203 174 -73 27692162
0xd074 0 0 18 2 1 0 75 0 0 301 277 -94 27692474
How can I do that?
the following command "awk '{print $8}' file" works fine

swift2 How to parse variable spaced text

I have the following data (a portion shown below) and I'm trying to determine the best way to parse this into an array where each line would be parsed separately. The problem I'm running into is that each "column" is separated by a various number of spaces.
I have tried using .componentsSeparatedBySpaces(" ") but that doesn't give me a consistent number items in the array. I thought of using whiteSpace but some team names have 2 words in them and some have 3.
A sample of the text follows:
1 New England Patriots = 28.69 5 0 0 20.34( 13) 1 0 0 | 3 0 0 | 28.68 1 | 28.95 1 | 28.66 2
2 Green Bay Packers = 27.97 6 0 0 17.80( 28) 1 0 0 | 1 0 0 | 27.47 2 | 28.73 2 | 29.01 1
3 Denver Broncos = 26.02 6 0 0 19.02( 23) 0 0 0 | 2 0 0 | 25.21 5 | 27.25 3 | 27.98 3
4 Cincinnati Bengals = 25.96 6 0 0 19.91( 18) 1 0 0 | 3 0 0 | 25.71 4 | 26.38 4 | 26.36 4
5 Arizona Cardinals = 25.01 4 2 0 18.05( 27) 0 1 0 | 0 1 0 | 26.47 3 | 24.17 6 | 23.37 7
6 Pittsburgh Steelers = 24.87 4 2 0 21.17( 10) 1 1 0 | 1 2 0 | 25.17 6 | 24.53 5 | 24.39 5
7 Seattle Seahawks = 24.04 2 4 0 20.92( 12) 0 2 0 | 0 3 0 | 24.47 7 | 23.29 7 | 23.37 6
8 Philadelphia Eagles = 23.87 3 3 0 20.02( 17) 1 1 0 | 2 2 0 | 24.28 8 | 23.01 8 | 23.23 8
9 New York Jets = 22.95 4 1 0 18.41( 25) 0 1 0 | 0 1 0 | 23.83 9 | 22.77 10 | 21.69 11
10 Atlanta Falcons = 22.18 5 1 0 19.31( 21) 1 0 0 | 3 0 0 | 22.36 10 | 22.33 11 | 21.86 10

Text processing using bash

I have a vmstat dump file that has the header and values in this format
procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------
r b swpd free buff cache si so bi bo in cs us sy id wa st
12 0 5924396 20810624 548548 935160 0 0 0 5 0 0 60 5 34 0 0
12 0 5924396 20768588 548548 935160 0 0 0 0 1045 1296 99 0 0 0 0
12 0 5924396 20768968 548548 935452 0 0 0 32 1025 1288 100 0 0 0 0
procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------
r b swpd free buff cache si so bi bo in cs us sy id wa st
4 0 5924396 20768724 548552 935408 0 0 0 92 1093 1377 33 0 67 0 0
I'm writing a script in bash that extracts just the lines containing lines with numbers i.e. values and remove all lines containing the alphabet. How do I do that
If you need a lines with numbers and tabs\spaces only, grep -P "^[0-9\ \t]*$" should helps you.
$> cat ./text | grep -P "^[0-9\ \t]*$"
12 0 5924396 20810624 548548 935160 0 0 0 5 0 0 60 5 34 0 0
12 0 5924396 20768588 548548 935160 0 0 0 0 1045 1296 99 0 0 0 0
12 0 5924396 20768968 548548 935452 0 0 0 32 1025 1288 100 0 0 0 0
4 0 5924396 20768724 548552 935408 0 0 0 92 1093 1377 33 0 67 0 0
cat filename | grep "[0-9]"

Resources