swift2 How to parse variable spaced text - swift2

I have the following data (a portion shown below) and I'm trying to determine the best way to parse this into an array where each line would be parsed separately. The problem I'm running into is that each "column" is separated by a various number of spaces.
I have tried using .componentsSeparatedBySpaces(" ") but that doesn't give me a consistent number items in the array. I thought of using whiteSpace but some team names have 2 words in them and some have 3.
A sample of the text follows:
1 New England Patriots = 28.69 5 0 0 20.34( 13) 1 0 0 | 3 0 0 | 28.68 1 | 28.95 1 | 28.66 2
2 Green Bay Packers = 27.97 6 0 0 17.80( 28) 1 0 0 | 1 0 0 | 27.47 2 | 28.73 2 | 29.01 1
3 Denver Broncos = 26.02 6 0 0 19.02( 23) 0 0 0 | 2 0 0 | 25.21 5 | 27.25 3 | 27.98 3
4 Cincinnati Bengals = 25.96 6 0 0 19.91( 18) 1 0 0 | 3 0 0 | 25.71 4 | 26.38 4 | 26.36 4
5 Arizona Cardinals = 25.01 4 2 0 18.05( 27) 0 1 0 | 0 1 0 | 26.47 3 | 24.17 6 | 23.37 7
6 Pittsburgh Steelers = 24.87 4 2 0 21.17( 10) 1 1 0 | 1 2 0 | 25.17 6 | 24.53 5 | 24.39 5
7 Seattle Seahawks = 24.04 2 4 0 20.92( 12) 0 2 0 | 0 3 0 | 24.47 7 | 23.29 7 | 23.37 6
8 Philadelphia Eagles = 23.87 3 3 0 20.02( 17) 1 1 0 | 2 2 0 | 24.28 8 | 23.01 8 | 23.23 8
9 New York Jets = 22.95 4 1 0 18.41( 25) 0 1 0 | 0 1 0 | 23.83 9 | 22.77 10 | 21.69 11
10 Atlanta Falcons = 22.18 5 1 0 19.31( 21) 1 0 0 | 3 0 0 | 22.36 10 | 22.33 11 | 21.86 10

Related

Generation of a counter variable for episodes in panel data in stata [duplicate]

This question already has an answer here:
Calculating consecutive ones
(1 answer)
Closed 1 year ago.
I am trying to generate a counter variable that describes the duration of a temporal episode in panel data.
I am using long format data that looks something like this:
clear
input byte id int time byte var1 int aim1
1 1 0 .
1 2 0 .
1 3 1 1
1 4 1 2
1 5 0 .
1 6 0 .
1 7 0 .
2 1 0 .
2 2 1 1
2 3 1 2
2 4 1 3
2 5 0 .
2 6 1 1
2 7 1 2
end
I want to generate a variable like aim1 that starts with a value of 1 when var1==1, and counts up one unit with each subsequent observation per ID where var1 is still equal to 1. For each observation where var1!=1, aim1 should contain missing values.
I already tried using rangestat (count) to solve the problem, however the created variable does not restart the count with each episode:
ssc install rangestat
gen var2=1 if var1==1
rangestat (count) aim2=var2, interval(time -7 0) by (id)
Here are two ways to do it: (1) from first principles, but see this paper for more and (2) using tsspell from SSC.
clear
input byte id int time byte var1 int aim1
1 1 0 .
1 2 0 .
1 3 1 1
1 4 1 2
1 5 0 .
1 6 0 .
1 7 0 .
2 1 0 .
2 2 1 1
2 3 1 2
2 4 1 3
2 5 0 .
2 6 1 1
2 7 1 2
end
bysort id (time) : gen wanted = 1 if var1 == 1 & var1[_n-1] != 1
by id: replace wanted = wanted[_n-1] + 1 if var1 == 1 & missing(wanted)
tsset id time
ssc inst tsspell
tsspell, cond(var1 == 1)
list, sepby(id _spell)
+---------------------------------------------------------+
| id time var1 aim1 wanted _seq _spell _end |
|---------------------------------------------------------|
1. | 1 1 0 . . 0 0 0 |
2. | 1 2 0 . . 0 0 0 |
|---------------------------------------------------------|
3. | 1 3 1 1 1 1 1 0 |
4. | 1 4 1 2 2 2 1 1 |
|---------------------------------------------------------|
5. | 1 5 0 . . 0 0 0 |
6. | 1 6 0 . . 0 0 0 |
7. | 1 7 0 . . 0 0 0 |
|---------------------------------------------------------|
8. | 2 1 0 . . 0 0 0 |
|---------------------------------------------------------|
9. | 2 2 1 1 1 1 1 0 |
10. | 2 3 1 2 2 2 1 0 |
11. | 2 4 1 3 3 3 1 1 |
|---------------------------------------------------------|
12. | 2 5 0 . . 0 0 0 |
|---------------------------------------------------------|
13. | 2 6 1 1 1 1 2 0 |
14. | 2 7 1 2 2 2 2 1 |
+---------------------------------------------------------+
The approach of tsspell is very close to what you ask for, except (a) its counter (by default _seq is 0 when out of spell, but replace _seq = . if _seq == 0 gets what you ask (b) its auxiliary variables (by default _spell and _end) are useful in many problems. You must install tsspell before you can use it with ssc install tsspell.

how to record properties of other variables in stata

I have to generate variables entry_1, entry_2 and entry_3 which will adopt the value 1 if id_i for that particular month had entry=1.
Example.
id month entry entry_1 entry_2 entry_3
1 1 1 1 0 0
1 2 0 0 0 0
1 3 0 0 1 1
1 4 0 0 0 0
2 1 0 1 0 0
2 2 0 0 0 0
2 3 1 0 1 1
2 4 0 0 0 0
3 1 0 1 0 0
3 2 0 0 0 0
3 3 1 0 1 1
3 4 0 0 0 0
Would anyone be so kind to propose an idea of how to implement a loop in order to do this?
I am thinking of something like this:
forvalues i=1(1)3 {
gen entry`i'=0
replace entry`i'=1 if on that particular month id=`i' had entry=1
}
You could do something like this (although your data don't quite look right for the question you're asking):
forvalues i = 1/3 {
gen entry_`i' = id == `i' & entry == 1
}
This generates a dummy variable entry_i for each i in the forvalues loop where entry_i = 1 if id is i and entry is 1, and 0 otherwise.
The code can be simplified down to at most one loop.
clear
input id month entry entry_1 entry_2 entry_3
1 1 1 1 0 0
1 2 0 0 0 0
1 3 0 0 1 1
1 4 0 0 0 0
2 1 0 1 0 0
2 2 0 0 0 0
2 3 1 0 1 1
2 4 0 0 0 0
3 1 0 1 0 0
3 2 0 0 0 0
3 3 1 0 1 1
3 4 0 0 0 0
end
forval j = 1/4 {
egen entry`j' = total(entry & id == `j'), by(month)
}
list id month entry entry? , sepby(id)
+--------------------------------------------------------+
| id month entry entry1 entry2 entry3 entry4 |
|--------------------------------------------------------|
1. | 1 1 1 1 0 0 0 |
2. | 1 2 0 0 0 0 0 |
3. | 1 3 0 0 1 1 0 |
4. | 1 4 0 0 0 0 0 |
|--------------------------------------------------------|
5. | 2 1 0 1 0 0 0 |
6. | 2 2 0 0 0 0 0 |
7. | 2 3 1 0 1 1 0 |
8. | 2 4 0 0 0 0 0 |
|--------------------------------------------------------|
9. | 3 1 0 1 0 0 0 |
10. | 3 2 0 0 0 0 0 |
11. | 3 3 1 0 1 1 0 |
12. | 3 4 0 0 0 0 0 |
+--------------------------------------------------------+

Collecting CPU time log using Apache Flume

I am new for hadoop and learning apache Flume. I installed CDH 4.7 on Virtualbox. The below command will output the top cputime. How can I transfer this log data output of the below command to my HDFS using Apache flume?. How to create the flume configuration file?
user#computer-Lenovo-IdeaPad-S510p:$ dstat -ta --top-cputime
----system---- ----total-cpu-usage---- -dsk/total- -net/total- ---paging-- ---system-- --highest-total--
time |usr sys idl wai hiq siq| read writ| recv send| in out | int csw | cputime process
27-02 13:14:32| 6 5 87 1 0 0| 216k 235k| 0 0 | 0 11B| 11k 2934 |X 29
27-02 13:14:33| 1 7 93 0 0 0| 64k 176k| 0 0 | 0 0 | 38k 3194 |X 8650
27-02 13:14:34| 2 11 87 0 0 0| 512B 188k| 0 0 | 0 0 | 24k 2612 | --enable-cra 11
27-02 13:14:35| 2 13 85 0 0 0| 45k 56k| 0 0 | 0 0 | 22k 2432 |X 11
27-02 13:14:36| 2 13 85 0 0 0|2093k 0 | 0 0 | 0 0 | 25k 3962 |VirtualBox 12
27-02 13:14:37| 1 4 95 1 0 0| 0 20k| 0 0 | 0 0 | 27k 3126 |VirtualBox 8942
27-02 13:14:38| 2 7 92 0 0 0| 0 8192B| 0 0 | 0 0 | 21k 3019 |VirtualBox 9082
27-02 13:14:39| 3 9 88 0 0 0| 512B 168k| 0 0 | 0 0 | 30k 2508 | --enable-cra 16
27-02 13:14:40| 2 13 86 0 0 0| 0 0 | 0 0 | 0 0 | 21k 2433 |VirtualBox 8041
27-02 13:14:41| 1 10 88 0 0 0| 0 0 | 0 0 | 0 0 | 19k 3191 |VirtualBox 10
27-02 13:14:42| 2 7 91 0 0 0| 32k 0 | 0 0 | 0 0 | 23k 2799 |X 8713
27-02 13:14:43| 2 7 90 1 0 0| 0 192k| 0 0 | 0 0 | 39k 2696 |X 10
27-02 13:14:44| 2 11 87 0 0 0| 0 140k| 0 0 | 0 0 | 35k 2434 |VirtualBox 8961
27-02 13:14:45| 2 11 87 0 0 0| 0 0 | 0 0 | 0 0 | 19k 2157 |VirtualBox 8126
27-02 13:14:46| 2 15 83 0 0 0| 182k 0 | 0 0 | 0 0 | 20k 3262 |VirtualBox 13^C
You can use flume exec source, to collection log and use hdfs sink to store log.
config can like this:
a1.sources = r1
a1.channels = c1
a1.sources.r1.type = exec
a1.sources.r1.command = dstat -ta --top-cputime
a1.sources.r1.channels = c1
http://flume.apache.org/FlumeUserGuide.html#hdfs-sink
http://flume.apache.org/FlumeUserGuide.html#exec-source

How do I read this network "frame" diagram?

Many times, such as on the website describing the WebSocket diagram here I see "frame" diagrams (at least that is what I think they are called) like the following:
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-------+-+-------------+-------------------------------+
|F|R|R|R| opcode|M| Payload len | Extended payload length |
|I|S|S|S| (4) |A| (7) | (16/63) |
|N|V|V|V| |S| | (if payload len==126/127) |
| |1|2|3| |K| | |
+-+-+-+-+-------+-+-------------+ - - - - - - - - - - - - - - - +
| Extended payload length continued, if payload len == 127 |
+ - - - - - - - - - - - - - - - +-------------------------------+
| |Masking-key, if MASK set to 1 |
+-------------------------------+-------------------------------+
| Masking-key (continued) | Payload Data |
+-------------------------------- - - - - - - - - - - - - - - - +
: Payload Data continued ... :
+ - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +
| Payload Data continued ... |
+---------------------------------------------------------------+
Can someone explain to me how to read such a diagram? The way I interpret it would be that the 0 1 2 3 on the top would be the bytes that arrive in a packet, and the 0-9 repeating would be the individual bits. However this doesn't make sense as there are only 8 bits in a byte.
Further more:
What are fin rsv opcode and mask?
What exactly is the Payload Data.
Is this entire frame one packet, or are there multiple frames in a packet?
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
means
| 1st byte | 2nd byte | 3rd byte | 4th byte |
+-----------------+-----------------+-----------------+-----------------+
| 0 0 0 0 0 0 0 0 | 0 0 1 1 1 1 1 1 | 1 1 1 1 2 2 2 2 | 2 2 2 2 2 2 3 3 |
| 0 1 2 3 4 5 6 7 | 8 9 0 1 2 3 4 5 | 6 7 8 9 0 1 2 3 | 4 5 6 7 8 9 0 1 |
That is, the table is 32-bit (= 4-byte) wide.
Descriptions about fin, rsv, opcode and mask are written right after the table you excerpted from RFC 6455.
Payload Data is a byte array. It is application-specific data.
The table represents the structure of one frame. A message consists of either one frame or multiple frames.

Filling in gaps with awk or anything

I have a list such as below, where the 1 column is position and the other columns aren't important for this question.
1 1 2 3 4 5
2 1 2 3 4 5
5 1 2 3 4 5
8 1 2 3 4 5
9 1 2 3 4 5
10 1 2 3 4 5
11 1 2 3 4 5
I want to fill in the gaps such that the list is continuous and it reads
1 1 2 3 4 5
2 1 2 3 4 5
3 0 0 0 0 0
4 0 0 0 0 0
5 1 2 3 4 5
6 0 0 0 0 0
7 0 0 0 0 0
8 1 2 3 4 5
9 1 2 3 4 5
10 1 2 3 4 5
11 1 2 3 4 5
I am familiar with awk and shell scripts, but whatever way it can be done is fine with me.
Thanks for any help..
this one-liner may work for you:
awk '$1>++p{for(;p<$1;p++)print p"  0 0 0 0 0"}1' file
with your example:
kent$ echo '1 1 2 3 4 5
2 1 2 3 4 5
5 1 2 3 4 5
8 1 2 3 4 5
9 1 2 3 4 5
10 1 2 3 4 5
11 1 2 3 4 5'|awk '$1>++p{for(;p<$1;p++)print p" 0 0 0 0 0"}1'
1 1 2 3 4 5
2 1 2 3 4 5
3 0 0 0 0 0
4 0 0 0 0 0
5 1 2 3 4 5
6 0 0 0 0 0
7 0 0 0 0 0
8 1 2 3 4 5
9 1 2 3 4 5
10 1 2 3 4 5
11 1 2 3 4 5
You can use the following awk one-liner:
awk '{b=a;a=$1;while(a>(b++)+1){print(b+1)," 0 0 0 0 0"}}1' input.file
Tested with here-doc input:
awk '{b=a;a=$1;while(a>(b++)+1){print(b+1)," 0 0 0 0 0"}}1' <<EOF
1 1 2 3 4 5
2 1 2 3 4 5
5 1 2 3 4 5
8 1 2 3 4 5
9 1 2 3 4 5
10 1 2 3 4 5
11 1 2 3 4 5
EOF
the output is as follows:
1 1 2 3 4 5
2 1 2 3 4 5
3 0 0 0 0 0
4 0 0 0 0 0
5 1 2 3 4 5
6 0 0 0 0 0
7 0 0 0 0 0
8 1 2 3 4 5
9 1 2 3 4 5
10 1 2 3 4 5
11 1 2 3 4 5
Explanation:
On every input line b is set to a where a is the value of the first column. Because of the order in which b and a are initialized, b can be used in a while loop that runs as long as b < a-1 and inserts the missing lines, filled up with zeros. The 1 at the end of the script will finally print the input line.
This is only for fun:
join -a2 FILE <(seq -f "%g 0 0 0 0 0" $(tail -1 FILE | cut -d' ' -f1)) | cut -d' ' -f -6
produces:
1 1 2 3 4 5
2 1 2 3 4 5
3 0 0 0 0 0
4 0 0 0 0 0
5 1 2 3 4 5
6 0 0 0 0 0
7 0 0 0 0 0
8 1 2 3 4 5
9 1 2 3 4 5
10 1 2 3 4 5
11 1 2 3 4 5
Here is another way:
awk '{x=$1-b;while(x-->1){print ++b," 0 0 0 0 0"};b=$1}1' file
Test:
$ cat file
1 1 2 3 4 5
2 1 2 3 4 5
5 1 2 3 4 5
8 1 2 3 4 5
9 1 2 3 4 5
10 1 2 3 4 5
11 1 2 3 4 5
$ awk '{x=$1-b;while(x-->1){print ++b," 0 0 0 0 0"};b=$1}1' file
1 1 2 3 4 5
2 1 2 3 4 5
3 0 0 0 0 0
4 0 0 0 0 0
5 1 2 3 4 5
6 0 0 0 0 0
7 0 0 0 0 0
8 1 2 3 4 5
9 1 2 3 4 5
10 1 2 3 4 5
11 1 2 3 4 5

Resources