How to read all data into one line? - ruby

For example:
12 711
112 011 111 61 070 401 2216 515
4 14 516 3
read as
127111120111116107040122165154145163?
Well I reading about STDIN but idk

"12 711 112 011 111 61 070 401 2216 515 4 14 516 3".delete("\s")
or
"12 711 112 011 111 61 070 401 2216 515 4 14 516 3".gsub(/\s/,'')

Read from standard input, then delete all newlines and whitespace.
STDIN.read.delete "\n\s"

Related

When Cloudwatch Logs data is sent into kinesis data stream, what is its encoding format

I'm trying to write a Go program, to download data from aws kinesis data stream. I read that kinesis data stream encode the data with base64, so I need first decode with base64. However, I can't figure out what encoding was used on the data as it is passed, from cloudwatch logs to kinesis data stream.
I'm trying the different decoding method but none works. My unprocessed byte array downloaded from kinesis data stream is as the following:
[31 139 8 0 0 0 0 0 0 0 53 206 65 11 130 64 16 134 225 191 178 204 89 130 178 34 246 22 97 30 178 130 12 58 68 196 166 147 14 233 174 236 140 69 68 255 61 204 58 190 204 7 243 188 160 70 102 83 224 254 217 32 104 88 108 55 251 221 54 57 175 163 52 157 199 17 4 224 30 22 125 119 169 92 155 63 140 100 101 226 10 134 0 42 87 196 222 181 13 104 232 43 21 143 166 238 147 219 11 103 158 26 33 103 151 84 9 122 6 125 60 125 119 209 29 173 116 249 2 202 251 185 80 141 44 166 110 64 15 167 227 201 48 28 79 166 225 108 20 6 127 94 7 56 36 234 199 83 63 158 86 139 18 179 27 217 66 149 104 42 41 149 187 170 28 89 200 154 238 179 90 145 69 38 86 252 165 13 224 125 122 127 0 234 141 66 79 242 0 0 0]
Can someone give me some tips how to process this piece of data?
You can use a subscription filter with Kinesis, Lambda, or Kinesis Data Firehose. Logs that are sent to a receiving service through a subscription filter are base64 encoded and compressed with the gzip format.
https://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/SubscriptionFilters.html

Create a file according to sort contend

I have a list of more than 100000 records.
per example the values from 21 to 84 are continuous, then it will be 21-84 but if it is not continuous as the case 84 87, then it need to be 84,87 separated by ,
at beginning of each line will be the value 11111.
The values from the list will be in the column range of 21 to 80 with, at last.
The length of each row need to be maximum 80.
here is the input file.
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
87
85
86
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
108
111
109
112
110
113
115
114
117
116
118
124
125
120
122
123
126
132
127
133
128
130
131
135
136
137
138
139
140
141
142
143
144
145
146
148
147
149
150
151
152
153
154
155
156
158
157
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
184
183
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
214
here in the output file desired.
111111 21-84,87,85-86,88-106,108,111,109,112,110,113,115,114,117,
111111 116,118,124-125,120,122-123,126,132,127,133,128,130-131,
111111 135-146,148,147,149-156,158,157,159-182,184,183,185-212,214,
thanks in advance
Presented without explanation: check the man pages for the commands used and come back with questions:
awk '
function printrange() { print start (start == last ? "" : "-" last) }
NR == 1 {start=last=$1; next}
$1 == last+1 {last=$1; next}
{printrange(); start=last=$1}
END {printrange()}
' file | paste -sd" " | fold -sw 60 | tr ' ' ',' | sed 's/^/111111 /'
111111 21-84,87,85-86,88-106,108,111,109,112,110,113,115,114,117,
111111 116,118,124-125,120,122-123,126,132,127,133,128,130-131,
111111 135-146,148,147,149-156,158,157,159-182,184,183,185-212,214

Keep lines based on ratio between lines

I have a sort -g k9 command on a file that gives me this in the bash standard output:
55.19 645 156 15 9 520 58 702 0.0 661
55.50 636 159 16 9 520 58 693 0.0 654
55.19 645 156 15 9 520 58 702 0.0 658
56.52 644 147 16 9 520 59 701 0.0 669
55.97 645 151 15 9 520 65 709 0.0 672
55.97 645 151 15 9 520 65 709 4e-124 674
28.32 671 301 32 1 507 48 702 3e-49 183
28.32 671 301 32 1 507 47 701 3e-49 183
31.40 516 247 24 86 507 196 698 1e-46 176
31.41 519 243 25 86 507 196 698 5e-46 175
27.72 588 290 26 19 481 98 675 2e-39 154
30.56 337 170 17 101 413 302 598 5e-20 96.3
30.56 337 170 17 101 413 302 598 8e-20 95.5
I would like to cut my data based on the 9th column. The idea would be to compare the value of the 9th column on line i, divide it by the value of the 9th column on line i+1, and if the ratio is 0 OR 0/0 OR > 1e-50, line i and i+1 are kept. As soon as one of these conditions is not filled, stop reading. The desired output would be:
55.19 645 156 15 9 520 58 702 0.0 661
55.50 636 159 16 9 520 58 693 0.0 654
55.19 645 156 15 9 520 58 702 0.0 658
56.52 644 147 16 9 520 59 701 0.0 669
55.97 645 151 15 9 520 65 709 0.0 672
55.97 645 151 15 9 520 65 709 4e-124 674
I can obtain this output with head -n 6 but this is obviously not based on the condition on values in the 9th column. Please note that the values are in 'scientific' format.
I know how to do this in Python (write the standard output to a file, calculate ratios, etc.) but for commodity reasons I'd prefer a shell-based solution (with awk or sort for instance) although I don't know if that's possible. Thanks for your help!
Just exit the script when the condition is not accomplished; otherwise, print the previous line and store the 9th field to compare on the next loop:
$ awk '($9 && prev/$9>1e-50) {exit} {print stored; prev=$9; stored=$0}' file
55.19 645 156 15 9 520 58 702 0.0 661
55.50 636 159 16 9 520 58 693 0.0 654
55.19 645 156 15 9 520 58 702 0.0 658
56.52 644 147 16 9 520 59 701 0.0 669
55.97 645 151 15 9 520 65 709 0.0 672
55.97 645 151 15 9 520 65 709 4e-124 674

awk find the closest match of a list in a matrix

I look for common elements in two files or which row of a matrix has the most elements from a given row. what I understood until now is how to compare fields. I receive the lines which hold the same value in the same fieldnumber.
But how can I open the search to the other field numbers?
awk 'NR==FNR{a[$1];next}$1 in a{print $1" "FNR}' file1 file2
104 3
Expected output:
104 3 111 4 117 2 134 2 148 - 156 4 166 4 176 3 186 - 198 1 221 6 236 -
best match row 4 with 3 elements common.
file 1
104 111 117 134 148 156 166 176 186 198 221 236
file 2
102 108 116 124 132 141 151 162 173 185 198 211
103 109 117 125 134 143 153 163 175 187 200 213
104 110 118 126 135 144 154 165 176 188 201 215
105 111 119 127 136 145 156 166 178 190 203 217
106 112 120 128 137 147 157 168 179 192 205 219
107 113 121 130 139 148 158 169 181 193 207 221
108 114 122 131 140 150 160 171 183 195 208 200
This solution assumes 1) that file1 contains unique values as shown in the provided example and 2) there is only one top ranked line in file2.
awk -v string=$(cat file1 | tr " " ",") \
'{split(string,array,","); cnt=0;
for(i in array) {for(j=1;j<=NF;j++) if(array[i]==$j) cnt++};
if(cnt>cntmax) {cntmax=cnt; NRmax=NR}} END{print NRmax}' file2
4

R compare all list elements for duplicates

I am looking at all possible paths through a graph. I have written a DFS algorithm that finds all these paths. I want to make sure that my algorithm works correctly and that no two paths are identical. My algorithm returns a list that looks as follows:
....
[[2770]]
[1] 1 2 3 52 53 54 55 56 57 58 59 60 12 11 10 9 8 78 79 80 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129
[38] 130 131 132 133 134 137 138 139 140 141 142 143 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166
[[2771]]
[1] 1 2 3 52 53 54 55 56 57 58 59 60 12 11 10 9 8 78 79 80 113 114 115 143 144 145 146 147 148 149 150 151 152 153 154 155 156
[38] 157 158 159 160 161 162 163 164 165 166
[[2772]]
[1] 1 2 3 52 53 54 55 56 57 58 59 60 12 11 10 9 8 78 79 80 113 114 115 143 150 151 152 153 154 155 156 157 158 159 160 161 162
[38] 163 164 165 166
As you can see, the list is 2772 elements long. This means there are 2,772 paths through this graph. How can I easily compare all the list elements to make sure there are no duplicates. Just to be clear, the same set of numbers but in a different ordering represents a different path and is not a duplicate!
Thank you for your help!
maybe something like
test<-list(1:2,3:4,5:7,1:10,3:4,4:3)
dups<-duplicated(test)
idups<-seq_along(test)[dups]

Resources