I'm trying to write a Go program, to download data from aws kinesis data stream. I read that kinesis data stream encode the data with base64, so I need first decode with base64. However, I can't figure out what encoding was used on the data as it is passed, from cloudwatch logs to kinesis data stream.
I'm trying the different decoding method but none works. My unprocessed byte array downloaded from kinesis data stream is as the following:
[31 139 8 0 0 0 0 0 0 0 53 206 65 11 130 64 16 134 225 191 178 204 89 130 178 34 246 22 97 30 178 130 12 58 68 196 166 147 14 233 174 236 140 69 68 255 61 204 58 190 204 7 243 188 160 70 102 83 224 254 217 32 104 88 108 55 251 221 54 57 175 163 52 157 199 17 4 224 30 22 125 119 169 92 155 63 140 100 101 226 10 134 0 42 87 196 222 181 13 104 232 43 21 143 166 238 147 219 11 103 158 26 33 103 151 84 9 122 6 125 60 125 119 209 29 173 116 249 2 202 251 185 80 141 44 166 110 64 15 167 227 201 48 28 79 166 225 108 20 6 127 94 7 56 36 234 199 83 63 158 86 139 18 179 27 217 66 149 104 42 41 149 187 170 28 89 200 154 238 179 90 145 69 38 86 252 165 13 224 125 122 127 0 234 141 66 79 242 0 0 0]
Can someone give me some tips how to process this piece of data?
You can use a subscription filter with Kinesis, Lambda, or Kinesis Data Firehose. Logs that are sent to a receiving service through a subscription filter are base64 encoded and compressed with the gzip format.
https://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/SubscriptionFilters.html
I have a list of more than 100000 records.
per example the values from 21 to 84 are continuous, then it will be 21-84 but if it is not continuous as the case 84 87, then it need to be 84,87 separated by ,
at beginning of each line will be the value 11111.
The values from the list will be in the column range of 21 to 80 with, at last.
The length of each row need to be maximum 80.
here is the input file.
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
87
85
86
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
108
111
109
112
110
113
115
114
117
116
118
124
125
120
122
123
126
132
127
133
128
130
131
135
136
137
138
139
140
141
142
143
144
145
146
148
147
149
150
151
152
153
154
155
156
158
157
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
184
183
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
214
here in the output file desired.
111111 21-84,87,85-86,88-106,108,111,109,112,110,113,115,114,117,
111111 116,118,124-125,120,122-123,126,132,127,133,128,130-131,
111111 135-146,148,147,149-156,158,157,159-182,184,183,185-212,214,
thanks in advance
Presented without explanation: check the man pages for the commands used and come back with questions:
awk '
function printrange() { print start (start == last ? "" : "-" last) }
NR == 1 {start=last=$1; next}
$1 == last+1 {last=$1; next}
{printrange(); start=last=$1}
END {printrange()}
' file | paste -sd" " | fold -sw 60 | tr ' ' ',' | sed 's/^/111111 /'
111111 21-84,87,85-86,88-106,108,111,109,112,110,113,115,114,117,
111111 116,118,124-125,120,122-123,126,132,127,133,128,130-131,
111111 135-146,148,147,149-156,158,157,159-182,184,183,185-212,214
I'm trying to use a variable range with ruby, but my code does not work;
ruby -e ' input2=145..170 ; input3= input2.to_s.gsub(/(.*?)\.\.(.*?)/) { 5.upto($2.to_i) { |i| print i, " " } }; print input3' > zzmf
But I obtained 5170
This part fails:
5.upto($2.to_i) { |i| print i, " " }
I expected:
5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 5170
I don't think gsub is what you need, try the match example below. [2] gets the second match from the regex /(\d+)..(\d+)/ applied to "147..170"
5.upto("147..170".match(/(\d+)\.\.(\d+)/)[2].to_i) { |i| print i, " "}
gsub is intended for string substitution.
https://ruby-doc.org/core-2.1.4/String.html#method-i-gsub
I see my code and I confuse in regular expression
I use this .*?
and the correct is this .*
(.*)/
ruby -e ' input2=145..170 ; input3= input2.to_s.gsub(/(.*?)\.\.(.*)/) { 5.upto($2.to_i) { |i| print i, " " } }; print input3' > zzmf
thanks for your responses
I look for common elements in two files or which row of a matrix has the most elements from a given row. what I understood until now is how to compare fields. I receive the lines which hold the same value in the same fieldnumber.
But how can I open the search to the other field numbers?
awk 'NR==FNR{a[$1];next}$1 in a{print $1" "FNR}' file1 file2
104 3
Expected output:
104 3 111 4 117 2 134 2 148 - 156 4 166 4 176 3 186 - 198 1 221 6 236 -
best match row 4 with 3 elements common.
file 1
104 111 117 134 148 156 166 176 186 198 221 236
file 2
102 108 116 124 132 141 151 162 173 185 198 211
103 109 117 125 134 143 153 163 175 187 200 213
104 110 118 126 135 144 154 165 176 188 201 215
105 111 119 127 136 145 156 166 178 190 203 217
106 112 120 128 137 147 157 168 179 192 205 219
107 113 121 130 139 148 158 169 181 193 207 221
108 114 122 131 140 150 160 171 183 195 208 200
This solution assumes 1) that file1 contains unique values as shown in the provided example and 2) there is only one top ranked line in file2.
awk -v string=$(cat file1 | tr " " ",") \
'{split(string,array,","); cnt=0;
for(i in array) {for(j=1;j<=NF;j++) if(array[i]==$j) cnt++};
if(cnt>cntmax) {cntmax=cnt; NRmax=NR}} END{print NRmax}' file2
4
I am looking at all possible paths through a graph. I have written a DFS algorithm that finds all these paths. I want to make sure that my algorithm works correctly and that no two paths are identical. My algorithm returns a list that looks as follows:
....
[[2770]]
[1] 1 2 3 52 53 54 55 56 57 58 59 60 12 11 10 9 8 78 79 80 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129
[38] 130 131 132 133 134 137 138 139 140 141 142 143 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166
[[2771]]
[1] 1 2 3 52 53 54 55 56 57 58 59 60 12 11 10 9 8 78 79 80 113 114 115 143 144 145 146 147 148 149 150 151 152 153 154 155 156
[38] 157 158 159 160 161 162 163 164 165 166
[[2772]]
[1] 1 2 3 52 53 54 55 56 57 58 59 60 12 11 10 9 8 78 79 80 113 114 115 143 150 151 152 153 154 155 156 157 158 159 160 161 162
[38] 163 164 165 166
As you can see, the list is 2772 elements long. This means there are 2,772 paths through this graph. How can I easily compare all the list elements to make sure there are no duplicates. Just to be clear, the same set of numbers but in a different ordering represents a different path and is not a duplicate!
Thank you for your help!
maybe something like
test<-list(1:2,3:4,5:7,1:10,3:4,4:3)
dups<-duplicated(test)
idups<-seq_along(test)[dups]