R compare all list elements for duplicates

R compare all list elements for duplicates - algorithm

I am looking at all possible paths through a graph. I have written a DFS algorithm that finds all these paths. I want to make sure that my algorithm works correctly and that no two paths are identical. My algorithm returns a list that looks as follows:
....
[[2770]]
[1] 1 2 3 52 53 54 55 56 57 58 59 60 12 11 10 9 8 78 79 80 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129
[38] 130 131 132 133 134 137 138 139 140 141 142 143 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166
[[2771]]
[1] 1 2 3 52 53 54 55 56 57 58 59 60 12 11 10 9 8 78 79 80 113 114 115 143 144 145 146 147 148 149 150 151 152 153 154 155 156
[38] 157 158 159 160 161 162 163 164 165 166
[[2772]]
[1] 1 2 3 52 53 54 55 56 57 58 59 60 12 11 10 9 8 78 79 80 113 114 115 143 150 151 152 153 154 155 156 157 158 159 160 161 162
[38] 163 164 165 166
As you can see, the list is 2772 elements long. This means there are 2,772 paths through this graph. How can I easily compare all the list elements to make sure there are no duplicates. Just to be clear, the same set of numbers but in a different ordering represents a different path and is not a duplicate!
Thank you for your help!

maybe something like
test<-list(1:2,3:4,5:7,1:10,3:4,4:3)
dups<-duplicated(test)
idups<-seq_along(test)[dups]

Related

When Cloudwatch Logs data is sent into kinesis data stream, what is its encoding format

I'm trying to write a Go program, to download data from aws kinesis data stream. I read that kinesis data stream encode the data with base64, so I need first decode with base64. However, I can't figure out what encoding was used on the data as it is passed, from cloudwatch logs to kinesis data stream.
I'm trying the different decoding method but none works. My unprocessed byte array downloaded from kinesis data stream is as the following:
[31 139 8 0 0 0 0 0 0 0 53 206 65 11 130 64 16 134 225 191 178 204 89 130 178 34 246 22 97 30 178 130 12 58 68 196 166 147 14 233 174 236 140 69 68 255 61 204 58 190 204 7 243 188 160 70 102 83 224 254 217 32 104 88 108 55 251 221 54 57 175 163 52 157 199 17 4 224 30 22 125 119 169 92 155 63 140 100 101 226 10 134 0 42 87 196 222 181 13 104 232 43 21 143 166 238 147 219 11 103 158 26 33 103 151 84 9 122 6 125 60 125 119 209 29 173 116 249 2 202 251 185 80 141 44 166 110 64 15 167 227 201 48 28 79 166 225 108 20 6 127 94 7 56 36 234 199 83 63 158 86 139 18 179 27 217 66 149 104 42 41 149 187 170 28 89 200 154 238 179 90 145 69 38 86 252 165 13 224 125 122 127 0 234 141 66 79 242 0 0 0]
Can someone give me some tips how to process this piece of data?

You can use a subscription filter with Kinesis, Lambda, or Kinesis Data Firehose. Logs that are sent to a receiving service through a subscription filter are base64 encoded and compressed with the gzip format.
https://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/SubscriptionFilters.html

Create a file according to sort contend

I have a list of more than 100000 records.
per example the values from 21 to 84 are continuous, then it will be 21-84 but if it is not continuous as the case 84 87, then it need to be 84,87 separated by ,
at beginning of each line will be the value 11111.
The values from the list will be in the column range of 21 to 80 with, at last.
The length of each row need to be maximum 80.
here is the input file.
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
87
85
86
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
108
111
109
112
110
113
115
114
117
116
118
124
125
120
122
123
126
132
127
133
128
130
131
135
136
137
138
139
140
141
142
143
144
145
146
148
147
149
150
151
152
153
154
155
156
158
157
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
184
183
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
214
here in the output file desired.
111111 21-84,87,85-86,88-106,108,111,109,112,110,113,115,114,117,
111111 116,118,124-125,120,122-123,126,132,127,133,128,130-131,
111111 135-146,148,147,149-156,158,157,159-182,184,183,185-212,214,
thanks in advance

Presented without explanation: check the man pages for the commands used and come back with questions:
awk '
function printrange() { print start (start == last ? "" : "-" last) }
NR == 1 {start=last=$1; next}
$1 == last+1 {last=$1; next}
{printrange(); start=last=$1}
END {printrange()}
' file | paste -sd" " | fold -sw 60 | tr ' ' ',' | sed 's/^/111111 /'
111111 21-84,87,85-86,88-106,108,111,109,112,110,113,115,114,117,
111111 116,118,124-125,120,122-123,126,132,127,133,128,130-131,
111111 135-146,148,147,149-156,158,157,159-182,184,183,185-212,214

ruby use upto with range variable?

I'm trying to use a variable range with ruby, but my code does not work;
ruby -e ' input2=145..170 ; input3= input2.to_s.gsub(/(.*?)\.\.(.*?)/) { 5.upto($2.to_i) { |i| print i, " " } }; print input3' > zzmf
But I obtained 5170
This part fails:
5.upto($2.to_i) { |i| print i, " " }
I expected:
5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 5170

I don't think gsub is what you need, try the match example below. [2] gets the second match from the regex /(\d+)..(\d+)/ applied to "147..170"
5.upto("147..170".match(/(\d+)\.\.(\d+)/)[2].to_i) { |i| print i, " "}
gsub is intended for string substitution.
https://ruby-doc.org/core-2.1.4/String.html#method-i-gsub

I see my code and I confuse in regular expression
I use this .*?
and the correct is this .*
(.*)/
ruby -e ' input2=145..170 ; input3= input2.to_s.gsub(/(.*?)\.\.(.*)/) { 5.upto($2.to_i) { |i| print i, " " } }; print input3' > zzmf
thanks for your responses

awk find the closest match of a list in a matrix

I look for common elements in two files or which row of a matrix has the most elements from a given row. what I understood until now is how to compare fields. I receive the lines which hold the same value in the same fieldnumber.
But how can I open the search to the other field numbers?
awk 'NR==FNR{a[$1];next}$1 in a{print $1" "FNR}' file1 file2
104 3
Expected output:
104 3 111 4 117 2 134 2 148 - 156 4 166 4 176 3 186 - 198 1 221 6 236 -
best match row 4 with 3 elements common.
file 1
104 111 117 134 148 156 166 176 186 198 221 236
file 2
102 108 116 124 132 141 151 162 173 185 198 211
103 109 117 125 134 143 153 163 175 187 200 213
104 110 118 126 135 144 154 165 176 188 201 215
105 111 119 127 136 145 156 166 178 190 203 217
106 112 120 128 137 147 157 168 179 192 205 219
107 113 121 130 139 148 158 169 181 193 207 221
108 114 122 131 140 150 160 171 183 195 208 200

This solution assumes 1) that file1 contains unique values as shown in the provided example and 2) there is only one top ranked line in file2.
awk -v string=$(cat file1 | tr " " ",") \
'{split(string,array,","); cnt=0;
for(i in array) {for(j=1;j<=NF;j++) if(array[i]==$j) cnt++};
if(cnt>cntmax) {cntmax=cnt; NRmax=NR}} END{print NRmax}' file2
4

Parsing the wbxml get from Exchange2013 server

Below is a full section of bytes stream I got from Exchange2013 server after sending activesync command 'itemoperations' from iPhone.
31 139 8 0 0 0 0 0 4 0 237 189 7 96 28 73 150 37 38 47 109 202 123 127 74 245 74 215 224 116 161 8 128 96 19 36 216 144 64 16 236 193 136 205 230 146 236 29 105 71 35 41 171 42 129 202 101 86 101 93 102 22 64 204 237 157 188 247 222 123 239 189 247 222 123 239 189 247 186 59 157 78 39 247 223 255 63 92 102 100 1 108 246 206 74 218 201 158 33 128 170 200 31 63 126 124 31 63 34 126 237 95 243 167 127 141 95 227 183 58 253 226 215 222 253 53 126 205 23 207 248 199 175 241 155 255 196 175 125 255 119 191 151 61 220 167 127 240 247 111 245 123 253 26 191 249 119 127 237 54 127 215 222 93 149 89 177 196 71 207 127 237 215 95 61 124 253 229 79 31 191 251 226 233 233 15 126 234 167 143 175 190 248 233 183 87 95 60 61 198 255 119 190 124 243 19 59 47 62 255 226 7 191 207 155 159 250 233 23 139 223 231 222 139 167 211 221 23 63 248 226 250 167 126 250 247 217 161 191 247 126 234 233 239 115 255 197 226 171 189 159 250 233 239 44 94 252 244 23 244 115 122 253 226 167 207 246 190 120 67 127 63 125 187 67 237 174 95 44 78 247 94 188 249 125 126 240 226 7 63 177 75 144 232 253 179 123 47 126 154 127 18 220 47 222 125 177 248 234 7 218 159 254 255 212 251 221 252 127 26 126 255 6 248 30 95 253 212 27 243 25 193 249 193 219 31 124 241 211 223 41 191 248 193 233 15 8 151 125 134 251 70 254 254 242 233 23 215 95 60 125 50 255 125 246 126 159 61 250 219 194 122 241 148 218 253 128 218 253 244 23 192 237 7 47 8 159 23 63 125 122 77 227 218 37 24 10 231 39 8 206 23 87 244 61 189 251 157 183 47 126 250 43 140 135 218 31 83 59 124 126 76 99 195 223 103 123 47 208 15 141 235 139 31 224 221 51 130 71 125 124 206 116 217 121 241 131 175 238 125 241 131 87 229 139 207 127 31 162 171 224 255 133 197 95 255 14 199 255 238 133 163 129 155 31 249 158 254 254 170 67 163 83 31 30 254 255 217 175 241 107 226 249 127 0 225 220 243 27 28 2 0 0
I don't know why it is not normal and expect it sholud start like this:
3 1 106 0 0 20 69 77 3 49 0 1 78 70 77 3 49 0 1 0 17 81 3 82 103 65 65 65 65 65 117 50 72 120 85 112 65 ..........
I think the response must have been encrypted, but what's the encryption algorithm?
Will be very appreciated for any ideas.

I have got the answer, the stream 31 139 8 0 0 0 0 0 4 0 237 189 7 96 28 73 150 37 ...... has been compressed by Gzip

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

R compare all list elements for duplicates - algorithm

maybe something like test<-list(1:2,3:4,5:7,1:10,3:4,4:3) dups<-duplicated(test) idups<-seq_along(test)[dups]

Related

When Cloudwatch Logs data is sent into kinesis data stream, what is its encoding format

Create a file according to sort contend

ruby use upto with range variable?

awk find the closest match of a list in a matrix

Parsing the wbxml get from Exchange2013 server

Categories

Resources