increasing range parsing challenge with awk - bash
I wrote this in response to Reddit's daily programmer challenge, and I would like to get some of your feedback on it to improve the code (it seems to work). The challenge is as follows:
We are given a list of numbers in a "short-hand" range notation where only the significant part of the next number is written because we know the numbers are always increasing (ex. "1,3,7,2,4,1" represents [1, 3, 7, 12, 14, 21]). Some people use different separators for their ranges (ex. "1-3,1-2", "1:3,1:2", "1..3,1..2" represent the same numbers [1, 2, 3, 11, 12]) and they sometimes specify a third digit for the range step (ex. "1:5:2" represents [1, 3, 5]).
NOTE: For this challenge range limits are always inclusive.
Our job is to return a list of the complete numbers.
The possible separators are: ["-", ":", ".."]
Sample input:
104..02
545,64:11
Sample output:
104 105 106...200 201 202 # truncated for simplicity
545 564 565 566...609 610 611 # truncated for simplicity
My solution:
BEGIN { FS = "," }
function next_value(current_value, previous_value) {
regexp = current_value "$"
while(current_value <= previous_value || !(current_value ~ regexp)) {
current_value += 10
}
return current_value;
}
{
j = 0
delete number_list
for(i = 1; i <= NF; i++) {
# handle fields with ranges
if($i ~ /-|:|\.\./) {
split($i, range, /-|:|\.\./)
if(range[1] > range[2]) {
if(j != 0) {
range[1] = next_value(range[1], number_list[j-1])
range[2] = next_value(range[2], range[1])
}
else
range[2] = next_value(range[2], range[1]);
}
if(range[3] == "")
number_to_iterate_by = 1;
else
number_to_iterate_by = range[3];
range_iterator = range[1]
while(range_iterator <= range[2]) {
number_list[j] = range_iterator
range_iterator += number_to_iterate_by
j++
}
}
else {
number_list[j] = $i
j++
}
}
# apply increasing range logic and print
for(i = 0; i < j; i++ ) {
if(i == 0) {
if(NR != 1) printf "\n"
current_value = number_list[i]
}
else {
previous_value = current_value
current_value = next_value(number_list[i], previous_value)
}
printf "%s ", current_value
}
}
END { printf "\n" }
This is BASH (Not AWK).
I believe it is a valid answer because the original challenge doesn't specify a language.
#!/bin/bash
mkord(){ local v=$1 dig base
max=$2
(( dig=10**${#v} , base=max/dig*dig , v+=base ))
while (( v < max )); do (( v+=dig )); done
max=$v
}
while read line; do
line="${line//[,\"]/ }" line="${line//[:-]/..}"
IFS=' ' read -a arr <<<"$line"
max=0 a='' res=''
for val in "${arr[#]//../ }"; do
IFS=" " read v1 v2 v3 <<<"$val"
(( a==0 )) && max=$v1
[[ $v1 ]] && mkord "$v1" "$max" && v1=$max
[[ $v2 ]] && mkord "$v2" "$max" && v2=$max
res=$res${a:+,}${v2:+\{}$v1${v2:+\.\.}$v2${v3:+\.\.}$v3${v2:+\}}
a=1
done
(( ${#arr[#]} > 1 )) && res={$res}
eval set -- $res
echo "\"$*\""
done <"infile"
If the source of the tests is:
$ cat infile
"1,3,7,2,4,1"
"1-3,1-2"
"1:5:2"
"104-2"
"104..02"
"545,64:11"
The result will be:
"1 3 7 12 14 21"
"1 2 3 11 12"
"1 3 5"
"104 105 106 107 108 109 110 111 112"
"104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202"
"545 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611"
This gets the list done in 7 milliseconds.
My solution using gawk, RT (It contains the input text that matched the text denoted by RS) and next_n function uses modulo operation for to find the next number based on the last
cat range.awk
BEGIN{
RS="\\.\\.|,|:|-"
start = ""
end = 0
temp = ""
}
function next_n(n, last){
mod = last % (10**length(n))
if(mod < n) return last - mod + n
return last + ((10**length(n))-mod) + n
}
{
if(RT==":" || RT==".." || RT=="-"){
if(start=="") start = next_n($1,end)
else temp = $1
}else{
if(start != ""){
if(temp==""){
end = next_n($1,start)
step = 1
}else {
end = next_n(temp,start)
step = $1
}
for(i=start; i<=end; i+=step) printf "%s ", i
start = ""
temp = ""
}else{
end = next_n($1,end)
printf "%s ", end
}
}
}
END{
print ""
}
TEST 1
echo "104..02" | awk -f range.awk
OUTPUT 1
104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202
TEST 2
echo "545,64:11" | awk -f range.awk
OUTPUT 2
545 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611
TEST 3
echo "2..5,7,2-1,2:1,0-3,2-7,8..0,4,4,2..1" | awk -f range.awk
OUTPUT 3
2 3 4 5 7 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 40 41 42 43 52 53 54 55 56 57 58 59 60 64 74 82 83 84 85 86 87 88 89 90 91
TEST 4 with step
echo "1:5:2,99,88..7..3" | awk -f range.awk"
OUTPUT 4
1 3 5 99 188 191 194 197
Related
Pick numbers from an array and see if they are even and lower than 380
I'm programing a bash to pick all numbers from a list and print all even and lower than 380. This is what i have so far, but i get error on if... Any help is welcome. n=(951 402 984 651 360 69 408 319 601 485 980 507 725 547 544 615 141 501 263 617 865 575 219 390 237 412 566 826 248 866 950 626 949 687 217 815 67 104 58 512 24 892 894 767 553) for (( i=0; i<${#n[#]}; i++ )) do b=${n[$i]} rem=$(( $b % 2 )) if (( $rem == 0 && b < 380 )) then echo "$b Par" else continue fi done
Create a file according to sort contend
I have a list of more than 100000 records. per example the values from 21 to 84 are continuous, then it will be 21-84 but if it is not continuous as the case 84 87, then it need to be 84,87 separated by , at beginning of each line will be the value 11111. The values from the list will be in the column range of 21 to 80 with, at last. The length of each row need to be maximum 80. here is the input file. 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 87 85 86 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 108 111 109 112 110 113 115 114 117 116 118 124 125 120 122 123 126 132 127 133 128 130 131 135 136 137 138 139 140 141 142 143 144 145 146 148 147 149 150 151 152 153 154 155 156 158 157 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 184 183 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 214 here in the output file desired. 111111 21-84,87,85-86,88-106,108,111,109,112,110,113,115,114,117, 111111 116,118,124-125,120,122-123,126,132,127,133,128,130-131, 111111 135-146,148,147,149-156,158,157,159-182,184,183,185-212,214, thanks in advance
Presented without explanation: check the man pages for the commands used and come back with questions: awk ' function printrange() { print start (start == last ? "" : "-" last) } NR == 1 {start=last=$1; next} $1 == last+1 {last=$1; next} {printrange(); start=last=$1} END {printrange()} ' file | paste -sd" " | fold -sw 60 | tr ' ' ',' | sed 's/^/111111 /' 111111 21-84,87,85-86,88-106,108,111,109,112,110,113,115,114,117, 111111 116,118,124-125,120,122-123,126,132,127,133,128,130-131, 111111 135-146,148,147,149-156,158,157,159-182,184,183,185-212,214
ruby use upto with range variable?
I'm trying to use a variable range with ruby, but my code does not work; ruby -e ' input2=145..170 ; input3= input2.to_s.gsub(/(.*?)\.\.(.*?)/) { 5.upto($2.to_i) { |i| print i, " " } }; print input3' > zzmf But I obtained 5170 This part fails: 5.upto($2.to_i) { |i| print i, " " } I expected: 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 5170
I don't think gsub is what you need, try the match example below. [2] gets the second match from the regex /(\d+)..(\d+)/ applied to "147..170" 5.upto("147..170".match(/(\d+)\.\.(\d+)/)[2].to_i) { |i| print i, " "} gsub is intended for string substitution. https://ruby-doc.org/core-2.1.4/String.html#method-i-gsub
I see my code and I confuse in regular expression I use this .*? and the correct is this .* (.*)/ ruby -e ' input2=145..170 ; input3= input2.to_s.gsub(/(.*?)\.\.(.*)/) { 5.upto($2.to_i) { |i| print i, " " } }; print input3' > zzmf thanks for your responses
awk find the closest match of a list in a matrix
I look for common elements in two files or which row of a matrix has the most elements from a given row. what I understood until now is how to compare fields. I receive the lines which hold the same value in the same fieldnumber. But how can I open the search to the other field numbers? awk 'NR==FNR{a[$1];next}$1 in a{print $1" "FNR}' file1 file2 104 3 Expected output: 104 3 111 4 117 2 134 2 148 - 156 4 166 4 176 3 186 - 198 1 221 6 236 - best match row 4 with 3 elements common. file 1 104 111 117 134 148 156 166 176 186 198 221 236 file 2 102 108 116 124 132 141 151 162 173 185 198 211 103 109 117 125 134 143 153 163 175 187 200 213 104 110 118 126 135 144 154 165 176 188 201 215 105 111 119 127 136 145 156 166 178 190 203 217 106 112 120 128 137 147 157 168 179 192 205 219 107 113 121 130 139 148 158 169 181 193 207 221 108 114 122 131 140 150 160 171 183 195 208 200
This solution assumes 1) that file1 contains unique values as shown in the provided example and 2) there is only one top ranked line in file2. awk -v string=$(cat file1 | tr " " ",") \ '{split(string,array,","); cnt=0; for(i in array) {for(j=1;j<=NF;j++) if(array[i]==$j) cnt++}; if(cnt>cntmax) {cntmax=cnt; NRmax=NR}} END{print NRmax}' file2 4
Parsing the wbxml get from Exchange2013 server
Below is a full section of bytes stream I got from Exchange2013 server after sending activesync command 'itemoperations' from iPhone. 31 139 8 0 0 0 0 0 4 0 237 189 7 96 28 73 150 37 38 47 109 202 123 127 74 245 74 215 224 116 161 8 128 96 19 36 216 144 64 16 236 193 136 205 230 146 236 29 105 71 35 41 171 42 129 202 101 86 101 93 102 22 64 204 237 157 188 247 222 123 239 189 247 222 123 239 189 247 186 59 157 78 39 247 223 255 63 92 102 100 1 108 246 206 74 218 201 158 33 128 170 200 31 63 126 124 31 63 34 126 237 95 243 167 127 141 95 227 183 58 253 226 215 222 253 53 126 205 23 207 248 199 175 241 155 255 196 175 125 255 119 191 151 61 220 167 127 240 247 111 245 123 253 26 191 249 119 127 237 54 127 215 222 93 149 89 177 196 71 207 127 237 215 95 61 124 253 229 79 31 191 251 226 233 233 15 126 234 167 143 175 190 248 233 183 87 95 60 61 198 255 119 190 124 243 19 59 47 62 255 226 7 191 207 155 159 250 233 23 139 223 231 222 139 167 211 221 23 63 248 226 250 167 126 250 247 217 161 191 247 126 234 233 239 115 255 197 226 171 189 159 250 233 239 44 94 252 244 23 244 115 122 253 226 167 207 246 190 120 67 127 63 125 187 67 237 174 95 44 78 247 94 188 249 125 126 240 226 7 63 177 75 144 232 253 179 123 47 126 154 127 18 220 47 222 125 177 248 234 7 218 159 254 255 212 251 221 252 127 26 126 255 6 248 30 95 253 212 27 243 25 193 249 193 219 31 124 241 211 223 41 191 248 193 233 15 8 151 125 134 251 70 254 254 242 233 23 215 95 60 125 50 255 125 246 126 159 61 250 219 194 122 241 148 218 253 128 218 253 244 23 192 237 7 47 8 159 23 63 125 122 77 227 218 37 24 10 231 39 8 206 23 87 244 61 189 251 157 183 47 126 250 43 140 135 218 31 83 59 124 126 76 99 195 223 103 123 47 208 15 141 235 139 31 224 221 51 130 71 125 124 206 116 217 121 241 131 175 238 125 241 131 87 229 139 207 127 31 162 171 224 255 133 197 95 255 14 199 255 238 133 163 129 155 31 249 158 254 254 170 67 163 83 31 30 254 255 217 175 241 107 226 249 127 0 225 220 243 27 28 2 0 0 I don't know why it is not normal and expect it sholud start like this: 3 1 106 0 0 20 69 77 3 49 0 1 78 70 77 3 49 0 1 0 17 81 3 82 103 65 65 65 65 65 117 50 72 120 85 112 65 .......... I think the response must have been encrypted, but what's the encryption algorithm? Will be very appreciated for any ideas.
I have got the answer, the stream 31 139 8 0 0 0 0 0 4 0 237 189 7 96 28 73 150 37 ...... has been compressed by Gzip