Filtering Outliers in a Signal - filter

Input Data
I have an array of values of a signal over time, but it consists of many outliers and 0 values.
[0, 0, 0, 0, 0, 0, 0, 1085, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 303, 301, 303, 301, 301, 301, 301, 301, 0, 0, 300, 300, 300, 298, 298, 298, 296, 300, 300, 301, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 256, 255, 0, 0, 0, 0, 236, 238, 0, 234, 0, 0, 0, 228, 0, 0, 1078, 0, 0, 0, 1078, 1076, 1078, 1076, 1076, 1076, 1074, 1074, 1072, 1072, 1074, 1074, 1074, 1074, 1076, 1076, 1074, 1074, 1074, 1072, 1074, 1074, 1074, 1074, 1074, 1074, 1074, 1074, 1074, 1074, 0, 0, 0, 0, 1886, 1880, 1880, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1074, 1074, 1074, 1076, 1076, 1076, 1074, 1074, 1074, 1072, 1072, 1074, 1076, 1076, 1076, 1076, 1074, 1074, 1074, 1078, 1074, 1066, 1061, 1061, 1057, 1053, 1059, 1046, 0, 1042, 1035, 1033, 1029, 1027, 1021, 1018, 1014, 1008, 1005, 1003, 993, 990, 673, 673, 673, 673, 673, 956, 950, 946, 935, 931, 924, 920, 911, 903, 896, 890, 885, 879, 871, 864, 853, 847, 841, 832, 823, 963, 808, 802, 787, 967, 969, 759, 748, 735, 725, 710, 697, 682, 667, 652, 635, 622, 603, 583, 560, 540, 517, 493, 468, 438, 412, 380, 0, 450, 500, 549, 592, 628, 661, 690, 720, 742, 1096, 150, 800, 819, 832, 845, 862, 1181, 885, 892, 901, 909, 920, 930, 935, 941, 1190, 1188, 1186, 967, 975, 978, 1173, 1173, 991, 997, 1003, 1005, 1006, 1008, 1012, 1014, 1020, 1021, 1021, 1023, 1029, 1031, 1031, 1033, 1035, 1038, 1040, 1044, 1044, 1042, 1048, 1048, 1048, 1053, 1053, 1053, 1053, 1053, 0, 0, 0, 0, 0, 0, 0, 0, 1080, 0, 0, 0, 1074, 1074, 1076, 1074, 1074, 1070, 1076, 1078, 1078, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1083, 0, 0, 0, 1072, 1070, 1066, 1065, 1065, 1063, 1061, 1053, 0, 0, 1040, 1273, 1035, 1275, 1031, 1027, 1020, 1018, 1012, 1008, 999, 995, 991, 986, 982, 978, 971, 965, 961, 958, 1121, 1117, 935, 423, 446, 915, 903, 900, 894, 892, 873, 866, 858, 853, 845, 1447, 826, 811, 804, 729, 20, 18, 757, 828, 738, 806, 712, 701, 24, 678, 945, 654, 643, 633, 622, 609, 1081, 1091, 1108, 1125, 536, 1155, 1183, 1215, 1224, 1239, 1254, 420, 399, 378, 0, 0, 0, 515, 553, 585, 616, 645, 669, 693, 710, 0, 0, 774, 789, 804, 815, 826, 845, 853, 866, 873, 1065, 894, 901, 905, 916, 924, 933, 939, 950, 952, 960, 960, 965, 973, 978, 982, 986, 991, 999, 1005, 1008, 1010, 1012, 1014, 1020, 1025, 1027, 1031, 836, 836, 836, 1042, 1040, 796, 1046, 1040, 753, 1040, 0, 0, 0, 1061, 0, 0, 1897, 1886, 1873, 1863, 1850, 0, 1828, 0, 1811, 1074, 1788, 1076, 0, 0, 1749, 1081, 1085, 1087, 1083, 1081, 1083, 1083, 0, 0, 1070, 1078, 1070, 1070, 1072, 1072, 1070, 1070, 1070, 1070, 0, 0, 1078, 1078, 1076, 1076, 1074, 1076, 0, 0, 1089, 1087, 1087, 1083, 1083, 1085, 1083, 1068, 1059, 0, 1053, 1055, 1053, 1048, 1837, 1320, 1044, 1316, 1312, 1036, 1051, 1027, 1025, 1044, 1291, 1282, 1042, 1018, 1036, 1035, 1005, 1228, 1029, 1027, 1190, 999, 1023, 1020, 1020, 1016, 1012, 1012, 1010, 1012, 1005, 997, 995, 993, 990, 986, 984, 980, 971, 1121, 969, 963, 960, 954, 950, 945, 1100, 939, 933, 926, 1121, 931, 946, 958, 969, 978, 986, 997, 1005, 1012, 1016, 1025, 1029, 1036, 1042, 1046, 1053, 1055, 1063, 1066, 1068, 1072, 1076, 1076, 1080, 1081, 1087, 1091, 1091, 1096, 1096, 1096, 1100, 1104, 1618, 1108, 1597, 1591, 1582, 1115, 1434, 1441, 1451, 1550, 1543, 1477, 1125, 1524, 1126, 1128, 1128, 1128, 1128, 1132, 1132, 1132, 1134, 1134, 1136, 1136, 1138, 1138, 1140, 1141, 1143, 1141, 1143, 1143, 1145, 1143, 1145, 1145, 1147, 1147, 1147, 1149, 1149, 1151, 1151, 1151, 1151, 1151, 1151, 1153, 1151, 1151, 1155, 1155, 1156, 1155, 1156, 1156, 1158, 1158, 1158, 1156, 1158, 1158, 1158, 1158, 1158, 1156, 1156, 1158, 1158, 1156, 1158, 1158, 1156, 1363, 1361, 1361, 1359, 1158, 1156, 1156, 1158, 1158, 1158, 1158, 1158, 1156, 1158, 1156, 1158, 1158, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1160, 1160, 1158, 1158, 1158, 1158, 1158, 1158, 1156, 1156, 1158, 1158, 1158, 1158, 1156, 1156, 1155, 1156, 1151, 1151, 1149, 1149, 1145, 1143, 1143, 1140, 1128, 1128, 1125, 1119, 0, 0, 1111, 1113, 1111, 1108, 1104, 1102, 1102, 1100, 1091, 1087, 1085, 1081, 1078, 1074, 1070, 1063, 1059, 1057, 1046, 1042, 1036, 1031, 1025, 1018, 1010, 1005, 997, 995, 984, 978, 969, 958, 948, 945, 931, 924, 915, 905, 894, 881, 871, 860, 851, 841, 830, 825, 0, 0, 885, 885, 888, 1089, 890, 886, 885, 712, 701, 684, 0, 0, 652, 673, 699, 913, 965, 1008, 1021, 1006, 813, 828, 840, 1057, 860, 1065, 1074, 886, 894, 898, 911, 915, 920, 1087, 931, 1087, 939, 948, 948, 956, 963, 965, 969, 971, 973, 976, 980, 982, 984, 986, 991, 993, 993, 997, 997, 1001, 1005, 1006, 1006, 1029, 1010, 1014, 1012, 1014, 1014, 1016, 1018, 1016, 1018, 1018, 1065, 1065, 1063, 1063, 1063, 1059, 1059, 1059, 1059, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1053, 0, 1046, 1046, 1050, 1048, 1048, 1057, 1061, 1065, 1070, 1072, 1074, 1078, 1081, 1083, 1087, 1091, 1093, 1095, 1102, 1104, 1111, 1111, 1113, 1119, 1125, 1128, 1130, 1134, 1141, 1147, 1153, 1156, 1276, 1170, 1175, 1183, 1188, 1192, 1201, 1303, 1303, 1220, 1231, 1239, 1248, 1256, 1303, 1275, 0, 0, 0, 0, 1391, 0, 1413, 1408, 0, 0, 1520, 1501, 0, 1398, 1576, 1421, 1438, 1447, 1460, 1471, 0, 0, 0, 0, 1552, 1565, 1588, 1610, 1627, 1650, 0, 1651, 0, 1888, 1554, 0, 1458, 1438, 1413, 1404, 1385, 1372, 1363, 1348, 1335, 1323, 1314, 1305, 1297, 1291, 1280, 1275, 1271, 1258, 1256, 1250, 1243, 1239, 1233, 1230, 1222, 1218, 1216, 1215, 1211, 1205, 1201, 1200, 1196, 1192, 1185, 1183, 1181, 1179, 1175, 1175, 1170, 1170, 1170, 1166, 1158, 1158, 1155, 1155, 1151, 1151, 1149, 1147, 1147, 1143, 1138, 1138, 1138, 1134, 1136, 1136, 1130, 1130, 0, 1128, 1128, 1128, 1128, 1128, 1125, 1125, 1125, 1123, 1121, 1121, 1117, 1119, 1117, 1117, 1117, 1113, 1115, 1115, 1115, 1113, 1093, 1091, 1087, 1085, 1083, 1080, 1087, 1059, 1053, 1053, 1042, 1038, 1029, 1025, 1021, 1014, 1008, 1001, 993, 990, 976, 973, 965, 958, 941, 939, 931, 922, 913, 909, 896, 881, 873, 866, 855, 843, 834, 826, 0, 796, 778, 770, 751, 738, 729, 718, 706, 697, 686, 675, 661, 646, 635, 620, 607, 592, 579, 562, 547, 532, 513, 495, 478, 457, 442, 420, 401, 378, 354, 333, 307, 285, 258, 232, 202, 174, 144, 112, 80, 45, 0, 0, 0, 0, 0, 0, 0, 0, 1083, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 607, 611, 609, 616, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
Goal
The goal is to obtain the main underlying signal, as drawn here:
As a first improvement to extract the main signal, I replaced each 0 with the trend of data by calculating the slope between the intermediate non-zero points, and interpolating the values:
But, how can I remove outliers?
I have tried the following:
Same approach as replacing each 0 value, but then checking when the difference between the last point is > 100. This approach does not work when there are multiple consecutive outliers, as can be seen in the interval of [450-480]
Moving average filter: it does not get rid of outliers, only reduces them. The points that are not outliers but part of the main signal also get affected too much, even after testing different window sizes.
For each data point, checking when it deviates too much from the standard deviation and variance of the entire signal. However, as the current trend of a certain interval is quite different over time, this approach does not work.
I would preferably not hard code any values used for eliminating outliers, as other signals may not be within the same scale as this one. I used Python, but a solution in another language is also welcome. Thanks for helping me!

I would recommend trying fourier transform, try and see how this signal looks in the frequency domain, if there is any frequency which you can remove and transform back and see if the signal holds. This works if the outliers are of a different frequency than the signal (look up noise reduction).
I thought at first you could just discard the outliers by checking the difference with the "real" signal, but it's hard to tell what's the real signal, especially compared to your example (the signal itself has some big jumps). If you have some kind of model for the signal, perhaps you can sort of weigh whether a point is likely to be part of the signal or not (for example if it's a sinusoid).
If you discard the first part of the signal though, capturing the other part might be possible by forcing the signal to be continuative (what's that called again), that is not allow the signal to have any jumps (what is a jump? well you'll have to compare it to the slope of the points before and it can't be X more than that). It would not be perfect though, for example I assume it would go all the way down at the 200 mark. This could maybe be compensated by removing the zeros in this case.

Related

Fitting logistic curve

I want to use FindFit to the logistic population model which I define as
model2 = L/(1 + (L/P0 - 1) e^(-kt))
on the data
data = {19, 39, 46, 73, 92, 109, 137, 160, 177, 202, 230, 257, 299, 342,
384, 419, 464, 511, 553, 597, 646, 684, 734, 779, 814, 851, 895, 929,
962, 988, 1011, 1040, 1069, 1110, 1141, 1165, 1195, 1212, 1226, 1247,
1269, 1288, 1303, 1318, 1332, 1341, 1354, 1367}
but I get this Error. I am using FindFit as follows
fit = FindFit[data, model2, {P0, L, k}, t]
The data is supposed to represent population size at different days, so 19 corresponds to population at day 1, 39 is population at day 2, etc.
Use capital E as in E^(-k t)
See https://reference.wolfram.com/language/ref/E.html

how to write a function to find prime numbers in a range in RStudio

I want to write a function to ask a user for a range and then return the prime numbers in that range.
But I dont know how to define inputs?
prime<-function(x,y){
u<-range(x:y)
for (u in range){
for(j in 2:u-1){
if (u%%j==0){
print("u is prime")
}
}
}
}enter code here
Help me to edit this code. Thank ypu
For small numbers (less than 1 million), implement trial division by primes less than 1000:
primes_less_than_1000 = [2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97, 101, 103, 107, 109, 113, 127, 131, 137, 139, 149, 151, 157, 163, 167, 173, 179, 181, 191, 193, 197, 199, 211, 223, 227, 229, 233, 239, 241, 251, 257, 263, 269, 271, 277, 281, 283, 293, 307, 311, 313, 317, 331, 337, 347, 349, 353, 359, 367, 373, 379, 383, 389, 397, 401, 409, 419, 421, 431, 433, 439, 443, 449, 457, 461, 463, 467, 479, 487, 491, 499, 503, 509, 521, 523, 541, 547, 557, 563, 569, 571, 577, 587, 593, 599, 601, 607, 613,
617, 619, 631, 641, 643, 647, 653, 659, 661, 673, 677, 683, 691, 701, 709, 719, 727, 733, 739, 743, 751, 757, 761, 769, 773, 787, 797, 809, 811, 821, 823, 827, 829, 839, 853, 857, 859, 863, 877, 881, 883, 887, 907, 911, 919, 929, 937, 941, 947, 953, 967, 971, 977, 983, 991, 997]
def is_prime(n):
factor = 0
if n in primes_less_than_1000:
return 1
if n > 1000:
for p in primes_less_than_1000:
if (n%p==0):
factor = 1
return 0
if (factor==0):
return 1
else:
return 0
Now that we have a prime-determining function, we can test any range below 1 million:
def primes_in_range(lower,upper):
primes = []
for n in range(lower,upper):
if is_prime(n)==1:
primes.append(n)
return primes
For example:
print(primes_in_range(1000,1100))
>>> [1009, 1013, 1019, 1021, 1031, 1033, 1039, 1049, 1051, 1061, 1063, 1069, 1087, 1091, 1093, 1097]
If you want primes more than 1 million, trial divison will take up more memory, so you should look at https://primes.utm.edu/ for other prime or probable prime algorithms.

Python. Homework: Not working due to efficiency with large inputs

Homework: return the maximal sum of k consecutive elements in a list. I have tried the following 3, which work for 6 of the 7 tests by which the solution is verified. The 7th test is a very long input with a very large k value. I cannot put the input list in because the shown list is truncated due to its length. Here are the 3 methods I tried. Reiterating, each timed out, while the last one also gave me a SyntaxError.
Method 1: [verbose]
def arrayMaxConsecutiveSum(inputArray, k):
sum_array = []
for i in range(len(inputArray)-(k+1)):
sum_array.append(sum(inputArray[i:i+k]))
return max(sum_array)
Method 2: [one line = efficiency??]
def arrayMaxConsecutiveSum(inputArray, k):
return max([sum(inputArray[i:i+k]) for i in range(len(inputArray)-(k+1))])
Method 3: Lambda call
def arrayMaxConsecutiveSum(inputArray, k):
f = lambda data, n: [data[i:i+n] for i in range(len(data) - n + 1)]
sum_array = [sum(val) for val in f(inputArray,k)]
return max(sum_array)
Some examples of inputs and (correct) outputs:
IN:[2, 3, 5, 1, 6]
k: 2 OUT: 8
IN:[2, 4, 10, 1]
k: 2 OUT: 14
IN: [1, 3, 4, 2, 4, 2, 4]
k: 4 OUT: 13
Again, I would like to mention that I passed the other tests (6 was very long with a large k value as well[k was an order of magnitude smaller than 7's, however]) and just need to identify a method or a revision that would be more efficient/make these more efficient. Lastly, I would like to add that I attempted both 6 and 7 with the (truncated) inputs on IDLE3 and each produced a ValueError:
Traceback (most recent call last):
File "/Users/ryanflynn/arrmaxconsecsum.py", line 15, in <module>
962, 244, 390, 854, 406, 457, 160, 612, 693, 896, 800, 670, 776, 65, 81, 336, 305, 262, 877, 217, 50, 835, 307, 865, 774, 163, 556, 186, 734, 404, 610, 621, 538, 370, 153, 105, 816, 172, 149, 404, 634, 105, 74, 303, 304, 145, 592, 472, 778, 301, 480, 693, 954, 628, 355, 400, 327, 916, 458, 599, 157, 424, 957, 340, 51, 60, 688, 325, 456, 148, 189, 365, 358, 618, 462, 125, 863, 530, 942, 978, 898, 858, 671, 527, 877, 614, 826, 163, 380, 442, 68, 825, 978, 965, 562, 724, 553, 18, 554, 516, 694, 802, 650, 434, 520, 685, 581, 445, 441, 711, 757, 167, 594, 686, 993, 543, 694, 950, 812, 765, 483, 474, 961, 566, 224, 879, 403, 649, 27, 205, 841, 35, 35, 816, 723, 276, 984, 869, 502, 248, 695, 273, 689, 885, 157, 246, 684, 642, 172, 313, 683, 968, 29, 52, 915, 800, 608, 974, 266, 5, 252, 6, 15, 725, 788, 137, 200, 107, 173, 245, 753, 594, 47, 795, 477, 37, 904, 4, 781, 804, 352, 460, 244, 119, 410, 333, 187, 231, 48, 560, 771, 921, 595, 794, 925, 35, 312, 561, 173, 233, 669, 300, 73, 977, 977, 591, 322, 187, 199, 817, 386, 806, 625, 500, 1, 294, 40, 271, 306, 724, 713, 600, 126, 263, 591, 855, 976, 515, 850, 219, 118, 921, 522, 587, 498, 420, 724, 716],6886)
File "/Users/ryanflynn/arrmaxconsecsum.py", line 6, in arrayMaxConsecutiveSum
return max(sum_array)
ValueError: max() arg is an empty sequence
(Note: this used method 3) I checked with print statements both the value for f(inputArray,k) and sum_array: [] Any help would be appreciated :)
Try:
def arrayMaxConsecutiveSum(inputArray, k):
S = sum(inputArray[:k])
M = S
for i in range(len(inputArray) - k):
S += ( inputArray[i+k] - inputArray[i])
if M < S:
M = S
return M
S stands for sum and M stands for max.
This solution have a complexity of O(n), when your's have O(n*k)
You are summing k numbers n-k times, when I am summing 3 numbers n times.

How to generate sequence of numbers matching a pattern?

I need to generate a sequence of all numbers, matching a pattern in a certain number range. For example:
range start: 1
range end: 100
pattern: *9* (i.e. in regexp [0-9]*9[0-9]*)
expected result: [9, 19, 29, 39, 49, 59, 69, 79, 89, 90, ..., 99]
Of course I could use a brute-force approach where I loop through all numbers of the range and test each number against the pattern. Here an example implementation done in Python:
def brute_force(start, end, limit, pattern):
if start < pattern:
start = pattern
if pattern > end:
return []
pattern_str = str(pattern)
generator = (n for n in range(start, end) if pattern_str in str(n))
return list(next(generator) for _ in range(limit))
Unfortunately, the range is quite large (10^7 numbers) and I have to do that frequently in my program with changing patterns. Therefore, I need a more efficient approach.
It is important to note that I need to have a sorted list and that I often only need the X first matching numbers (see limit parameter in example above).
I guess there is some kind of standard algorithm for this kind of problem, but I don't know what to search for. Any ideas?
Here's a general solution that generates numbers up to a fixed size matching *(pat)* where pat is any sequence of digits.
It generates the numbers in increasing order, without duplicates. Because it's a generator, one can easily use it to generate the first few results without generating the entire list. (See the last example below).
def numbers(pat, k, matched=1):
if (matched >> len(pat)) & 1:
for x in xrange(10 ** k):
yield x
return
if not ((matched << k) >> len(pat)):
return
for d in xrange(10):
nm = 1
for i, pd in enumerate(pat):
if pd == str(d) and (matched >> i) & 1:
nm |= 1 << (i+1)
for n in numbers(pat, k - 1, nm):
yield 10 ** (k - 1) * d + n
It works by recording how much of pat has been matched so far. To avoid backtracking when pat has been partially matched and the next digit does not match, it keeps a bitmap of all possible partial matches so far, in much the same way that a (non-backtracking) regexp matcher does.
Here's some test code, that checks it against the slow but obviously correct implementation.
def numbers_slow(pat, k):
for i in xrange(10 ** k):
if pat in str(i):
yield i
test_cases = [
('9', 3),
('9', 4),
('123', 4),
('22', 4),
]
for pat, k in test_cases:
got = list(numbers(pat, k))
want = list(numbers_slow(pat, k))
if got != want:
print 'numbers(%s, %d) = %s, want %s' % (pat, k, got, want)
And here's the example given in the question:
print list(numbers('9', 3))
[9, 19, 29, 39, 49, 59, 69, 79, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 109,
119, 129, 139, 149, 159, 169, 179, 189, 190, 191, 192, 193, 194, 195, 196, 197,
198, 199, 209, 219, 229, 239, 249, 259, 269, 279, 289, 290, 291, 292, 293, 294,
295, 296, 297, 298, 299, 309, 319, 329, 339, 349, 359, 369, 379, 389, 390, 391,
392, 393, 394, 395, 396, 397, 398, 399, 409, 419, 429, 439, 449, 459, 469, 479,
489, 490, 491, 492, 493, 494, 495, 496, 497, 498, 499, 509, 519, 529, 539, 549,
559, 569, 579, 589, 590, 591, 592, 593, 594, 595, 596, 597, 598, 599, 609, 619,
629, 639, 649, 659, 669, 679, 689, 690, 691, 692, 693, 694, 695, 696, 697, 698,
699, 709, 719, 729, 739, 749, 759, 769, 779, 789, 790, 791, 792, 793, 794, 795,
796, 797, 798, 799, 809, 819, 829, 839, 849, 859, 869, 879, 889, 890, 891, 892,
893, 894, 895, 896, 897, 898, 899, 900, 901, 902, 903, 904, 905, 906, 907, 908,
909, 910, 911, 912, 913, 914, 915, 916, 917, 918, 919, 920, 921, 922, 923, 924,
925, 926, 927, 928, 929, 930, 931, 932, 933, 934, 935, 936, 937, 938, 939, 940,
941, 942, 943, 944, 945, 946, 947, 948, 949, 950, 951, 952, 953, 954, 955, 956,
957, 958, 959, 960, 961, 962, 963, 964, 965, 966, 967, 968, 969, 970, 971, 972,
973, 974, 975, 976, 977, 978, 979, 980, 981, 982, 983, 984, 985, 986, 987, 988,
989, 990, 991, 992, 993, 994, 995, 996, 997, 998, 999]
Here's a less easy case:
print list(numbers('11', 3))
[11, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 211, 311, 411, 511, 611,
711, 811, 911]
And a case that shows the method is efficient even for large numbers of digits:
print list(numbers('121212121', 10))
[121212121, 1121212121, 1212121210, 1212121211, 1212121212, 1212121213,
1212121214, 1212121215, 1212121216, 1212121217, 1212121218, 1212121219,
2121212121, 3121212121, 4121212121, 5121212121, 6121212121, 7121212121,
8121212121, 9121212121]
And to demonstrate the ability to limit the generated results, here's the first 21 numbers up to 20 digits long that contain '100000':
print itertools.islice(numbers('100000', 20), 21)
[100000, 1000000, 1000001, 1000002, 1000003, 1000004, 1000005, 1000006, 1000007,
1000008, 1000009, 1100000, 2100000, 3100000, 4100000, 5100000, 6100000, 7100000,
8100000, 9100000, 10000000]
Since you only need the first x numbers, lets look at it from the point of view of the number of the digits.
One digit is easy.
For two digits, get all options for the additional digits needed (8 possibilities). Then, working from the smallest to the largest, get the combination with the smallest value. Then iterate again with the combination with the bigger value.
The case for three digits or more is the same as the case for two digits.
Example - three digits:
counter = 0;
options = generate_all_combinations_for_k_digits_in_order(2) //for the 3 digit case.
//options = [10, 11, 12, ..., 20, 21, ..., 91, 92, ..., 99]
for (int i = 0; i < number_of_digits - 1; i++)
{
for (int j = 0; j < options.length; j++)
{
print_9_in_position_i_of_number_j(i,j);
counter++;
if (counter == x)
break; // End the loop somehow...
}
}
Here's the pattern:
1009
^-- 0-8
1090
^-- 0-8
1109
^-- 0-8
1190
^-- 0-8
...
1209
^-- 2-8 and the rightmost two digit pattern repeats till 1900
1900
01-99 this is different
--------------------------------------------------------------
^-- 2-8 Now the whole previous structure repeats for each thousand until 9000
9000
^-- 001-999
We can formulate a recursion for n < 10^m and n has m digits:
f(n = 10^m - 1)
=> 9*10^(m-1) + 1..10^(m-2)-1 // that's like the 9000 + (001..999)
=> 8*10^(m-1) + f(n-1) // now for each leftmost digit down to 1,
=> ... // we append all the results from f(n-1)
=> 1*10^(m-1) + f(n-1)
Ruby code:
def f m
if m == 1
return [9]
end
result = []
((10**m).to_i - 1).downto(9 * 10**(m-1).to_i).each do |i|
result << i
end
temp = 10**(m-1)
8.downto(0).each do |i|
f(m-1).each do |j|
result << (i * temp + j)
end
end
result
end
Results:
ruby 2.3.1p112 (2016-04-26 revision 54768) [x86_64-linux]
=> :f
> f 3
=> [999, 998, 997, 996, 995, 994, 993, 992, 991, 990, 989, 988, 987, 986, 985, 984, 983
, 982, 981, 980, 979, 978, 977, 976, 975, 974, 973, 972, 971, 970, 969, 968, 967, 966
, 965, 964, 963, 962, 961, 960, 959, 958, 957, 956, 955, 954, 953, 952, 951, 950, 949
, 948, 947, 946, 945, 944, 943, 942, 941, 940, 939, 938, 937, 936, 935, 934, 933, 932
, 931, 930, 929, 928, 927, 926, 925, 924, 923, 922, 921, 920, 919, 918, 917, 916, 915
, 914, 913, 912, 911, 910, 909, 908, 907, 906, 905, 904, 903, 902, 901, 900, 899, 898
, 897, 896, 895, 894, 893, 892, 891, 890, 889, 879, 869, 859, 849, 839, 829, 819, 809
, 799, 798, 797, 796, 795, 794, 793, 792, 791, 790, 789, 779, 769, 759, 749, 739, 729
, 719, 709, 699, 698, 697, 696, 695, 694, 693, 692, 691, 690, 689, 679, 669, 659, 649
, 639, 629, 619, 609, 599, 598, 597, 596, 595, 594, 593, 592, 591, 590, 589, 579, 569
, 559, 549, 539, 529, 519, 509, 499, 498, 497, 496, 495, 494, 493, 492, 491, 490, 489
, 479, 469, 459, 449, 439, 429, 419, 409, 399, 398, 397, 396, 395, 394, 393, 392, 391
, 390, 389, 379, 369, 359, 349, 339, 329, 319, 309, 299, 298, 297, 296, 295, 294, 293
, 292, 291, 290, 289, 279, 269, 259, 249, 239, 229, 219, 209, 199, 198, 197, 196, 195
, 194, 193, 192, 191, 190, 189, 179, 169, 159, 149, 139, 129, 119, 109, 99, 98, 97
, 96, 95, 94, 93, 92, 91, 90, 89, 79, 69, 59, 49, 39, 29, 19, 9]
> f(4).length
=> 3439
My attempt at a "next lexicographic" algorithm for a fixed length string.
JavaScript code:
function f(str,pat){
if (str[0] == 0
|| isNaN(Number(str))
|| isNaN(Number(pat))
|| str.length == pat.length
|| str == String(Math.pow(10,str.length - pat.length) - 1 + pat) ){
return str;
}
var AP = "",
i = 0;
while (!AP.match(pat) && i < str.length){
AP += str[i++];
}
// if there are digits to the right of the pattern
if (AP.length < str.length){
// increment the string if the pattern won't break
if (String(Number(str) + 1).match(pat)){
return Number(str) + 1;
// otherwise, find first number that may be incremented
} else {
var i = str.length - pat.length - 1
+ (pat.match(/^9+/) ? pat.match(/9+/)[0].length : 0)
while (str[i] == 9 && i-- >= 0){
}
// increment at i, move pattern to the right, set zeros in between
return (Number(str.substr(0,i + 1)) + 1)
* Math.pow(10,str.length - pat.length)
+ Number(pat)
}
// if the pattern is all the way on the right
} else {
// find rightmost placement for the pattern along an adjacent
// match where the string could be incremented
i = 1;
for (;i<pat.length; i++){
var j = str.length - pat.length - i,
patShifted = Number(pat) * Math.pow(10,i);
if (str.substr(j,i) == pat.substr(0,i)
&& patShifted > Number(str.substr(j))){
return Number(str.substr(0,j) + String(patShifted));
}
}
// if no left-shifted placement was found
return (Number(str.substr(0,str.length - pat.length)) + 1)
* Math.pow(10,pat.length) + Number(pat);
}
}

Union of two arrays equals the first array? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
I am doing a problem from projecteuler.com. The question is this:
If we list all the natural numbers below 10 that are multiples of 3 or
5, we get 3, 5, 6 and 9. The sum of these multiples is 23. Find the
sum of all the multiples of 3 or 5 below 1000.
I thought of creating arrays of multiples for each 3 and 5 up to 1000 and taking the union of them, which doesn't leave me with duplicates (so I don't need to call array.uniq). What I've written is this:
def get_range(range, step)
ret = []
range.step(step) { |i| ret << i }
return ret
end
p get_range(0..1000, 3) | get_range(0..1000, 5)
This comes out with this result:
[0, 3, 6, 9, 12, 15, 18, 21, 24, 27, 30, 33, 36, 39, 42, 45, 48, 51, 54, 57, 60, 63, 66, 69, 72, 75, 78, 81, 84, 87, 90, 93, 96, 99, 102, 105, 108, 111, 114, 117, 120, 123, 126, 129, 132, 135, 138, 141, 144, 147, 150, 153, 156, 159, 162, 165, 168, 171, 174, 177, 180, 183, 186, 189, 192, 195, 198, 201, 204, 207, 210, 213, 216, 219, 222, 225, 228, 231, 234, 237, 240, 243, 246, 249, 252, 255, 258, 261, 264, 267, 270, 273, 276, 279, 282, 285, 288, 291, 294, 297, 300, 303, 306, 309, 312, 315, 318, 321, 324, 327, 330, 333, 336, 339, 342, 345, 348, 351, 354, 357, 360, 363, 366, 369, 372, 375, 378, 381, 384, 387, 390, 393, 396, 399, 402, 405, 408, 411, 414, 417, 420, 423, 426, 429, 432, 435, 438, 441, 444, 447, 450, 453, 456, 459, 462, 465, 468, 471, 474, 477, 480, 483, 486, 489, 492, 495, 498, 501, 504, 507, 510, 513, 516, 519, 522, 525, 528, 531, 534, 537, 540, 543, 546, 549, 552, 555, 558, 561, 564, 567, 570, 573, 576, 579, 582, 585, 588, 591, 594, 597, 600, 603, 606, 609, 612, 615, 618, 621, 624, 627, 630, 633, 636, 639, 642, 645, 648, 651, 654, 657, 660, 663, 666, 669, 672, 675, 678, 681, 684, 687, 690, 693, 696, 699, 702, 705, 708, 711, 714, 717, 720, 723, 726, 729, 732, 735, 738, 741, 744, 747, 750, 753, 756, 759, 762, 765, 768, 771, 774, 777, 780, 783, 786, 789, 792, 795, 798, 801, 804, 807, 810, 813, 816, 819, 822, 825, 828, 831, 834, 837, 840, 843, 846, 849, 852, 855, 858, 861, 864, 867, 870, 873, 876, 879, 882, 885, 888, 891, 894, 897, 900, 903, 906, 909, 912, 915, 918, 921, 924, 927, 930, 933, 936, 939, 942, 945, 948, 951, 954, 957, 960, 963, 966, 969, 972, 975, 978, 981, 984, 987, 990, 993, 996, 999, 5, 10, 20, 25, 35, 40, 50, 55, 65, 70, 80, 85, 95, 100, 110, 115, 125, 130, 140, 145, 155, 160, 170, 175, 185, 190, 200, 205, 215, 220, 230, 235, 245, 250, 260, 265, 275, 280, 290, 295, 305, 310, 320, 325, 335, 340, 350, 355, 365, 370, 380, 385, 395, 400, 410, 415, 425, 430, 440, 445, 455, 460, 470, 475, 485, 490, 500, 505, 515, 520, 530, 535, 545, 550, 560, 565, 575, 580, 590, 595, 605, 610, 620, 625, 635, 640, 650, 655, 665, 670, 680, 685, 695, 700, 710, 715, 725, 730, 740, 745, 755, 760, 770, 775, 785, 790, 800, 805, 815, 820, 830, 835, 845, 850, 860, 865, 875, 880, 890, 895, 905, 910, 920, 925, 935, 940, 950, 955, 965, 970, 980, 985, 995]
which is the first array. If I swap the order of the ranges, then I get the array with the multiples of 5. I tried something like this on IRB:
[1,3,5] | [3, 5, 7]
# => [1, 3, 5, 7]
Am I missing something, am I just going insane, or have I encountered a bug in Ruby?
Your array is correct. It contains both multiples of 3 and 5. Look at the end. It's just not sorted.

Resources