Algorithm question
The following array exists list = [1, 3, 6, 8, 12, 18, 25, 28, 30, 40, 45, 50, 60, 68, 78, 88, 98, 128, 158, 198, 248, 298 , 348, 418, 488, 548, 588, 618, 648, 698, 798, 818, 848, 898, 998, 1048, 1098, 1148, 1198, 1248, 1298, 1398, 1448, 149, 8, 1998 , 2298, 2598, 2998, 3298, 3998, 4498, 4998, 5898, 6498], the target value is a number, you need to select the sum of n numbers from list as target, n The range is [1,10], where items in the list allow repeated selection, for example:
Example1: Assuming target = 10,
✅ Possible outcomes are as follows:
Result 1: 10*1 = 10, => [1,1,1,1,1,1,1,1,1,1]
Result 2: 1*8 + 2*1 = 10, => [8,1,1]
Result 3: 1*6 + 3*1 + 1*1 = 10, => [6,3,1]
Result 4: 3*3 + 1*1 = 10, => [3,3,3,1]
...
Example2: Assuming target = 20,
✅ Possible outcomes are as follows:
Result 1: 18*1 + 2*1 = 20, => [18,1,1]
Result 2: 12*1 + 8*1 = 20, => [12,8]
...
❌ Bad Result:
20 1s, against the rules: pick up to 10
[1,2,5], the sum is not right
Note
Pick a maximum of 10, a minimum of 1, repeatable picks
if no suitable result is found, return undefined
Solution
Solution1- Recursive computation - 10/25/2022
Recursion always solves this kind of problem, maybe there are other better ways, I will keep updating and trying.
TypeScript Playground - Recursive computation
Question
I want to find an optimal solution, speed first.
const list = [
1,
3,
6,
8,
12,
18,
25,
28,
30,
40,
45,
50,
60,
68,
78,
88,
98,
128,
158,
198,
248,
298,
348,
418,
488,
548,
588,
618,
648,
698,
798,
818,
848,
898,
998,
1048,
1098,
1148,
1198,
1248,
1298,
1398,
1448,
1498,
1598,
1648,
1998,
2298,
2598,
2998,
3298,
3998,
4498,
4998,
5898,
6498,
];
function getCombinations(
list: number[],
target: number
): Array<number> | undefined {
// TODO...
}
This is solvable with dynamic programming. I'll just outline the solution. I do have other posts where I've actually given working code for problems with similar ideas. For example Finding all possible combinations of numbers to reach a given sum.
First imagine we have a 2-D data structure with the following information in each Node:
{
is_root: true|false, // True only at the root of the data structure
current_sum: ..., // At this node, here is the current sum
solution_count: ..., // How many solutions below here
current_value: ..., // A value to find more solutions under
current_value_count: ..., // How many times this value has been used.
// First dimension, go back by current_value
prev_sum_solution: ptr to Node,
// Second dimension, go back to current sum, previous value
prev_value_solution: ptr to Node,
}
These nodes will hold ONLY information for solutions. You can report how many there are, or produce the solutions recursively. If node.prev_sum_solution is not null, then node.prev_sum_solution->solution_count gives how many solutions will have current_value next. And if f node.prev_value_solution is not null, then node.prev_value_solution->solution_count gives how many solutions will have current_sum with a previous value next. This allows you to find any particular solution as well.
Now how do we generate this solution? Well first we initialize an array of target+1 pointers to nodes:
solution = [null, null, ..., null]
And then we replace solution[0] with our root node:
{
is_root: true,
current_sum: 0,
solution_count: 1,
current_value: 0,
current_value_count: 0,
prev_sum_solution: null,
prev_value_solution: null,
}
(Warning. I'll use a mix of notations for this pseudo-code. And specifically . accesses an object's properties, while -> accesses properties from a pointer to an object. You can make the ideas work in any language whether or not it has pointers though.)
And then we go as follows:
for value in list:
// Add solutions that use this value
for i in range(n): // ie i is in {0, 1, 2, ..., n-1}
// We iterate down to avoid solutions that use this.
for (j = list.length-1; (i+1) * value <= j; j--) {
this_solution = solution[j]
prev_solution = solution[j-value]
if prev_solution == null or
(prev_solution.current_value != value and 0 < i) or
(prev_solution.current_value_count != i-i):
continue // can't add value here.
else:
if (this_solution == null) {
solution[j] = pointer to Node{
is_root: false,
current_sum: i,
solution_count: prev_solution->solution_count,
current_value: value,
current_value_count: i,
prev_sum_solution: prev_solution,
prev_value_solution: null,
}
}
else {
solution[i] = pointer to Node{
is_root: false,
current_sum: j,
solution_count: prev_solution->solution_count + this_solution->solution_count,
current_value: value,
current_value_count: i,
prev_sum_solution: prev_solution,
prev_value_solution: this_solution,
}
}
}
}
And when we are done, if we have no mistakes, solution[limit] will hold a pointer to a Node that has all the information about all of the possible solutions that exist.
Related
[ 42, 45, 47, x, x] -> stop1 to stop2
[ 45, 47, 42, 88, x] -> stop2 to stop3
[ 21, 77, 42, x, x] -> stop3 to stop4
[ 22, 47, 42, 88, x] -> stop4 to stop5
[ 23, 47, 42, x, x] -> stop5 to stop6
[ 24, 47, 42, 8, 91] -> stop6 to stop7
[ 25, 13, 42, 3, 84] -> stop7 to stop8
[ 26, 10, 11, 4, 54] -> stop8 to stop9
[ 27, 9, 8, 88, 71] -> stop9 to stop10
x is there just for formatting. The first row means that there are only three buses from stop1 to stop2(42, 45, 47).
I have this matrix like structure where each row represents the buses going from one stop to another. I need to minimize the number of bus changes a person has to make to go from stop1 to stop10.
For example one of the output should be 42, 42, 42, 42, 42, 42, 42, 26, 27 another can be 42, 42, 42, 42, 42, 42, 42, 10, 9. If the number of changes is more than three I can discard the result.
What's the most optimal way to achieve this as brute forcing through it is pretty unefficient right now?
You can solve this problem by modeling it as a graph search.
Imagine you're a person and you're trying to get from point A to point B. The information most relevant to you is
where you currently are, and
which bus line, if any, you are currently on.
You can therefore model a person's state as a pair of a location (a bus stop) and a bus line (which might be "not on a line" when they start or finish). So create a graph with one node for each combination of a location and a bus line.
The edges in this graph will correspond to changes in state. You can change state either by
staying on your current bus line and going somewhere, or
switching bus lines.
If you're currently on a bus line, you can stay on that line to move from one location to the next if the line goes from the first location to the second. So create edges ((location1, line), (location2, line)) if bus line line goes from location1 to location2. This doesn't involve a transfer, so give this edge a cost of 0.
Alternatively, you can always get off of a bus or go from being off a bus to being on a bus. So add an edge ((location, line), (location, free)) for each line and each location (you always have the option to get off of a bus line) and give it cost 0, since this doesn't involve changing lines. Similarly, add edges ((location, free), (location, line)) for each bus line line available at the given location. Give it cost 1 to indicate that this requires you to get on a bus.
Now, imagine you find a path from (point A, free) to (point B, free) in this graph. This corresponds to getting on and off of a series of buses that start you at point A and end at point B, and the cost will be the number of different buses that you ended up getting on. If you run a shortest paths algorithm in this graph (say, Dijkstra's algorithm), you'll find the path from the start to end point that minimizes the number of bus transfers!
You could go through the array once, and keep a set of buses that are common to the visited stops. As soon as none such buses can be found, take the previous set, choose one bus from it, and fill the result with that bus for that many stops.
Then put all buses at the current stop in the set, and repeat the operation for the subsequent stops, ...etc.
Here is the algorithm coded in ES6 JavaScript. It uses a Set to allow constant-time access to the items (buses) it stores.
// Helper function: given a reduced set of buses, and a count,
// add one of those buses as the bus to take during that many stops
function addToResult(common, count, result) {
let bus = common.values().next().value; // pick any available bus
while (count > 0) {
result.push(bus);
count--;
}
}
// Main algorithm
function getBusRide(stops) {
if (stops.length === 0) return [];
let result = [],
count = 0,
common;
for (let buses of stops) {
if (count == 0) { // First iteration only
common = new Set(buses); // all buses are candidate
count = 1;
} else {
let keep = new Set();
for (let bus of buses) {
// Only keep buses as candidate when they
// are still served here
if (common.has(bus)) keep.add(bus);
}
if (keep.size == 0) { // Need to change bus
addToResult(common, count, result);
count = 0;
keep = new Set(buses); // all buses are candidate
}
common = keep;
count++;
}
}
addToResult(common, count, result);
return result;
}
// Sample input
const stops = [
[ 42, 45, 47],
[ 45, 47, 42, 88],
[ 21, 77, 42],
[ 22, 47, 42, 88],
[ 23, 47, 42],
[ 24, 47, 42, 8, 91],
[ 25, 13, 42, 3, 84],
[ 26, 10, 11, 4, 54],
[ 27, 9, 8, 88, 71]
];
// Apply the algorithm
console.log(getBusRide(stops));
.as-console-wrapper { max-height: 100% !important; top: 0; }
This algorithm runs in O(n) where n is the total number of values in the input, so in the example n = 37.
We have an increasing sequence in which each element is consist of even digits only (0, 2, 4, 6, 8). How can we find the nth number in this sequence
Is it possible to find nth number in this sequence in O(1) time.
Sequence: 0, 2, 4, 6, 8, 20, 22, 24, 26, 28, 40, 42, 44, 46, 48, 60, 62, 64, 66, 68, 80, 82, 84, 86, 88, 200, 202 and so on.
The nth number in this sequence is n in base 5, with the digits doubled.
def base5(n):
if n == 0: return
for x in base5(n // 5): yield x
yield n % 5
def seq(n):
return int(''.join(str(2 * x) for x in base5(n)) or '0')
for i in xrange(100):
print i, seq(i)
This runs in O(log n) time. I don't think it's possible to do it in O(1) time.
It can be simplified a bit by combining the doubling of the digits with the generation of the base 5 digits of n:
def seq(n):
return 10 * seq(n // 5) + (n % 5) * 2 if n else 0
int Code()
{
k=0;
for(i=0;i<=10000;i++)
{
count=0;
n=i;
while(n!=0)
{
c=n%10;
n=n/10;
if(c%2!=0)
{
count=1;
}
}
if(count==0)
{ a[k]=i;
k++;}
}
}
I don't even know how to explain this... I've been looking for algos but no luck.
I need a function that would return an array of incrementally bigger numbers (not sure what kind of curve) from two numbers that I'd pass as parameters.
Ex.:
$length = 20;
get_numbers(1, 1000, $length);
> 1, 2, 3, 5, 10, 20, 30, 50, 100, 200, 500... // let's say that these are 20 numbers that add up to 1000
Any idea how I could do this..? I guess I'm not smart enough to figure it out.
How about an exponential curve? Sample Python implementation:
begin = 1
end = 1000
diff = end - begin
length = 10
X = diff**(1.0/(length-1))
seq = []
for i in range(length):
seq.append(int(begin+X**i))
print seq
(note: ** is the Python operator for exponentiation. Other languages may or may not use ^ instead)
Result:
[2, 3, 5, 10, 22, 47, 100, 216, 464, 999]
There is a straight road with 'n' number of milestones. You are given
an array with the distance between all the pairs of milestones in
some random order. Find the position of milestones.
Example:
Consider a road with 4 milestones (a,b,c,d) :
a ---3Km--- b ---5Km--- c ---2Km--- d
Distance between a and b is 3
Distance between a and c is 8
Distance between a and d is 10
Distance between b and c is 5
Distance between b and d is 7
Distance between c and d is 2
All the above values are given in a random order say 7, 10, 5, 2, 8, 3.
The output must be 3, 5, 2 or 2, 5, 3.
Assuming the length of the give array is n. My idea is:
Calculate the number of milestones by solving a quadratic equation, saying it's x.
There are P(n, x-1) possibilities.
Validate every possible permutation.
Is there any better solution for this problem?
I can't find an algorithm for this that has good worst-case behaviour. However, the following heuristic may be useful for practical solution:
Say the first landmark is at position zero. You can find the last landmark. Then all other landmark positions need to appear in the input array. Their distances to the last landmark must also appear.
Let's build a graph on these possible landmark positions.
If a and b are two possible landmark positions, then either |a-b| appears in the input array or at least one of a and b isn't a landmark position. Draw an edge between a and b if |a-b| appears in the input array.
Iteratively filter out landmark positions whose degree is too small.
You wind up with something that's almost a clique-finding problem. Find an appropriately large clique; it corresponds to a positioning of the landmarks. Check that this positioning actually gives rise to the right distances.
At worst here, you've narrowed down the possible landmark positions to a more manageable set.
Ok. I will give my idea , which could reduce the number of permutations.
Finding n, is simple, you could even run a Reverse factorial https://math.stackexchange.com/questions/171882/is-there-a-way-to-reverse-factorials
Assumption:
Currently I have no idea of how to find the numbers. But I assume you have found out the numbers somehow. After finding n and elements we could apply this for partial reduction of computation.
Consider a problem like,
|<--3-->|<--6-->|<--1-->|<--7-->|
A B C D E
Now as you said, the sum they will give (in random order too) 3,9,10,17,6,7,14,1,8,7.
But you could take any combination (mostly it will be wrong ),
6-3-1-7. (say this is our taken combination)
Now,
6+3 -> 9 There, so Yes //Checking in the list whether the 2 numbers could possibly be adjacent.
3+1 -> 4 NOT THERE, so cannot
1+7 -> 8 There, So Yes
6+7 -> 13 NOT THERE, So cannot be ajacent
Heart concept :
For, 2 numbers to be adjacent, their sum must be there in the list. If the sum is not in the list, then the numbers are not adjacent.
Optimization :
So, 3 and 1 will not come nearby. And 6 and 7 will not come nearby.
Hence while doing permutation, we could eliminate
*31*,*13*,*76* and *67* combinations. Where * is 0 or more no of digits either preceding or succeeding.
i.e instead of trying permutation for 4! = 24 times, we could only check for 3617,1637,3716,1736. ie only 4 times. i.e 84% of computation is saved.
Worst case :
Say in your case it is 5,2,3.
Now, we have to perform this operation.
5+2 -> 7 There
2+3 -> 5 There
5+3 -> 8 There
Oops, your example is worst case, where we could not optimize the solution in these type of cases.
Place the milestones one by one
EDIT See new implementation below (with timings).
The key idea is the following:
Build a list of milestones one by one, starting with one milestone at 0 and a milestone at max(distances). Lets call them endpoints.
The largest distance that's not accounted for has to be from one of the endpoints, which leaves at most two positions for the corresponding milestone.
The following Python program simply checks if the milestone can be placed from the left endpoint, and if not, tries to place the milestone from the right endpoint (always using the largest distances that's not accounted for by the already placed milestones). This has to be done with back-tracking, as placements may turn out wrong later.
Note that there is another (mirrored) solution that is not output. (I don't think there can be more than 2 solutions (symmetric), but I haven't proven it.)
I consider the position of the milestones as the solution and use a helper function steps for the output desired by the OP.
from collections import Counter
def milestones_from_dists(dists, milestones=None):
if not dists: # all dist are acounted for: we have a solution!
return milestones
if milestones is None:
milestones = [0]
max_dist = max(dists)
solution_from_left = try_milestone(dists, milestones, min(milestones) + max_dist)
if solution_from_left is not None:
return solution_from_left
return try_milestone(dists, milestones, max(milestones) - max_dist)
def try_milestone(dists, milestones, new_milestone):
unused_dists = Counter(dists)
for milestone in milestones:
dist = abs(milestone - new_milestone)
if unused_dists[dist]:
unused_dists[dist] -= 1
if unused_dists[dist] == 0:
del unused_dists[dist]
else:
return None # no solution
return milestones_from_dists(unused_dists, milestones + [new_milestone])
def steps(milestones):
milestones = sorted(milestones)
return [milestones[i] - milestones[i - 1] for i in range(1, len(milestones))]
Example usage:
>>> print(steps(milestones_from_dists([7, 10, 5, 2, 8, 3])))
[3, 5, 2]
>>> import random
>>> milestones = random.sample(range(1000), 100)
>>> dists = [abs(x - y) for x in milestones for y in milestones if x < y]
>>> solution = sorted(milestones_from_dists(dists))
>>> solution == sorted(milestones)
True
>>> print(solution)
[0, 10, 16, 23, 33, 63, 72, 89, 97, 108, 131, 146, 152, 153, 156, 159, 171, 188, 210, 211, 212, 215, 219, 234, 248, 249, 273, 320, 325, 329, 339, 357, 363, 387, 394, 396, 402, 408, 412, 418, 426, 463, 469, 472, 473, 485, 506, 515, 517, 533, 536, 549, 586, 613, 614, 615, 622, 625, 630, 634, 640, 649, 651, 653, 671, 674, 697, 698, 711, 715, 720, 730, 731, 733, 747, 758, 770, 772, 773, 776, 777, 778, 783, 784, 789, 809, 828, 832, 833, 855, 861, 873, 891, 894, 918, 952, 953, 968, 977, 979]
>>> print(steps(solution))
[10, 6, 7, 10, 30, 9, 17, 8, 11, 23, 15, 6, 1, 3, 3, 12, 17, 22, 1, 1, 3, 4, 15, 14, 1, 24, 47, 5, 4, 10, 18, 6, 24, 7, 2, 6, 6, 4, 6, 8, 37, 6, 3, 1, 12, 21, 9, 2, 16, 3, 13, 37, 27, 1, 1, 7, 3, 5, 4, 6, 9, 2, 2, 18, 3, 23, 1, 13, 4, 5, 10, 1, 2, 14, 11, 12, 2, 1, 3, 1, 1, 5, 1, 5, 20, 19, 4, 1, 22, 6, 12, 18, 3, 24, 34, 1, 15, 9, 2]
New implementation incorporationg suggestions from the comments
from collections import Counter
def milestones_from_dists(dists):
dists = Counter(dists)
right_end = max(dists)
milestones = [0, right_end]
del dists[right_end]
sorted_dists = sorted(dists)
add_milestones_from_dists(dists, milestones, sorted_dists, right_end)
return milestones
def add_milestone
s_from_dists(dists, milestones, sorted_dists, right_end):
if not dists:
return True # success!
# find max dist that's not fully used yet
deleted_dists = []
while not dists[sorted_dists[-1]]:
deleted_dists.append(sorted_dists[-1])
del sorted_dists[-1]
max_dist = sorted_dists[-1]
# for both possible positions, check if this fits the already placed milestones
for new_milestone in [max_dist, right_end - max_dist]:
used_dists = Counter() # for backing up
for milestone in milestones:
dist = abs(milestone - new_milestone)
if dists[dist]: # this distance is still available
dists[dist] -= 1
if dists[dist] == 0:
del dists[dist]
used_dists[dist] += 1
else: # no solution
dists.update(used_dists) # back up
sorted_dists.extend(reversed(deleted_dists))
break
else: # unbroken
milestones.append(new_milestone)
success = add_milestones_from_dists(dists, milestones, sorted_dists, right_end)
if success:
return True
dists.update(used_dists) # back up
sorted_dists.extend(reversed(deleted_dists))
del milestones[-1]
return False
def steps(milestones):
milestones = sorted(milestones)
return [milestones[i] - milestones[i - 1] for i in range(1, len(milestones))]
Timings for random milestones in the range from 0 to 100000:
n = 10: 0.00s
n = 100: 0.05s
n = 1000: 3.20s
n = 10000: still takes too long.
The largest distance in the given set of distance is the distance between the first and the last milestone, i.e. in your example 10. You can find this in O(n) step.
For every other milestone (every one except the first or the last), you can find their distances from the first and the last milestone by looking for a pair of distances that sums up to the maximum distance, i.e. in your example 7+3 = 10, 8+2 = 10. You can find these pairs trivially in O(n^2).
Now if you think the road is from east to west, what remains is that for all the interior milestones (all but the first or the last), you need to know which one of the two distances (e.g. 7 and 3, or 8 and 2) is towards east (the other is then towards west).
You can trivially enumerate all the possibilities in time O(2^(n-2)), and for every possible orientation check that you get the same set of distances as in the problem. This is faster than enumerating through all permutations of the smallest distances in the set.
For example, if you assume 7 and 8 are towards west, then the distance between the two internal milestones is 1 mile, which is not in the problem set. So it must be 7 towards west, 8 towards east, leading to solution (or it's mirror)
WEST | -- 2 -- | -- 5 -- | -- 3 -- | EAST
For a larger set of milestones, you would just start guessing the orientation of the two distances to the endpoints, and whenever you product two milestones that have a distance between them that is not in the problem set, you backtrack.
What is the best way to find the period in a repeating list?
For example:
a = {4, 5, 1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 1, 2}
has repeat {4, 5, 1, 2, 3} with the remainder {4, 5, 1, 2} matching, but being incomplete.
The algorithm should be fast enough to handle longer cases, like so:
b = RandomInteger[10000, {100}];
a = Join[b, b, b, b, Take[b, 27]]
The algorithm should return $Failed if there is no repeating pattern like above.
Please see the comments interspersed with the code on how it works.
(* True if a has period p *)
testPeriod[p_, a_] := Drop[a, p] === Drop[a, -p]
(* are all the list elements the same? *)
homogeneousQ[list_List] := Length#Tally[list] === 1
homogeneousQ[{}] := Throw[$Failed] (* yes, it's ugly to put this here ... *)
(* auxiliary for findPeriodOfFirstElement[] *)
reduce[a_] := Differences#Flatten#Position[a, First[a], {1}]
(* the first element occurs every ?th position ? *)
findPeriodOfFirstElement[a_] := Module[{nl},
nl = NestWhileList[reduce, reduce[a], ! homogeneousQ[#] &];
Fold[Total#Take[#2, #1] &, 1, Reverse[nl]]
]
(* the period must be a multiple of the period of the first element *)
period[a_] := Catch#With[{fp = findPeriodOfFirstElement[a]},
Do[
If[testPeriod[p, a], Return[p]],
{p, fp, Quotient[Length[a], 2], fp}
]
]
Please ask if findPeriodOfFirstElement[] is not clear. I did this independently (for fun!), but now I see that the principle is the same as in Verbeia's solution, except the problem pointed out by Brett is fixed.
I was testing with
b = RandomInteger[100, {1000}];
a = Flatten[{ConstantArray[b, 1000], Take[b, 27]}];
(Note the low integer values: there will be lots of repeating elements within the same period *)
EDIT: According to Leonid's comment below, another 2-3x speedup (~2.4x on my machine) is possible by using a custom position function, compiled specifically for lists of integers:
(* Leonid's reduce[] *)
myPosition = Compile[
{{lst, _Integer, 1}, {val, _Integer}},
Module[{pos = Table[0, {Length[lst]}], i = 1, ctr = 0},
For[i = 1, i <= Length[lst], i++,
If[lst[[i]] == val, pos[[++ctr]] = i]
];
Take[pos, ctr]
],
CompilationTarget -> "C", RuntimeOptions -> "Speed"
]
reduce[a_] := Differences#myPosition[a, First[a]]
Compiling testPeriod gives a further ~20% speedup in a quick test, but I believe this will depend on the input data:
Clear[testPeriod]
testPeriod =
Compile[{{p, _Integer}, {a, _Integer, 1}},
Drop[a, p] === Drop[a, -p]]
Above methods are better if you have no noise. If your signal is only approximate then Fourier transform methods might be useful. I'll illustrate with a "parametrized" setup wherein the length and number of repetitions of the base signal, the length of the trailing part, and a bound on the noise perturbation are all variables one can play with.
noise = 20;
extra = 40;
baselen = 103;
base = RandomInteger[10000, {baselen}];
repeat = 5;
signal = Flatten[Join[ConstantArray[base, repeat], Take[base, extra]]];
noisysignal = signal + RandomInteger[{-noise, noise}, Length[signal]];
We compute the absolute value of the FFT. We adjoin zeros to both ends. The object will be to threshold by comparing to neighbors.
sigfft = Join[{0.}, Abs[Fourier[noisysignal]], {0}];
Now we create two 0-1 vectors. In one we threshold by making a 1 for each element in the fft that is greater than twice the geometric mean of its two neighbors. In the other we use the average (arithmetic mean) but we lower the size bound to 3/4. This was based on some experimentation. We count the number of 1s in each case. Ideally we'd get 100 for each, as that would be the number of nonzeros in a "perfect" case of no noise and no tail part.
In[419]:=
thresh1 =
Table[If[sigfft[[j]]^2 > 2*sigfft[[j - 1]]*sigfft[[j + 1]], 1,
0], {j, 2, Length[sigfft] - 1}];
count1 = Count[thresh1, 1]
thresh2 =
Table[If[sigfft[[j]] > 3/4*(sigfft[[j - 1]] + sigfft[[j + 1]]), 1,
0], {j, 2, Length[sigfft] - 1}];
count2 = Count[thresh2, 1]
Out[420]= 114
Out[422]= 100
Now we get our best guess as to the value of "repeats", by taking the floor of the total length over the average of our counts.
approxrepeats = Floor[2*Length[signal]/(count1 + count2)]
Out[423]= 5
So we have found that the basic signal is repeated 5 times. That can give a start toward refining to estimate the correct length (baselen, above). To that end we might try removing elements at the end and seeing when we get ffts closer to actually having runs of four 0s between nonzero values.
Something else that might work for estimating number of repeats is finding the modal number of zeros in run length encoding of the thresholded ffts. While I have not actually tried that, it looks like it might be robust to bad choices in the details of how one does the thresholding (mine were just experiments that seem to work).
Daniel Lichtblau
The following assumes that the cycle starts on the first element and gives the period length and the cycle.
findCyclingList[a_?VectorQ] :=
Module[{repeats1, repeats2, cl, cLs, vec},
repeats1 = Flatten#Differences[Position[a, First[a]]];
repeats2 = Flatten[Position[repeats1, First[repeats1]]];
If[Equal ## Differences[repeats2] && Length[repeats2] > 2(*
is potentially cyclic - first element appears cyclically *),
cl = Plus ### Partition[repeats1, First[Differences[repeats2]]];
cLs = Partition[a, First[cl]];
If[SameQ ## cLs (* candidate cycles all actually the same *),
vec = First[cLs];
{Length[vec], vec}, $Failed], $Failed] ]
Testing
b = RandomInteger[50, {100}];
a = Join[b, b, b, b, Take[b, 27]];
findCyclingList[a]
{100, {47, 15, 42, 10, 14, 29, 12, 29, 11, 37, 6, 19, 14, 50, 4, 38,
23, 3, 41, 39, 41, 17, 32, 8, 18, 37, 5, 45, 38, 8, 39, 9, 26, 33,
40, 50, 0, 45, 1, 48, 32, 37, 15, 37, 49, 16, 27, 36, 11, 16, 4, 28,
31, 46, 30, 24, 30, 3, 32, 31, 31, 0, 32, 35, 47, 44, 7, 21, 1, 22,
43, 13, 44, 35, 29, 38, 31, 31, 17, 37, 49, 22, 15, 28, 21, 8, 31,
42, 26, 33, 1, 47, 26, 1, 37, 22, 40, 27, 27, 16}}
b1 = RandomInteger[10000, {100}];
a1 = Join[b1, b1, b1, b1, Take[b1, 23]];
findCyclingList[a1]
{100, {1281, 5325, 8435, 7505, 1355, 857, 2597, 8807, 1095, 4203,
3718, 3501, 7054, 4620, 6359, 1624, 6115, 8567, 4030, 5029, 6515,
5921, 4875, 2677, 6776, 2468, 7983, 4750, 7609, 9471, 1328, 7830,
2241, 4859, 9289, 6294, 7259, 4693, 7188, 2038, 3994, 1907, 2389,
6622, 4758, 3171, 1746, 2254, 556, 3010, 1814, 4782, 3849, 6695,
4316, 1548, 3824, 5094, 8161, 8423, 8765, 1134, 7442, 8218, 5429,
7255, 4131, 9474, 6016, 2438, 403, 6783, 4217, 7452, 2418, 9744,
6405, 8757, 9666, 4035, 7833, 2657, 7432, 3066, 9081, 9523, 3284,
3661, 1947, 3619, 2550, 4950, 1537, 2772, 5432, 6517, 6142, 9774,
1289, 6352}}
This case should fail because it isn't cyclical.
findCyclingList[Join[b, Take[b, 11], b]]
$Failed
I tried to something with Repeated, e.g. a /. Repeated[t__, {2, 100}] -> {t} but it just doesn't work for me.
Does this work for you?
period[a_] :=
Quiet[Check[
First[Cases[
Table[
{k, Equal ## Partition[a, k]},
{k, Floor[Length[a]/2]}],
{k_, True} :> k
]],
$Failed]]
Strictly speaking, this will fail for things like
a = {1, 2, 3, 1, 2, 3, 1, 2, 3, 4, 5}
although this can be fixed by using something like:
(Equal ## Partition[a, k]) && (Equal ## Partition[Reverse[a], k])
(probably computing Reverse[a] just once ahead of time.)
I propose this. It borrows from both Verbeia and Brett's answers.
Do[
If[MatchQ ## Equal ## Partition[#, i, i, 1, _], Return ## i],
{i, #[[ 2 ;; Floor[Length##/2] ]] ~Position~ First##}
] /. Null -> $Failed &
It is not quite as efficient as Vebeia's function on long periods, but it is faster on short ones, and it is simpler as well.
I don't know how to solve it in mathematica, but the following algorithm (written in python) should work. It's O(n) so speed should be no concern.
def period(array):
if len(array) == 0:
return False
else:
s = array[0]
match = False
end = 0
i = 0
for k in range(1,len(array)):
c = array[k]
if not match:
if c == s:
i = 1
match = True
end = k
else:
if not c == array[i]:
match = False
i += 1
if match:
return array[:end]
else:
return False
# False
print(period([4, 5, 1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 1, 2,1]))
# [4, 5, 1, 2, 3]
print(period([4, 5, 1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 1, 2]))
# False
print(period([4]))
# [4, 2]
print(period([4,2,4]))
# False
print(period([4,2,1]))
# False
print(period([]))
Ok, just to show my own work here:
ModifiedTortoiseHare[a_List] := Module[{counter, tortoise, hare},
Quiet[
Check[
counter = 1;
tortoise = a[[counter]];
hare = a[[2 counter]];
While[(tortoise != hare) || (a[[counter ;; 2 counter - 1]] != a[[2 counter ;; 3 counter - 1]]),
counter++;
tortoise = a[[counter]];
hare = a[[2 counter]];
];
counter,
$Failed]]]
I'm not sure this is a 100% correct, especially with cases like {pattern,pattern,different,pattern, pattern} and it gets slower and slower when there are a lot of repeating elements, like so:
{ 1,2,1,1, 1,2,1,1, 1,2,1,1, ...}
because it is making too many expensive comparisons.
#include <iostream>
#include <vector>
using namespace std;
int period(vector<int> v)
{
int p=0; // period 0
for(int i=p+1; i<v.size(); i++)
{
if(v[i] == v[0])
{
p=i; // new potential period
bool periodical=true;
for(int i=0; i<v.size()-p; i++)
{
if(v[i]!=v[i+p])
{
periodical=false;
break;
}
}
if(periodical) return p;
i=p; // try to find new period
}
}
return 0; // no period
}
int main()
{
vector<int> v3{1,2,3,1,2,3,1,2,3};
cout<<"Period is :\t"<<period(v3)<<endl;
vector<int> v0{1,2,3,1,2,3,1,9,6};
cout<<"Period is :\t"<<period(v0)<<endl;
vector<int> v1{1,2,1,1,7,1,2,1,1,7,1,2,1,1};
cout<<"Period is :\t"<<period(v1)<<endl;
return 0;
}
This sounds like it might relate to sequence alignment. These algorithms are well studied, and might already be implemented in mathematica.