How to solve this question asked in premium coding contest? - algorithm

Could someone guide me on how to solve this programming question? It seems something like DP problem which I couldn't find answer for.
Question:
There are ‘n’ ads. Each ad has an effectiveness value associated with
it which is given in an array of size ‘n’ in the format [v1, v2, …,
vn], where ‘v1’ is the effectiveness value of the first ad, ‘v2’ is
the effectiveness value of the second ad, and so on. The show in which
these ads will be shown is ‘m’ length long (starting from 0 till m),
and the time when the ads can be shown is given in the format [(a1,
b1), (a2, b2), …, (an, bn)], where ith tuple in the array denotes the
timing of the ith ad in the format (start_time, end_time). Note that
any ‘ai’ and ‘bi’ cannot be smaller than 0 and cannot be larger than
‘m’. When you choose to show an ad, you cannot show another ad within
4 minutes of it’s end. So if you select to show the ad having timings
as (2, 4), then you cannot show another ad before 9, hence next ad
cannot be (8, 10), but it can be (9, 12). You have to select the ads
to show to the audience such that you maximize the sum of the
effectiveness values of the ads, given the above constraints. For
example, if ‘m’ is 20, and the timings of the ads are [(2, 3), (6, 9),
(10, 12), (12, 13), (14, 17)] and the effectiveness values are [3, 9,
10, 6, 7], then you can show ad 2 and ad 5 (one-based-indexing) and
have an effectiveness value of 16, which is the maximum you can get
given the constraints.

Your problem can be reduced to the following, let's consider 1d segments that can be described as (start_i, end_i, value_i).
Let S = array of available 1d segments
What we want is to find the non-intersecting segments that sum to the maximum possible value and fit in the interval [0, m] ( show length )
Let DP[x] = best achievable value to cover the segment [0, x] with a subset of the available 1d segments.
The recurrence relation is, given an element (s_i, e_i, v_i) we can select it or not.
DP[e_i] = DP[e_i - 1]
DP[e_i] = max( DP[e_i], DP[s_i - 1] + v_i )
dp[0] = 0;
// sort the events by increasing e_i, because
// we want to process first the events that finish earlier
for( int END = 1; END <= m; ++END)
{
dp[end] = dp[end - 1];
for( each ELEMENT(s_i, e_i, v_u) with (e_i == END) )
{
dp[end] = max( dp[end], dp[s_i - 1] + v_i )
}
}

Related

What is the key difference between Combination Sum IV and No. of ways to make coin change problem?

Combination Sum
Given an array of distinct integer nums and a target integer target, return the number of possible combinations that add up to the target.
Input: nums = [1,2,3], target = 4
Output: 7
Explanation:
The possible combination ways are:
(1, 1, 1, 1)
(1, 1, 2)
(1, 2, 1)
(1, 3)
(2, 1, 1)
(2, 2)
(3, 1)
Note that different sequences are counted as different combinations.
Coin Change
For the given infinite supply of coins of each of denominations, D = {D0, D1, D2, D3, ...... Dn-1}. You need to figure out the total number of ways W, in which you can make the change for Value V using coins of denominations D.
For the same Input as above question:
Number of ways are - 4 total i.e. (1,1,1,1), (1,1, 2), (1, 3) and (2, 2).
I know how to solve Coin Change using the concept of UNBOUNDED KNAPSACK. But how Combination Sum IV is different here! Seems so similar

find the maximum possible length of segments (a scan line algorithm)

The segments (their beginnings) are given. Let them be: 1, 4, 6, 9, 10. All segments have the same length C. We need to find such a maximum length (C) that there is no such common point that belongs to N segments. For example, if N = 2, and the length of the segments above is 4 (C = 4), then if we sort the coordinates of the beginning of the segments and their ends, we get: [1, 5], [4, 8], [6, 10], [9, 13], [10, 14] (it is not difficult to notice that at point 10 ([6, 10], [9, 13], [10, 14]) we have 3 segments intersecting at once, which contradicts the condition)
I know the algorithm for fixed ends of segments, but I do not know how to choose such ends so that the common point contains no more than N without O(n^2) iterations. Please help me
Here's an O(n) time complexity algorithm, provided the beginnings of the segments are already sorted. The space complexity is O(1).
Let's define what a shared point means in terms of the absolute distances between the segment beginnings.
segment beginnings = [1,_,_,4,_,_,_,8] # empty cells added for illustration
N = 2
if we take C = 5 we get [1,_,_,4,_,_,_,8]
1,2,3,4,5
It's evident that there is shared point. If we had C = 4,
this wouldn't be the case since the end points of the segments don't count.
So, we can state that for 2 segments to have at least one shared point, the length of C must be at least abs(s_seg2 - s_seg1) + 2. This is the distance of the segment starts plus 2. This also means that C = abs(s_seg2 - s_seg1) + 1 avoids any common point between seg1 and seg2
Now, the above implies that to avoid one shared point between any 2 segments (N = 2), C must be min(abs(s_segi - s_segj))+1 for 0 <= i < len(segment_starts)-1, j = i+1.
How do we generalize this idea to one shared point between 3 segments (N = 3) or any number of segments with 2 <= N <= len(segment_starts)?
For 3 adjacent segments, s_seg1, s_seg2, s_seg3 to produce one shared point between the 3, the length of C must be abs(s_seg3 - s_seg1)+2, so +1 would be the maximum possible length of C.
The above motivates the following algorithm:
Create a sliding window or 2 pointers with left = 0 and right = N-1. You could also say a sliding window of length N.
Keep track of the minimum value of segment_starts[right] - segment_starts[left].
Run a loop while right < len(segment_starts) and increment left and right after each iteration.
At the end of the loop return the found min value + 1.
In pseudo code:
# assuming segment_starts is sorted none-decreasing
if N > len(segment_starts): return Infinity
if N < 2: raise Invalid Input Error
left, right = 0, N-1
curr_best = segment_starts[right] - segment_starts[left]
for right < len(segment_starts):
curr_best = min(curr_best, segment_starts[right] - segment_starts[left])
return curr_best+1

Find optimal points to cut a set of intervals

Given a set of intervals on the real line and some parameter d > 0. Find a sequence of points with gaps between neighbors less or equal to d, such that the number of intervals that contain any of the points is minimized.
To prevent trivial solutions we ask that the first point from the sequence is before the first interval, and the last point is after the last interval. The intervals can be thought of right-open.
Does this problem have a name? Maybe even an algorithm and a complexity bound?
Some background:
This is motivated by a question from topological data analysis, but it seems so general, that it could be interesting for other topics, e.g. task scheduling (given a factory that has to shut down at least once a year and wants to minimize the number of tasks inflicted by the maintenance...)
We were thinking of integer programming and minimum cuts, but the d-parameter does not quite fit. We also implemented approximate greedy solutions in n^2 and n*logn time, but they can run into very bad local optima.
Show me a picture
I draw intervals by lines. The following diagram shows 7 intervals. d is such that you have to cut at least every fourth character. At the bottom of the diagram you see two solutions (marked with x and y) to the diagram. x cuts through the four intervals in the top, whereas y cuts through the three intervals at the bottom. y is optimal.
——— ———
——— ———
———
———
———
x x x x
y y y
Show me some code:
How should we define fun in the following snippet?
intervals = [(0, 1), (0.5, 1.5), (0.5, 1.5)]
d = 1.1
fun(intervals, d)
>>> [-0.55, 0.45, 1.55] # Or something close to it
In this small example the optimal solution will cut the first interval, but not the second and third. Obviously, the algorithm should work with more complicated examples as well.
A tougher test can be the following: Given a uniform distribution of interval start times on [0, 100] and lengths uniform on [0, d], one can compute the expected number of cuts by a regular grid [0, d, 2d, 3d,..] to be slightly below 0.5*n. And the optimal solution should be better:
n = 10000
delta = 1
starts = np.random.uniform(low=0., high=99, size=n)
lengths = np.random.uniform(low=0., high=1, size=n)
rand_intervals = np.array([starts, starts + lengths]).T
regular_grid = np.arange(0, 101, 1)
optimal_grid = fun(rand_intervals)
# This computes the number of intervals being cut by one of the points
def cuts(intervals, grid):
bins = np.digitize(intervals, grid)
return sum(bins[:,0] != bins[:,1])
cuts(rand_intervals, regular_grid)
>>> 4987 # Expected to be slightly below 0.5*n
assert cuts(rand_intervals, optimal_grid) <= cuts(rand_intervals, regular_grid)
You can solve this optimally through dynamic programming by maintaining an array S[k] where S[k] is the best solution (covers the largest amount of space) while having k intervals with a point in it. Then you can repeatedly remove your lowest S[k], extend it in all possible ways (limiting yourself to the relevant endpoints of intervals plus the last point in S[k] + delta), and updating S with those new possible solutions.
When the lowest possible S[k] in your table covers the entire range, you are done.
A Python 3 solution using intervaltree from pip:
from intervaltree import Interval, IntervalTree
def optimal_points(intervals, d, epsilon=1e-9):
intervals = [Interval(lr[0], lr[1]) for lr in intervals]
tree = IntervalTree(intervals)
start = min(iv.begin for iv in intervals)
stop = max(iv.end for iv in intervals)
# The best partial solution with k intervals containing a point.
# We also store the intervals that these points are contained in as a set.
sols = {0: ([start], set())}
while True:
lowest_k = min(sols.keys())
s, contained = sols.pop(lowest_k)
# print(lowest_k, s[-1]) # For tracking progress in slow instances.
if s[-1] >= stop:
return s
relevant_intervals = tree[s[-1]:s[-1] + d]
relevant_points = [iv.begin - epsilon for iv in relevant_intervals]
relevant_points += [iv.end + epsilon for iv in relevant_intervals]
extensions = {s[-1] + d} | {p for p in relevant_points if s[-1] < p < s[-1] + d}
for ext in sorted(extensions, reverse=True):
new_s = s + [ext]
new_contained = set(tree[ext]) | contained
new_k = len(new_contained)
if new_k not in sols or new_s[-1] > sols[new_k][0][-1]:
sols[new_k] = (new_s, new_contained)
If the range and precision could be feasible for iterating over, we could first merge and count the intervals. For example,
[(0, 1), (0.5, 1.5), (0.5, 1.5)] ->
[(0, 0.5, 1), (0.5, 1, 3), (1, 1.5, 2)]
Now let f(n, k) represent the optimal solution with k points up to n on the number line. Then:
f(n, k) = min(
num_intervals(n) + f(n - i, k - 1)
)
num_intervals(n) is known in O(1)
from a pointer in the merged interval list.
n-i is not every precision point up to n. Rather, it's
every point not more than d back that marks a change
from one merged interval to the next as we move it
back from our current pointer in the merged-interval
list.
One issue to note is that we need to store the distance between the rightmost and previous point for any optimal f(n, k). This is to avoid joining f(n - i, k - 1) where the second to rightmost point would be less than d away from our current n, making the new middle point, n - i, superfluous and invalidating this solution. (I'm not sure I've thought this issue through enough. Perhaps someone could point out something that's amiss.)
How would we know k is high enough? Given that the optimal solution may be lower than the current k, we assume that the recurrence would prevent us from finding an instance based on the idea in the above paragraph:
0.......8
——— ———
——— ———
———
———
———
x x x x
y y y
d = 4
merged list:
[(1, 3, 2), (3, 4, 5), (4, 5, 3), (5, 6, 5), (6, 8, 2)]
f(4, 2) = (3, 0) // (intersections, previous point)
f(8, 3) = (3, 4)
There are no valid solutions for f(8, 4) since the
break point we may consider between interval change
in the merged list is before the second-to-last
point in f(8, 3).

Finding cheapest combination of items with conditions on the selection

Lets say that I have 3 sellers of a particular item. Each seller has different amounts of this items stored. The also have a different price for the item.
Name Price Units in storage
Supplier #1 17$ 1 Unit
Supplier #2 18$ 3 Units
Supplier #3 23$ 5 Units
If I do not order enough items from the same supplier, I have to pay some extra costs per unit. Let's say, for example, that if I do not order at least 4 units, I do have to pay extra 5$ for each unit ordered.
Some examples:
If I wanted to buy 4 units, the best price would come from getting them from Supplier #1 and Supplier #2, rather than getting it all from Supplier #3
(17+5)*1 + (18+5)*3 = 91 <--- Cheaper
23 *4 = 92
But if I were to buy 5 units, getting them all from Supplier 3 gives me a better price, than getting first the cheaper ones and the rest from more expensive suppliers
(17+5)*1 + (18+5)*3 + (23+5)*1 = 119
23 *5 = 115$ <--- Cheaper
The question
Keeping all this in mind... If I knew beforehand how many items I want to order, what would be an algorithm to find out what is the best combination I can chose?
As noted in comments, you can use a graph search algorithm for this, like Dijkstra's algorithm. It might also be possible to use A*, but in order to do so, you need a good heuristic function. Using the minimum price might work, but for now, let's stick with Dijkstra's.
One node in the graph is represented as a tuple of (cost, num, counts), where cost is the cost, obviously, num the total number of items purchased, and counts a breakdown of the number of items per seller. With cost being the first element in the tuple, the item with the lowest cost will always be at the front of the heap. We can handle the "extra fee" by adding the fee if the current count for that seller is lower than the minimum, and subtracting it again once we reach that minimum.
Here's a simple implementation in Python.
import heapq
def find_best(goal, num_cheap, pay_extra, price, items):
# state is tuple (cost, num, state)
heap = [(0, 0, tuple((seller, 0) for seller in price))]
visited = set()
while heap:
cost, num, counts = heapq.heappop(heap)
if (cost, num, counts) in visited:
continue # already seen this combination
visited.add((cost, num, counts))
if num == goal: # found one!
yield (cost, num, counts)
for seller, count in counts:
if count < items[seller]:
new_cost = cost + price[seller] # increase cost
if count + 1 < num_cheap: new_cost += pay_extra # pay extra :(
if count + 1 == num_cheap: new_cost -= (num_cheap - 1) * pay_extra # discount! :)
new_counts = tuple((s, c + 1 if s == seller else c) for s, c in counts)
heapq.heappush(heap, (new_cost, num+1, new_counts)) # push to heap
The above is a generator function, i.e. you can either use next(find_best(...)) to find just the best combination, or iterate over all the combinations:
price = {1: 17, 2: 18, 3: 23}
items = {1: 1, 2: 3, 3: 5}
for best in find_best(5, 4, 5, price, items):
print(best)
And as we can see, there's an even cheaper solution for buying five items:
(114, 5, ((1, 1), (2, 0), (3, 4)))
(115, 5, ((1, 0), (2, 0), (3, 5)))
(115, 5, ((1, 0), (2, 1), (3, 4)))
(119, 5, ((1, 1), (2, 3), (3, 1)))
(124, 5, ((1, 1), (2, 2), (3, 2)))
(125, 5, ((1, 0), (2, 3), (3, 2)))
(129, 5, ((1, 1), (2, 1), (3, 3)))
(130, 5, ((1, 0), (2, 2), (3, 3)))
Update 1: While the above works fine for the example, there can be cases where it fails, since subtracting the extra cost once we reach the minimum number means that we could have edges with negative cost, which can be a problem in Dijkstra's. Alternatively, we can add all four elements at once in a single "action". For this, replace the inner part of the algorithm with this:
if count < items[seller]:
def buy(n, extra): # inner function to avoid code duplication
new_cost = cost + (price[seller] + extra) * n
new_counts = tuple((s, c + n if s == seller else c) for s, c in counts)
heapq.heappush(heap, (new_cost, num + n, new_counts))
if count == 0 and items[seller] >= num_cheap:
buy(num_cheap, 0) # buy num_cheap in bulk
if count < num_cheap - 1: # do not buy single item \
buy(1, pay_extra) # when just 1 lower than num_cheap!
if count >= num_cheap:
buy(1, 0) # buy with no extra cost
Update 2: Also, since the order in which the items are added to the "path" does not matter, we can restrict the sellers to those that are not before the current seller. We can add the for seller, count in counts: loop to his:
used_sellers = [i for i, (_, c) in enumerate(counts) if c > 0]
min_sellers = used_sellers[0] if used_sellers else 0
for i in range(min_sellers, len(counts)):
seller, count = counts[i]
With those two improvements, the states in the explored graph looks for next(find_best(5, 4, 5, price, items)) like this (click to enlarge):
Note that there are many states "below" the goal state, with costs much worse. This is because those are all the states that have been added to the queue, and for each of those states, the predecessor state was still better than out best state, thus they were expanded and added to, but never actually popped from the queue. Many of those could probably be trimmed away by using A* with a heuristic function like items_left * min_price.
This is a Bounded Knapsack problem. Where you want to optimize(minimize) the cost with the constraints of price and quantity.
Read about 0-1 KnapSack problem here. Where you have only 1 quantity for given supplier.
Read how to extend the 0-1 KnapSack problem for given quantity ( called Bounded Knapsack ) here
A more detailed discussion of Bounded KnapSack is here
These all will be sufficient to come up with an algorithm which requires a bit of tweaking ( i.g. adding 5$ when the quantity is below some given quantity )

Find a period of eventually periodic sequence

Short explanation.
I have a sequence of numbers [0, 1, 4, 0, 0, 1, 1, 2, 3, 7, 0, 0, 1, 1, 2, 3, 7, 0, 0, 1, 1, 2, 3, 7, 0, 0, 1, 1, 2, 3, 7]. As you see, from the 3-rd value the sequence is periodic with a period [0, 0, 1, 1, 2, 3, 7].
I am trying to automatically extract this period from this sequence. The problem is that neither I know the length of the period, nor do I know from which position the sequence becomes periodic.
Full explanation (might require some math)
I am learning combinatorial game theory and a cornerstone of this theory requires one to calculate Grundy values of a game graph. This produces infinite sequence, which in many cases becomes eventually periodic.
I found a way to efficiently calculate grundy values (it returns me a sequence). I would like to automatically extract offset and period of this sequence. I am aware that seeing a part of the sequence [1, 2, 3, 1, 2, 3] you can't be sure that [1, 2, 3] is a period (who knows may be the next number is 4, which breaks the assumption), but I am not interested in such intricacies (I assume that the sequence is enough to find the real period). Also the problem is the sequence can stop in the middle of the period: [1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, ...] (the period is still 1, 2, 3).
I also need to find the smallest offset and period. For example for original sequence, the offset can be [0, 1, 4, 0, 0] and the period [1, 1, 2, 3, 7, 0, 0], but the smallest is [0, 1, 4] and [0, 0, 1, 1, 2, 3, 7].
My inefficient approach is to try every possible offset and every possible period. Construct the sequence using this data and check whether it is the same as original. I have not done any normal analysis, but it looks like it is at least quadratic in terms of time complexity.
Here is my quick python code (have not tested it properly):
def getPeriod(arr):
min_offset, min_period, n = len(arr), len(arr), len(arr)
best_offset, best_period = [], []
for offset in xrange(n):
start = arr[:offset]
for period_len in xrange(1, (n - offset) / 2):
period = arr[offset: offset+period_len]
attempt = (start + period * (n / period_len + 1))[:n]
if attempt == arr:
if period_len < min_period:
best_offset, best_period = start[::], period[::]
min_offset, min_period = len(start), period_len
elif period_len == min_period and len(start) < min_offset:
best_offset, best_period = start[::], period[::]
min_offset, min_period = len(start), period_len
return best_offset, best_period
Which returns me what I want for my original sequence:
offset [0, 1, 4]
period [0, 0, 1, 1, 2, 3, 7]
Is there anything more efficient?
Remark: If there is a period P1 with length L, then there is also a period P2, with the same length, L, such that the input sequence ends exactly with P2 (i.e. we do not have a partial period involved at the end).
Indeed, a different period of the same length can always be obtained by changing the offset. The new period will be a rotation of the initial period.
For example the following sequence has a period of length 4 and offset 3:
0 0 0 (1 2 3 4) (1 2 3 4) (1 2 3 4) (1 2 3 4) (1 2 3 4) (1 2
but it also has a period with the same length 4 and offset 5, without a partial period at the end:
0 0 0 1 2 (3 4 1 2) (3 4 1 2) (3 4 1 2) (3 4 1 2) (3 4 1 2)
The implication is that we can find the minimum length of a period by processing the sequence in reverse order, and searching the minimum period using zero offset from the end. One possible approach is to simply use your current algorithm on the reversed list, without the need of the loop over offsets.
Now that we know the length of the desired period, we can also find its minimum offset. One possible approach is to try all various offsets (with the advantage of not needing the loop over lengths, since the length is known), however, further optimizations are possible if necessary, e.g. by advancing as much as possible when processing the list from the end, allowing the final repetition of the period (i.e. the one closest to the start of the un-reversed sequence) to be partial.
I would start with constructing histogram of the values in the sequence
So you just make a list of all numbers used in sequence (or significant part of it) and count their occurrence. This is O(n) where n is sequence size.
sort the histogram ascending
This is O(m.log(m)) where m is number of distinct values. You can also ignore low probable numbers (count<treshold) which are most likely in the offset or just irregularities further lowering m. For periodic sequences m <<< n so you can use it as a first marker if the sequence is periodic or not.
find out the period
In the histogram the counts should be around multiples of the n/period. So approximate/find GCD of the histogram counts. The problem is that you need to take into account there are irregularities present in the counts and also in the n (offset part) so you need to compute GCD approximately. for example:
sequence = { 1,1,2,3,3,1,2,3,3,1,2,3,3 }
has ordered histogram:
item,count
2 3
1 4
3 6
the GCD(6,4)=2 and GCD(6,3)=3 you should check at least +/-1 around the GCD results so the possible periods are around:
T = ~n/2 = 13/2 = 6
T = ~n/3 = 13/3 = 4
So check T={3,4,5,6,7} just to be sure. Use always GCD between the highest counts vs. lowest counts. If the sequence has many distinct numbers you can also do a histogram of counts checking only the most common values.
To check period validity just take any item near end or middle of the sequence (just use probable periodic area). Then look for it in close area near probable period before (or after) its occurrence. If found few times you got the right period (or its multiple)
Get the exact period
Just check the found period fractions (T/2, T/3, ...) or do a histogram on the found period and the smallest count tells you how many real periods you got encapsulated so divide by it.
find offset
When you know the period this is easy. Just scan from start take first item and see if after period is there again. If not remember position. Stop at the end or in the middle of sequence ... or on some treshold consequent successes. This is up to O(n) And the last remembered position is the last item in the offset.
[edit1] Was curious so I try to code it in C++
I simplified/skip few things (assuming at least half of the array is periodic) to test if I did not make some silly mistake in my algorithm and here the result (Works as expected):
const int p=10; // min periods for testing
const int n=500; // generated sequence size
int seq[n]; // generated sequence
int offset,period; // generated properties
int i,j,k,e,t0,T;
int hval[n],hcnt[n],hs; // histogram
// generate periodic sequence
Randomize();
offset=Random(n/5);
period=5+Random(n/5);
for (i=0;i<offset+period;i++) seq[i]=Random(n);
for (i=offset,j=i+period;j<n;i++,j++) seq[j]=seq[i];
if ((offset)&&(seq[offset-1]==seq[offset-1+period])) seq[offset-1]++;
// compute histogram O(n) on last half of it
for (hs=0,i=n>>1;i<n;i++)
{
for (e=seq[i],j=0;j<hs;j++)
if (hval[j]==e) { hcnt[j]++; j=-1; break; }
if (j>=0) { hval[hs]=e; hcnt[hs]=1; hs++; }
}
// bubble sort histogram asc O(m^2)
for (e=1,j=hs;e;j--)
for (e=0,i=1;i<j;i++)
if (hcnt[i-1]>hcnt[i])
{ e=hval[i-1]; hval[i-1]=hval[i]; hval[i]=e;
e=hcnt[i-1]; hcnt[i-1]=hcnt[i]; hcnt[i]=e; e=1; }
// test possible periods
for (j=0;j<hs;j++)
if ((!j)||(hcnt[j]!=hcnt[j-1])) // distinct counts only
if (hcnt[j]>1) // more then 1 occurence
for (T=(n>>1)/(hcnt[j]+1);T<=(n>>1)/(hcnt[j]-1);T++)
{
for (i=n-1,e=seq[i],i-=T,k=0;(i>=(n>>1))&&(k<p)&&(e==seq[i]);i-=T,k++);
if ((k>=p)||(i<n>>1)) { j=hs; break; }
}
// compute histogram O(T) on last multiple of period
for (hs=0,i=n-T;i<n;i++)
{
for (e=seq[i],j=0;j<hs;j++)
if (hval[j]==e) { hcnt[j]++; j=-1; break; }
if (j>=0) { hval[hs]=e; hcnt[hs]=1; hs++; }
}
// least count is the period multiple O(m)
for (e=hcnt[0],i=0;i<hs;i++) if (e>hcnt[i]) e=hcnt[i];
if (e) T/=e;
// check/handle error
if (T!=period)
{
return;
}
// search offset size O(n)
for (t0=-1,i=0;i<n-T;i++)
if (seq[i]!=seq[i+T]) t0=i;
t0++;
// check/handle error
if (t0!=offset)
{
return;
}
Code is still not optimized. For n=10000 it takes around 5ms on mine setup. The result is in t0 (offset) and T (period). You may need to play with the treshold constants a bit
I had to do something similar once. I used brute force and some common sense, the solution is not very elegant but it works. The solution always works, but you have to set the right parameters (k,j, con) in the function.
The sequence is saved as a list in the variable seq.
k is the size of the sequence array, if you think your sequence will take long to become periodic then set this k to a big number.
The variable found will tell us if the array passed the periodic test with period j
j is the period.
If you expect a huge period then you must set j to a big number.
We test the periodicity by checking the last j+30 numbers of the sequence.
The bigger the period (j) the more we must check.
As soon as one of the test is passed we exit the function and we return the smaller period.
As you may notice the accuracy depends on the variables j and k but if you set them to very big numbers it will always be correct.
def some_sequence(s0, a, b, m):
try:
seq=[s0]
snext=s0
findseq=True
k=0
while findseq:
snext= (a*snext+b)%m
seq.append(snext)
#UNTIL THIS PART IS JUST TO CREATE THE SEQUENCE (seq) SO IS NOT IMPORTANT
k=k+1
if k>20000:
# I IS OUR LIST INDEX
for i in range(1,len(seq)):
for j in range(1,1000):
found =True
for con in range(j+30):
#THE TRICK IS TO START FROM BEHIND
if not (seq[-i-con]==seq[-i-j-con]):
found = False
if found:
minT=j
findseq=False
return minT
except:
return None
simplified version
def get_min_period(sequence,max_period,test_numb):
seq=sequence
if max_period+test_numb > len(sequence):
print("max_period+test_numb cannot be bigger than the seq length")
return 1
for i in range(1,len(seq)):
for j in range(1,max_period):
found =True
for con in range(j+test_numb):
if not (seq[-i-con]==seq[-i-j-con]):
found = False
if found:
minT=j
return minT
Where max_period is the maximun period you want to look for, and test_numb is how many numbers of the sequence you want to test, the bigger the better but you have to make max_period+test_numb < len(sequence)

Resources