Im looking for an algorithm to reduce a list (playlist) of ordered but not unique items.
Searched for set theory but havent found anything suitable yet
Examples
[a, b, b, c] -> [a, b, b, c] Cannot be reduced.
[a, b, b, a, b, b] -> [a, b, b].
[b, b, b, b, b] -> [b].
[b, b, b, b, a] -> [b, b, b, b, a] Cannot be reduced.
Thinking of getting all existing sublists and count each instance.
If there is such a sublist where the count times the sublist length is equal to the orginal list, take the shortest sublist matching this criteria.
This seems a bit brute force, there must be a simpler/faster solution available.
For starters, you don't need to check all sublists -- just those with lengths that are factors of the length of the full list.
If your main concern is coding simplicity rather than raw speed, just let a regex engine solve the problem:
/^(.+?)\1+$/
Which is a variant on Abigail's awesome Perl regex to find prime numbers.
For each n <= N (where N is the length of the list), if n is a factor of N. If it is, then check if repeating the sublist of the first n characters generates the original list. If it does, then you've found a potential answer (the answer is the shortest). This should get you down to less than O(N^2) but still the same order of efficiency as brute force in the worst case.
You can do some pruning by noting that, if for example a sublist of length 2 successfully generates the first 4 characters but not the full list, then a sublist of length 4 will fail. You can keep a list of all such sublist lengths to not check and this will cut down on some computation.
Encode every set element with a prime number.
Ex:
a -> 2
b -> 3
c -> 5
etc.
Now, you need two more lists to maintain.
First list is for primes, second is for their exponents.
The idea is; when you stumble upon an element, record it's prime number and how many times in succession it appears.
For [a, b, b, c], you get this:
[2, 3, 3, 5]
Which can be recorded as:
[2, 3^2, 5]
or, more precisely:
[2^1, 3^2, 5^1]
and you maintain two lists:
[2,3,5] // primes in succession - list [p]
[1,2,1] // exponents - list [e]
Now, you iterate through these two lists from ends to middle, by checking if first element [p]^[e] is the same as last element; if it is, then second with next to last and so on... If all of them are the same, your list can be reduced.
In this example, you check if 2^1*5^1 == 3^2*3^2; and since it's not, it cannot be reduced.
Let's try [a, b, b, a, b, b]:
This is encoded as
[2^1, 3^2, 2^1, 3^2]
or,
[2, 3, 2, 3] // primes
[1, 2, 1, 2] // exponents
Now, we check if 2^1 * 3^2 == 3^2 * 2^1 (first prime, first exponent multiplied with last prime, last exponent, and then compared with second against next to last)
Since this holds, it is reducible.
Let's try [b, b, b, b, b]:
This can be encoded as
[3^5]
or,
[3] // primes
[5] // exponents
This is a special case: if you've got 1 element lists, then your original list is reducible.
Let's try [b, b, b, b, a]:
This can be encoded as
[3^4, 2^1]
or,
[3, 2] // primes
[4, 1] // exponents
We check if 3^4 == 2^1, and since it is not, your list is not reducible.
Let's try [a, b, a, b, a, b]:
This can be encoded as
[2^1, 3^1, 2^1, 3^1, 2^1, 3^1]
or,
[2, 3, 2, 3, 2, 3]
[1, 1, 1, 1, 1, 1]
Trying the above procedure works, because 2^1 * 3^1 == 3^1 * 2^1 == 2^1 * 3^1
So, the algorithm would be something like this:
Encode all numbers to primes.
Iterating through your list, make two lists and populate them as described
Now that you have your two lists, p and e, both of them having length n do this:
var start = p[0]^e[0] * p[n-1]^e[n-1]
var reducible = true;
for (int i = 0; i < n/2, ++i) :
if ( (p[i]^e[i] * p[n-i]^e[n-i]) != start ) :
reducible = false;
break;
Note: I didn't really code this algorithm and try it out for various inputs. It's just an idea.
Also, if a list is reducible, from it's length and length of n, it shouldn't be too hard to see how to reduce the original list to its basic form.
Second note: if anyone sees a mistake above, please correct me. It's possible that nothing of this really works since it's late and my concentration isn't optimal.
Here's some simple code that should run in close to linear time (at worst O(n lg lg n) I think, relying on some higher math).
f(x) {
i = 1;
while (i <= size(x) / 2) {
if (size(x) % i != 0) { i++; continue;}
b = true;
for (j = 0; j + i < x.size(); j++) {
if (x[i] != x[j]) {
b = false;
break;
}
}
if (b) return i;
i = max(i + 1, j / i * i / 2); // skip some values of i if j is large enough
}
return -1;
}
Essentially, the above performs the naive algorithm, but skips some periodicities which are known to be impossible due to earlier "near-misses". For example, if you try a period of 5 and see "aaaabaaaabaaaabaaaabab", you can safely skip 6, 7,..., 10 since we saw 4 cycles of 5 repeat and then a failure.
Ultimately, you end up doing a linear amount of work plus an amount of work that is linear in sigma(n), the sum of the divisors of n, which is bounded by O(n lg lg n).
*Note that proving the correctness of this skipping is pretty subtle, and I may have made a mistake on the details -- comments welcome.
Related
For some fixed integer N, an array A[1..N] is an arithmetic-free permutation if
A is a permutation of { 1, ... , N }; and
for every 1 ≤ i < j < k ≤ N, the elements A[i], A[j], A[k] (in order) do NOT form an arithmetic
progression. That means, A[j] - A[i] ≠ A[k] - A[j]
Give an algorithm which, given N, returns an arithmetic-free permutation of size N in O(N log N) time. It is guaranteed that arithmetic-free permutations exist for all positive integers N.
Construct the bit-reversal permutation for the next highest power of two and drop out the numbers that don't belong. There are several ways to do this in O(n log n) time. I'm not going to write a formal proof in case this is homework, but the general idea is to look at the lowest order bit where A[i], A[j], and A[k] are not all the same and observe that the two that agree are adjacent.
There is a good answer at https://leetcode.com/articles/beautiful-array/
Say a < b < c and b - a = c - b. Then
2 * b = a + c
Since 2 * b is always even, to break any progression, a and c must have different parity. But grouping odds to one side and evens on the other is not enough since if we have more than 4 numbers, we could generate an arithmetic progression within one of the groups.
This is where we can use the recursive idea in the article to break it. One way I understand it is to consider that if we have a solution for array size N, because the arithmetic progression depends on differences between the numbers, we can map a given solution by an arithmetic function to the same effect:
if [a, b, c, d] works,
then [2*a, 2*b, 2*c, 2*d] works too
and so does [2*a - 1, 2*b - 1, 2*c - 1, 2*d - 1]
Therefore all we need to do is map a smaller solution once to the even numbers and once to the odds and group them separately. (Separating the groups limits the problem to breaking the arithmetic progression in each group since, as we've shown, no progression, (a, b, c), would rely on a and c of different parities.)
N1 -> [1]
N2 -> even map N1 + odd map N1
[2*1] + [2*1 - 1]
[2, 1]
N3 -> even map N1 + odd map N2
[2*1] + [2*2 - 1, 2*1 - 1]
[2, 3, 1]
...
N6 -> even map N3 + odd map N3
[2*2, 2*3, 2*1] + [2*2 - 1, 2*3 - 1, 2*1 - 1]
[4, 6, 2, 3, 5, 1]
My data structure is a path represented by a list of cities. If, for example, the cities are
A, B, C, D
A possible configuration could be: A, B, D, C or D, C, A, B.
I need two compare two paths in order to find the differences between these two, in such a way that the output of this procedure returns the set of swapping operations necessary to transform the second path into the first one.
For example, given the following paths:
X = {A, B, D, C}
Y = {D, C, A, B}
indexes = {0, 1, 2, 3}
A possible way to transform the path Y into X would be the set of the following swaps: {0-2, 1-3}.
{D, C, A, B} --> [0-2] --> {A, C, D, B} --> [1-3] --> {A, B, D, C}
Is there any known (and fast) algorithm that allows to compute this set?
Your problem looks like a problem of counting the minimal number of swaps to transform one permutation to another.
In fact it's a well known problem. The key idea is to create new permutation P such that P[i] is the index of X[i] city in the Y. Then you just calculate the total number of cycles C in the P. The answer is the len(X) - C, where len(X) is the size of X.
In your case P looks like: 3, 4, 1, 2. It has two cycles: 3, 1 and 4, 2. So the answer is 4 - 2 = 2.
Total complexity is linear.
For more details see this answer. It explains this algorithm in more details.
EDIT
Okay, but how we can get swaps, and not only their number? Note, that in this solution we reorder each cycle independently doing N - 1 swaps if the length of cycle is N. So, if you have cycle v(0), v(1), ..., v(N - 1), v(N) you just need to swap v(N), v(N - 1), v(N - 1), v(N - 2), ..., v(1), v(0). So you swap cycle elements in reverse order.
Also, if you have C cycles with lengths L(1), L(2), ..., L(C) the number of swaps is L(1) - 1 + L(2) - 1 + ... + L(C) - 1 = L(1) + L(2) + ... + L(C) - C = LEN - C where LEN is the length of permutation.
Engines numbered 1, 2, ..., n are on the line at the left, and it is desired to rearrange(permute) the engines as they leave on the right-hand track. An engine that is on the spur track can be left there or sent on its way down the right track, but it can never be sent back to the incoming track. For example if n = 3, and we ha the engines numbered 1,2,3 on the left track, then 3 first goes to the spur track. We could then send 2 to the spur, then on the way to its right,then send 3 on the way, then 1, obtaining the new order 1,3,2.
We have to find all the possible permutations for specific n.
For n=1, answer = 1;
For n=2 answer = 2;
For n=3 answer = 5;
Didn't find any generality.
Using Stack Implementation would be very helpful.
But any solution is welcomed.
p.s. This is not a homework question, as I am a self taught person.
Here's my attempt at a recursive solution (see comments in Java code):
private static int result = 0;
private static int n = 3;
public static void g(int right,int spur){
if (right == n) // all trains are on the right track
result++;
else if (right + spur < n) // some trains are still on the left track
g(right,spur + 1); // a train moved from the left track to the spur
if (spur > 0) // at least one train is on the spur
g(right + 1,spur - 1); // a train moved from the spur to the right track
// (also counts trains moving directly from the left to right track)
}
public static void main (String[] args){
g(0,0);
System.out.println(result); // 5
}
The recursive solution above actually counts each possibility. For a combinatorial solution we consider all combinations of n movements on and out of the spur, where adjacent such movements are equivalent to movements directly from the left to right track. There are 2n choose n such combinations. Now let's count the invalid ones:
Consider all combinations of (n - 1) ins and (n + 1) outs of the spur. All these include a point, p, where a train is counted as leaving the spur when no trains are on it. Let's say that p has k ins and (k + 1) outs preceding it - then the number of remaining ins is (n - 1 - k); and remaining outs, (n + 1) - (k + 1) = (n - k).
Now reverse the ins and outs for each one of these combinations starting after p so that in becomes out and out in. Each one of the reversed sections has necessarily (n - k) ins and (n - 1 - k) outs. But now if we total the number of ins and outs before and after p we get k + (n - k) = n ins, and (k + 1) + (n - 1 - k) = n outs. We have just counted the number of combinations of n ins and n outs that are invalid. If we assume that one such combination may not have been counted, theoretically reverse that combination after its p and you will find a combination of (n - 1) ins and (n + 1) outs that was not counted. But by definition we counted them all already so our assumed extra combination could not exist.
Total valid combinations then are 2n choose n - 2n choose (n + 1), the Catalan numbers.
(Adapted from an explanation by Tom Davis here: http://mathcircle.berkeley.edu/BMC6/pdf0607/catalan.pdf)
First, note that you may ignore the possibility of moving the train from incoming directly to outgoing: such a move can be done by moving a train to the spur and then out again.
Denote a train move from incoming to spur as ( and a train move from spur to outgoing as ), and you get a bijection between permutations of the trains and strings of n pairs of correctly balanced parenthesis. That statement needs proving, but the only hard part of the proof is proving that no two strings of balanced parentheses correspond to the same permutation. The number of such strings is the n'th Catalan number, or choose(2n, n)/(n+1), where choose(n, k) is the number of ways of choosing k items from n.
Here's code to compute the solution:
def perms(n):
r = 1
for i in xrange(1, n+1):
r *= (n + i)
r //= i
return r // (n + 1)
You can generate all the permutations with this code, which also exposes the Catalan nature of the solution.
def perms(n, i=0):
if n == 0:
yield []
for k in xrange(n):
for s in perms(k, i+1):
for t in perms(n-k-1, i+k+1):
yield s + [i] + t
print list(perms(4))
Output:
[[0, 1, 2, 3], [0, 1, 3, 2], [0, 2, 1, 3], [0, 2, 3, 1],
[0, 3, 2, 1], [1, 0, 2, 3], [1, 0, 3, 2], [1, 2, 0, 3],
[2, 1, 0, 3], [1, 2, 3, 0], [1, 3, 2, 0], [2, 1, 3, 0],
[2, 3, 1, 0], [3, 2, 1, 0]]
The status of the system can be described by giving the 3 (ordered!) lists of engines, in the left, spur and right tracks. Given the status, it is possible to calculate all the possible moves. This creates a tree of possibilities: the root of the tree is the initial status, and every move corresponds to a branch which leads to a new status. The final status at the end of a branch (a leaf) is your final position in the right track.
So, you have to build and explore all the tree, and at the end you have to count all the leaves. And the tree is a common data structure.
Just to clarify, the tree in this case wouldn't replace the stacks. The stacks are used to store your data (the position of the engines); the tree is used to track the progress of the algorithm. Every time you have a status (a node of the tree) you have to analyse your data (=the content of the stacks) and find the possible moves. Every move is a branch in the tree of the algorithm, and it leads to a new status of the stacks (because the engine has moved). So basically you will have one "configuration" of the 3 stacks for each node of the tree.
I'm given the size N of the multiset and its sum S. The elements of the set are supposed to be continuous, for example a multiset K having 6 (N=6) elements {1,1,2,2,2,3}, so S=11 (the multiset always contains first N repeating natural numbers).
How can I know the total changes to make so that there can be no repetitions and the set becomes continuous?
For the above example the multiset K needs 3 changes. Hence, finally the set K will become {1,2,3,4,5,6}.
What I did is, I found out the actual sum (i.e. n*(n+1)/2) and subtracted the given sum. Let it be T.
Then, T=ceil(T/n), then the answer becomes 2*T, it is working for most cases.
But, I guess I'm missing some cases. Does there exists some algorithm to know how many elements to change?
I'm given only the size and sum of the multiset.
As you already noticed, for a given N, the sum should be S' = N * (N-1) / 2. You are given some value S.
Clearly, if S' = S the answer is 0.
If S'- S <= N - 1, then the multiset that requires least changes is
{1, 2, ..., N-1, X}
where X = N - (S' - S), which is in the range [1, N-1]. In other words, X makes up for the difference in sum between the required and the actual multiset. Your answer would be 1.
If the difference is larger than N-1, then also N-1 cannot be in the multiset. If S'- S <= (N - 1) + (N - 2), a multiset that requires least changes is
{1, 2, ..., N-2, 1, X}
where X = N + (N - 1) - (S'- S), which is in the range [1, N-2]. Your answer would be 2.
Generalizing, you would get a table like:
S' - S | answer
-----------------------
[ 0, 0] | 0
[ 1, N-1] | 1
[ N, 2N-3] | 2
[2N-2, 3N-6] | 3
and so on. You could find a formula to get the answer in terms of N and S, but it seems much easier to use a simple loop. I'll leave the implementation to you.
ranking an element x in an array/list is just to find out how many elements in the array/list that strictly smaller than x.
So ranking a list is just get ranks of all elements in the list.
For example, rank [51, 38, 29, 51, 63, 38] = [3, 1, 0, 3, 5, 1], i.e., there are 3 elements smaller than 51, etc.
Ranking a list can be done in O(NlogN). Basically, we can sort the list while remembering the original index of each element, and then see for each element, how many before it.
The question here is How to rank the suffixes of a list, in O(NlogN)?
Ranking the suffixes of a list means:
for list [3; 1; 2], rank [[3;1;2]; [1;2]; [2]]
note that elements may not be distinct.
edit
We don't need to print out all elements for all suffixes. You can image that we just need to print out a list/array, where each element is a rank of a suffix.
For example, rank suffix_of_[3;1;2] = rank [[3;1;2]; [1;2]; [2]] = [2;0;1] and you just print out [2;0;1].
edit 2
Let me explain what is all suffixes and what means sorting/ranking all suffixes more clearly here.
Suppose we have an array/list [e1;e2;e3;e4;e5].
Then all suffixes of [e1;e2;e3;e4;e5] are:
[e1;e2;e3;e4;e5]
[e2;e3;e4;e5]
[e3;e4;e5]
[e4;e5]
[e5]
for example, all suffixes of [4;2;3;1;0] are
[4;2;3;1;0]
[2;3;1;0]
[3;1;0]
[1;0]
[0]
Sorting above 5 suffixes implies lexicographic sort. sorting above all suffixes, you get
[0]
[1;0]
[2;3;1;0]
[3;1;0]
[4;2;3;1;0]
by the way, if you can't image how 5 lists/arrays can be sorted among them, just think of sorting strings in lexicographic order.
"0" < "10" < "2310" < "310" < "42310"
It seems sorting all suffixes is actually sorting all elements of the original array.
However, please be careful that all elements may not be distinct, for example
for [4;2;2;1;0], all suffixes are:
[4;2;2;1;0]
[2;2;1;0]
[2;1;0]
[1;0]
[0]
then the order is
[0]
[1;0]
[2;1;0]
[2;2;1;0]
[4;2;2;1;0]
As MBo noted correctly, your problem is that of constructing the suffix array of your input list. The fast and complicated algorithms to do this are actually linear time, but since you only aim for O(n log n), I will try to propose a simpler version that is much easier to implement.
Basic idea and an initial O(n log² n) implementation
Let's take the sequence [4, 2, 2, 1] as an example. Its suffixes are
0: 4 2 2 1
1: 2 2 1
2: 2 1
3: 1
I numbered the suffixes with their starting index in the original sequence. Ultimately we want to sort this set of suffixes lexicographically, and fast. We know we can represent each suffix using its starting index in constant space and we can sort in O(n log n) comparisons using merge sort, heap sort or a similar algorithm. So the question remains, how can we compare two suffixes fast?
Let's say we want to compare the suffixes [2, 2, 1] and [2, 1]. We can pad those with negative infinity values changing the result of the comparison: [2, 2, 1, -∞] and [2, 1, -∞, -∞].
Now the key idea here is the following divide-and-conquer observation: Instead of comparing the sequences character by character until we find a position where the two differ, we can instead split both lists in half and compare the halves lexicographically:
[a, b, c, d] < [e, f, g, h]
<=> ([a, b], [c, d]) < ([e, f], [g, h])
<=> [a, b] < [e, f] or ([a, b,] = [e, f] and [c, d] < [g, h])
Essentially we have decomposed the problem of comparing the sequences into two problems of comparing smaller sequences. This leads to the following algorithm:
Step 1: Sort the substrings (contiguous subsequences) of length 1. In our example, the substrings of length 1 are [4], [2], [2], [1]. Every substring can be represented by the starting position in the original list. We sort them by a simple comparison sort and get [1], [2], [2], [4]. We store the result by assigning to every position it's rank in the sorted lists of lists:
position substring rank
0 [4] 2
1 [2] 1
2 [2] 1
3 [1] 0
It is important that we assign the same rank to equal substrings!
Step 2: Now we want to sort the substrings of length 2. The are only really 3 such substrings, but we assign one to every position by padding with negative infinity if necessary. The trick here is that we can use our divide-and-conquer idea from above and the ranks assigned in step 1 to do a fast comparison (this isn't really necessary yet but will become important later).
position substring halves ranks from step 1 final rank
0 [4, 2] ([4], [2]) (2, 1) 3
1 [2, 2] ([2], [2]) (1, 1) 2
2 [2, 1] ([2], [2]) (1, 0) 1
3 [1, -∞] ([1], [-∞]) (0, -∞) 0
Step 3: You guessed it, now we sort substrings of length 4 (!). These are exactly the suffixes of the list! We can use the divide-and-conquer trick and the results from step 2 this time:
position substring halves ranks from step 2 final rank
0 [4, 2, 2, 1] ([4, 2], [2, 1]) (3, 1) 3
1 [2, 2, 1, -∞] ([2, 2], [1, -∞]) (2, 0) 2
2 [2, 1, -∞, -∞] ([2, 1], [-∞,-∞]) (1, -∞) 1
3 [1, -∞, -∞, -∞] ([1,-∞], [-∞,-∞]) (0, -∞) 0
We're done! If our initial sequence would have had size 2^k, we would have needed k steps. Or put the other way round, we need log_2 n steps to process a sequence of size n. If its length is not a power of two, we just pad with negative infinity.
For an actual implementation we just need to remember the sequence "final rank" for every step of the algorithm.
An implementation in C++ could look like this (compile with -std=c++11):
#include <algorithm>
#include <iostream>
using namespace std;
int seq[] = {8, 3, 2, 4, 2, 2, 1};
const int n = 7;
const int log2n = 3; // log2n = ceil(log_2(n))
int Rank[log2n + 1][n]; // Rank[i] will save the final Ranks of step i
tuple<int, int, int> L[n]; // L is a list of tuples. in step i,
// this will hold pairs of Ranks from step i - 1
// along with the substring index
const int neginf = -1; // should be smaller than all the numbers in seq
int main() {
for (int i = 0; i < n; ++i)
Rank[1][i] = seq[i]; // step 1 is actually simple if you think about it
for (int step = 2; step <= log2n; ++step) {
int length = 1 << (step - 1); // length is 2^(step - 1)
for (int i = 0; i < n; ++i)
L[i] = make_tuple(
Rank[step - 1][i],
(i + length / 2 < n) ? Rank[step - 1][i + length / 2] : neginf,
i); // we need to know where the tuple came from later
sort(L, L + n); // lexicographical sort
for (int i = 0; i < n; ++i) {
// we save the rank of the index, but we need to be careful to
// assign equal ranks to equal pairs
Rank[step][get<2>(L[i])] = (i > 0 && get<0>(L[i]) == get<0>(L[i - 1])
&& get<1>(L[i]) == get<1>(L[i - 1]))
? Rank[step][get<2>(L[i - 1])]
: i;
}
}
// the suffix array is in L after the last step
for (int i = 0; i < n; ++i) {
int start = get<2>(L[i]);
cout << start << ":";
for (int j = start; j < n; ++j)
cout << " " << seq[j];
cout << endl;
}
}
Output:
6: 1
5: 2 1
4: 2 2 1
2: 2 4 2 2 1
1: 3 2 4 2 2 1
3: 4 2 2 1
0: 8 3 2 4 2 2 1
The complexity is O(log n * (n + sort)), which is O(n log² n) in this implementation because we use a comparison sort of complexity O(n log n)
A simple O(n log n) algorithm
If we manage to do the sorting parts in O(n) per step, we get a O(n log n) bound. So basically we have to sort a sequence of pairs (x, y), where 0 <= x, y < n. We know that we can sort a sequence of integers in the given range in O(n) time using counting sort. We can intepret our pairs (x, y) as numbers z = n * x + y in base n. We can now see how to use LSD radix sort to sort the pairs.
In practice, this means we sort the pairs by increasing y using counting sort, and then use counting sort again to sort by increasing x. Since counting sort is stable, this gives us the lexicographical order of our pairs in 2 * O(n) = O(n). The final complexity is thus O(n log n).
In case you are interested, you can find an O(n log² n) implementation of the approach at my Github repo. The implementation has 27 lines of code. Neat, ain't it?
This is exactly suffix array construction problem, and wiki page contains links to the linear-complexity algorithms (probably, depending on alphabet)