I'm trying to find an optimal (in means of complexity) algo to get the maximum number of courses i can participate in. I don't care about the total length of the courses nor the courses itself. It's all about being present at as many courses as possible.
Given are two arrays S and E. S[i] contains the starting time of the course, E[i] contains the corresponding ending time. Arrays are already sorted by E. Since I'm not The Flash I can't join a course where the ending time of the first equals the starting time of the following.
Here's an example:
S = [
"2014-11-06 01:00:00",
"2014-11-06 03:00:00",
"2014-11-06 07:00:00",
"2014-11-06 09:00:00",
"2014-11-06 09:00:00"
]
E = [
"2014-11-06 05:00:00",
"2014-11-06 06:00:00",
"2014-11-06 08:00:00",
"2014-11-06 09:00:00",
"2014-11-06 10:00:00"
]
For those values the correct answer would be 3 since I can participate in course 1, 3 and 5 (other combinations are possible too)
Thanks in advance
The Core Idea
The idea is to use a recursion to check all of the courses. The basic solution is explained in Timothy Ha's answer. For the sake of clarity, I recall it :
Sort courses by S
Begin at S1 :
Find k = min(j, E1 < Sj). If it exists, find the best fit for Sk, ..., Sn
If k > 2 (= current course + 1), find the best fit for S2, ..., Sn
Return the best fit between Step 3 and Step 4.
Time optimization with dynamic programming
But you could optimize it by storing the intermediate values. To do so, instead of starting the recursion from the beginning, we will start from the end !
Let b be an array of n integers initialized to 0. Let bn = 1. Finally, let i = n-1
While i > 0, perform the following :
Consider we take the course i. We must find k = min(j, Ei < Sj). If such a k exists, it enforces i+1 ≤ k.
If k exists, then bi = max(1 + bk, bi+1).
Otherwise, bi = bi+1 (We don't have to explicitly state "max(1, bi+1)", since 1 ≤ bi+1 )
Decrement i and proceed to Step 1.
When the loop is over, the solution is b1.
Here is the solution in a C fashion
// I suppose S[] is sorted by increasing order
// E[] are the end times associated with S :
// (the course that starts at S[i] ends at E[i]
int findBestNumber(ADateType S[], ADateType E[], int n) {
int i = n-1, k, res;
int *sol = NULL;
if(!(sol = malloc(n*sizeof (int))))
return -1;
memset(sol, 0, sizeof (*sol));
sol[n-1] = 1;
while(i-- > 0) {
k = findMinIndex(E[i], S);
if(k >= 0) // k exists
sol[i] = max(1 + sol[k], sol[i+1]);
else // k does not exist, we return a value like -1
sol[i] = sol[i+1];
}
res = sol[0];
free(sol);
return res;
}
findMinIndex(after, S) search the minimum index k such that after ≤ Sk. Since S is sorted when this function is called, it is a simple binary search. In the above, I supposed we returned a negative value from findMinIndex() if we could not find such an index.
Space and Time Complexity for the DP version
In the Dynamic Programming version, we just compute step by step the intermediate results and we do so only once. Thus, this is O(n). The Space complexity is also O(n) since we need to store n intermediate values.
But remember we had to sort the courses by starting time ! This operation, using an appropriate sorting algorithm (eg. merge sort), is O(n. log(n)). The final time complexity is thus O(n. log(n)).
Bonus : A working C implementation
On Ideone
Note : After reading the question again, I noticed I can't select courses that starts at the same time than the ending time of the previously taken course ... To exclude these, just turn the >=s in findMinIndex() into >s
Since you commented that each course needs to be participated in full, I think algorithm can be like this:
sort courses by S
If you join S1, find list of courses with S > E1; if that list starts with Sk, then use recursion for (Sk, ..., Sn), total result will be (result from recursion) + 1.
If you skip S1, use recursion for (S2, S3, ..., Sn) and the value that comes from it.
Choose maximum of values coming from steps 2 and 3.
UPD (from comments) We should check more than just at first step, or recursion should store results for [K-N] sets, so as not to calculate again. I can imagine a case like this: S1, S2, E1, E2, S3, E3, .... SN, EN The part [3-N] can be calculated twice in recursion (with or without S1) if we don't check
Related
I want to ask a user a 2 number range to generate a third one. (This is done to set a custom selling price following a market trend.)
For example, between 1 and 2, generated number is 3. Between 2 and 3, it will be 4.5 (this number is fixed by the user and we must store it).
Then, when I will be processing some data and will just have the market price number, I will have to know in which number range it is "located" and find the according generated number.
I thought about storing 3 datas : startingValue, endingValue and userValue. But this means that I have to retrieve all data and browse each number ranger each time, which leads to N in space and time complexity.
I somehow feel this could be done in constant time.
Would you have any idea ? Or am I doing it wrong ?
I do not think that you can get your answer in constant time (I think it's not too hard to prove that it is not feasible in constant time under classic complexity assumptions).
Nevertheless, you can find your answer in O(log n) where n is the number of ranges you have to store.
All you need to do is to store your ranges in a sorted array as follows:
imagine you have two ranges (a, b) -> v1 and (b ,c) -> v2 (your answer to the comment implies that the ranges are adjacent).
Then you can use an array of tuples [(a, v1), (b, v2)] and use a binary search to find the corresponding value.
I could do it in log(n) space/time complexity thanks to m.raynal answer. The fact that the range were adjacent makes it possible.
It uses recursion.
function findTheRightPriceRange(arrayOfPriceRanges, priceInput, counter = 0) {
if (
!Array.isArray(arrayOfPriceRanges) ||
(Array.isArray(arrayOfPriceRanges) && arrayOfPriceRanges).length === 0
) {
return -2;
}
counter++;
let middleArrayIndex = Math.floor(arrayOfPriceRanges.length / 2);
let currentPointer = arrayOfPriceRanges[middleArrayIndex];
if (priceInput >= currentPointer[0] && priceInput <= currentPointer[1]) {
return currentPointer[2];
}
if (priceInput <= currentPointer[0]) {
return findTheRightPriceRange(
arrayOfPriceRanges.splice(0, middleArrayIndex),
priceInput,
counter
);
}
if (priceInput >= currentPointer[0]) {
return findTheRightPriceRange(
arrayOfPriceRanges.splice(
middleArrayIndex,
arrayOfPriceRanges.length - 1
),
priceInput,
counter
);
}
return -1;
}
I am fairly new to dynamic programming and don't yet understand most of the types of problems it can solve. Hence I am facing problems in understaing the solution of Jewelry topcoder problem.
Can someone at least give me some hints as to what the code is doing ?
Most importantly is this problem a variant of the subset-sum problem ? Because that's what I am studying to make sense of this problem.
What are these two functions actually counting ? Why are we using actually two DP tables ?
void cnk() {
nk[0][0]=1;
FOR(k,1,MAXN) {
nk[0][k]=0;
}
FOR(n,1,MAXN) {
nk[n][0]=1;
FOR(k,1,MAXN)
nk[n][k] = nk[n-1][k-1]+nk[n-1][k];
}
}
void calc(LL T[MAXN+1][MAX+1]) {
T[0][0] = 1;
FOR(x,1,MAX) T[0][x]=0;
FOR(ile,1,n) {
int a = v[ile-1];
FOR(x,0,MAX) {
T[ile][x] = T[ile-1][x];
if(x>=a) T[ile][x] +=T[ile-1][x-a];
}
}
}
How is the original solution constructed by using the following logic ?
FOR(u,1,c) {
int uu = u * v[done];
FOR(x,uu,MAX)
res += B[done][x-uu] * F[n-done-u][x] * nk[c][u];
}
done=p;
}
Any help would be greatly appreciated.
Let's consider the following task first:
"Given a vector V of N positive integers less than K, find the number of subsets whose sum equals S".
This can be solved in polynomial time with dynamic programming using some extra-memory.
The dynamic programming approach goes like this:
instead of solving the problem for N and S, we will solve all the problems of the following form:
"Find the number of ways to write sum s (with s ≤ S) using only the first n ≤ N of the numbers".
This is a common characteristic of the dynamic programming solutions: instead of only solving the original problem, you solve an entire family of related problems. The key idea is that solutions for more difficult problem settings (i.e. higher n and s) can efficiently be built up from the solutions of the easier settings.
Solving the problem for n = 0 is trivial (sum s = 0 can be expressed in one way -- using the empty set, while all other sums can't be expressed in any ways).
Now consider that we have solved the problem for all values up to a certain n and that we have these solutions in a matrix A (i.e. A[n][s] is the number of ways to write sum s using the first n elements).
Then, we can find the solutions for n+1, using the following formula:
A[n+1][s] = A[n][s - V[n+1]] + A[n][s].
Indeed, when we write the sum s using the first n+1 numbers we can either include or not V[n+1] (the n+1th term).
This is what the calc function computes. (the cnk function uses Pascal's rule to compute binomial coefficients)
Note: in general, if in the end we are only interested in answering the initial problem (i.e. for N and S), then the array A can be uni-dimensional (with length S) -- this is because whenever trying to construct solutions for n + 1 we only need the solutions for n, and not for smaller values).
This problem (the one initially stated in this answer) is indeed related to the subset sum problem (finding a subset of elements with sum zero).
A similar type of dynamic programming approach can be applied if we have a reasonable limit on the absolute values of the integers used (we need to allocate an auxiliary array to represent all possible reachable sums).
In the zero-sum problem we are not actually interested in the count, thus the A array can be an array of booleans (indicating whether a sum is reachable or not).
In addition, another auxiliary array, B can be used to allow reconstructing the solution if one exists.
The recurrence would now look like this:
if (!A[s] && A[s - V[n+1]]) {
A[s] = true;
// the index of the last value used to reach sum _s_,
// allows going backwards to reproduce the entire solution
B[s] = n + 1;
}
Note: the actual implementation requires some additional care for handling the negative sums, which can not directly represent indices in the array (the indices can be shifted by taking into account the minimum reachable sum, or, if working in C/C++, a trick like the one described in this answer can be applied: https://stackoverflow.com/a/3473686/6184684).
I'll detail how the above ideas apply in the TopCoder problem and its solution linked in the question.
The B and F matrices.
First, note the meaning of the B and F matrices in the solution:
B[i][s] represents the number of ways to reach sum s using only the smallest i items
F[i][s] represents the number of ways to reach sum s using only the largest i items
Indeed, both matrices are computed using the calc function, after sorting the array of jewelry values in ascending order (for B) and descending order (for F).
Solution for the case with no duplicates.
Consider first the case with no duplicate jewelry values, using this example: [5, 6, 7, 11, 15].
For the reminder of the answer I will assume that the array was sorted in ascending order (thus "first i items" will refer to the smallest i ones).
Each item given to Bob has value less (or equal) to each item given to Frank, thus in every good solution there will be a separation point such that Bob receives only items before that separation point, and Frank receives only items after that point.
To count all solutions we would need to sum over all possible separation points.
When, for example, the separation point is between the 3rd and 4th item, Bob would pick items only from the [5, 6, 7] sub-array (smallest 3 items), and Frank would pick items from the remaining [11, 12] sub-array (largest 2 items). In this case there is a single sum (s = 11) that can be obtained by both of them. Each time a sum can be obtained by both, we need to multiply the number of ways that each of them can reach the respective sum (e.g. if Bob could reach a sum s in 4 ways and Frank could reach the same sum s in 5 ways, then we could get 20 = 4 * 5 valid solutions with that sum, because each combination is a valid solution).
Thus we would get the following code by considering all separation points and all possible sums:
res = 0;
for (int i = 0; i < n; i++) {
for (int s = 0; s <= maxS; s++) {
res += B[i][s] * F[n-i][s]
}
}
However, there is a subtle issue here. This would often count the same combination multiple times (for various separation points). In the example provided above, the same solution with sum 11 would be counted both for the separation [5, 6] - [7, 11, 15], as well as for the separation [5, 6, 7] - [11, 15].
To alleviate this problem we can partition the solutions by "the largest value of an item picked by Bob" (or, equivalently, by always forcing Bob to include in his selection the largest valued item from the first sub-array under the current separation).
In order to count the number of ways to reach sum s when Bob's largest valued item is the ith one (sorted in ascending order), we can use B[i][s - v[i]]. This holds because using the v[i] valued item implies requiring the sum s - v[i] to be expressed using subsets from the first i items (indices 0, 1, ... i - 1).
This would be implemented as follows:
res = 0;
for (int i = 0; i < n; i++) {
for (int s = v[i]; s <= maxS; s++) {
res += B[i][s - v[i]] * F[n - 1 - i][s];
}
}
This is getting closer to the solution on TopCoder (in that solution, done corresponds to the i above, and uu = v[i]).
Extension for the case when duplicates are allowed.
When duplicate values can appear in the array, it's no longer easy to directly count the number of solutions when Bob's most valuable item is v[i]. We need to also consider the number of such items picked by Bob.
If there are c items that have the same value as v[i], i.e. v[i] = v[i+1] = ... v[i + c - 1], and Bob picks u such items, then the number of ways for him to reach a certain sum s is equal to:
comb(c, u) * B[i][s - u * v[i]] (1)
Indeed, this holds because the u items can be picked from the total of c which have the same value in comb(c, u) ways. For each such choice of the u items, the remaining sum is s - u * v[i], and this should be expressed using a subset from the first i items (indices 0, 1, ... i - 1), thus it can be done in B[i][s - u * v[i]] ways.
For Frank, if Bob used u of the v[i] items, the number of ways to express sum s will be equal to:
F[n - i - u][s] (2)
Indeed, since Bob uses the smallest i + u values, Frank can use any of the largest n - i - u values to reach the sum s.
By combining relations (1) and (2) from above, we obtain that the number of solutions where both Frank and Bob have sum s, when Bob's most valued item is v[i] and he picks u such items is equal to:
comb(c, u) * B[i][s - u * v[i]] * F[n - i - u][s].
This is precisely what the given solution implements.
Indeed, the variable done corresponds to variable i above, variable x corresponds to sums s, the index p is used to determine the c items with same value as v[done], and the loop over u is used in order to consider all possible numbers of such items picked by Bob.
Here's some Java code for this that references the original solution. It also incorporates qwertyman's fantastic explanations (to the extent feasible). I've added some of my comments along the way.
import java.util.*;
public class Jewelry {
int MAX_SUM=30005;
int MAX_N=30;
long[][] C;
// Generate all possible sums
// ret[i][sum] = number of ways to compute sum using the first i numbers from val[]
public long[][] genDP(int[] val) {
int i, sum, n=val.length;
long[][] ret = new long[MAX_N+1][MAX_SUM];
ret[0][0] = 1;
for(i=0; i+1<=n; i++) {
for(sum=0; sum<MAX_SUM; sum++) {
// Carry over the sum from i to i+1 for each sum
// Problem definition allows excluding numbers from calculating sums
// So we are essentially excluding the last number for this calculation
ret[i+1][sum] = ret[i][sum];
// DP: (Number of ways to generate sum using i+1 numbers =
// Number of ways to generate sum-val[i] using i numbers)
if(sum>=val[i])
ret[i+1][sum] += ret[i][sum-val[i]];
}
}
return ret;
}
// C(n, r) - all possible combinations of choosing r numbers from n numbers
// Leverage Pascal's polynomial co-efficients for an n-degree polynomial
// Leverage Dynamic Programming to build this upfront
public void nCr() {
C = new long[MAX_N+1][MAX_N+1];
int n, r;
C[0][0] = 1;
for(n=1; n<=MAX_N; n++) {
C[n][0] = 1;
for(r=1; r<=MAX_N; r++)
C[n][r] = C[n-1][r-1] + C[n-1][r];
}
}
/*
General Concept:
- Sort array
- Incrementally divide array into two partitions
+ Accomplished by using two different arrays - L for left, R for right
- Take all possible sums on the left side and match with all possible sums
on the right side (multiply these numbers to get totals for each sum)
- Adjust for common sums so as to not overcount
- Adjust for duplicate numbers
*/
public long howMany(int[] values) {
int i, j, sum, n=values.length;
// Pre-compute C(n,r) and store in C[][]
nCr();
/*
Incrementally split the array and calculate sums on either side
For eg. if val={2, 3, 4, 5, 9}, we would partition this as
{2 | 3, 4, 5, 9} then {2, 3 | 4, 5, 9}, etc.
First, sort it ascendingly and generate its sum matrix L
Then, sort it descendingly, and generate another sum matrix R
In later calculations, manipulate indexes to simulate the partitions
So at any point L[i] would correspond to R[n-i-1]. eg. L[1] = R[5-1-1]=R[3]
*/
// Sort ascendingly
Arrays.sort(values);
// Generate all sums for the "Left" partition using the sorted array
long[][] L = genDP(values);
// Sort descendingly by reversing the existing array.
// Java 8 doesn't support Arrays.sort for primitive int types
// Use Comparator or sort manually. This uses the manual sort.
for(i=0; i<n/2; i++) {
int tmp = values[i];
values[i] = values[n-i-1];
values[n-i-1] = tmp;
}
// Generate all sums for the "Right" partition using the re-sorted array
long[][] R = genDP(values);
// Re-sort in ascending order as we will be using values[] as reference later
Arrays.sort(values);
long tot = 0;
for(i=0; i<n; i++) {
int dup=0;
// How many duplicates of values[i] do we have?
for(j=0; j<n; j++)
if(values[j] == values[i])
dup++;
/*
Calculate total by iterating through each sum and multiplying counts on
both partitions for that sum
However, there may be count of sums that get duplicated
For instance, if val={2, 3, 4, 5, 9}, you'd get:
{2, 3 | 4, 5, 9} and {2, 3, 4 | 5, 9} (on two different iterations)
In this case, the subset {2, 3 | 5} is counted twice
To account for this, exclude the current largest number, val[i], from L's
sum and exclude it from R's i index
There is another issue of duplicate numbers
Eg. If values={2, 3, 3, 3, 4}, how do you know which 3 went to L?
To solve this, group the same numbers
Applying to {2, 3, 3, 3, 4} :
- Exclude 3, 6 (3+3) and 9 (3+3+3) from L's sum calculation
- Exclude 1, 2 and 3 from R's index count
We're essentially saying that we will exclude the sum contribution of these
elements to L and ignore their count contribution to R
*/
for(j=1; j<=dup; j++) {
int dup_sum = j*values[i];
for(sum=dup_sum; sum<MAX_SUM; sum++) {
// (ways to pick j numbers from dup) * (ways to get sum-dup_sum from i numbers) * (ways to get sum from n-i-j numbers)
if(n-i-j>=0)
tot += C[dup][j] * L[i][sum-dup_sum] * R[n-i-j][sum];
}
}
// Skip past the duplicates of values[i] that we've now accounted for
i += dup-1;
}
return tot;
}
}
Given a stack of integers, players take turns at removing either 1, 2, or 3 numbers from the top of the stack. Assuming that the opponent plays optimally and you select first, I came up with the following recursion:
int score(int n) {
if (n <= 0) return 0;
if (n <= 3) {
return sum(v[0..n-1]);
}
// maximize over picking 1, 2, or 3 + value after opponent picks optimally
return max(v[n-1] + min(score(n-2), score(n-3), score(n-4)),
v[n-1] + v[n-2] + min(score(n-3), score(n-4), score(n-5)),
v[n-1] + v[n-2] + v[n-3] + min(score(n-4), score(n-5), score(n-6)));
}
Basically, at each level comparing the outcomes of selecting 1, 2, or 3 and then your opponent selecting either 1, 2, or 3.
I was wondering how I could convert this to a DP solution as it is clearly exponential. I was struggling with the fact that there seem to be 3 dimensions to it: num of your pick, num of opponent's pick, and sub problem size, i.e., it seems the best solution for table[p][o][n] would need to be maintained, where p is the number of values you choose, o is the number your opponent chooses and n is the size of the sub problem.
Do I actually need the 3 dimensions? I have seen this similar problem: http://www.geeksforgeeks.org/dynamic-programming-set-31-optimal-strategy-for-a-game/ , but couldn't seem to adapt it.
Here is way the problem can be converted into DP :-
score[i] = maximum{ sum[i] - score[i+1] , sum[i] - score[i+2] , sum[i] - score[i+3] }
Here score[i] means max score generated from game [i to n] where v[i] is top of stack. sum[i] is sum of all elements on the stack from i onwards. sum[i] can be evaluated using a separate DP in O(N). The above DP can be solved using table in O(N)
Edit :-
Following is a DP solution in JAVA :-
public class game {
static boolean play_game(int[] stack) {
if(stack.length<=3)
return true;
int[] score = new int[stack.length];
int n = stack.length;
score[n-1] = stack[n-1];
score[n-2] = score[n-1]+stack[n-2];
score[n-3] = score[n-2]+stack[n-3];
int sum = score[n-3];
for(int i=n-4;i>=0;i--) {
sum = stack[i]+sum;
int min = Math.min(Math.min(score[i+1],score[i+2]),score[i+3]);
score[i] = sum-min;
}
if(sum-score[0]<score[0])
return true;
return false;
}
public static void main(String args[]) {
int[] stack = {12,1,7,99,3};
System.out.printf("I win => "+play_game(stack));
}
EDIT:-
For getting a DP solution you need to visualize a problems solution in terms of the smaller instances of itself. For example in this case as both players are playing optimally , after the choice made by first one ,the second player also obtains an optimal score for remaining stack which the subproblem of the first one. The only problem here is that how represent it in a recurrence . To solve DP you must first define a recurrence relation in terms of subproblem which precedes the current problem in any way of computation. Now we know that whatever second player wins , first player loses so effectively first player gains total sum - score of second player. As second player as well plays optimally we can express the solution in terms of recursion.
For finding the position of a fraction in farey sequence, i tried to implement the algorithm given here http://www.math.harvard.edu/~corina/publications/farey.pdf under "initial algorithm" but i can't understand where i'm going wrong, i am not getting the correct answers . Could someone please point out my mistake.
eg. for order n = 7 and fractions 1/7 ,1/6 i get same answers.
Here's what i've tried for given degree(n), and a fraction a/b:
sum=0;
int A[100000];
A[1]=a;
for(i=2;i<=n;i++)
A[i]=i*a-a;
for(i=2;i<=n;i++)
{
for(j=i+i;j<=n;j+=i)
A[j]-=A[i];
}
for(i=1;i<=n;i++)
sum+=A[i];
ans = sum/b;
Thanks.
Your algorithm doesn't use any particular properties of a and b. In the first part, every relevant entry of the array A is a multiple of a, but the factor is independent of a, b and n. Setting up the array ignoring the factor a, i.e. starting with A[1] = 1, A[i] = i-1 for 2 <= i <= n, after the nested loops, the array contains the totients, i.e. A[i] = phi(i), no matter what a, b, n are. The sum of the totients from 1 to n is the number of elements of the Farey sequence of order n (plus or minus 1, depending on which of 0/1 and 1/1 are included in the definition you use). So your answer is always the approximation (a*number of terms)/b, which is close but not exact.
I've not yet looked at how yours relates to the algorithm in the paper, check back for updates later.
Addendum: Finally had time to look at the paper. Your initialisation is not what they give. In their algorithm, A[q] is initialised to floor(x*q), for a rational x = a/b, the correct initialisation is
for(i = 1; i <= n; ++i){
A[i] = (a*i)/b;
}
in the remainder of your code, only ans = sum/b; has to be changed to ans = sum;.
A non-algorithmic way of finding the position t of a fraction in the Farey sequence of order n>1 is shown in Remark 7.10(ii)(a) of the paper, under m:=n-1, where mu-bar stands for the number-theoretic Mobius function on positive integers taking values from the set {-1,0,1}.
Here's my Java solution that works. Add head(0/1), tail(1/1) nodes to a SLL.
Then start by passing headNode,tailNode and setting required orderLevel.
public void generateSequence(Node leftNode, Node rightNode){
Fraction left = (Fraction) leftNode.getData();
Fraction right= (Fraction) rightNode.getData();
FractionNode midNode = null;
int midNum = left.getNum()+ right.getNum();
int midDenom = left.getDenom()+ right.getDenom();
if((midDenom <=getMaxLevel())){
Fraction middle = new Fraction(midNum,midDenom);
midNode = new FractionNode(middle);
}
if(midNode!= null){
leftNode.setNext(midNode);
midNode.setNext(rightNode);
generateSequence(leftNode, midNode);
count++;
}else if(rightNode.next()!=null){
generateSequence(rightNode, rightNode.next());
}
}
I am looking for an algorithm to merge two sorted lists,
but they lack a comparison operator between elements of one list and elements of the other.
The resulting merged list may not be unique, but any result which satisfies the relative sort order of each list will do.
More precisely:
Given:
Lists A = {a_1, ..., a_m}, and B = {b_1, ..., b_n}. (They may be considered sets, as well).
A precedence operator < defined among elements of each list such that
a_i < a_{i+1}, and b_j < b_{j+1} for 1 <= i <= m and 1 <= j <= n.
The precedence operator is undefined between elements of A and B:
a_i < b_j is not defined for any valid i and j.
An equality operator = defined among all elements of either A or B
(it is defined between an element from A and an element from B).
No two elements from list A are equal, and the same holds for list B.
Produce:
A list C = {c_1, ..., c_r} such that:
C = union(A, B); the elements of C are the union of elements from A and B.
If c_p = a_i, c_q = a_j, and a_i < a_j, then c_p < c_q. (The order of elements
of the sublists of C corresponding to sets A and B should be preserved.
There exist no i and j such that c_i = c_j.
(all duplicated elements between A and B are removed).
I hope this question makes sense and that I'm not asking something either terribly obvious,
or something for which there is no solution.
Context:
A constructible number can be represented exactly in finitely many quadratic extensions to the field of rational numbers (using a binary tree of height equal to the number of field extensions).
A representation of a constructible number must therefore "know" the field it is represented in.
Lists A and B represent successive quadratic extensions of the rational numbers.
Elements of A and B themselves are constructible numbers, which are defined in the context
of previous smaller fields (hence the precedence operator). When adding/multiplying constructible numbers,
the quadratically extended fields must first be merged so that the binary arithmetic
operations can be performed; the resulting list C is the quadratically extended field which
can represent numbers representable by both fields A and B.
(If anyone has a better idea of how to programmatically work with constructible numbers, let me know. A question concerning constructible numbers has arisen before, and also here are some interesting responses about their representation.)
Before anyone asks, no, this question does not belong on mathoverflow; they hate algorithm (and generally non-graduate-level math) questions.
In practice, lists A and B are linked lists (stored in reverse order, actually).
I will also need to keep track of which elements of C corresponded to which in A and B, but that is a minor detail.
The algorithm I seek is not the merge operation in mergesort,
because the precedence operator is not defined between elements of the two lists being merged.
Everything will eventually be implemented in C++ (I just want the operator overloading).
This is not homework, and will eventually be open sourced, FWIW.
I don't think you can do it better than O(N*M), although I'd be happy to be wrong.
That being the case, I'd do this:
Take the first (remaining) element of A.
Look for it in (what's left of) B.
If you don't find it in B, move it to the output
If you do find it in B, move everything from B up to and including the match, and drop the copy from A.
Repeat the above until A is empty
Move anything left in B to the output
If you want to detect incompatible orderings of A and B, then remove "(what's left of)" from step 2. Search the whole of B, and raise an error if you find it "too early".
The problem is that given a general element of A, there is no way to look for it in B in better than linear time (in the size of B), because all we have is an equality test. But clearly we need to find the matches somehow and (this is where I wave my hands a bit, I can't immediately prove it) therefore we have to check each element of A for containment in B. We can avoid a bunch of comparisons because the orders of the two sets are consistent (at least, I assume they are, and if not there's no solution).
So, in the worst case the intersection of the lists is empty, and no elements of A are order-comparable with any elements of B. This requires N*M equality tests to establish, hence the worst-case bound.
For your example problem A = (1, 2, c, 4, 5, f), B = (a, b, c, d, e, f), this gives the result (1,2,a,b,c,4,5,d,e,f), which seems good to me. It performs 24 equality tests in the process (unless I can't count): 6 + 6 + 3 + 3 + 3 + 3. Merging with A and B the other way around would yield (a,b,1,2,c,d,e,4,5,f), in this case with the same number of comparisons, since the matching elements just so happen to be at equal indices in the two lists.
As can be seen from the example, the operation can't be repeated. merge(A,B) results in a list with an order inconsistent with that of merge(B,A). Hence merge((merge(A,B),merge(B,A)) is undefined. In general, the output of a merge is arbitrary, and if you go around using arbitrary orders as the basis of new complete orders, you will generate mutually incompatible orders.
This sounds like it would use a degenerate form of topological sorting.
EDIT 2:
Now with a combined routine:
import itertools
list1 = [1, 2, 'c', 4, 5, 'f', 7]
list2 = ['a', 'b', 'c', 'd', 'e', 'f', 'g']
ibase = 0
result = []
for n1, i1 in enumerate(list1):
for n2, i2 in enumerate(itertools.islice(list2, ibase, None, 1)):
if i1 == i2:
result.extend(itertools.islice(list2, ibase, ibase + n2))
result.append(i2)
ibase = n2 + ibase + 1
break
else:
result.append(i1)
result.extend(itertools.islice(list2, ibase, None, 1))
print result
Would concatenating the two lists be sufficient? It does preserve the relative sortedness of elements from a and elements from b.
Then it's just a matter of removing duplicates.
EDIT: Alright, after the comment discussion (and given the additional condition that a_i=b_i & a_j=b_j & a_i<a_j => b_i<b-J), here's a reasonable solution:
Identify entries common to both lists. This is O(n2) for the naive algorithm - you might be able to improve it.
(optional) verify that the common entries are in the same order in both lists.
Construct the result list: All elements of a that are before the first shared element, followed by all elements of b before the first shared element, followed by the first shared element, and so on.
Given the problem as you have expressed it, I have a feeling that the problem may have no solution. Suppose that you have two pairs of elements {a_1, b_1} and {a_2, b_2} where a_1 < a_2 in the ordering of A, and b_1 > b_2 in the ordering of B. Now suppose that a_1 = b_1 and a_2 = b_2 according to the equality operator for A and B. In this scenario, I don't think you can create a combined list that satisfies the sublist ordering requirement.
Anyway, there's an algorithm that should do the trick. (Coded in Java-ish ...)
List<A> alist = ...
List<B> blist = ...
List<Object> mergedList = new SomeList<Object>(alist);
int mergePos = 0;
for (B b : blist) {
boolean found = false;
for (int i = mergePos; i < mergedList.size(); i++) {
if (equals(mergedList.get(i), b)) {
found = true; break;
}
}
if (!found) {
mergedList.insertBefore(b, mergePos);
mergePos++;
}
}
This algorithm is O(N**2) in the worst case, and O(N) in the best case. (I'm skating over some Java implementation details ... like combining list iteration and insertion without a major complexity penalty ... but I think it can be done in this case.)
The algorithm neglects the pathology I mentioned in the first paragraph and other pathologies; e.g. that an element of B might be "equal to" multiple elements of A, or vice versa. To deal with these, the algorithm needs to check each b against all elements of the mergedList that are not instances of B. That makes the algorithm O(N**2) in the best case.
If the elements are hashable, this can be done in O(N) time where N is the total number of elements in A and B.
def merge(A, B):
# Walk A and build a hash table mapping its values to indices.
amap = {}
for i, a in enumerate(A):
amap[a] = i
# Now walk B building C.
C = []
ai = 0
bi = 0
for i, b in enumerate(B):
if b in amap:
# b is in both lists.
new_ai = amap[b]
assert new_ai >= ai # check for consistent input
C += A[ai:new_ai] # add non-shared elements from A
C += B[bi:i] # add non-shared elements from B
C.append(b) # add the shared element b
ai = new_ai + 1
bi = i + 1
C += A[ai:] # add remaining non-shared elements from A
C += B[bi:] # from B
return C
A = [1, 2, 'c', 4, 5, 'f', 7]
B = ['a', 'b', 'c', 'd', 'e', 'f', 'g']
print merge(A, B)
(This is just an implementation of Anon's algorithm. Note that you can check for inconsistent input lists without hurting performance and that random access into the lists is not necessary.)