How can I efficiently determine if two lists contain elements ordered in the same way? - algorithm

I have two ordered lists of the same element type, each list having at most one element of each value (say ints and unique numbers), but otherwise with no restrictions (one may be a subset of the other, they may be completely disjunct, or share some elements but not others).
How do I efficiently determine if A is ordering any two items in a different way than B is? For example, if A has the items 1, 2, 10 and B the items 2, 10, 1, the property would not hold as A lists 1 before 10 but B lists it after 10. 1, 2, 10 vs 2, 10, 5 would be perfectly valid however as A never mentions 5 at all, I cannot rely on any given sorting rule shared by both lists.

You can get O(n) as follows. First, find the intersection of the two sets using hashing. Second, test whether A and B are identical if you only consider elements from the intersection.

My approach would be to first make sorted copies of A and B which also record the positions of elements in the original lists:
for i in 1 .. length(A):
Apos[i] = (A, i)
sortedApos = sort(Apos[] by first element of each pair)
for i in 1 .. length(B):
Bpos[i] = (B, i)
sortedBpos = sort(Bpos[] by first element of each pair)
Now find those elements in common using a standard list merge that records the positions in both A and B of the shared elements:
i = 1
j = 1
shared = []
while i <= length(A) && j <= length(B)
if sortedApos[i][1] < sortedBpos[j][1]
++i
else if sortedApos[i][1] > sortedBpos[j][1]
++j
else // They're equal
append(shared, (sortedApos[i][2], sortedBpos[j][2]))
++i
++j
Finally, sort shared by its first element (position in A) and check that all its second elements (positions in B) are increasing. This will be the case iff the elements common to A and B appear in the same order:
sortedShared = sort(shared[] by first element of each pair)
for i = 2 .. length(sortedShared)
if sortedShared[i][2] < sortedShared[i-1][2]
return DIFFERENT
return SAME
Time complexity: 2*(O(n) + O(nlog n)) + O(n) + O(nlog n) + O(n) = O(nlog n).

General approach: store all the values and their positions in B as keys and values in a HashMap. Iterate over the values in A and look them up in B's HashMap to get their position in B (or null). If this position is before the largest position value you've seen previously, then you know that something in B is in a different order than A. Runs in O(n) time.
Rough, totally untested code:
boolean valuesInSameOrder(int[] A, int[] B)
{
Map<Integer, Integer> bMap = new HashMap<Integer, Integer>();
for (int i = 0; i < B.length; i++)
{
bMap.put(B[i], i);
}
int maxPosInB = 0;
for (int i = 0; i < A.length; i++)
{
if(bMap.containsKey(A[i]))
{
int currPosInB = bMap.get(A[i]);
if (currPosInB < maxPosInB)
{
// B has something in a different order than A
return false;
}
else
{
maxPosInB = currPosInB;
}
}
}
// All of B's values are in the same order as A
return true;
}

Related

How to get original array from random shuffle of an array

I was asked in an interview today below question. I gave O(nlgn) solution but I was asked to give O(n) solution. I could not come up with O(n) solution. Can you help?
An input array is given like [1,2,4] then every element of it is doubled and
appended into the array. So the array now looks like [1,2,4,2,4,8]. How
this array is randomly shuffled. One possible random arrangement is
[4,8,2,1,2,4]. Now we are given this random shuffled array and we want to
get original array [1,2,4] in O(n) time.
The original array can be returned in any order. How can I do it?
Here's an O(N) Java solution that could be improved by first making sure that the array is of the proper form. For example it shouldn't accept [0] as an input:
import java.util.*;
class Solution {
public static int[] findOriginalArray(int[] changed) {
if (changed.length % 2 != 0)
return new int[] {};
// set Map size to optimal value to avoid rehashes
Map<Integer,Integer> count = new HashMap<>(changed.length*100/75);
int[] original = new int[changed.length/2];
int pos = 0;
// count frequency for each number
for (int n : changed) {
count.put(n, count.getOrDefault(n,0)+1);
}
// now decide which go into the answer
for (int n : changed) {
int smallest = n;
for (int m=n; m > 0 && count.getOrDefault(m,0) > 0; m = m/2) {
//System.out.println(m);
smallest = m;
if (m % 2 != 0) break;
}
// trickle up from smallest to largest while count > 0
for (int m=smallest, mm = 2*m; count.getOrDefault(mm,0) > 0; m = mm, mm=2*mm){
int ct = count.getOrDefault(mm,0);
while (count.get(m) > 0 && ct > 0) {
//System.out.println("adding "+m);
original[pos++] = m;
count.put(mm, ct -1);
count.put(m, count.get(m) - 1);
ct = count.getOrDefault(mm,0);
}
}
}
// check for incorrect format
if (count.values().stream().anyMatch(x -> x > 0)) {
return new int[] {};
}
return original;
}
public static void main(String[] args) {
int[] changed = {1,2,4,2,4,8};
System.out.println(Arrays.toString(changed));
System.out.println(Arrays.toString(findOriginalArray(changed)));
}
}
But I've tried to keep it simple.
The output is NOT guaranteed to be sorted. If you want it sorted it's going to cost O(NlogN) inevitably unless you use a Radix sort or something similar (which would make it O(NlogE) where E is the max value of the numbers you're sorting and logE the number of bits needed).
Runtime
This may not look that it is O(N) but you can see that it is because for every loop it will only find the lowest number in the chain ONCE, then trickle up the chain ONCE. Or said another way, in every iteration it will do O(X) iterations to process X elements. What will remain is O(N-X) elements. Therefore, even though there are for's inside for's it is still O(N).
An example execution can be seen with [64,32,16,8,4,2].
If this where not O(N) if you print out each value that it traverses to find the smallest you'd expect to see the values appear over and over again (for example N*(N+1)/2 times).
But instead you see them only once:
finding smallest 64
finding smallest 32
finding smallest 16
finding smallest 8
finding smallest 4
finding smallest 2
adding 2
adding 8
adding 32
If you're familiar with the Heapify algorithm you'll recognize the approach here.
def findOriginalArray(self, changed: List[int]) -> List[int]:
size = len(changed)
ans = []
left_elements = size//2
#IF SIZE IS ODD THEN RETURN [] NO SOLN. IS POSSIBLE
if(size%2 !=0):
return ans
#FREQUENCY DICTIONARY given array [0,0,2,1] my map will be: {0:2,2:1,1:1}
d = {}
for i in changed:
if(i in d):
d[i]+=1
else:
d[i] = 1
# CHECK THE EDGE CASE OF 0
if(0 in d):
count = d[0]
half = count//2
if((count % 2 != 0) or (half > left_elements)):
return ans
left_elements -= half
ans = [0 for i in range(half)]
#CHECK REST OF THE CASES : considering the values will be 10^5
for i in range(1,50001):
if(i in d):
if(d[i] > 0):
count = d[i]
if(count > left_elements):
ans = []
break
left_elements -= d[i]
for j in range(count):
ans.append(i)
if(2*i in d):
if(d[2*i] < count):
ans = []
break
else:
d[2*i] -= count
else:
ans = []
break
return ans
I have a simple idea which might not be the best, but I could not think of a case where it would not work. Having the array A with the doubled elements and randomly shuffled, keep a helper map. Process each element of the array and, each time you find a new element, add it to the map with the value 0. When an element is processed, increment map[i] and decrement map[2*i]. Next you iterate over the map and print the elements that have a value greater than zero.
A simple example, say that the vector is:
[1, 2, 3]
And the doubled/shuffled version is:
A = [3, 2, 1, 4, 2, 6]
When processing 3, first add the keys 3 and 6 to the map with value zero. Increment map[3] and decrement map[6]. This way, map[3] = 1 and map[6] = -1. Then for the next element map[2] = 1 and map[4] = -1 and so forth. The final state of the map in this example would be map[1] = 1, map[2] = 1, map[3] = 1, map[4] = -1, map[6] = 0, map[8] = -1, map[12] = -1.
Then you just process the keys of the map and, for each key with a value greater than zero, add it to the output. There are certainly more efficient solutions, but this one is O(n).
In C++, you can try this.
With time is O(N + KlogK) where N is the length of input, and K is the number of unique elements in input.
class Solution {
public:
vector<int> findOriginalArray(vector<int>& input) {
if (input.size() % 2) return {};
unordered_map<int, int> m;
for (int n : input) m[n]++;
vector<int> nums;
for (auto [n, cnt] : m) nums.push_back(n);
sort(begin(nums), end(nums));
vector<int> out;
for (int n : nums) {
if (m[2 * n] < m[n]) return {};
for (int i = 0; i < m[n]; ++i, --m[2 * n]) out.push_back(n);
}
return out;
}
};
Not so clear about the space complexity required in the question, so this is my top-of-the-mind attempt to this question if this requires O(n) time complexity.
If the length of the input array is not even, then its wrong !!
Create a map, add the elements of the input array to it.
Divide each element in the input array by 2 and check if that value exists in the map. If it exists, add it to the array (slice) orig.
There is a chance we have added duplicate values to this original array, clean it!!
Here is a sample go code:
https://go.dev/play/p/w4mm-rloHyi
I am sure we can optimize this code in a lot of ways for space complexities. But its O(n) time complexity.

Algorithm for all combinations to divide set into equally sized subsets [duplicate]

Let's say I have a set of elements S = { 1, 2, 3, 4, 5, 6, 7, 8, 9 }
I would like to create combinations of 3 and group them in a way such that no number appears in more than one combination.
Here is an example:
{ {3, 7, 9}, {1, 2, 4}, {5, 6, 8} }
The order of the numbers in the groups does not matter, nor does the order of the groups in the entire example.
In short, I want every possible group combination from every possible combination in the original set, excluding the ones that have a number appearing in multiple groups.
My question: is this actually feasible in terms of run time and memory? My sample sizes could be somewhere around 30-50 numbers.
If so, what is the best way to create this algorithm? Would it be best to create all possible combinations, and choose the groups only if the number hasn't already appeared?
I'm writing this in Qt 5.6, which is a C++ based framework.
You can do this recursively, and avoid duplicates, if you keep the first element fixed in each recursion, and only make groups of 3 with the values in order, eg:
{1,2,3,4,5,6,7,8,9}
Put the lowest element in the first spot (a), and keep it there:
{a,b,c} = {1, *, *}
For the second spot (b), iterate over every value from the second-lowest to the second-highest:
{a,b,c} = {1, 2~8, *}
For the third spot (c), iterate over every value higher than the second value:
{1, 2~8, b+1~9}
Then recurse with the rest of the values.
{1,2,3} {4,5,6} {7,8,9}
{1,2,3} {4,5,7} {6,8,9}
{1,2,3} {4,5,8} {6,7,9}
{1,2,3} {4,5,9} {6,7,8}
{1,2,3} {4,6,7} {5,8,9}
{1,2,3} {4,6,8} {5,7,9}
{1,2,3} {4,6,9} {5,7,8}
{1,2,3} {4,7,8} {5,6,9}
{1,2,3} {4,7,9} {5,6,8}
{1,2,3} {4,8,9} {5,6,7}
{1,2,4} {3,5,6} {7,8,9}
...
{1,8,9} {2,6,7} {3,4,5}
Wen I say "in order", that doesn't have to be any specific order (numerical, alphabetical...), it can just be the original order of the input. You can avoid having to re-sort the input of each recursion if you make sure to pass the rest of the values on to the next recursion in the order you received them.
A run-through of the recursion:
Let's say you get the input {1,2,3,4,5,6,7,8,9}. As the first element in the group, you take the first element from the input, and for the other two elements, you iterate over the other values:
{1,2,3}
{1,2,4}
{1,2,5}
{1,2,6}
{1,2,7}
{1,2,8}
{1,2,9}
{1,3,4}
{1,3,5}
{1,3,6}
...
{1,8,9}
making sure the third element always comes after the second element, to avoid duplicates like:
{1,3,5} &lrarr; {1,5,3}
Now, let's say that at a certain point, you've selected this as the first group:
{1,3,7}
You then pass the rest of the values onto the next recursion:
{2,4,5,6,8,9}
In this recursion, you apply the same rules as for the first group: take the first element as the first element in the group and keep it there, and iterate over the other values for the second and third element:
{2,4,5}
{2,4,6}
{2,4,8}
{2,4,9}
{2,5,6}
{2,5,8}
{2,5,9}
{2,6,7}
...
{2,8,9}
Now, let's say that at a certain point, you've selected this as the second group:
{2,5,6}
You then pass the rest of the values onto the next recursion:
{4,8,9}
And since this is the last group, there is only one possibility, and so this particular recursion would end in the combination:
{1,3,7} {2,5,6} {4,8,9}
As you see, you don't have to sort the values at any point, as long as you pass them onto the next recursion in the order you recevied them. So if you receive e.g.:
{q,w,e,r,t,y,u,i,o}
and you select from this the group:
{q,r,u}
then you should pass on:
{w,e,t,y,i,o}
Here's a JavaScript snippet which demonstrates the method; it returns a 3D array with combinations of groups of elements.
(The filter function creates a copy of the input array, with elements 0, i and j removed.)
function clone2D(array) {
var clone = [];
for (var i = 0; i < array.length; i++) clone.push(array[i].slice());
return clone;
}
function groupThree(input) {
var result = [], combination = [];
group(input, 0);
return result;
function group(input, step) {
combination[step] = [input[0]];
for (var i = 1; i < input.length - 1; i++) {
combination[step][1] = input[i];
for (var j = i + 1; j < input.length; j++) {
combination[step][2] = input[j];
if (input.length > 3) {
var rest = input.filter(function(elem, index) {
return index && index != i && index != j;
});
group(rest, step + 1);
}
else result.push(clone2D(combination));
}
}
}
}
var result = groupThree([1,2,3,4,5,6,7,8,9]);
for (var r in result) document.write(JSON.stringify(result[r]) + "<br>");
For n things taken 3 at a time, you could use 3 nested loops:
for(k = 0; k < n-2; k++){
for(j = k+1; j < n-1; j++){
for(i = j+1; i < n ; i++){
... S[k] ... S[j] ... S[i]
}
}
}
For a generic solution of n things taken k at a time, you could use an array of k counters.
I think You can solve it by using coin change problem with dynamic programming, just assume You are looking for change of 3 and every index in array is a coin value 1, then just output coins(values in Your array) that has been found.
Link: https://www.youtube.com/watch?v=18NVyOI_690

how to write iterative algorithm for generate all subsets of a set?

I wrote recursive backtracking algorithm for finding all subsets of a given set.
void backtracke(int* a, int k, int n)
{
if (k == n)
{
for(int i = 1; i <=k; ++i)
{
if (a[i] == true)
{
std::cout << i << " ";
}
}
std::cout << std::endl;
return;
}
bool c[2];
c[0] = false;
c[1] = true;
++k;
for(int i = 0; i < 2; ++i)
{
a[k] = c[i];
backtracke(a, k, n);
a[k] = INT_MAX;
}
}
now we have to write the same algorithm but in an iterative form, how to do it ?
You can use the binary counter approach. Any unique binary string of length n represents a unique subset of a set of n elements. If you start with 0 and end with 2^n-1, you cover all possible subsets. The counter can be easily implemented in an iterative manner.
The code in Java:
public static void printAllSubsets(int[] arr) {
byte[] counter = new byte[arr.length];
while (true) {
// Print combination
for (int i = 0; i < counter.length; i++) {
if (counter[i] != 0)
System.out.print(arr[i] + " ");
}
System.out.println();
// Increment counter
int i = 0;
while (i < counter.length && counter[i] == 1)
counter[i++] = 0;
if (i == counter.length)
break;
counter[i] = 1;
}
}
Note that in Java one can use BitSet, which makes the code really shorter, but I used a byte array to illustrate the process better.
There are a few ways to write an iterative algorithm for this problem. The most commonly suggested would be to:
Count (i.e. a simply for-loop) from 0 to 2numberOfElements - 1
If we look at the variable used above for counting in binary, the digit at each position could be thought of a flag indicating whether or not the element at the corresponding index in the set should be included in this subset. Simply loop over each bit (by taking the remainder by 2, then dividing by 2), including the corresponding elements in our output.
Example:
Input: {1,2,3,4,5}.
We'd start counting at 0, which is 00000 in binary, which means no flags are set, so no elements are included (this would obviously be skipped if you don't want the empty subset) - output {}.
Then 1 = 00001, indicating that only the last element would be included - output {5}.
Then 2 = 00010, indicating that only the second last element would be included - output {4}.
Then 3 = 00011, indicating that the last two elements would be included - output {4,5}.
And so on, all the way up to 31 = 11111, indicating that all the elements would be included - output {1,2,3,4,5}.
* Actually code-wise, it would be simpler to turn this on its head - output {1} for 00001, considering that the first remainder by 2 will then correspond to the flag of the 0th element, the second remainder, the 1st element, etc., but the above is simpler for illustrative purposes.
More generally, any recursive algorithm could be changed to an iterative one as follows:
Create a loop consisting of parts (think switch-statement), with each part consisting of the code between any two recursive calls in your function
Create a stack where each element contains each necessary local variable in the function, and an indication of which part we're busy with
The loop would pop elements from the stack, executing the appropriate section of code
Each recursive call would be replaced by first adding it's own state to the stack, and then the called state
Replace return with appropriate break statements
A little Python implementation of George's algorithm. Perhaps it will help someone.
def subsets(S):
l = len(S)
for x in range(2**l):
yield {s for i,s in enumerate(S) if ((x / 2**i) % 2) // 1 == 1}
Basically what you want is P(S) = S_0 U S_1 U ... U S_n where S_i is a set of all sets contained by taking i elements from S. In other words if S= {a, b, c} then S_0 = {{}}, S_1 = {{a},{b},{c}}, S_2 = {{a, b}, {a, c}, {b, c}} and S_3 = {a, b, c}.
The algorithm we have so far is
set P(set S) {
PS = {}
for i in [0..|S|]
PS = PS U Combination(S, i)
return PS
}
We know that |S_i| = nCi where |S| = n. So basically we know that we will be looping nCi times. You may use this information to optimize the algorithm later on. To generate combinations of size i the algorithm that I present is as follows:
Suppose S = {a, b, c} then you can map 0 to a, 1 to b and 2 to c. And perumtations to these are (if i=2) 0-0, 0-1, 0-2, 1-0, 1-1, 1-2, 2-0, 2-1, 2-2. To check if a sequence is a combination you check if the numbers are all unique and that if you permute the digits the sequence doesn't appear elsewhere, this will filter the above sequence to just 0-1, 0-2 and 1-2 which are later mapped back to {a,b},{a,c},{b,c}. How to generate the long sequence above you can follow this algorithm
set Combination(set S, integer l) {
CS = {}
for x in [0..2^l] {
n = {}
for i in [0..l] {
n = n U {floor(x / |S|^i) mod |S|} // get the i-th digit in x base |S|
}
CS = CS U {S[n]}
}
return filter(CS) // filtering described above
}

Algorithm to find the smallest snippet from searching a document?

I've been going through Skiena's excellent "The Algorithm Design Manual" and got hung up on one of the exercises.
The question is:
"Given a search string of three words, find the smallest snippet of the document that contains all three of the search words—i.e. , the snippet with smallest number of words in it. You are given the index positions where these words in occur search strings, such as word1: (1, 4, 5), word2: (4, 9, 10), and word3: (5, 6, 15). Each of the lists are in sorted order, as above."
Anything I come up with is O(n^2)... This question is in the "Sorting and Searching" chapter, so I assume there is a simple and clever way to do it. I'm trying something with graphs right now, but that seems like overkill.
Ideas?
Thanks
Unless I've overlooked something, here's a simple, O(n) algorithm:
We'll represent the snippet by (x, y) where x and y are where the snippet begins and ends respectively.
A snippet is feasible if it contains all 3 search words.
We will start with the infeasible snippet (0,0).
Repeat the following until y reaches end-of-string:
If the current snippet (x, y) is feasible, proceed to the snippet (x+1, y)
Else (the current snippet is infeasible) proceed to the snippet (x, y+1)
Choose the shortest snippet among all feasible snippets we went through.
Running time - in each iteration either x or y is increased by 1, clearly x can't exceed y and y can't exceed string length so total number of iterations is O(n). Also, feasibility can be checked at O(1) in this case since we can track how many occurences of each word are within the current snippet. We can maintain this count at O(1) with each increase of x or y by 1.
Correctness - For each x, we calculate the minimal feasible snippet (x, ?). Thus we must go over the minimal snippet. Also, if y is the smallest y such that (x, y) is feasible then if (x+1, y') is a feasible snippet y' >= y (This bit is why this algorithm is linear and the others aren't).
I already posted a rather straightforward algorithm that solves exactly that problem in this answer
Google search results: How to find the minimum window that contains all the search keywords?
However, in that question we assumed that the input is represented by a text stream and the words are stored in an easily searchable set.
In your case the input is represented slightly differently: as a bunch of vectors with sorted positions for each word. This representation is easily transformable to what is needed for the above algorithm by simply merging all these vectors into a single vector of (position, word) pairs ordered by position. It can be done literally, or it can be done "virtually", by placing the original vectors into the priority queue (ordered in accordance with their first elements). Popping an element from the queue in this case means popping the first element from the first vector in the queue and possibly sinking the first vector into the queue in accordance with its new first element.
Of course, since your statement of the problem explicitly fixes the number of words as three, you can simply check the first elements of all three arrays and pop the smallest one at each iteration. That gives you a O(N) algorithm, where N is the total length of all arrays.
Also, your statement of the problem seems to suggest that target words can overlap in the text, which is rather strange (given that you use the term "word"). Is it intentional? In any case, it doesn't present any problem for the above linked algorithm.
From the question, it seems that you're given the index locations for each of your n “search words” (word1, word2, word3, ..., word n) in the document. Using a sorting algorithm, the n independent arrays associated with search words can readily be represented as a single array of all the index locations in ascending numerical order and a word label associated with each index in the array (the index array).
The Basic Algorithm:
(Designed to work whether or not the poster of this question intended to allow two different search words to coexist at the same index number.)
First, we define a simple function for measuring the length of a snippet that contains all n labels given a starting point in the index array. (It is obvious from the definition of our array that any starting point on the array will necessarily be the indexed location of one of the n search labels.) The function simply keeps track of the unique search labels seen as the function iterates through the elements in the array until all n labels have been observed. The length of the snippet is defined as the difference between the index of the last unique label found and the index of the starting point in the index array (the first unique label found). If all n labels aren't observed before the end of the array the function returns a null value.
Now, the snippet length function can be run for each element in your array to associate a snippet size containing all n search words starting from each element in the array. The smallest non-Null value returned by the snippet length function over the whole index array is the snippet in your document that you're looking for.
Necessary Optimizations:
Keep track of the value of the current shortest snippet length so that the value will be know immediately after iterating once through the index array.
When iterating through your array terminate the snippet length function if the current snippet under inspection ever surpasses the length of the shortest snippet length previously seen.
When the snippet length function returns null for not locating all n search words in the remaining index array elements, associate a null snippet length to all successive elements in the index array.
If the snippet length function is applied to a word label and the label immediately following it is identical to the starting label, assign a null value to the starting label and move on to the next label.
Computational Complexity:
Obviously the sorting part of the algorithm can be arranged in O(n log n).
Here's how I would work out the time complexity of the second part of the algorithm (any critiques and corrections would be greatly appreciated).
In the best case scenario, the algorithm only applies the snippet length function to the first element in the index array and finds that no snippet containing all the search words exists. This scenario would be computed in just n calculations where n is the size of the index array. Slightly worse than that is if the smallest snippet turns out to be equal to the size of the whole array. In this case the computational complexity will be a little less than 2 n (once through the array to find the smallest snippet length, a second time to demonstrate that no other snippets exist). The shorter the average computed snippet length, the more times the snippet length function will need to be applied over the index array. We can assume that our worse case scenario will be the case where the snippet length function needs to be applied to every element in the index array. To develop a case where the function will be applied to every element in the index array we need to design an index array where the average snippet length over the whole index array is negligible in comparison to the size of the index array as a whole. Using this case we can write out our computational complexity as O(C n) where C is some constant that is significantly smaller then n. Giving a final computational complexity of:
O(n log n + C n)
Where:
C << n
Edit:
AndreyT correctly points out that instead of sorting the word indicies in n log n time, one might just as well merge them (since the sub arrays are already sorted) in n log m time where m is the amount of search word arrays to be merged. This will obviously speed up the algorithm is cases where m < n.
O(n log k) solution, where n is the total number of indices and k is the number of words. The idea is to use a heap to identify the smallest index at each iteration, while also keeping track of the maximum index in the heap. I also put the coordinates of each value in the heap, in order to be able to retrieve the next value in constant time.
#include <algorithm>
#include <cassert>
#include <limits>
#include <queue>
#include <vector>
using namespace std;
int snippet(const vector< vector<int> >& index) {
// (-index[i][j], (i, j))
priority_queue< pair< int, pair<size_t, size_t> > > queue;
int nmax = numeric_limits<int>::min();
for (size_t i = 0; i < index.size(); ++i) {
if (!index[i].empty()) {
int cur = index[i][0];
nmax = max(nmax, cur);
queue.push(make_pair(-cur, make_pair(i, 0)));
}
}
int result = numeric_limits<int>::max();
while (queue.size() == index.size()) {
int nmin = -queue.top().first;
size_t i = queue.top().second.first;
size_t j = queue.top().second.second;
queue.pop();
result = min(result, nmax - nmin + 1);
j++;
if (j < index[i].size()) {
int next = index[i][j];
nmax = max(nmax, next);
queue.push(make_pair(-next, make_pair(i, j)));
}
}
return result;
}
int main() {
int data[][3] = {{1, 4, 5}, {4, 9, 10}, {5, 6, 15}};
vector<vector<int> > index;
for (int i = 0; i < 3; i++) {
index.push_back(vector<int>(data[i], data[i] + 3));
}
assert(snippet(index) == 2);
}
Sample implementation in java (tested only with the implementation in the example, there might be bugs). The implementation is based on the replies above.
import java.util.Arrays;
public class SmallestSnippet {
WordIndex[] words; //merged array of word occurences
public enum Word {W1, W2, W3};
public SmallestSnippet(Integer[] word1, Integer[] word2, Integer[] word3) {
this.words = new WordIndex[word1.length + word2.length + word3.length];
merge(word1, word2, word3);
System.out.println(Arrays.toString(words));
}
private void merge(Integer[] word1, Integer[] word2, Integer[] word3) {
int i1 = 0;
int i2 = 0;
int i3 = 0;
int wordIdx = 0;
while(i1 < word1.length || i2 < word2.length || i3 < word3.length) {
WordIndex wordIndex = null;
Word word = getMin(word1, i1, word2, i2, word3, i3);
if (word == Word.W1) {
wordIndex = new WordIndex(word, word1[i1++]);
}
else if (word == Word.W2) {
wordIndex = new WordIndex(word, word2[i2++]);
}
else {
wordIndex = new WordIndex(word, word3[i3++]);
}
words[wordIdx++] = wordIndex;
}
}
//determine which word has the smallest index
private Word getMin(Integer[] word1, int i1, Integer[] word2, int i2, Integer[] word3,
int i3) {
Word toReturn = Word.W1;
if (i1 == word1.length || (i2 < word2.length && word2[i2] < word1[i1])) {
toReturn = Word.W2;
}
if (toReturn == Word.W1 && i3 < word3.length && word3[i3] < word1[i1])
{
toReturn = Word.W3;
}
else if (toReturn == Word.W2){
if (i2 == word2.length || (i3 < word3.length && word3[i3] < word2[i2])) {
toReturn = Word.W3;
}
}
return toReturn;
}
private Snippet calculate() {
int start = 0;
int end = 0;
int max = words.length;
Snippet minimum = new Snippet(words[0].getIndex(), words[max-1].getIndex());
while (start < max)
{
end = start;
boolean foundAll = false;
boolean found[] = new boolean[Word.values().length];
while (end < max && !foundAll) {
found[words[end].getWord().ordinal()] = true;
boolean complete = true;
for (int i=0 ; i < found.length && complete; i++) {
complete = found[i];
}
if (complete)
{
foundAll = true;
}
else {
if (words[end].getIndex()-words[start].getIndex() == minimum.getLength())
{
// we won't find a minimum no need to search further
break;
}
end++;
}
}
if (foundAll && words[end].getIndex()-words[start].getIndex() < minimum.getLength()) {
minimum.setEnd(words[end].getIndex());
minimum.setStart(words[start].getIndex());
}
start++;
}
return minimum;
}
/**
* #param args
*/
public static void main(String[] args) {
Integer[] word1 = {1,4,5};
Integer[] word2 = {3,9,10};
Integer[] word3 = {2,6,15};
SmallestSnippet smallestSnippet = new SmallestSnippet(word1, word2, word3);
Snippet snippet = smallestSnippet.calculate();
System.out.println(snippet);
}
}
Helper classes:
public class Snippet {
private int start;
private int end;
//getters, setters etc
public int getLength()
{
return Math.abs(end - start);
}
}
public class WordIndex
{
private SmallestSnippet.Word word;
private int index;
public WordIndex(SmallestSnippet.Word word, int index) {
this.word = word;
this.index = index;
}
}
The other answers are alright, but like me, if you're having trouble understanding the question in the first place, those aren't really helpful. Let's rephrase the question:
Given three sets of integers (call them A, B, and C), find the minimum contiguous range that contains one element from each set.
There is some confusion about what the three sets are. The 2nd edition of the book states them as {1, 4, 5}, {4, 9, 10}, and {5, 6, 15}. However, another version that has been stated in a comment above is {1, 4, 5}, {3, 9, 10}, and {2, 6, 15}. If one word is not a suffix/prefix of another, version 1 isn't possible, so let's go with the second one.
Since a picture is worth a thousand words, lets plot the points:
Simply inspecting the above visually, we can see that there are two answers to this question: [1,3] and [2,4], both of size 3 (three points in each range).
Now, the algorithm. The idea is to start with the smallest valid range, and incrementally try to shrink it by moving the left boundary inwards. We will use zero-based indexing.
MIN-RANGE(A, B, C)
i = j = k = 0
minSize = +∞
while i, j, k is a valid index of the respective arrays, do
ans = (A[i], B[j], C[k])
size = max(ans) - min(ans) + 1
minSize = min(size, minSize)
x = argmin(ans)
increment x by 1
done
return minSize
where argmin is the index of the smallest element in ans.
+---+---+---+---+--------------------+---------+
| n | i | j | k | (A[i], B[j], C[k]) | minSize |
+---+---+---+---+--------------------+---------+
| 1 | 0 | 0 | 0 | (1, 3, 2) | 3 |
+---+---+---+---+--------------------+---------+
| 2 | 1 | 0 | 0 | (4, 3, 2) | 3 |
+---+---+---+---+--------------------+---------+
| 3 | 1 | 0 | 1 | (4, 3, 6) | 4 |
+---+---+---+---+--------------------+---------+
| 4 | 1 | 1 | 1 | (4, 9, 6) | 6 |
+---+---+---+---+--------------------+---------+
| 5 | 2 | 1 | 1 | (5, 9, 6) | 5 |
+---+---+---+---+--------------------+---------+
| 6 | 3 | 1 | 1 | | |
+---+---+---+---+--------------------+---------+
n = iteration
At each step, one of the three indices is incremented, so the algorithm is guaranteed to eventually terminate. In the worst case, i, j, and k are incremented in that order, and the algorithm runs in O(n^2) (9 in this case) time. For the given example, it terminates after 5 iterations.
O(n)
Pair find(int[][] indices) {
pair.lBound = max int;
pair.rBound = 0;
index = 0;
for i from 0 to indices.lenght{
if(pair.lBound > indices[i][0]){
pair.lBound = indices[i][0]
index = i;
}
if(indices[index].lenght > 0)
pair.rBound = max(pair.rBound, indices[i][0])
}
remove indices[index][0]
return min(pair, find(indices)}

Algorithm get a new list containing no duplicated item by adding any 2 elements in a big array

I can only think of this naive algorithm. Any better way? C/C++, Ruby ,Haskell is OK.
arry = [1,5,.....4569895] //1000000 elements ,sorted , no duplicated
newArray = Hash.new
for (i = 0 ; i < arry.length ;i++ )
{
for (j = 0 ; j < arry.length ;j ++ )
{
elem = arry[i] + arry[j]
if (! newArray.key?(elem))
{
newArray [elem] = arry[i] + arry[j]
}
}
}
EDIT : sorry. I have discrete value in the array , instead of [1..1000000]
It would be more efficient to separate the algorithm into two distinct steps. (Warning: pseudocode ahead)
First create n-1 lists by adding the rest of the elements to the ith element. This can be done in parallel for each list. Note that the resulting lists will be sorted.
newArray = array(array.length);
for (i = 0 ; i < array.length ;i++ ) {
newArray[i] = array(array.length - i - 1);
for (j = 0; j < array.length - i; j++) {
newArray[i][j] = array[i] + array[j + i];
}
}
Second use merge sort in to merge the resulted lists. You can do this in parallel, e.g. merge newArray[0] - newArray[i], newArray[2] - newArray[1-i], ... and then again until you only have one list.
If the condition says that you should be able to add any item in the range, then the only way i can think of is to check if the sum is not yet in the result list. Since for any number x, there are x different additions that lead to x. (Or x/2 if you think that 1 + 2 and 2 + 1 is the same addition).
There is one obvious optimization: make the second loop start at the indice i, that way you will avoid having x+y and y+x.
Then if you don't want to use a set, you could use the fact that the items are sorted, so you could build N lists, and merge them while removing the duplicates.
I'm afraid the best worst-case time complexity is O(n2). For input {20, 21, 22, ...}, you won't get any duplicate adding these numbers. Assuming hash insertions are O(1), you already have the best algorithm...

Resources