how to write iterative algorithm for generate all subsets of a set? - algorithm

I wrote recursive backtracking algorithm for finding all subsets of a given set.
void backtracke(int* a, int k, int n)
{
if (k == n)
{
for(int i = 1; i <=k; ++i)
{
if (a[i] == true)
{
std::cout << i << " ";
}
}
std::cout << std::endl;
return;
}
bool c[2];
c[0] = false;
c[1] = true;
++k;
for(int i = 0; i < 2; ++i)
{
a[k] = c[i];
backtracke(a, k, n);
a[k] = INT_MAX;
}
}
now we have to write the same algorithm but in an iterative form, how to do it ?

You can use the binary counter approach. Any unique binary string of length n represents a unique subset of a set of n elements. If you start with 0 and end with 2^n-1, you cover all possible subsets. The counter can be easily implemented in an iterative manner.
The code in Java:
public static void printAllSubsets(int[] arr) {
byte[] counter = new byte[arr.length];
while (true) {
// Print combination
for (int i = 0; i < counter.length; i++) {
if (counter[i] != 0)
System.out.print(arr[i] + " ");
}
System.out.println();
// Increment counter
int i = 0;
while (i < counter.length && counter[i] == 1)
counter[i++] = 0;
if (i == counter.length)
break;
counter[i] = 1;
}
}
Note that in Java one can use BitSet, which makes the code really shorter, but I used a byte array to illustrate the process better.

There are a few ways to write an iterative algorithm for this problem. The most commonly suggested would be to:
Count (i.e. a simply for-loop) from 0 to 2numberOfElements - 1
If we look at the variable used above for counting in binary, the digit at each position could be thought of a flag indicating whether or not the element at the corresponding index in the set should be included in this subset. Simply loop over each bit (by taking the remainder by 2, then dividing by 2), including the corresponding elements in our output.
Example:
Input: {1,2,3,4,5}.
We'd start counting at 0, which is 00000 in binary, which means no flags are set, so no elements are included (this would obviously be skipped if you don't want the empty subset) - output {}.
Then 1 = 00001, indicating that only the last element would be included - output {5}.
Then 2 = 00010, indicating that only the second last element would be included - output {4}.
Then 3 = 00011, indicating that the last two elements would be included - output {4,5}.
And so on, all the way up to 31 = 11111, indicating that all the elements would be included - output {1,2,3,4,5}.
* Actually code-wise, it would be simpler to turn this on its head - output {1} for 00001, considering that the first remainder by 2 will then correspond to the flag of the 0th element, the second remainder, the 1st element, etc., but the above is simpler for illustrative purposes.
More generally, any recursive algorithm could be changed to an iterative one as follows:
Create a loop consisting of parts (think switch-statement), with each part consisting of the code between any two recursive calls in your function
Create a stack where each element contains each necessary local variable in the function, and an indication of which part we're busy with
The loop would pop elements from the stack, executing the appropriate section of code
Each recursive call would be replaced by first adding it's own state to the stack, and then the called state
Replace return with appropriate break statements

A little Python implementation of George's algorithm. Perhaps it will help someone.
def subsets(S):
l = len(S)
for x in range(2**l):
yield {s for i,s in enumerate(S) if ((x / 2**i) % 2) // 1 == 1}

Basically what you want is P(S) = S_0 U S_1 U ... U S_n where S_i is a set of all sets contained by taking i elements from S. In other words if S= {a, b, c} then S_0 = {{}}, S_1 = {{a},{b},{c}}, S_2 = {{a, b}, {a, c}, {b, c}} and S_3 = {a, b, c}.
The algorithm we have so far is
set P(set S) {
PS = {}
for i in [0..|S|]
PS = PS U Combination(S, i)
return PS
}
We know that |S_i| = nCi where |S| = n. So basically we know that we will be looping nCi times. You may use this information to optimize the algorithm later on. To generate combinations of size i the algorithm that I present is as follows:
Suppose S = {a, b, c} then you can map 0 to a, 1 to b and 2 to c. And perumtations to these are (if i=2) 0-0, 0-1, 0-2, 1-0, 1-1, 1-2, 2-0, 2-1, 2-2. To check if a sequence is a combination you check if the numbers are all unique and that if you permute the digits the sequence doesn't appear elsewhere, this will filter the above sequence to just 0-1, 0-2 and 1-2 which are later mapped back to {a,b},{a,c},{b,c}. How to generate the long sequence above you can follow this algorithm
set Combination(set S, integer l) {
CS = {}
for x in [0..2^l] {
n = {}
for i in [0..l] {
n = n U {floor(x / |S|^i) mod |S|} // get the i-th digit in x base |S|
}
CS = CS U {S[n]}
}
return filter(CS) // filtering described above
}

Related

How to get original array from random shuffle of an array

I was asked in an interview today below question. I gave O(nlgn) solution but I was asked to give O(n) solution. I could not come up with O(n) solution. Can you help?
An input array is given like [1,2,4] then every element of it is doubled and
appended into the array. So the array now looks like [1,2,4,2,4,8]. How
this array is randomly shuffled. One possible random arrangement is
[4,8,2,1,2,4]. Now we are given this random shuffled array and we want to
get original array [1,2,4] in O(n) time.
The original array can be returned in any order. How can I do it?
Here's an O(N) Java solution that could be improved by first making sure that the array is of the proper form. For example it shouldn't accept [0] as an input:
import java.util.*;
class Solution {
public static int[] findOriginalArray(int[] changed) {
if (changed.length % 2 != 0)
return new int[] {};
// set Map size to optimal value to avoid rehashes
Map<Integer,Integer> count = new HashMap<>(changed.length*100/75);
int[] original = new int[changed.length/2];
int pos = 0;
// count frequency for each number
for (int n : changed) {
count.put(n, count.getOrDefault(n,0)+1);
}
// now decide which go into the answer
for (int n : changed) {
int smallest = n;
for (int m=n; m > 0 && count.getOrDefault(m,0) > 0; m = m/2) {
//System.out.println(m);
smallest = m;
if (m % 2 != 0) break;
}
// trickle up from smallest to largest while count > 0
for (int m=smallest, mm = 2*m; count.getOrDefault(mm,0) > 0; m = mm, mm=2*mm){
int ct = count.getOrDefault(mm,0);
while (count.get(m) > 0 && ct > 0) {
//System.out.println("adding "+m);
original[pos++] = m;
count.put(mm, ct -1);
count.put(m, count.get(m) - 1);
ct = count.getOrDefault(mm,0);
}
}
}
// check for incorrect format
if (count.values().stream().anyMatch(x -> x > 0)) {
return new int[] {};
}
return original;
}
public static void main(String[] args) {
int[] changed = {1,2,4,2,4,8};
System.out.println(Arrays.toString(changed));
System.out.println(Arrays.toString(findOriginalArray(changed)));
}
}
But I've tried to keep it simple.
The output is NOT guaranteed to be sorted. If you want it sorted it's going to cost O(NlogN) inevitably unless you use a Radix sort or something similar (which would make it O(NlogE) where E is the max value of the numbers you're sorting and logE the number of bits needed).
Runtime
This may not look that it is O(N) but you can see that it is because for every loop it will only find the lowest number in the chain ONCE, then trickle up the chain ONCE. Or said another way, in every iteration it will do O(X) iterations to process X elements. What will remain is O(N-X) elements. Therefore, even though there are for's inside for's it is still O(N).
An example execution can be seen with [64,32,16,8,4,2].
If this where not O(N) if you print out each value that it traverses to find the smallest you'd expect to see the values appear over and over again (for example N*(N+1)/2 times).
But instead you see them only once:
finding smallest 64
finding smallest 32
finding smallest 16
finding smallest 8
finding smallest 4
finding smallest 2
adding 2
adding 8
adding 32
If you're familiar with the Heapify algorithm you'll recognize the approach here.
def findOriginalArray(self, changed: List[int]) -> List[int]:
size = len(changed)
ans = []
left_elements = size//2
#IF SIZE IS ODD THEN RETURN [] NO SOLN. IS POSSIBLE
if(size%2 !=0):
return ans
#FREQUENCY DICTIONARY given array [0,0,2,1] my map will be: {0:2,2:1,1:1}
d = {}
for i in changed:
if(i in d):
d[i]+=1
else:
d[i] = 1
# CHECK THE EDGE CASE OF 0
if(0 in d):
count = d[0]
half = count//2
if((count % 2 != 0) or (half > left_elements)):
return ans
left_elements -= half
ans = [0 for i in range(half)]
#CHECK REST OF THE CASES : considering the values will be 10^5
for i in range(1,50001):
if(i in d):
if(d[i] > 0):
count = d[i]
if(count > left_elements):
ans = []
break
left_elements -= d[i]
for j in range(count):
ans.append(i)
if(2*i in d):
if(d[2*i] < count):
ans = []
break
else:
d[2*i] -= count
else:
ans = []
break
return ans
I have a simple idea which might not be the best, but I could not think of a case where it would not work. Having the array A with the doubled elements and randomly shuffled, keep a helper map. Process each element of the array and, each time you find a new element, add it to the map with the value 0. When an element is processed, increment map[i] and decrement map[2*i]. Next you iterate over the map and print the elements that have a value greater than zero.
A simple example, say that the vector is:
[1, 2, 3]
And the doubled/shuffled version is:
A = [3, 2, 1, 4, 2, 6]
When processing 3, first add the keys 3 and 6 to the map with value zero. Increment map[3] and decrement map[6]. This way, map[3] = 1 and map[6] = -1. Then for the next element map[2] = 1 and map[4] = -1 and so forth. The final state of the map in this example would be map[1] = 1, map[2] = 1, map[3] = 1, map[4] = -1, map[6] = 0, map[8] = -1, map[12] = -1.
Then you just process the keys of the map and, for each key with a value greater than zero, add it to the output. There are certainly more efficient solutions, but this one is O(n).
In C++, you can try this.
With time is O(N + KlogK) where N is the length of input, and K is the number of unique elements in input.
class Solution {
public:
vector<int> findOriginalArray(vector<int>& input) {
if (input.size() % 2) return {};
unordered_map<int, int> m;
for (int n : input) m[n]++;
vector<int> nums;
for (auto [n, cnt] : m) nums.push_back(n);
sort(begin(nums), end(nums));
vector<int> out;
for (int n : nums) {
if (m[2 * n] < m[n]) return {};
for (int i = 0; i < m[n]; ++i, --m[2 * n]) out.push_back(n);
}
return out;
}
};
Not so clear about the space complexity required in the question, so this is my top-of-the-mind attempt to this question if this requires O(n) time complexity.
If the length of the input array is not even, then its wrong !!
Create a map, add the elements of the input array to it.
Divide each element in the input array by 2 and check if that value exists in the map. If it exists, add it to the array (slice) orig.
There is a chance we have added duplicate values to this original array, clean it!!
Here is a sample go code:
https://go.dev/play/p/w4mm-rloHyi
I am sure we can optimize this code in a lot of ways for space complexities. But its O(n) time complexity.

How can I efficiently determine if two lists contain elements ordered in the same way?

I have two ordered lists of the same element type, each list having at most one element of each value (say ints and unique numbers), but otherwise with no restrictions (one may be a subset of the other, they may be completely disjunct, or share some elements but not others).
How do I efficiently determine if A is ordering any two items in a different way than B is? For example, if A has the items 1, 2, 10 and B the items 2, 10, 1, the property would not hold as A lists 1 before 10 but B lists it after 10. 1, 2, 10 vs 2, 10, 5 would be perfectly valid however as A never mentions 5 at all, I cannot rely on any given sorting rule shared by both lists.
You can get O(n) as follows. First, find the intersection of the two sets using hashing. Second, test whether A and B are identical if you only consider elements from the intersection.
My approach would be to first make sorted copies of A and B which also record the positions of elements in the original lists:
for i in 1 .. length(A):
Apos[i] = (A, i)
sortedApos = sort(Apos[] by first element of each pair)
for i in 1 .. length(B):
Bpos[i] = (B, i)
sortedBpos = sort(Bpos[] by first element of each pair)
Now find those elements in common using a standard list merge that records the positions in both A and B of the shared elements:
i = 1
j = 1
shared = []
while i <= length(A) && j <= length(B)
if sortedApos[i][1] < sortedBpos[j][1]
++i
else if sortedApos[i][1] > sortedBpos[j][1]
++j
else // They're equal
append(shared, (sortedApos[i][2], sortedBpos[j][2]))
++i
++j
Finally, sort shared by its first element (position in A) and check that all its second elements (positions in B) are increasing. This will be the case iff the elements common to A and B appear in the same order:
sortedShared = sort(shared[] by first element of each pair)
for i = 2 .. length(sortedShared)
if sortedShared[i][2] < sortedShared[i-1][2]
return DIFFERENT
return SAME
Time complexity: 2*(O(n) + O(nlog n)) + O(n) + O(nlog n) + O(n) = O(nlog n).
General approach: store all the values and their positions in B as keys and values in a HashMap. Iterate over the values in A and look them up in B's HashMap to get their position in B (or null). If this position is before the largest position value you've seen previously, then you know that something in B is in a different order than A. Runs in O(n) time.
Rough, totally untested code:
boolean valuesInSameOrder(int[] A, int[] B)
{
Map<Integer, Integer> bMap = new HashMap<Integer, Integer>();
for (int i = 0; i < B.length; i++)
{
bMap.put(B[i], i);
}
int maxPosInB = 0;
for (int i = 0; i < A.length; i++)
{
if(bMap.containsKey(A[i]))
{
int currPosInB = bMap.get(A[i]);
if (currPosInB < maxPosInB)
{
// B has something in a different order than A
return false;
}
else
{
maxPosInB = currPosInB;
}
}
}
// All of B's values are in the same order as A
return true;
}

Algorithm to find the smallest snippet from searching a document?

I've been going through Skiena's excellent "The Algorithm Design Manual" and got hung up on one of the exercises.
The question is:
"Given a search string of three words, find the smallest snippet of the document that contains all three of the search words—i.e. , the snippet with smallest number of words in it. You are given the index positions where these words in occur search strings, such as word1: (1, 4, 5), word2: (4, 9, 10), and word3: (5, 6, 15). Each of the lists are in sorted order, as above."
Anything I come up with is O(n^2)... This question is in the "Sorting and Searching" chapter, so I assume there is a simple and clever way to do it. I'm trying something with graphs right now, but that seems like overkill.
Ideas?
Thanks
Unless I've overlooked something, here's a simple, O(n) algorithm:
We'll represent the snippet by (x, y) where x and y are where the snippet begins and ends respectively.
A snippet is feasible if it contains all 3 search words.
We will start with the infeasible snippet (0,0).
Repeat the following until y reaches end-of-string:
If the current snippet (x, y) is feasible, proceed to the snippet (x+1, y)
Else (the current snippet is infeasible) proceed to the snippet (x, y+1)
Choose the shortest snippet among all feasible snippets we went through.
Running time - in each iteration either x or y is increased by 1, clearly x can't exceed y and y can't exceed string length so total number of iterations is O(n). Also, feasibility can be checked at O(1) in this case since we can track how many occurences of each word are within the current snippet. We can maintain this count at O(1) with each increase of x or y by 1.
Correctness - For each x, we calculate the minimal feasible snippet (x, ?). Thus we must go over the minimal snippet. Also, if y is the smallest y such that (x, y) is feasible then if (x+1, y') is a feasible snippet y' >= y (This bit is why this algorithm is linear and the others aren't).
I already posted a rather straightforward algorithm that solves exactly that problem in this answer
Google search results: How to find the minimum window that contains all the search keywords?
However, in that question we assumed that the input is represented by a text stream and the words are stored in an easily searchable set.
In your case the input is represented slightly differently: as a bunch of vectors with sorted positions for each word. This representation is easily transformable to what is needed for the above algorithm by simply merging all these vectors into a single vector of (position, word) pairs ordered by position. It can be done literally, or it can be done "virtually", by placing the original vectors into the priority queue (ordered in accordance with their first elements). Popping an element from the queue in this case means popping the first element from the first vector in the queue and possibly sinking the first vector into the queue in accordance with its new first element.
Of course, since your statement of the problem explicitly fixes the number of words as three, you can simply check the first elements of all three arrays and pop the smallest one at each iteration. That gives you a O(N) algorithm, where N is the total length of all arrays.
Also, your statement of the problem seems to suggest that target words can overlap in the text, which is rather strange (given that you use the term "word"). Is it intentional? In any case, it doesn't present any problem for the above linked algorithm.
From the question, it seems that you're given the index locations for each of your n “search words” (word1, word2, word3, ..., word n) in the document. Using a sorting algorithm, the n independent arrays associated with search words can readily be represented as a single array of all the index locations in ascending numerical order and a word label associated with each index in the array (the index array).
The Basic Algorithm:
(Designed to work whether or not the poster of this question intended to allow two different search words to coexist at the same index number.)
First, we define a simple function for measuring the length of a snippet that contains all n labels given a starting point in the index array. (It is obvious from the definition of our array that any starting point on the array will necessarily be the indexed location of one of the n search labels.) The function simply keeps track of the unique search labels seen as the function iterates through the elements in the array until all n labels have been observed. The length of the snippet is defined as the difference between the index of the last unique label found and the index of the starting point in the index array (the first unique label found). If all n labels aren't observed before the end of the array the function returns a null value.
Now, the snippet length function can be run for each element in your array to associate a snippet size containing all n search words starting from each element in the array. The smallest non-Null value returned by the snippet length function over the whole index array is the snippet in your document that you're looking for.
Necessary Optimizations:
Keep track of the value of the current shortest snippet length so that the value will be know immediately after iterating once through the index array.
When iterating through your array terminate the snippet length function if the current snippet under inspection ever surpasses the length of the shortest snippet length previously seen.
When the snippet length function returns null for not locating all n search words in the remaining index array elements, associate a null snippet length to all successive elements in the index array.
If the snippet length function is applied to a word label and the label immediately following it is identical to the starting label, assign a null value to the starting label and move on to the next label.
Computational Complexity:
Obviously the sorting part of the algorithm can be arranged in O(n log n).
Here's how I would work out the time complexity of the second part of the algorithm (any critiques and corrections would be greatly appreciated).
In the best case scenario, the algorithm only applies the snippet length function to the first element in the index array and finds that no snippet containing all the search words exists. This scenario would be computed in just n calculations where n is the size of the index array. Slightly worse than that is if the smallest snippet turns out to be equal to the size of the whole array. In this case the computational complexity will be a little less than 2 n (once through the array to find the smallest snippet length, a second time to demonstrate that no other snippets exist). The shorter the average computed snippet length, the more times the snippet length function will need to be applied over the index array. We can assume that our worse case scenario will be the case where the snippet length function needs to be applied to every element in the index array. To develop a case where the function will be applied to every element in the index array we need to design an index array where the average snippet length over the whole index array is negligible in comparison to the size of the index array as a whole. Using this case we can write out our computational complexity as O(C n) where C is some constant that is significantly smaller then n. Giving a final computational complexity of:
O(n log n + C n)
Where:
C << n
Edit:
AndreyT correctly points out that instead of sorting the word indicies in n log n time, one might just as well merge them (since the sub arrays are already sorted) in n log m time where m is the amount of search word arrays to be merged. This will obviously speed up the algorithm is cases where m < n.
O(n log k) solution, where n is the total number of indices and k is the number of words. The idea is to use a heap to identify the smallest index at each iteration, while also keeping track of the maximum index in the heap. I also put the coordinates of each value in the heap, in order to be able to retrieve the next value in constant time.
#include <algorithm>
#include <cassert>
#include <limits>
#include <queue>
#include <vector>
using namespace std;
int snippet(const vector< vector<int> >& index) {
// (-index[i][j], (i, j))
priority_queue< pair< int, pair<size_t, size_t> > > queue;
int nmax = numeric_limits<int>::min();
for (size_t i = 0; i < index.size(); ++i) {
if (!index[i].empty()) {
int cur = index[i][0];
nmax = max(nmax, cur);
queue.push(make_pair(-cur, make_pair(i, 0)));
}
}
int result = numeric_limits<int>::max();
while (queue.size() == index.size()) {
int nmin = -queue.top().first;
size_t i = queue.top().second.first;
size_t j = queue.top().second.second;
queue.pop();
result = min(result, nmax - nmin + 1);
j++;
if (j < index[i].size()) {
int next = index[i][j];
nmax = max(nmax, next);
queue.push(make_pair(-next, make_pair(i, j)));
}
}
return result;
}
int main() {
int data[][3] = {{1, 4, 5}, {4, 9, 10}, {5, 6, 15}};
vector<vector<int> > index;
for (int i = 0; i < 3; i++) {
index.push_back(vector<int>(data[i], data[i] + 3));
}
assert(snippet(index) == 2);
}
Sample implementation in java (tested only with the implementation in the example, there might be bugs). The implementation is based on the replies above.
import java.util.Arrays;
public class SmallestSnippet {
WordIndex[] words; //merged array of word occurences
public enum Word {W1, W2, W3};
public SmallestSnippet(Integer[] word1, Integer[] word2, Integer[] word3) {
this.words = new WordIndex[word1.length + word2.length + word3.length];
merge(word1, word2, word3);
System.out.println(Arrays.toString(words));
}
private void merge(Integer[] word1, Integer[] word2, Integer[] word3) {
int i1 = 0;
int i2 = 0;
int i3 = 0;
int wordIdx = 0;
while(i1 < word1.length || i2 < word2.length || i3 < word3.length) {
WordIndex wordIndex = null;
Word word = getMin(word1, i1, word2, i2, word3, i3);
if (word == Word.W1) {
wordIndex = new WordIndex(word, word1[i1++]);
}
else if (word == Word.W2) {
wordIndex = new WordIndex(word, word2[i2++]);
}
else {
wordIndex = new WordIndex(word, word3[i3++]);
}
words[wordIdx++] = wordIndex;
}
}
//determine which word has the smallest index
private Word getMin(Integer[] word1, int i1, Integer[] word2, int i2, Integer[] word3,
int i3) {
Word toReturn = Word.W1;
if (i1 == word1.length || (i2 < word2.length && word2[i2] < word1[i1])) {
toReturn = Word.W2;
}
if (toReturn == Word.W1 && i3 < word3.length && word3[i3] < word1[i1])
{
toReturn = Word.W3;
}
else if (toReturn == Word.W2){
if (i2 == word2.length || (i3 < word3.length && word3[i3] < word2[i2])) {
toReturn = Word.W3;
}
}
return toReturn;
}
private Snippet calculate() {
int start = 0;
int end = 0;
int max = words.length;
Snippet minimum = new Snippet(words[0].getIndex(), words[max-1].getIndex());
while (start < max)
{
end = start;
boolean foundAll = false;
boolean found[] = new boolean[Word.values().length];
while (end < max && !foundAll) {
found[words[end].getWord().ordinal()] = true;
boolean complete = true;
for (int i=0 ; i < found.length && complete; i++) {
complete = found[i];
}
if (complete)
{
foundAll = true;
}
else {
if (words[end].getIndex()-words[start].getIndex() == minimum.getLength())
{
// we won't find a minimum no need to search further
break;
}
end++;
}
}
if (foundAll && words[end].getIndex()-words[start].getIndex() < minimum.getLength()) {
minimum.setEnd(words[end].getIndex());
minimum.setStart(words[start].getIndex());
}
start++;
}
return minimum;
}
/**
* #param args
*/
public static void main(String[] args) {
Integer[] word1 = {1,4,5};
Integer[] word2 = {3,9,10};
Integer[] word3 = {2,6,15};
SmallestSnippet smallestSnippet = new SmallestSnippet(word1, word2, word3);
Snippet snippet = smallestSnippet.calculate();
System.out.println(snippet);
}
}
Helper classes:
public class Snippet {
private int start;
private int end;
//getters, setters etc
public int getLength()
{
return Math.abs(end - start);
}
}
public class WordIndex
{
private SmallestSnippet.Word word;
private int index;
public WordIndex(SmallestSnippet.Word word, int index) {
this.word = word;
this.index = index;
}
}
The other answers are alright, but like me, if you're having trouble understanding the question in the first place, those aren't really helpful. Let's rephrase the question:
Given three sets of integers (call them A, B, and C), find the minimum contiguous range that contains one element from each set.
There is some confusion about what the three sets are. The 2nd edition of the book states them as {1, 4, 5}, {4, 9, 10}, and {5, 6, 15}. However, another version that has been stated in a comment above is {1, 4, 5}, {3, 9, 10}, and {2, 6, 15}. If one word is not a suffix/prefix of another, version 1 isn't possible, so let's go with the second one.
Since a picture is worth a thousand words, lets plot the points:
Simply inspecting the above visually, we can see that there are two answers to this question: [1,3] and [2,4], both of size 3 (three points in each range).
Now, the algorithm. The idea is to start with the smallest valid range, and incrementally try to shrink it by moving the left boundary inwards. We will use zero-based indexing.
MIN-RANGE(A, B, C)
i = j = k = 0
minSize = +∞
while i, j, k is a valid index of the respective arrays, do
ans = (A[i], B[j], C[k])
size = max(ans) - min(ans) + 1
minSize = min(size, minSize)
x = argmin(ans)
increment x by 1
done
return minSize
where argmin is the index of the smallest element in ans.
+---+---+---+---+--------------------+---------+
| n | i | j | k | (A[i], B[j], C[k]) | minSize |
+---+---+---+---+--------------------+---------+
| 1 | 0 | 0 | 0 | (1, 3, 2) | 3 |
+---+---+---+---+--------------------+---------+
| 2 | 1 | 0 | 0 | (4, 3, 2) | 3 |
+---+---+---+---+--------------------+---------+
| 3 | 1 | 0 | 1 | (4, 3, 6) | 4 |
+---+---+---+---+--------------------+---------+
| 4 | 1 | 1 | 1 | (4, 9, 6) | 6 |
+---+---+---+---+--------------------+---------+
| 5 | 2 | 1 | 1 | (5, 9, 6) | 5 |
+---+---+---+---+--------------------+---------+
| 6 | 3 | 1 | 1 | | |
+---+---+---+---+--------------------+---------+
n = iteration
At each step, one of the three indices is incremented, so the algorithm is guaranteed to eventually terminate. In the worst case, i, j, and k are incremented in that order, and the algorithm runs in O(n^2) (9 in this case) time. For the given example, it terminates after 5 iterations.
O(n)
Pair find(int[][] indices) {
pair.lBound = max int;
pair.rBound = 0;
index = 0;
for i from 0 to indices.lenght{
if(pair.lBound > indices[i][0]){
pair.lBound = indices[i][0]
index = i;
}
if(indices[index].lenght > 0)
pair.rBound = max(pair.rBound, indices[i][0])
}
remove indices[index][0]
return min(pair, find(indices)}

determine if intersection of a set with conjunction of two other sets is empty

For any three given sets A, B and C: is there a way to determine (programmatically) whether there is an element of A that is part of the conjunction (edit: intersection) of B and C?
example:
A: all numbers greater than 3
B: all numbers lesser than 7
C: all numbers that equal 5
In this case there is an element in set A, being the number 5, that fits. I'm implementing this as specifications, so this numerical range is just an example. A, B, C could be anything.
EDIT:
Thanks Niki!
It will be helpful if B.Count <= C.Count <= A.Count.
D = GetCommonElements(B,C);
if( D.Count>0 && GetCommonElements(D,A).Count >0)
{
// what you want IS NOT EMPTY
}
else
{
// what you want IS EMPTY
}
SET GetCommonElements(X,Y)
{
common = {}
for x in X:
if Y.Contains(x):
common.Add(x);
return common;
}
Look at Efficient Set Intersection Algorithm.
We can use distributive laws of sets
if(HasCommonElements(A,B) || HasCommonElements(A,C))
{
// what you want IS NOT EMPTY
}
else
{
// what you want IS EMPTY
}
bool HasCommonElements(X,Y)
{
// if at least one common element is found return true(immediately)
return false
}
If I'm understanding your question correctly, you want to programmatically compute the intersection of 3 sets, right? You want to see if there is an element in A that exists in the intersection of B and C, or in other words, you want to know if the intersection of A, B and C is non-empty.
Many languages have set containers and intersection algorithms so you should just be able to use those. Your example in OCaml:
module Int = struct
type t = int
let compare i j = if i<j then -1 else if i=j then 0 else 1
end;;
module IntSet = Set.Make(Int);;
let a = List.fold_left (fun a b -> IntSet.add b a) IntSet.empty [4;5;6;7;8;9;10];;
let b = List.fold_left (fun a b -> IntSet.add b a) IntSet.empty [0;1;2;3;4;5;6];;
let c = IntSet.add 5 IntSet.empty;;
let aIbIc = IntSet.inter (IntSet.inter b c) a;;
IntSet.is_empty aIbIc;;
This outputs false, as the intersection of a b and c is non-empty (contains 5). This of course relies on the fact that the elements of the set are comparable (in the example, the function compare defines this property in the Int module).
Alternatively in C++:
#include<iostream>
#include<set>
#include<algorithm>
#include<iterator>
int main()
{
std::set<int> A, B, C;
for(int i=10; i>3; --i)
A.insert(i);
for(int i=0; i<7; ++i)
B.insert(i);
C.insert(5);
std::set<int> ABC, BC;
std::set_intersection(B.begin(), B.end(), C.begin(), C.end(), std::inserter(BC, BC.begin()));
std::set_intersection(BC.begin(), BC.end(), A.begin(), A.end(), std::inserter(ABC, ABC.begin()));
for(std::set<int>::iterator i = ABC.begin(); i!=ABC.end(); ++i)
{
std::cout << *i << " ";
}
std::cout << std::endl;
return 0;
}
The question needs further clarification.
First, do you want to work with symbolic sets given by a range?
And secondly, is it a one time question or is it going to be repeated in some form (if yes, what are the stable parts of the question?)?
If you want to work with ranges, then you could represent these with binary trees and define union and intersection operations on these structures. Building the tree would require O(n log n) and finding the result would require O(log n). This would not pay off with only tree sets, but it would be flexible to efficiently support any combination of ranges (if that is what you thought by 'it can be anything').
On the other hand if anything means, any set of elements, then the only option is to enumerate elements. In this case building B+ trees on sets B and C will also require O(n log n) time, but here n is the number of elements, and in the first case n is the number of ranges. The later might be several orders of magnitude bigger and of course it can represent only finite number of elements.

Algorithm to find two repeated numbers in an array, without sorting

There is an array of size n (numbers are between 0 and n - 3) and only 2 numbers are repeated. Elements are placed randomly in the array.
E.g. in {2, 3, 6, 1, 5, 4, 0, 3, 5} n=9, and repeated numbers are 3 and 5.
What is the best way to find the repeated numbers?
P.S. [You should not use sorting]
There is a O(n) solution if you know what the possible domain of input is. For example if your input array contains numbers between 0 to 100, consider the following code.
bool flags[100];
for(int i = 0; i < 100; i++)
flags[i] = false;
for(int i = 0; i < input_size; i++)
if(flags[input_array[i]])
return input_array[i];
else
flags[input_array[i]] = true;
Of course there is the additional memory but this is the fastest.
OK, seems I just can't give it a rest :)
Simplest solution
int A[N] = {...};
int signed_1(n) { return n%2<1 ? +n : -n; } // 0,-1,+2,-3,+4,-5,+6,-7,...
int signed_2(n) { return n%4<2 ? +n : -n; } // 0,+1,-2,-3,+4,+5,-6,-7,...
long S1 = 0; // or int64, or long long, or some user-defined class
long S2 = 0; // so that it has enough bits to contain sum without overflow
for (int i=0; i<N-2; ++i)
{
S1 += signed_1(A[i]) - signed_1(i);
S2 += signed_2(A[i]) - signed_2(i);
}
for (int i=N-2; i<N; ++i)
{
S1 += signed_1(A[i]);
S2 += signed_2(A[i]);
}
S1 = abs(S1);
S2 = abs(S2);
assert(S1 != S2); // this algorithm fails in this case
p = (S1+S2)/2;
q = abs(S1-S2)/2;
One sum (S1 or S2) contains p and q with the same sign, the other sum - with opposite signs, all other members are eliminated.
S1 and S2 must have enough bits to accommodate sums, the algorithm does not stand for overflow because of abs().
if abs(S1)==abs(S2) then the algorithm fails, though this value will still be the difference between p and q (i.e. abs(p - q) == abs(S1)).
Previous solution
I doubt somebody will ever encounter such a problem in the field ;)
and I guess, I know the teacher's expectation:
Lets take array {0,1,2,...,n-2,n-1},
The given one can be produced by replacing last two elements n-2 and n-1 with unknown p and q (less order)
so, the sum of elements will be (n-1)n/2 + p + q - (n-2) - (n-1)
the sum of squares (n-1)n(2n-1)/6 + p^2 + q^2 - (n-2)^2 - (n-1)^2
Simple math remains:
(1) p+q = S1
(2) p^2+q^2 = S2
Surely you won't solve it as math classes teach to solve square equations.
First, calculate everything modulo 2^32, that is, allow for overflow.
Then check pairs {p,q}: {0, S1}, {1, S1-1} ... against expression (2) to find candidates (there might be more than 2 due to modulo and squaring)
And finally check found candidates if they really are present in array twice.
You know that your Array contains every number from 0 to n-3 and the two repeating ones (p & q). For simplicity, lets ignore the 0-case for now.
You can calculate the sum and the product over the array, resulting in:
1 + 2 + ... + n-3 + p + q = p + q + (n-3)(n-2)/2
So if you substract (n-3)(n-2)/2 from the sum of the whole array, you get
sum(Array) - (n-3)(n-2)/2 = x = p + q
Now do the same for the product:
1 * 2 * ... * n - 3 * p * q = (n - 3)! * p * q
prod(Array) / (n - 3)! = y = p * q
Your now got these terms:
x = p + q
y = p * q
=> y(p + q) = x(p * q)
If you transform this term, you should be able to calculate p and q
Insert each element into a set/hashtable, first checking if its are already in it.
You might be able to take advantage of the fact that sum(array) = (n-2)*(n-3)/2 + two missing numbers.
Edit: As others have noted, combined with the sum-of-squares, you can use this, I was just a little slow in figuring it out.
Check this old but good paper on the topic:
Finding Repeated Elements (PDF)
Some answers to the question: Algorithm to determine if array contains n…n+m? contain as a subproblem solutions which you can adopt for your purpose.
For example, here's a relevant part from my answer:
bool has_duplicates(int* a, int m, int n)
{
/** O(m) in time, O(1) in space (for 'typeof(m) == typeof(*a) == int')
Whether a[] array has duplicates.
precondition: all values are in [n, n+m) range.
feature: It marks visited items using a sign bit.
*/
assert((INT_MIN - (INT_MIN - 1)) == 1); // check n == INT_MIN
for (int *p = a; p != &a[m]; ++p) {
*p -= (n - 1); // [n, n+m) -> [1, m+1)
assert(*p > 0);
}
// determine: are there duplicates
bool has_dups = false;
for (int i = 0; i < m; ++i) {
const int j = abs(a[i]) - 1;
assert(j >= 0);
assert(j < m);
if (a[j] > 0)
a[j] *= -1; // mark
else { // already seen
has_dups = true;
break;
}
}
// restore the array
for (int *p = a; p != &a[m]; ++p) {
if (*p < 0)
*p *= -1; // unmark
// [1, m+1) -> [n, n+m)
*p += (n - 1);
}
return has_dups;
}
The program leaves the array unchanged (the array should be writeable but its values are restored on exit).
It works for array sizes upto INT_MAX (on 64-bit systems it is 9223372036854775807).
suppose array is
a[0], a[1], a[2] ..... a[n-1]
sumA = a[0] + a[1] +....+a[n-1]
sumASquare = a[0]*a[0] + a[1]*a[1] + a[2]*a[2] + .... + a[n]*a[n]
sumFirstN = (N*(N+1))/2 where N=n-3 so
sumFirstN = (n-3)(n-2)/2
similarly
sumFirstNSquare = N*(N+1)*(2*N+1)/6 = (n-3)(n-2)(2n-5)/6
Suppose repeated elements are = X and Y
so X + Y = sumA - sumFirstN;
X*X + Y*Y = sumASquare - sumFirstNSquare;
So on solving this quadratic we can get value of X and Y.
Time Complexity = O(n)
space complexity = O(1)
I know the question is very old but I suddenly hit it and I think I have an interesting answer to it.
We know this is a brainteaser and a trivial solution (i.e. HashMap, Sort, etc) no matter how good they are would be boring.
As the numbers are integers, they have constant bit size (i.e. 32). Let us assume we are working with 4 bit integers right now. We look for A and B which are the duplicate numbers.
We need 4 buckets, each for one bit. Each bucket contains numbers which its specific bit is 1. For example bucket 1 gets 2, 3, 4, 7, ...:
Bucket 0 : Sum ( x where: x & 2 power 0 == 0 )
...
Bucket i : Sum ( x where: x & 2 power i == 0 )
We know what would be the sum of each bucket if there was no duplicate. I consider this as prior knowledge.
Once above buckets are generated, a bunch of them would have values more than expected. By constructing the number from buckets we will have (A OR B for your information).
We can calculate (A XOR B) as follows:
A XOR B = Array[i] XOR Array[i-1] XOR ... 0, XOR n-3 XOR n-2 ... XOR 0
Now going back to buckets, we know exactly which buckets have both our numbers and which ones have only one (from the XOR bit).
For the buckets that have only one number we can extract the number num = (sum - expected sum of bucket). However, we should be good only if we can find one of the duplicate numbers so if we have at least one bit in A XOR B, we've got the answer.
But what if A XOR B is zero?
Well this case is only possible if both duplicate numbers are the same number, which then our number is the answer of A OR B.
Sorting the array would seem to be the best solution. A simple sort would then make the search trivial and would take a whole lot less time/space.
Otherwise, if you know the domain of the numbers, create an array with that many buckets in it and increment each as you go through the array. something like this:
int count [10];
for (int i = 0; i < arraylen; i++) {
count[array[i]]++;
}
Then just search your array for any numbers greater than 1. Those are the items with duplicates. Only requires one pass across the original array and one pass across the count array.
Here's implementation in Python of #eugensk00's answer (one of its revisions) that doesn't use modular arithmetic. It is a single-pass algorithm, O(log(n)) in space. If fixed-width (e.g. 32-bit) integers are used then it is requires only two fixed-width numbers (e.g. for 32-bit: one 64-bit number and one 128-bit number). It can handle arbitrary large integer sequences (it reads one integer at a time therefore a whole sequence doesn't require to be in memory).
def two_repeated(iterable):
s1, s2 = 0, 0
for i, j in enumerate(iterable):
s1 += j - i # number_of_digits(s1) ~ 2 * number_of_digits(i)
s2 += j*j - i*i # number_of_digits(s2) ~ 4 * number_of_digits(i)
s1 += (i - 1) + i
s2 += (i - 1)**2 + i**2
p = (s1 - int((2*s2 - s1**2)**.5)) // 2
# `Decimal().sqrt()` could replace `int()**.5` for really large integers
# or any function to compute integer square root
return p, s1 - p
Example:
>>> two_repeated([2, 3, 6, 1, 5, 4, 0, 3, 5])
(3, 5)
A more verbose version of the above code follows with explanation:
def two_repeated_seq(arr):
"""Return the only two duplicates from `arr`.
>>> two_repeated_seq([2, 3, 6, 1, 5, 4, 0, 3, 5])
(3, 5)
"""
n = len(arr)
assert all(0 <= i < n - 2 for i in arr) # all in range [0, n-2)
assert len(set(arr)) == (n - 2) # number of unique items
s1 = (n-2) + (n-1) # s1 and s2 have ~ 2*(k+1) and 4*(k+1) digits
s2 = (n-2)**2 + (n-1)**2 # where k is a number of digits in `max(arr)`
for i, j in enumerate(arr):
s1 += j - i
s2 += j*j - i*i
"""
s1 = (n-2) + (n-1) + sum(arr) - sum(range(n))
= sum(arr) - sum(range(n-2))
= sum(range(n-2)) + p + q - sum(range(n-2))
= p + q
"""
assert s1 == (sum(arr) - sum(range(n-2)))
"""
s2 = (n-2)**2 + (n-1)**2 + sum(i*i for i in arr) - sum(i*i for i in range(n))
= sum(i*i for i in arr) - sum(i*i for i in range(n-2))
= p*p + q*q
"""
assert s2 == (sum(i*i for i in arr) - sum(i*i for i in range(n-2)))
"""
s1 = p+q
-> s1**2 = (p+q)**2
-> s1**2 = p*p + 2*p*q + q*q
-> s1**2 - (p*p + q*q) = 2*p*q
s2 = p*p + q*q
-> p*q = (s1**2 - s2)/2
Let C = p*q = (s1**2 - s2)/2 and B = p+q = s1 then from Viete theorem follows
that p and q are roots of x**2 - B*x + C = 0
-> p = (B + sqrtD) / 2
-> q = (B - sqrtD) / 2
where sqrtD = sqrt(B**2 - 4*C)
-> p = (s1 + sqrt(2*s2 - s1**2))/2
"""
sqrtD = (2*s2 - s1**2)**.5
assert int(sqrtD)**2 == (2*s2 - s1**2) # perfect square
sqrtD = int(sqrtD)
assert (s1 - sqrtD) % 2 == 0 # even
p = (s1 - sqrtD) // 2
q = s1 - p
assert q == ((s1 + sqrtD) // 2)
assert sqrtD == (q - p)
return p, q
NOTE: calculating integer square root of a number (~ N**4) makes the above algorithm non-linear.
Since a range is specified, you can perform radix sort. This would sort your array in O(n). Searching for duplicates in a sorted array is then O(n)
You can use simple nested for loop
int[] numArray = new int[] { 1, 2, 3, 4, 5, 7, 8, 3, 7 };
for (int i = 0; i < numArray.Length; i++)
{
for (int j = i + 1; j < numArray.Length; j++)
{
if (numArray[i] == numArray[j])
{
//DO SOMETHING
}
}
*OR you can filter the array and use recursive function if you want to get the count of occurrences*
int[] array = { 1, 2, 3, 4, 5, 4, 4, 1, 8, 9, 23, 4, 6, 8, 9, 1,4 };
int[] myNewArray = null;
int a = 1;
void GetDuplicates(int[] array)
for (int i = 0; i < array.Length; i++)
{
for (int j = i + 1; j < array.Length; j++)
{
if (array[i] == array[j])
{
a += 1;
}
}
Console.WriteLine(" {0} occurred {1} time/s", array[i], a);
IEnumerable<int> num = from n in array where n != array[i] select n;
myNewArray = null;
a = 1;
myNewArray = num.ToArray() ;
break;
}
GetDuplicates(myNewArray);
answer to 18..
you are taking an array of 9 and elements are starting from 0..so max ele will be 6 in your array. Take sum of elements from 0 to 6 and take sum of array elements. compute their difference (say d). This is p + q. Now take XOR of elements from 0 to 6 (say x1). Now take XOR of array elements (say x2). x2 is XOR of all elements from 0 to 6 except two repeated elements since they cancel out each other. now for i = 0 to 6, for each ele of array, say p is that ele a[i] so you can compute q by subtracting this ele from the d. do XOR of p and q and XOR them with x2 and check if x1==x2. likewise doing for all elements you will get the elements for which this condition will be true and you are done in O(n). Keep coding!
check this out ...
O(n) time and O(1) space complexity
for(i=0;i< n;i++)
xor=xor^arr[i]
for(i=1;i<=n-3;i++)
xor=xor^i;
So in the given example you will get the xor of 3 and 5
xor=xor & -xor //Isolate the last digit
for(i = 0; i < n; i++)
{
if(arr[i] & xor)
x = x ^ arr[i];
else
y = y ^ arr[i];
}
for(i = 1; i <= n-3; i++)
{
if(i & xor)
x = x ^ i;
else
y = y ^ i;
}
x and y are your answers
For each number: check if it exists in the rest of the array.
Without sorting you're going to have a keep track of numbers you've already visited.
in psuedocode this would basically be (done this way so I'm not just giving you the answer):
for each number in the list
if number not already in unique numbers list
add it to the unique numbers list
else
return that number as it is a duplicate
end if
end for each
How about this:
for (i=0; i<n-1; i++) {
for (j=i+1; j<n; j++) {
if (a[i] == a[j]) {
printf("%d appears more than once\n",a[i]);
break;
}
}
}
Sure it's not the fastest, but it's simple and easy to understand, and requires
no additional memory. If n is a small number like 9, or 100, then it may well be the "best". (i.e. "Best" could mean different things: fastest to execute, smallest memory footprint, most maintainable, least cost to develop etc..)
In c:
int arr[] = {2, 3, 6, 1, 5, 4, 0, 3, 5};
int num = 0, i;
for (i=0; i < 8; i++)
num = num ^ arr[i] ^i;
Since x^x=0, the numbers that are repeated odd number of times are neutralized. Let's call the unique numbers a and b.We are left with a^b. We know a^b != 0, since a != b. Choose any 1 bit of a^b, and use that as a mask ie.choose x as a power of 2 so that x & (a^b) is nonzero.
Now split the list into two sublists -- one sublist contains all numbers y with y&x == 0, and the rest go in the other sublist. By the way we chose x, we know that the pairs of a and b are in different buckets. So we can now apply the same method used above to each bucket independently, and discover what a and b are.
I have written a small programme which finds out the number of elements not repeated, just go through this let me know your opinion, at the moment I assume even number of elements are even but can easily extended for odd numbers also.
So my idea is to first sort the numbers and then apply my algorithm.quick sort can be use to sort this elements.
Lets take an input array as below
int arr[] = {1,1,2,10,3,3,4,5,5,6,6};
the number 2,10 and 4 are not repeated ,but they are in sorted order, if not sorted use quick sort to first sort it out.
Lets apply my programme on this
using namespace std;
main()
{
//int arr[] = {2, 9, 6, 1, 1, 4, 2, 3, 5};
int arr[] = {1,1,2,10,3,3,4,5,5,6,6};
int i = 0;
vector<int> vec;
int var = arr[0];
for(i = 1 ; i < sizeof(arr)/sizeof(arr[0]); i += 2)
{
var = var ^ arr[i];
if(var != 0 )
{
//put in vector
var = arr[i-1];
vec.push_back(var);
i = i-1;
}
var = arr[i+1];
}
for(int i = 0 ; i < vec.size() ; i++)
printf("value not repeated = %d\n",vec[i]);
}
This gives the output:
value not repeated= 2
value not repeated= 10
value not repeated= 4
Its simple and very straight forward, just use XOR man.
for(i=1;i<=n;i++) {
if(!(arr[i] ^ arr[i+1]))
printf("Found Repeated number %5d",arr[i]);
}
Here is an algorithm that uses order statistics and runs in O(n).
You can solve this by repeatedly calling SELECT with the median as parameter.
You also rely on the fact that After a call to SELECT,
the elements that are less than or equal to the median are moved to the left of the median.
Call SELECT on A with the median as the parameter.
If the median value is floor(n/2) then the repeated values are right to the median. So you continue with the right half of the array.
Else if it is not so then a repeated value is left to the median. So you continue with the left half of the array.
You continue this way recursively.
For example:
When A={2, 3, 6, 1, 5, 4, 0, 3, 5} n=9, then the median should be the value 4.
After the first call to SELECT
A={3, 2, 0, 1, <3>, 4, 5, 6, 5} The median value is smaller than 4 so we continue with the left half.
A={3, 2, 0, 1, 3}
After the second call to SELECT
A={1, 0, <2>, 3, 3} then the median should be 2 and it is so we continue with the right half.
A={3, 3}, found.
This algorithm runs in O(n+n/2+n/4+...)=O(n).
What about using the https://en.wikipedia.org/wiki/HyperLogLog?
Redis does http://redis.io/topics/data-types-intro#hyperloglogs
A HyperLogLog is a probabilistic data structure used in order to count unique things (technically this is referred to estimating the cardinality of a set). Usually counting unique items requires using an amount of memory proportional to the number of items you want to count, because you need to remember the elements you have already seen in the past in order to avoid counting them multiple times. However there is a set of algorithms that trade memory for precision: you end with an estimated measure with a standard error, in the case of the Redis implementation, which is less than 1%. The magic of this algorithm is that you no longer need to use an amount of memory proportional to the number of items counted, and instead can use a constant amount of memory! 12k bytes in the worst case, or a lot less if your HyperLogLog (We'll just call them HLL from now) has seen very few elements.
Well using the nested for loop and assuming the question is to find the number occurred only twice in an array.
def repeated(ar,n):
count=0
for i in range(n):
for j in range(i+1,n):
if ar[i] == ar[j]:
count+=1
if count == 1:
count=0
print("repeated:",ar[i])
arr= [2, 3, 6, 1, 5, 4, 0, 3, 5]
n = len(arr)
repeated(arr,n)
Why should we try out doing maths ( specially solving quadratic equations ) these are costly op . Best way to solve this would be t construct a bitmap of size (n-3) bits , i.e, (n -3 ) +7 / 8 bytes . Better to do a calloc for this memory , so every single bit will be initialized to 0 . Then traverse the list & set the particular bit to 1 when encountered , if the bit is set to 1 already for that no then that is the repeated no .
This can be extended to find out if there is any missing no in the array or not.
This solution is O(n) in time complexity

Resources