Three-way set Disjointness - algorithm

this is a question from my practice problems for an upcoming test.
I was hoping to get help in finding a more efficient solution to this problem. Right now, I know I can solve this type of problem just by using 3 simple for loops, but that would be O(N^3).
Furthermore, I believe that somehow incorporating binary search will be the best way, and give me the O(log n) in the answer that I'm looking for. Unfortunately, I'm kind of stuck.
The three-way set disjointness problem is defined as follows: Given three sets of items, A, B, and C, they are three-way disjoint if there is no element common to all three sets, ie, there exists no x such that x is in A, B, and C.
Assume that A, B, and C are sets of items that can be ordered (integers); furthermore, assume that it is possible to sort n integers in O(n log n) time. Give an O(n log n) algorithm to decide whether the sets are three-way set disjoint.
Thanks for any assistance

The question statement has given obvious hint on how to solve the problem. Assuming the 3 sets are mathematical sets (elements are unique within each set), just mix 3 sets together and sort them, then traverse the list linearly and search whether there are 3 instances of the same item. The time complexity is dominated by sorting, which is O(n log n). The auxiliary space complexity is at most O(n).
Another solution is to use hash-based map/dictionary. Just count the frequency of the items across 3 sets. If any of the items has frequency equal to 3 (this can be checked when the frequency is retrieved for update), the 3 sets are not 3-way disjoint. Insertion, access and modification can be done in O(1) amortized complexity, so the time complexity is O(n). The space complexity is also O(n).

If complexity is the constraint (and neither space or the constant term are), this can be solved in O(n). Create two bitmaps, mapping integers from A to the first and integers from B to the second. Then traverse the third (C) until you exhaust, or you find an entry where bitmapA(testInt) and bitmapB(testInt) are both set.

We can solve this problem with o(n). This is possible if we use Set data structure and consider initial capacity and load factor into consideration.
public static boolean checkThreeWaySetDisjointness(Set<Integer> a, Set<Integer> b, Set<Integer> c)
{
int capacity = Math.round((a.size() + b.size()) / 0.75f) + 1;
Set<Integer> container = new HashSet<Integer>(capacity);
container.addAll(a);
for (int element : b)
{
if (!container.add(element))
{
if (c.contains(element))
{
return false;
}
}
}
return true;
}
We are creating new Set container because if we start adding directly in any of existing Set a/b/c, once its capacity of 75 % of actual size is reached, internally java will create new Hashset and copies entire existing Set in it. This overhead will have complexity of O(n). Hence we are creating here new HashSet of size capacity which will ensure there will not be an overhead of copying. Then copy entire Set a and then go on adding one by one element from Set b. In Java , if add() returns false means element already exists in current collection. If yes, just check for the same in third Set c. Method add and contains of HashSet have complexity of O(1) so this entire code runs in O(n).

Related

Given O(n) sets, what is complexity of figuring out distinct ones amongst them?

I have an application where I have a list of O(n) sets.
Each set Set(i) is an n-vector. Suppose n=4, for instance,
Set(1) could be [0|1|1|0]
Set(2) could be [1|1|1|0]
Set(3) could be [1|1|0|0]
Set(4) could be [1|1|1|0]
I'd like to process these sets so that as output, I only get the unique ones amongst them. So, in the example above, I would get as output:
Set(1), Set(2), Set(3). Note that Set(4) is discarded since it is same as Set(2).
A rather brute force way of figuring this gives me a worst-case bound of O(n^3):
Given: Input List of size O(n)
Output List L = Set(1)
for(j = 2 to Length of Input List){ // Loop Outer, check if Set(j) should be added to L
for(i = 1 to Length of L currently){ // Loop Inner
check if Set(i) is same as Set(j) //This step is O(n) since Set() has O(n) elements
if(they are same) exit inner loop
else
if( i is length of L currently) //so, Set(j) is unique thus far
Append Set(j) to L
}
}
There is no a priori bound on n: it can be arbitrarily large. This seems to preclude use of simple hash function which maps the binary set into decimal. I could be wrong.
Is there any other way this can be done in better worst-case running time other than O(n^3)?
O(n) sequences of length n makes an input of size O(n^2). You won't get complexity better than that, since you may at least be required to read all the input. All sequences might be the same, for example, but you'd have to read them all to know that.
A binary sequence of length n can be inserted into a trie or radix tree, while checking whether or not it already exists, in O(n) time. That's O(n^2) for all the sequences together, so simply using a trie or radix tree to find duplicates is optimal.
See: https://en.wikipedia.org/wiki/Trie
and: https://en.wikipedia.org/wiki/Radix_tree
You may consider implementing your set using a balanced binary tree. The cost of inserting a new node into such a tree is O(lgm), where m is the number of elements in the tree. Duplicates would implicitly be weeded out because if we detect that such a node already exists, then it would just not be added.
In your example, the total number of lookup/insertion operations would be n*n, since there are n sets, and each set has n values. So, the overall time might scale as O(n^2*lg(n^2)). This outperforms O(n^3) by some amount.
First of all, these are not sets but bitstrings.
Next, for every bitstring you can convert it to a number and put that number in a hashset (or simply store the original bitstrings, most hashset implementations can do that). Afterwards, your hashset contains all the unique items. O(N) time, O(N) space. If you need to maintain the original order of strings, then in the first loop check for each string if it is in the hashset already, and if not, output it and insert in the hashset.
If you can use O(n) extra space, you can try this:
First of all, let's assume the vectors are binary numbers, so 0110 becomes 6.
This is in case numbers in vectors are [0,1], else you can multiply by 10 instead of 2.
Converting all vectors into decimals would take O(4n).
For each converted number we'll map the vector by the decimal number. To implement this, we'll be using an n-sized hash-map.
HM <- n-sized hash-map
for each vector v:
num <- decimal number converted of v
map v into HM by num
loop over HM and take only one for each index
runtime by steps:
O(n)
O(n*(4+1)) , when 1 is the time for mapping, 4 is the vector length
O(n)

Sorting given pairwise orderings

I have n variables (Var 1 ... Var n) and do not know their exact values. The n choose 2 pairwise ordering between these n variables are known. For instance, it is known that Var 5 <= Var 9, Var 9 <= Var 10 and so on for all pairs. Further, it is also known that these pairwise orderings are consistent and do not lead to a degenerate case of equality throughout. That is, in the above example the inequality Var 10 <= Var 5 will not be present.
What is the most efficient sorting algorithm for such problems which gives a sorting of all variables?
Pairwise ordering is the only thing that any (comparison-based) sort needs anyway, so your question boils down to "what's the most efficient comparison-based sorting algorithm".
In answer to that, I recommend you look into Quicksort, Heapsort, Timsort, possibly Mergesort and see what will work well for your case in terms of memory requirements, programming complexity etc.
I find Quicksort the quickest to implement for a once-off program.
The question is not so much how to sort (use the standard sort of your language) but how to feed the sort criterion to the sorting algorithm.
In most languages you need to provide a int comparison (T a, T b) where T is the type of elements, that returns -1, 0 or 1 depending on who is larger.
So you need a fast access to the data structure storing (all) pairwise orderings, given a pair of elements.
So the question is not so much will Var 10 <= Var 5 be present (inconsistent) but more is Var 5 <= Var 10 ensured to be present ? If this is the case, you can test presence of the constraint in O(1) with a hash set of pairs of elements, otherwise, you need to find a transitive relationship between a and b, which might not even exist (it's unclear from OP if we are talking of a partial or total order, i.e. for all a,b we ensure a < b or b < a or a = b (total order).
With roughly worst case N^2 entries, this hash is pretty big. Building it still requires exploring transitive links which is costly.
Following links probably means a map of elements to sets of (immediately) smaller elements, when comparing a to b, if (map(a) contains b) or (map(b) contains a) you can answer immediately, otherwise you need to recurse on the elements of map(a) and map(b), with pretty bad complexity. Ultimately you'll still be cumulating sets of smaller values to build your test.
Perhaps if you have a low number of constraints a <= b, just applying a permutation of a and b when they do not respect the constraint and iterating over the constraints to fixpoint (all constraints applied in one full round with no effect) could be more efficient. At least it's O(1) in memory.
A variant of that could be sorting using a stable sort (preserves order of incomparable entries) several times with subsets of the constraints.
Last idea, computing a Max with your input data is O(number of constraints), so you could just repeatedly compute the Max, add it at the end of the target, remove constraints that use it, rinse and repeat. I'd use a stack to store the largest element up to a given constraint index, so you can backtrack to that rather than restart from scratch.

Data structure that deletes all elements of a set less than or equal to x in O(1) time

I am self studying for an algorithms course, and I am trying to solve the following problem:
Describe a data structure to store a set of real numbers, which can perform each of the following operations in O(1) amortized time:
Insert(x) : Deletes all elements not greater than x, and adds x to the set.
FindMin() : Find minimum value of set.
I realize that findmin kind of becomes trivial once you have Insert, and see how with a linked list implementation, you could delete multiple elements simultaneously (ie O(1)), but finding out which link to delete (aka where x goes) seems like an O(n) or O(log n) operation, not O(1). The problem gave the hint: Consider using a stack, but I don't see how this is helpful.
Any help is appreciated.
The original question is below:
Note that your goal is to get O(1) amortized time, not O(1) time. This means that you can do as much work as you'd like per operation as long as n operations don't take more than O(n) time.
Here's a simple solution. Store the elements in a stack in ascending order. To insert an element, keep popping the stack until it's empty or until the top element is greater than x, then push x onto the stack. To do a find-min, read the top of the stack.
Find-min clearly runs in time O(1). Let now look at insert. Intuitively, each element is pushed and popped at most once, so we can spread the work of an expensive insert across cheaper inserts. More formally, let the potential be n, the number of elements on the stack. Each time you do an insert, you'll do some number of pops (say, k) and the potential increases by 1 - k (one new element added, k removed). The amortized cost is then k + 1 + 1 - k, which is 2. Therefore, insert is amortized O(1).
Hope this helps!
double is the data structure! In the methods below, ds represents the data structure that the operation is being performed on.
void Insert(ref double ds, double x)
{
ds = x;
}
double FindMin(double ds)
{
return ds;
}
The only way to ever observe the state of the data structure is to query its minimum element (FindMin). The only way to modify the state of the data structure is to set its new minimum element (Insert). So the data structure is simply the minimum element of the set.

Algorithm for finding mutual name in lists

I've been reading up on Algorithms from the book Algorithms by Robert Sedgewick and I've been stuck on an exercise problem for a while. Here is the question :
Given 3 lists of N names each, find an algorithm to determine if there is any name common to all three lists. The algorithm must have O(NlogN) complexity. You're only allowed to use sorting algorithms and the only data structures you can use are stacks and queues.
I figured I could solve this problem using a HashMap, but the questions restricts us from doing so. Even then that still wouldn't have a complexity of NlogN.
If you sort each of the lists, then you could trivially check if all three lists have any 1 name in O(n) time by picking the first name of list A compare it to the first name in list B, if that element is < that of list A, pop the list b element and repeat until list B >= list A. If you find a match repeat the process on C. If you find a match in C also return true, otherwise return to the next element in a.
Now you have to sort all of the lists in n log n time. which you could do with your favorite sorting algorithm though you would have to be a little creative using just stacks and queues. I would probably recommend merge sort
The below psuedo code is a little messed up because I am changing lists that I am iterating over
pseudo code:
assume listA, b and c are sorted Queues where the smallest name is at the top of the queue.
eltB = listB.pop()
eltC = listC.pop()
for eltA in listA:
while(eltB<=eltA):
if eltB==eltA:
while(eltC<=eltB):
if eltB==eltC:
return true
if eltC<eltB:
eltC=listC.pop();
eltB=listB.pop()
Steps:
Sort the three lists using an O(N lgN) sorting algorithm.
Pop the one item from each list.
If any of the lists from which you tried to pop is empty, then you are done i.e. no common element exists.
Else, compare the three elements.
If the elements are equal, you are done - you have found the common element.
Else, keep the maximum of the three elements (constant time) and replenish from the same lists from which the two elements were discarded.
Go to step 3.
Step 1 takes O(N lgN) and the rest of the steps take O(N), so the overall complexity is O(N lgN).

Pairwise priority queue

I have a set of A's and a set of B's, each with an associated numerical priority, where each A may match some or all B's and vice versa, and my main loop basically consists of:
Take the best A and B in priority order, and do stuff with A and B.
The most obvious way to do this is with a single priority queue of (A,B) pairs, but if there are 100,000 A's and 100,000 B's then the set of O(N^2) pairs won't fit in memory (and disk is too slow).
Another possibility is for each A, loop through every B. However this means that global priority ordering is by A only, and I really need to take priority of both components into account.
(The application is theorem proving, where the above options are called the pair algorithm and the given clause algorithm respectively; the shortcomings of each are known, but I haven't found any reference to a good solution.)
Some kind of two layer priority queue would seem indicated, but it's not clear how to do this without using either O(N^2) memory or O(N^2) time in the worst case.
Is there a known method of doing this?
Clarification: each A must be processed with all corresponding B's, not just one.
Maybe there's something I'm not understanding but,
Why not keep the A's and B's in separate heaps, get_Max on each of the heaps, do your work, remove each max from its associated heap and continue?
You could handle the best pairs first, and if nothing good comes up mop up the rest with the given clause algorithm for completeness' sake. This may lead to some double work, but I'd bet that this is insignificant.
Have you considered ordered paramodulation or superposition?
It appears that the items in A have an individual priority, the items in B have an individual priority, and the (A,B) pairs have a combined priority. Only the combined priority matters, but hopefully we can use the individual properties along the way. However, there is also a matching relation between items in A and items in B that is independent priority.
I assume that, for all a in A, b1 and b2 in B, such that Match(a,b1) and Match(a,b2), then Priority(b1) >= Priority(b2) implies CombinedPriority(a,b1) >= CombinedPriority(a,b2).
Now, begin by sorting B in decreasing order priority. Let B(j) indicate the jth element in this sorted order. Also, let A(i) indicate the ith element of A (which may or may not be in sorted order).
Let nextb(i,j) be a function that finds the smallest j' >= j such that Match(A(i),B(j')). If no such j' exists, the function returns null (or some other suitable error value). Searching for j' may just involve looping upward from j, or we may be able to do something faster if we know more about the structure of the Match relation.
Create a priority queue Q containing (i,nextb(i,0)) for all indices i in A such that nextb(i,0) != null. The pairs (i,j) in Q are ordered by CombinedPriority(A(i),B(j)).
Now just loop until Q is empty. Pull out the highest-priority pair (i,j) and process (A(i),B(j)) appropriately. Then re-insert (i,nextb(i,j+1)) into Q (unless nextb(i,j+1) is null).
Altogether, this takes O(N^2 log N) time in the worst case that all pairs match. In general, it takes O(N^2 + M log N) where M are the number of matches. The N^2 component can be reduced if there is a faster way of calculating nextb(i,j) that just looping upward, but that depends on knowledge of the Match relation.
(In the above analysis, I assumed both A and B were of size N. The formulas could easily be modified if they are different sizes.)
You seemed to want something better than O(N^2) time in the worst case, but if you need to process every match, then you have a lower bound of M, which can be N^2 itself. I don't think you're going to be able to do better than O(N^2 log N) time unless there is some special structure to the combined priority that lets you use a better-than-log-N priority queue.
So you have a Set of A's, and a set of B's, and you need to pick a (A, B) pair from this set such that some f(a, b) is the highest of any other (A, B) pair.
This means you can either store all possible (A, B) pairs and order them, and just pick the highest each time through the loop (O(1) per iteration but O(N*M) memory).
Or you could loop through all possible pairs and keep track of the current maximum and use that (O(N*M) per iteration, but only O(N+M) memory).
If I am understanding you correctly this is what you are asking.
I think it very much depends on f() to determine if there is a better way to do it.
If f(a, b) = a + b, then it is obviously very simple, the highest A, and the highest B are what you want.
I think your original idea will work, you just need to keep your As and Bs in separate collections and just stick references to them in your priority queue. If each reference takes 16 bytes (just to pick a number), then 10,000,000 A/B references will only take ~300M. Assuming your As and Bs themselves aren't too big, it should be workable.

Resources