Selecting globally distinct values from multiple set - algorithm

I have N lists containing various numbers of objects and for each list a number X of required distinct values.
An simple example:
List1 = [ 1,2,3,4 ] : 2
List2 = [ 2,3,4 ] : 1
List3 = [ 1,2,4 ] : 1
Here one solution is to select 1 and 2 from List1, 3 from List2 and 4 from List3
But if the problem looks like this, there is no solutions
List1 = [ 1,2,3,4 ] : 2
List2 = [ 2,3,4 ] : 2
List3 = [ 1,2,4 ] : 1
So, the brute force solution to this problem is to select the required numbers of objects from the first list, then select the required numbers from the second list where the selected cannot be in the previous selected. If this fails, select other objects from the first list and so forth.
This, however, is not efficient and I might end up trying all combinations before I find a solution, if any.
So, is there any other way to solve this problem?

This problem can be restated in terms of a flow network, and solved using a number of maximum flow algorithms:
Add a source vertex S
Add a vertex Gi for each of the sets
Add an edge from S to Gi with the capacity equal to the number of items to be selected
Add a vertex Ni for each distinct number in the union of all sets
Add an edge with capacity 1 between each Gi and Ni where the set contains the number
Add a sink vertex T
Add an edge with capacity 1 between each Ni and T
Here is how the flow network for your problem would look:
If the max flow algorithm does not produce a flow equal to the total required distinct numbers, the problem cannot be solved. Otherwise, use capacity assignments produced by the algorithm for edges between Gi and Ni to decide which numbers to take from each set.

Related

Select one element from each set but the selected element should not be repeated

I have a few sets, say 5 of them
{1,2,3}
{2,4}
{1,2}
{4,5}
{2,3,5}
Here, I need to choose at least 3 elements from any three sets(One element per set). Given that if an element is already selected, then it cannot be selected again.
Also check if any solution exists or not.
Eg
set {1,2,3} -> choose 1
set {2,4} -> choose 2
set {1,2} -> cannot choose since both 1 and 2 are chosen.
set {2,5} -> can only choose 5
Is there a way to achieve this? Simple explanation would be appreciated.
If you only need 3 elements, then the algorithm is quite simple. Just repeat the following procedure:
Select the set with the lowest heuristic. The heuristic is the length of the set, divided by the total occurrences of that set. If the set has zero elements, remove the set, and go to step 4. If there are two or more, you can choose any one of them.
Pick an element from that set. This is the element you'll choose.
Remove this element from every set.
If you have picked 3 elements or there are no more sets remaining, exit. Otherwise go to step 1.
This algorithm gives at least 3 elements whenever it's possible, even in the presence of duplicates. Here's the proof.
If the heuristic for a set is <= 1, picking an element from that set is basically free. It doesn't hurt the ability to use other sets at all.
If we are in a situation with 2 or more sets with heuristic >1, and we have to pick at least two elements, this is easy. Just pick one from the first, and the second one will have an element left, because it's length is >1 because it's heuristic is >1.
If we are in a situation with 3 or more sets with heuristic >1, we can pick from the first set. After this we are left with at least two sets, where at least one of them has more than one element. We can't be left with two size one sets, because that would imply that the 3 sets we started with contain a duplicate length 2 set, which has heuristic 1. Thus we can pick all 3 elements.
Here is python code for this algorithm. The generator returns as many elements as it can manage. If it's possible to return at least 3 elements, it will. However after that, it doesn't always return the optimal solution.
def choose(sets):
# Copy the input, to avoid modification of the input
s = [{*e} for e in sets]
while True:
# If there are no more sets remaining
if not s:return
# Remove based on length and number of duplicates
m = min(s,key=lambda x:(len(x)/s.count(x)))
s.remove(m)
# Ignore empty sets
if m:
# Remove a random element
e = m.pop()
# Yield it
yield e
# Remove the chosen element e from other sets
for i in range(len(s)):s[i].discard(e)
print([*choose([{1,2,3}, {2,4}, {1,2}, {4,5}, {2,3,5}])])
print([*choose([{1}, {2,3}, {2,4}, {1,2,4}])])
print([*choose([{1,2}, {2}, {2,3,4}])])
print([*choose([{1,2}, {2}, {2,1}])])
print([*choose([{1,2}, {1,3}, {1,3}])])
print([*choose([{1}, {1,2,3}, {1,2,3}])])
print([*choose([{1,3}, {2,3,4}, {2,3,4}, {2,3,4}, {2,3,4}])])
print([*choose([{1,5}, {2,3}, {1,3}, {1,2,3}])])
Try it online!
Something like this
given your sets
0: {1,2,3}
1: {2,4}
2: {1,2}
3: {4,5}
4: {2,3,5}
A array of sets
set A[1] = { 0, 2} // all sets containing 1
A[2] = { 0, 1, 2, 4} // all sets containing 2
A[3] = { 0, 4 } // all sets containing 3
A[4] = { 1, 3 } // all sets containing 4
A[5] = { 3, 4 } // all sets containing 5
set<int> result;
for(i = 0; i < 3; i++) {
find k such that A[k] not empty
if no k exist then "no solution"
result.add(k)
A[k] = empty
}
return result
I think my idea is a bit overkill but it would work on any kind of sets with any number of sets in any size.
the idea is to transform the sets to bipartite graph. on one side you have each set, and on the other side you have the number which they contains.
and if a set contains a number you have a edge between those vertices.
eventually you're trying to find the maximum matching in the graph (maximum cardinality matching).
gladly it can be done with Hopcroft-Karp algorithm in O(√VE) time or even less with Ford–Fulkerson algorithm.
here some links for more source on maximum matching and the algorithms->
https://en.wikipedia.org/wiki/Matching_(graph_theory)
https://en.wikipedia.org/wiki/Maximum_cardinality_matching
https://en.wikipedia.org/wiki/Ford%E2%80%93Fulkerson_algorithm

Way to Calculate Distinct Partitions using Subsets of a Set containing only one Kind of element

We know 3 things
n(number of elements in the set)
k(no. of parts)
set s= {x,x,x,x,...,x(n times)} (here X can have any possible integral value)
we have to find the result as a number which will holds the value of number of distinct partitions possible of the set S.
Is there any kind way(formula / procedure) to find the result using given values?
EXAMPLES:
Input: n = 3, k = 2
Output: 4
Explanation: Let the set be {0,0,0} (assuming x=0), we can partition
it into 2 subsets in following ways
{{0,0}, {0}}, {{0}, {0,0}}, {{0,0,0},{}}
{{},{0,0,0}}.
further, see {{0,0}, {0}} is made up of 2 subsets namely {0,0}
and {0} And it has x(=0) used exactly n(=3) times
Input: n = 3, k = 1
Output: 1
Explanation: There is only one way {{1, 1, 1}} (assuming x=1)
Note:
I know I used word Set in the problem. but a set is defined as collection of distinct elements. So you can either consider it a Multiset, an array or You can assume a set can hold same elements for this particular problem.
I am just trying to use Same terminology as that in the problem.

Looking for algorithm to match up objects from 2 lists depending of distance

So I have 2 lists of objects, with a position for each one. I would like to match every object from the first list with an object of the second list.
Once the object of the second list is selected for a match up, we remove it from the list (thus it can not be matched with another one). And most importantly, the total sum of distances between the matched up objects should be the least possible.
For example:
list1 { A, B, C } list2 { X, Y, Z }
So if I match up A->X (dist: 3meters) B->Z (dist: 2meters) C->Y (dist: 4meters)
Total sum = 3 + 2 + 4 = 9meters
We could have another match up with A->Y (4meters) B->X (1meter) C->Z (3meters)
Total sum = 4 + 1 + 3 = 8meters <======= Better solution
Thank you for your help.
Extra: Lists could have different length.
This problem is known as the Assignment Problem (a weighted matching in bipartite graphs).
An algorithm which solves this is the Hungarian algorithm. At the bottom of the wikipedia article is also a list of implementations.
If your data has special properties, like your two sets are 2D points and the weight of an edge is the euclidean distance, then there are better algorithms for this.

Does a data structure like this exist?

I'm searching for a data structure that can be sorted as fast as a plain list and which should allow to remove elements in the following way. Let's say we have a list like this:
[{2,[1]},
{6,[2,1]},
{-4,[3,2,1]},
{-2,[4,3,2,1]},
{-4,[5,4,3,2,1]},
{4,[2]},
{-6,[3,2]},
{-4,[4,3,2]},
{-6,[5,4,3,2]},
{-10,[3]},
{18,[4,3]},
{-10,[5,4,3]},
{2,[4]},
{0,[5,4]},
{-2,[5]}]
i.e. a list containing tuples (this is Erlang syntax). Each tuple contains a number, and a list which includes the members of a list used to compute previous number. What I want to do with the list is the following. First, sort it, then take the head of the list, and finally clean the list. With clean I mean to remove all the elements from the tail that contain elements that are in the head, or, in other words, all the elements from the tail which intersection with head is not empty. For example, after sorting the head is {18,[4,3]}. Next step is removing all the elements of the list that contain 4 or 3, i.e. the resulting list should be this one:
[{6,[2,1]},
{4,[2]},
{2,[1]},
{-2,[5]}]
The process follows by taking the new head and cleaning again till the whole list is consumed. Note that if the the clean process preserves the order, there is no need to resorting the list each iteration.
The bottleneck here is the clean process. I would need some structure which allows me to do the cleaning in a faster way than now.
Does anyone know some structure that allows to do this in an efficient way without losing the order or at least allowing fast sorting?
Yes, you can get faster than this. Your problem is that you are representing the second tuple members as lists. Searching them is cumbersome and quite unnecessary. They are all contiguous substrings of 5..1. You could simply represent them as a tuple of indices!
And in fact you don't even need a list with these index tuples. Put them in a two-dimensional array right at the position given by the respective tuple, and you'll get a triangular array:
h\l| 1 2 3 4 5
---+----------------------
1 | 2
2 | 6 2
3 | -4 -6 -10
4 | -2 -4 18 2
5 | -4 -10 -10 0 -2
Instead of storing the data in a two-dimensional array, you might want to store them in a simple array with some index magic to account for the triangular shape (if your programming language only allows for rectangular two-dimensional arrays), but that doesn't affect complexity.
This is all the structure you need to quickly filter the "list" by simply looking the things up.
Instead of sorting first and getting the head, we simply iterate once through the whole structure to find the maximum value and its indices:
max_val = 18
max = (4, 3) // the two indices
The filter is quite simple. If we don't use lists (not (any (substring `contains`) selection)) or sets (isEmpty (intersect substring selection)) but tuples then it's just sel.high < substring.low || sel.low > substring.high. And we don't even need to iterate the whole triangular array, we can simple iterate the higer and the lower triangles:
result = []
for (i from 1 until max[1])
for (j from i until max[1])
result.push({array[j][i], (j,i)})
for (i from max[0] until 5)
for (j from i until 5)
result.push({array[j+1][i+1], (j+1,i+1)})
And you've got the elements you need:
[{ 2, (1,1)},
{ 6, (2,1)},
{ 4, (2,2)},
{-2, (5,5)}]
Now you only need to sort that and you've got your result.
Actually the overall complexity doesn't get better with the triangular array. You still got O(n) from building the list and finding the maximum. Whether you filter in O(n) by testing against every substring index tuple, or filter in O(|result|) by smart selection doesn't matter any more, but you were specifically asking about a fast cleaning step. This still might be beneficial in reality if the data is large, or when you need to do multiple cleanings.
The only thing affecting overall complexity is to sort only the result, not the whole input.
I wonder if your original data structure can be seen as an adjacency list for a directed graph? E.g;
{2,[1]},
{6,[2,1]}
means you have these nodes and edges;
node 2 => node 1
node 6 => node 2
node 6 => node 1
So your question can be rewritten as;
If I find a node that links to nodes 4 and 3, what happens to the graph if I delete nodes 4 and 3?
One approach would be to build an adjacency matrix; an NxN bit matrix where every edge is the 1-bit. Your problem now becomes;
set every bit in the 4-row, and every bit in the 4-column, to zero.
That is, nothing links in or out of this deleted node.
As an optimisation, keep a bit array of length N. The bit is set if the node hasn't been deleted. So if nodes 1, 2, 4, and 5 are 'live' and 3 and 6 are 'deleted', the array looks like
[1,1,0,1,1,0]
Now to delete '4', you just clear the bit;
[1,1,0,0,1,0]
When you're done deleting, go through the adjacency matrix, but ignore any edge that's encoded in a row or column with 0 set.
Full example. Lets say you have
[ {2, [1,3]},
{3, [1]},
{4, [2,3]} ]
That's the adjacency matrix
1 2 3 4
1 0 0 0 0 # no entry for 1
2 1 0 1 0 # 2, [1,3]
3 1 0 0 0 # 3, [1]
4 0 1 1 0 # 4, [2,3]
and the mask
[1 1 1 1]
To delete node 2, you just alter the mask;
[1 0 1 1]
Now, to figure out the structure, pseudocode like:
rows = []
for r in 1..4:
if mask[r] == false:
# this row was deleted
continue;
targets = []
for c in 1..4:
if mask[c] == true && matrix[r,c]:
# this node wasn't deleted and was there before
targets.add(c)
if (!targets.empty):
rows.add({ r, targets})
Adjacency matrices can get large - it's NxN bits, after all - so this will only better on small, dense matrices, not large, sparse ones.
If this isn't great, you might find that it's easier to google for graph algorithms than invent them yourself :)

Efficient Algorithm for Comparing parts of lists containing sets

In my application I need to compare parts of lists of sets to see if they contain the same elements. I have basically the following structure:
List 1 Index Set
1 (1,5)
2 (3,7)
3 ()
4 (1,9,15)
I have something about 20 Lists an more than thousand sets in each list. The Sets in the list can be empty or can contain up to hundreds of elements.
I need to create the union of those sets for different intervals of my lists.
So for example I want to compare intervals of the former list with the follwoing list:
List 2 Index Set
1 (3,6,9)
2 (2)
3 (20)
Comparing Interval List 1 from 2 to 4 with Interval List 2 from 1 to 2 should give (3,9)
Currently I use a brute force method simply running throu both list an comparing each set. Is there a more efficient solution?
Thanks in advance
One approach could be to create for each such list, an auxiliary list, that contains histogram in each index of elements that appeared in sets up to now.
In your example:
List Index histogram
1 [1=1, 5=1]
2 [1=1, 3=1, 5=1, 7=1]
3 [1=1, 3=1, 5=1, 7=1]
4 [1=2, 3=1, 5=1, 7=1, 9=1, 15=1]
Now, given two indices, i,j - you can create the union set of the sets in indices i,i+1,...,j by taking two histograms: hist1=list[i-1], hist2=list[j], and return all elements x such that hist1.get(x) < hist2.get(x), and get the union set without actually iterating the list.
For example, in the above list, if you want to find the union list for indices 2,3,4:
hist1=list[1] = [1=1, 5=1]
hist2=list[4] = [1=2, 3=1, 5=1, 7=1, 9=1, 15=1]
hist2-hist1 = [1=2-1, 3=1-0, 5=1-1, 7=1-0, 9=1-0, 15=1-0] =
= [1=1, 3=1, 5=0, 7=1, 9=1, 15=1]
union_set = {1,3,7,9,15}
This approach is especially useful when sets are considerably smaller than the lists, which seems to be your case.

Resources