Ways to get combination from set of sets - algorithm

I have a set of sets with possible repeats across sets, say {{1, 2, 3, 7}, {1, 2, 4, 5}, {1, 3, 5, 6}, {4, 5, 6}}. I want to know if I can get a specific combination, say {1, 2, 3, 4} by choosing one element from each set, and if so, I want to know how many ways I can do this.
I can do this via bruteforcing (finding all ways to get the first element, then ways to get the second element, and so on), but this seems rather painful and inefficient. Is there a better way to go about this?

You can reduce your problem to maximum bipartite matching (It is equivalent actually).
One the left side you have all the elements of your set. On the right side you have all your sets. You connect a number of your left side with a set on the right iff the number is contained in the set.
Now you can now apply an algorithm like Hopcroft-Karp https://de.wikipedia.org/wiki/Algorithmus_von_Hopcroft_und_Karp to find the maximal matching. If it is as big as your set you have an assignment as you requested, otherwise not.
The number of optimal matchings is NP-hard, see https://www.sciencedirect.com/science/article/pii/S0012365X03002048 : But the enumeration problem for perfect matchings in general graphs (even in bipartite graphs) is NP-hard

Related

Restore the original order based on many incomplete ordered sets

Let's say my original data is
1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12
It got corrupted and all I have is a few incomplete sets where the order is valid but not all elements are present.
1, 4, 6, 7, 8, 11, 12
1, 2, 4, 5, 6, 9, 10, 12
2, 4, 7, 9, 10, 11
4, 7, 9, 12
etc.
I also have the list of all original elements without any order.
I need to restore as much original data as possible. I have no guarantee that I have enough information to restore everything. I need to make the most sense of what I have and figure out which parts are reliable.
There may be complications (but I'd solve the problem without them first):
Order of the incomplete sets is mostly valid but may have a few mistakes here and there, it's written by humans.
I may have additional information for every pair of elements in the incomplete sets like
"there is certainly nothing between 5 and 6",
"there is certainly something else between 7 and 12 but I'm not sure how many and what exactly",
"there may be or may not be anything between 3 and 4",
"there is exactly one unknown item between 7 and 9"
I'd like to incorporate that information into the algorithm to restore more data.
My best idea so far:
Use the incomplete arrays in the sorting function like this: conclude that A > B if there exists an incomplete set in which B precedes A. If there is no set in which both A and B are present return that A == B.
What I don't like about it is that I have no idea which parts are completely restored and which are random. To kind of help that I'm going to shuffle the original list of elements, sort again and see which elements change place and which don't. And do that a few thousand times (The number of elements in the list is < 50 so I can use the most bulldozerish methods on this problem)
Any better suggestions?
Build directed graph from your incomplete sets and make topological sort
Some errors might be found as cycles (there are no cycles in directed acyclic graph)
Build a directed graph as MBo suggests. If the resulting graph is a directed acyclic graph (DAG), that counts as validation of the original data, and it's possible to perform a topological sort to recover information about the original order.
Some of the additional information can be incorporated into the graph, if you think all your information is reliable. For example, "there is certainly nothing between 5 and 6" means (if it is understood that 5 -> 6) that every edge of the graph from v to 6 (v not equal to 5) can be replaced by an edge from v to 5. "There may or may not be anything between 3 and 4": all this tells us is 3 -> 4, if it even says that much.
The other information is harder to use. "Something between 7 and 12" can be incorporated into the digraph as 7 -> 12, but the "something" part can't, as far as I can see. There might be a way to use it by enlarging the graph to include "something" vertices, but I can't get it to work. Instead, I recommend getting your topsort algorithm to spit out every topsort (provided there aren't too many) and evaluating them by how many additional constraints they are consistent with. As a bonus you'll find out how many different answers are possible. You can also use it while you are topsorting, e.g., if you're looking for an item to come immediately after 7, don't pick 12, but that feels messy to me, and you won't get any outcome at all if the information is contradictory.
If the resulting graph is not a DAG, you can still separate it into strongly connected components (e.g., Tarjan's algorithm). The strongly connected components are the parts that are not reliable. The strongly connected components will themselves form a DAG which can be topsorted, but each component of size bigger than 1 vertex will need some further special treatment. One way to handle this is to try to find a minimum feedback arc set, i.e., the minimum number of edges to eliminate in a strongly connected component to turn it into a DAG. The minimum feedback arc set problem is NP-hard, but the problem is "fixed-parameter tractable": http://dl.acm.org/citation.cfm?doid=1411509.1411511. Less reasoned approaches will probably work, too, like identifying a cycle and removing a random edge in the cycle until there are no more cycles.
I suppose a some algorithm can be if you take a longest string and try to compare this string to all others by two neithebor symbols( begin and end included too): if there is something between it in other strings add this to the longest string, and start again. If nothing to add check next pair.

Split Algorithm on C++

I have an array with 8 elements:
a[8] = {9, 7, 6, 2, 3, 1, 5, 4}
I want to divide 8 elements to 3 group. Each group is the sum of 1 or more element. The sum of each group is most similar.
You are describing the k-partition problem with k=3.
Unfortunately, this problem is known to be (strong) NP-Hard, so there is no known efficient solution to it (and the general belied is one does not exist).
Your best hope will be brute force search: create all partitions to 3 groups, and choose the best one out of them. If you are dealing with 8 elements - that should be possible, but it will quickly become too slow for larger arrays I am afraid.

Overlapping Intervals and Minimum Values

How can you determine whether there exists a set of values that satisfy certain given criteria. The criteria are in the form of intervals and the minimum value within that interval.
For instance, given the criteria:
Interval : Minimum value in that interval
{2, 2} : 5
{1, 4} : 1
{4, 4} : 4
One set of values that can satisfy it is
{1, 5, 1, 4}
On the other hand, given the critera:
Interval : Minimum value in that interval
{2, 3} : 1
{1, 4} : 2
{4, 4} : 4
There exists no such set of values that satisfies them.
I want to determine whether there exists a set of values that satisfy the given criteria (i.e., I only want to find an algorithm that returns true if there exists a set of values that do satisfy the criteria and false if there does not).
I know how to do this using an O(N^2) brute-force, but I want to achieve an O(N lgN) solution if possible.
My first attempt at solving this involved merging overlapping intervals and then checking the merged intervals for the lowest value, however I quickly realized that doing so does not necessarily guarantee a correct answer.
My second attempt involved segment trees, namely trying to assign values to each values and if you were trying to overwrite an interval then no such interval existed. However, this was also quickly abandoned as you can still achieve at a valid set of values even when some portions are overwritten.
My third attempt involved interval trees, trying to find the intersection points between two intervals and checking whether a valid set of values could be created. This seemed promising but was an O(N^2) algorithm so was also abandoned.
Can anyone provide some insight?
The idea to assign values is correct. You just need to sort intervals by their minumum value(in increasing order). That is, the solution looks like this:
Build a segment tree(with -infinity values in all nodes).
Process all intervals in sorted order. For each interval, just assign its value(no matter what there was before).
Run queries for all intervals to check that everything is correct.
The only non-trivial statement is: if this algorithm did not find a solution, there is no solution. I will not post a formal proof, but here are key observation:
We must assign a new value to the entire interval when we process them in sorted order.
We do not assign a new value anywhere else(that is, we cannot destroy the value for another interval accidently).

canceling arrays by number of items that I am ready to lose

We are writing c# program that will help us to remove some of unnecessary data repeaters and already found some repeaters to remove with help of this Finding overlapping data in arrays. Now we are going to check maybe we can to cancel some repeaters by other term. The question is:
We have arrays of numbers
{1, 2, 3, 4, 5, 6, 7, ...}, {4, 5, 10, 100}, {100, 1, 20, 50}
some numbers can be repeated in other arrays, some numbers can be unique and to belong only to specific array. We want to remove some arrays when we are ready to lose up to N numbers from the arrays.
Explanation:
{1, 2}
{2, 3, 4, 5}
{2, 7}
We are ready to lose up to 3 numbers from these arrays it means that we can remove array 1 cause we will lose only number "1" it's only unique number. Also we can remove array 1 and 3 cause we will lose numbers "1", "7" or array 3 cause we will lose number "7" only and it less than 3 numbers.
In our output we want to give maximum amount of arrays that can be removed with condition that we going to lose less then N where N is number of items we are ready to lose.
This problem is equivalent to the Set Cover problem (e.g.: take N=0) and thus efficient, exact solutions that work in general are unlikely. However, in practice, heuristics and approximations are often good enough. Given the similarity of your problem with Set Cover, the greedy heuristic is a natural starting point. Instead of stopping when you've covered all elements, stop when you've covered all but N elements.
You need to first get a number for each array which tells you hwo many numbers are unique to that particular array.
An easy way to do this is O(n²) since for each element, you need to check through all arrays if it's unique.
You can do this much more efficiently by having sorted arrays, sorting first or using a heap-like data structure.
After that, you only have to find a sum so that the numbers for a certain amount of arrays sum up to N.That's similar to the subset sum problem, but much less complex because N > 0 and all your numbers are > 0.
So you simply have to sort these numbers from smallest to greatest and then iterate over the sorted array and take the numbers as long as the sum < N.
Finally, you can remove every array that corresponds to a number which you were able to fit into N.

In a set of frequency values, how to find some optimal subsets of harmonically related frequency values?

I have a set of frequency values and I'd like to find the most likely subsets of values meeting the following conditions:
values in each subset should be harmonically related (approximately multiples of a given value)
the number of subsets should be as small as possible
every subset should have a minimum number of missing harmonics smaller than the highest value
E.g. [1,2,3,4,10,20,30] should return [1,2,3,4] and [10,20,30] (a set with all the values is not optimal because, even if they are harmonically related, there are many missing values)
The brute force method could be to compute all the possible subsets of values in the sets and compute some cost value, but that would take way too long time.
Is there any efficient algorithm to perform this task (or something similar)?
I would reduce the problem to minimum set cover, which, although NP-hard, often is efficiently solvable in practice via integer programming. I'm assuming that it would be reasonable to decompose [1, 2, 3, 4, 8, 12, 16] as [1, 2, 3, 4] and [4, 8, 12, 16], with 4 repeating.
To solve set cover (well, to use stock integer-program solvers, anyway), we need to enumerate all of the maximal allowed subsets. If the fundamental (i.e., the given value) must belong to the set, then, for each frequency, we can enumerate its multiples in order until too many in a row are missing. If not, we try all pairs of frequencies, assume that their fundamental is their approximate greatest common divisor, and extend the subset downward and upward until too many frequencies are missing.

Resources