Pecking order of pigeons? - algorithm

I was going though problems on graph theory posted by Prof. Ericksson from my alma-mater and came across this rather unique question about pigeons and their innate tendency to form pecking orders. The question goes as follows:
Whenever groups of pigeons gather,
they instinctively establish a pecking
order. For any pair of pigeons, one
pigeon always pecks the other, driving
it away from food or potential mates.
The same pair of pigeons always
chooses the same pecking order, even
after years of separation, no matter
what other pigeons are around.
Surprisingly, the overall pecking
order can contain cycles—for example,
pigeon A pecks pigeon B, which pecks
pigeon C, which pecks pigeon A.
Prove that any finite set of pigeons
can be arranged in a row from left to
right so that every pigeon pecks the
pigeon immediately to its left.
Since this is a question on Graph theory, the first things that crossed my mind that is this just asking for a topological sort of a graphs of relationships(relationships being the pecking order). What made this a little more complex was the fact that there can be cyclic relationships between the pigeons. If we have a cyclic dependency as follows:
A->B->C->A
where A pecks on B,B pecks on C and C goes back and pecks on A
If we represent it in the way suggested by the problem, we have something as follows:
C B A
But the above given row ordering does not factor in the pecking order between C and A.
I had another idea of solving it by mathematical induction where the base case is for two pigeons arranged according to their pecking order, assuming the pecking order arrangement is valid for n pigeons and then proving it to be true for n+1 pigeons.
I am not sure if I am going down the wrong track here. Some insights into how I should be analyzing this problem will be helpful.
Thanks

I would prove that using induction indeed (a>b means a peacks b):
for k=2 it obviously holds
let for k=n there's always required order, lets prove that it exists for n+1. Choose and order any n pigeons (A1>A2...>An) from given n+1. And let C is a (n+1)th pigeon.
If C pecks A1 then it can be added to the start of the "line" and statement proved. If A1 pecks C then lets compare C with A2 - if C pecks A2 then it can be inserted between A1 and A2 and statement holds. If not - repeat that comparing process till last pair - A(n-1) and An, as process goes we assume that A(n-1) > C. If C>An then C can be inserted between A(n-1) and An, if not - it can be inserted to the end of the "line".
qed
P.S. Note that "pecking cycles" do not necessarily exist - if we assign pigeons number from 1 to n and assume that pigeon pecks another if his number is greater then we obviously can order them in line but not in circle so that each pigeon pecks his left neighbour.
P.P.S. That proof also gives an algorithm to construct the required order.

Have you considered constructing a directed graph then looking for a Hamiltonian Path that visits every point (pigeon) once? The Hamiltonian path should reveal the sequence - this isn't a proof, though. Just a solution.

Related

Algorithm for random sampling under multiple no-repeat conditions

I ran into the following issue:
So, I got an array of 100-1000 objects (size varies), e.g.something like
[{one:1,two:'A',three: 'a'}, {one:1,two:'A',three: 'b'}, {one:1,two:'A',three: 'c'}, {one:1,two:'A',three: 'd'},
{one:1,two:'B',three: 'a'},{one:2,two:'B',three: 'b'},{one:1,two:'B',three: ':c'}, {one:1,two:'B',three: 'd'},
{one:1,two:'C',three: 'a'},{one:1,two:'C',three: 'b'},{one:1,two:'C',three: ':c'}, {one:2,two:'C',three: 'd'},
{one:1,two:'C',three: 'a'},{one:1,two:'C',three: 'b'},{one:2,two:'C',three: ':c'}, {one:1,two:'C',three: 'd'},...]
The value for 'one' is pretty much arbitrary. 'two' and 'three' have to be balanced in a certain way: Basically, in the above, there is some n, such that n=4 times 'A'. 'B','C','D','a','b','c' and 'd' - and such an n exists in any variant of this problem. It is just not clear what the n is, and the combinations themselves can also vary (e.g. if we only had As and Bs, [{1,A,a},{1,A,a},{1,B,b},{1,B,b}] as well as [{1,A,a},{1,A,b},{1,B,a},{1,B,b}] would both be possible arrays with n=2).
What I am trying to do now, is randomise the original array with the condition that there cannot be repeats in close order for some keys, i.e. the value of 'two' and 'three' for an object at index i-1 cannot be the same as the value of same attribute for the object at index i (and that should be true for all or as many objects as possible), i.e. [{1,B,a},{1,A,a},{1,C,b}] would not be allowed, [{1,B,a},{1,C,b},{1,A,a}] would be allowed.
I tried some brute-force method (randomise all, then push wrong indexes to the back) that works rarely, but it mostly just loops infinitely over the whole array, because it never ends up without repeats. Not sure, if this is because it is generally mathematically impossible for some original arrays, or if it is just because my solution sucks.
By now, I've been looking for over a week, and I am not even sure how to approach this.
Would be great, if someone knew a solution for this problem, or at least a reason why it isn't possible. Any help is greatly appreciated!
First, let us dissect the problem.
Forget for now about one, separate two and three into two independent sequences (assuming they are indeed independent, and not tied to each other).
The underlying problem is then as follows.
Given is a collection of c1 As, c2 Bs, c3 Cs, and so on. Place them randomly in such a way that no two consecutive letters are the same.
The trivial approach is as follows.
Suppose we already placed some letters, and are left with d1 As, d2 Bs, d3 Cs, and so on.
What is the condition when it is impossible to place the remaining letters?
It is when the count for one of the letters, say dk, is greater than one plus the sum of all other counts, 1 + d1 + d2 + ... excluding dk.
Otherwise, we can place them as K . K . K . K ..., where K is the k-th letter, and dots correspond to any letter except the k-th.
We can proceed at least as long as dk is still the greatest of the remaining quantities of letters.
So, on each step, if there is a dk equal to 1 + d1 + d2 + ... excluding dk, we should place the k-th letter right now.
Otherwise, we can place any other letter and still be able to place all others.
If there is no immediate danger of not being able to continue, adjust the probabilities to your liking, for example, weigh placing k-th letter as dk (instead of uniform probabilities for all remaining letters).
This problem smells of NP complete and lots of hard combinatorial optimization problems.
Just to find a solution, I'd always place as the next element the remaining element that can be placed which as few possible remaining elements can be placed next to. In other words try to get the hardest elements out of the way first - if they run into a problem, then you're stuck. If that works, then you're golden. (There are data structures like a heap which can be used to find those fairly efficiently.)
Now armed with a "good enough" solver, I'd suggest picking the first element randomly until the solver can solve the rest. Repeat. If at any time you find it takes too many guesses, just go with what the solver did last time. That way all the way you know that there IS a solution, even though you are trying to do things randomly at every step.
Graph
As I understand it, one does not play a role in constraints, so I'll label {one:1,two:'A',three: 'a'} with Aa. Thinking of objects as vertices, place them on a graph. Place edges whenever two respective vertices can be beside each other. For [{1,A,a},{1,A,a},{1,B,b},{1,B,b}] it would be,
and for [{1,A,a},{1,A,b},{1,B,a},{1,B,b}],
The problem becomes: select a random Hamiltonian path, (if possible.) For the loop, it would be any path on the circuit [Aa, Bb, Aa, Bb] or the reverse. For the disconnected lines, it is not possible.
Possible algorithm
I think, to be uniformly random, we would have to enumerate all the possibilities and choose one at random. This is probably infeasible, even at 100 vertices.
A näive algorithm that relaxes the uniform criterion, I think, would be to select (a) random point that does not split the graph in two. Then select (b) random neighbour of (a) that does not split the graph in two. Remove (a) to the solution. (a) = (b). Keep going until the end or backtrack when there are no moves, (if possible.) There may be further heuristics that could cut down the branching factor.
Example
There are no vertices that would disconnect the graph, so choosing Ab uniformly at random.
The neighbours of Ab are {Ca, Bc, Ba, Cc} of which Ca is chosen randomly.
Ab splits the graph, so we must choose Bc.
The only choice left is which of Cc and Ba comes first. We might end up with: [Ab, Ca, Bc, Ab, Ba, Cc].

What is a sorting algorithm that is robust to a faulty comparison?

I want to sort a list of n items with a comparison sort. However, one of the comparisons made by the algorithm will be flipped from what it's supposed to be. Specifically, there is one pair of items for which the comparator function consistently gives the wrong result.
What is a efficient n*log(n) sorting algorithm that will be robust to this faulty comparison? By robust, I mean that every item is off by at most k spots from its true position, for some reasonably small k.
If possible, I'd like it to be robust in the worst case (faulty comparison chosen adversarially), but I'll settle for robust in the average case.
An example robust algorithm (that's not efficient), would be to make all n*(n-1)/2 pairwise comparisons, and place each item by how many of the comparisons they won. Then, no matter what comparison the adversary makes, each items index will be off by no more than k=1.
An example of a NON-robust algorithm is quicksort, because the adversary could just choose the largest item to be on the wrong side of the first pivot, making it on average n/2 spots off from its correct index.
TL;DR: It's possible to modify quicksort to get the following guarantee: in (expected) time O(n log n), we can do one of the following, depending on which comparison is flipped.
Perfectly sort the array.
Perfectly sort the array, except that an adjacent pair of items somewhere in the array is swapped.
Perfectly sort the array, except that three consecutive items in the array, which can be identified, are permuted.
This guarantees a maximum displacement of 2, which is as good as is theoretically possible.
I mulled over this problem for a couple of hours and everything I'm doing connects back to tournaments.
I'd like to begin by trying to reframe the question as follows. If you have a set of n items and you know the "true" results of the comparisons between them, you can represent that result as a directed graph with one node per item and edges indicating when one item compares less than another. This type of digraph is called a "tournament," since you can think of it as encoding the result of a round-robin tournament where each player plays each other player.
In the case of an honest comparator, our tournament will be acyclic, and in particular it will have the following key property: there's exactly one node of each outdegree 0, 1, 2, ..., n - 1. The idea here is that the smallest element will have outdegree n - 1 (it's smaller than everything else), while the largest element will have outdegree 0 (it's bigger than everything else). And in fact, there's a theorem that a tournament is acyclic if and only if each node in the tournament has a different outdegree. Another useful fact: in an acyclic tournament, there's an edge from U to V if and only if outdeg(U) > outdeg(V).
In the case of a "dishonest comparator," we essentially start with an acyclic tournament, then flip a single edge. Your question asked about doing approximate sorting based on this comparator, but I'd like to step back and ask a different question, which I think can then be used to answer yours more precisely. In what cases can you figure out which edge was flipped? If we can do that, then we can do even better than approximate sorting - we can "unflip" the edge and sort perfectly. On the other hand, in which cases can you not figure out which edge was flipped, and when that happens, how far from sorted will we end up? That corresponds to having to do an approximate sort because we can't recover the original ordering.
Here's a useful fact:
Theorem: Begin with an acyclic tournament and flip a single edge. Then it's possible to determine which edge was flipped if and only if the outdegrees of the two endpoints of the flipped edge originally differ by at least three.
To prove this, we'll show both directions of implication.
First, suppose that we flip an edge between two nodes X and Y whose outdegrees differ by one. When we're done, we're left with a tournament where all nodes have different outdegrees (all other nodes have their outdegrees unchanged, and if we flipped the edge (X, Y), then X and Y swap outdegrees because one goes up by one and one goes down by one). We're now left with another acyclic tournament. And in particular, we can't tell which edge we flipped, because we could have just as well flipped any edge between any pair of nodes whose outdegrees differ by one.
Next, suppose we flip an edge between nodes X and Y where the outdeg(X) = k+1 and outdeg(Y) = k-1. We now have outdeg(X) = k = outdeg(Y), and somewhere else to begin with there must have been some node Z with outdegree k as well. So at this point, we have three nodes of outdegree k (namely, X, Y, and Z), and we know that we must have flipped one of the three edges between them. But we can't tell which one it was. Specifically, flipping the XY edge, or the XZ edge, or the YZ edge would all give back acyclic tournaments. So in that case, there's no way to undo the transform. That means that any sorted ordering we get from this comparator will have those two items out of place, so we'd have a maximum distance of at least 1.
An important note for this particular case: this corresponds to the comparator creating a tournament with exactly one cycle containing the nodes X, Y, and Z. Specifically, it'll take on the form X, Z, Y, X. The problem is we can't tell whether the original ordering was (X, Z, Y), or (Z, Y, X), or (Y, X, Z), and so we'd have a maximum distance of at least 2.
And finally, suppose that we have two nodes X and Y and flip the edge XY in the case where outdeg(X) = k, outdeg(Y) = m, and k ≥ m + 3. We're now left with a tournament in which two nodes have outdegree k - 1 and two nodes have outdegree m + 1. But of those four nodes, it's guaranteed that there's exactly one pair of them that can be flipped back to produce an acyclic tournament. One way to see this: take the four nodes that now have repeated outdegrees; call them X and Y (as above) and also W and Z, and suppose we have the cycle X, W, Z, Y, X, where the only flipped edge from the original is (Y, X). What will this cycle look like? Well, since (X, W), (W, Z), and (Z, Y) are edges in the tournament that weren't flipped, back in the original tournament we have outdeg(X) > outdeg(W) > outdeg(Z) > outdeg(Y). That means that we have to have X and W having outdegree k - 1 in the new graph and Z and Y having outdegree m + 1 in the new graph. Therefore, only flipping the edge from Y to X will increase the degree of one of the degree-(k-1) nodes back up to k while also decreasing the degree of one of the degree-(m+1) nodes down to m.
Summarizing:
Theorem: The faulty comparator will either
Behave as a real comparator, in which case we swapped two adjacent elements in the original sequence and we will never know which.
Have exactly one cycle of length three of elements whose original ordering can never be known, or
Have a cycle of length four, in which case we can identify which comparison is reversed.
With this in mind, it seems reasonable to reframe your problem in the following way:
Goal: Design an algorithm that, in time O(n log n), does one of the following things to a list of n elements given a faulty comparator that returns the wrong result when comparing two fixed elements X and Y against one another:
Perfectly sort the list.
Perfectly sort the list, except with two adjacent items swapped.
Perfectly sort the list, except with three adjacent items permuted.
Here's one possible algorithm that does this in expected O(n log n) time that's based on quicksort. The basic idea is the following: we run more or less a regular quicksort, at each point in time checking to see whether we found a triangle. If not, then either we're in case (1) or case (2). If we do find a triangle, we see whether we can identify which comparison got reversed. If we can, then we rerun quicksort, except that we "fix" the comparator in this broken case. If we can't, then we're in case (3) and just finish quicksort as usual.
The specific technique we'll use to detect a triangle works like this. Begin with a regular, vanilla quicksort: pick a pivot, partition the array into things less than the pivot and things bigger than the pivot, then recursively sort the two smaller subarrays. However, after doing so, we do one additional step: assuming the subarray we're sorting has three or more elements in it, look at the pivot p and the element just before and just after it (call those s, p, g for "smaller," "pivot," and "greater"). Then if the comparator says s < p < g < s, we've found a triangle. And in fact, we have something stronger.
Suppose that at some point in quicksort comparator does indeed compare X and Y, the mismatched items. We're assuming X < Y, but that the comparator incorrectly reports that Y < X. The only way that two items can be compared in quicksort is if one of them is a pivot element at a time when the other is in the current subarray. Without loss of generality, let's assume that X was the pivot, and that Y was compared against it.
What should happen here, assuming the comparator was honest, is that Y would be found to be larger than X, and therefore would be placed into the "bigger" subarray. But because the comparator is a lying liar who lies, instead Y gets placed into the "smaller" subarray. If we then recursively sort the "smaller" subarray and the "bigger" subarray, think about where Y will end up. It's in the "smaller" subarray but is actually bigger than X, which means it'll compare larger than everything in that "smaller" subarray. Consequently, Y will appear just before X. Now, look at the items in the "bigger" subarray. There are two options. The first is that in the "real" ordering, there's at least one value between X and Y. That value would then appear in the "bigger" subarray because it's larger than X, and in particular the first element of the "bigger" subarray would compare smaller than Y. That would mean that Y, then X, then the item immediately after X after sorting would form a triangle. The other option is that X and Y are adjacent in the true sorted ordering, which case we'd never find out (as mentioned above). This, combined with the above insight, means that
Theorem: Suppose we run quicksort, and after recursively sorting the left and right subarrays we look at the three items consisting of the pivot, the item just before it, and the item just after it to see if they form a triangle. Then if this algorithm detects a triangle, a triangle exists. Moreover, if this algorithm does not detect a triangle, then either (1) no triangle exists or (2) a triangle does exist, but the comparator was never applied to the bad pair (X, Y) and so the sorted order is correct.
With all this said and done, we can state the full algorithm that, in expected O(n log n) time, sorts the array as best as is possible.
function modifiedQuicksort(array, comparator):
if array has length 0 or 1, return.
pick a random pivot element from the array.
use the comparator to form subarrays smaller and greater based on
how elements compare against the pivot.
recursively apply modifiedQuicksort to those two arrays.
if the comparator finds a triangle formed from the last element of
smaller, the pivot, and the first element of greater, report those
three items as a triangle.
return smaller, pivot, greater.
function sortAsBestWeCan(array, comparator):
run modifiedQuicksort(array, comparator)
if it didn't report a triangle, return the result of the call.
otherwise, it reported a triangle A, B, C.
for each other item D:
if comparator(A, D) and comparator(D, B) or
comparator(B, D) and comparator(D, C) or
comparator(C, D) and comparator(D, A):
you have found a 4-cycle from A, B, C, and D.
detect which comparison is reversed.
use that knowledge plus the comparator and your favorite
O(n log n)-time sorting algorithm to perfectly sort
the input array.
otherwise, those three items are the only triangle, and the
array is sorted as well as it can be. return it.
I think I've thought up a solution.
First, do a first pass with any decent sorting algorithm you want (like quicksort), which should, at worst, result in only one item that's significantly far from where it should be.
Then, choose a width h that's at least 5.
for i from 0 to n-h, we look at the group of h items at i, i+1, ..., i+h-1. We make all h*(h-1)/2 pairwise comparisons in that group, and rearrange them by who won the most comparisons. We then increment i and move onto the next group.
Afterwards, we do the same thing, but going backwards from i=n-h to i=0.
These two extra passes will bubble up/bubble down the displaced item to be in the correct area, and uses the extra comparisons in a group of h to override the faulty single comparison.
The final number of comparisons will be O(n*log(n)) + n*h*(h-1)/2. Not sure how much better you can do.
This method also works (I think) for more than one faulty comparison. All you need to do is make sure that h is large enough to override those faulty comparisons.

Select the most elements that do not overlap so that the sum of their size is maximized

I'm trying to find an algorithm to the following problem.
Say I have a number of objects A, B, C,...
I have a list of valid combinations of these objects. Each combination is of length 2 or 4.
For eg. AF, CE, CEGH, ADFG,... and so on.
For combinations of two objects, eg. AF, the length of the combination is 2. For combination of four objects, eg CEGH, the length of the combination.
I can only pick non-overlapping combinations, i.e. I cannot pick AF and ADFG because both require objects 'A' and 'F'. I can pick combinations AF and CEGH because they do not require common objects.
If my solution consists of only the two combinations AF and CEGH, then my objective is the sum of the length of the combinations, which is 2 + 4 = 6.
Given a list of objects and their valid combinations, how do I pick the most valid combinations that don't overlap with each other so that I maximize the sum of the lengths of the combinations? I do not want to formulate it as an IP as I am working with a problem instance with 180 objects and 10 million valid combinations and solving an IP using CPLEX is prohibitively slow. Looking for some other elegant way to solve it. Can I perhaps convert this to a network? And solve it using a max-flow algorithm? Or a Dynamic program? Stuck as to how to go about solving this problem.
My first attempt at showing this problem to be NP-hard was wrong, as it did not take into account the fact that only combinations of size 2 or 4 were allowed. However, using Jim D.'s suggestion to reduce from 3-dimensional matching (3DM), we can show that the problem is nevertheless NP-hard.
I'll show that the natural decision problem form of your problem ("Given a set O of objects, and a set C of combinations of either 2 or 4 objects from O, and an integer m, does there exist a subset D of C such that all sets in D are pairwise disjoint, and the union of all sets in D has size at least m?") is NP-hard. Clearly the optimisation problem (i.e., your original problem, where we seek an actual subset of combinations that maximises m above) is at least as hard as this problem. (To see that the optimisation problem is not "much" harder than the decision problem, notice that you could first find the maximum m value for which a solution exists using a binary search on m in which you solve a decision problem at each step, and then, once this maximal m value has been found, solving a series of decision problems in which each combination in turn is removed: if the solution after removing some particular combination is still "YES", then it may also be left out of all future problem instances, while if the solution becomes "NO", then it is necessary to keep this combination in the solution.)
Given an instance (X, Y, Z, T, k) of 3DM, where X, Y and Z are sets that are pairwise disjoint from each other, T is a subset of X*Y*Z (i.e., a set of ordered triples with first, second and third components from X, Y and Z, respectively) and k is an integer, our task is to determine whether there is any subset U of T such that |U| >= k and all triples in U are pairwise disjoint (i.e., to answer the question, "Are there at least k non-overlapping triples in T?"). To turn any such instance of 3DM into an instance of your problem, all we need to do is create a fresh 4-combination from each triple in T, by adding a distinct dummy value to each. The set of objects in the constructed instance of your problem will consist of the union of X, Y, Z, and the |T| dummy values we created. Finally, set m to k.
Suppose that the answer to the original 3DM instance is "YES", i.e., there are at least k non-overlapping triples in T. Then each of the k triples in such a solution corresponds to a 4-combination in the input C to your problem, and no two of these 4-combinations overlap, since by construction, their 4th elements are all distinct, and by assumption of the. Thus there are at least m = k non-overlapping 4-combinations in the instance of your problem, so the solution for that problem must also be "YES".
In the other direction, suppose that the solution to the constructed instance of your problem is "YES", i.e., there are at least m non-overlapping 4-combinations in C. We can simply take the first 3 elements of each of the 4-combinations (throwing away the fourth) to produce a set of k = m non-overlapping triples in T, so the answer to the original 3DM instance must also be "YES".
We have shown that a YES-answer to one problem implies a YES-answer to the other, thus a NO-answer to one problem implies a NO-answer to the other. Thus the problems are equivalent. The instance of your problem can clearly be constructed in polynomial time and space. It follows that your problem is NP-hard.
You can reduce this problem to the maximum weighted clique problem, which is, unfortunately, NP-hard.
Build a graph such that every combination is a vertex with weight equal to the length of the combination, and connect vertices if the corresponding combinations do not share any object (i.e. if you can pick both them at the same time). Then, a solution is valid if and only if it is a clique in that graph.
A simple search on google brings up a lot of approximation algorithms for this problem, such as this one.

Efficient algorithm for eliminating nodes in "graph"?

Suppose I have a a graph with 2^N - 1 nodes, numbered 1 to 2^N - 1. Node i "depends on" node j if all the bits in the binary representation of j that are 1, are also 1 in the binary representation of i. So, for instance, if N=3, then node 7 depends on all other nodes. Node 6 depends on nodes 4 and 2.
The problem is eliminating nodes. I can eliminate a node if no other nodes depend on it. No nodes depend on 7; so I can eliminate 7. After eliminating 7, I can eliminate 6, 5, and 3, etc. What I'd like is to find an efficient algorithm for listing all the possible unique elimination paths. (that is, 7-6-5 is the same as 7-5-6, so we only need to list one of the two). I have a dumb algorithm already, but I think there must be a better way.
I have three related questions:
Does this problem have a general name?
What's the best way to solve it?
Is there a general formula for the number of unique elimination paths?
Edit: I should note that a node cannot depend on itself, by definition.
Edit2: Let S = {s_1, s_2, s_3,...,s_m} be the set of all m valid elimination paths. s_i and s_j are "equivalent" (for my purposes) iff the two eliminations s_i and s_j would lead to the same graph after elimination. I suppose to be clearer I could say that what I want is the set of all unique graphs resulting from valid elimination steps.
Edit3: Note that elimination paths may be different lengths. For N=2, the 5 valid elimination paths are (),(3),(3,2),(3,1),(3,2,1). For N=3, there are 19 unique paths.
Edit4: Re: my application - the application is in statistics. Given N factors, there are 2^N - 1 possible terms in statistical model (see http://en.wikipedia.org/wiki/Analysis_of_variance#ANOVA_for_multiple_factors) that can contain the main effects (the factors alone) and various (2,3,... way) interactions between the factors. But an interaction can only be present in a model if all sub-interactions (or main effects) are present. For three factors a, b, and c, for example, the 3 way interaction a:b:c can only be in present if all the constituent two-way interactions (a:b, a:c, b:c) are present (and likewise for the two-ways). Thus, the model a + b + c + a:b + a:b:c would not be allowed. I'm looking for a quick way to generate all valid models.
It seems easier to think about this in terms of sets: you are looking for families of subsets of {1, ..., N} such that for each set in the family also all its subsets are present. Each such family is determined by the inclusion-wise maximal sets, which must be overlapping. Families of pairwise overlapping sets are called Sperner families. So you are looking for Sperner families, plus the union of all the subsets in the family. Possibly known algorithms for enumerating Sperner families or antichains in general are useful; without knowing what you actually want to do with them, it's hard to tell.
Thanks to #FalkHüffner's answer, I saw that what I wanted to do was equivalent to finding monotonic Boolean functions for N arguments. If you look at the figure on the Wikipedia page for Dedekind numbers (http://en.wikipedia.org/wiki/Dedekind_number) the figure expresses the problem graphically. There is an algorithm for generating monotonic Boolean functions (http://www.mathpages.com/home/kmath094.htm) and it is quite simple to construct.
For my purposes, I use the algorithm, then eliminate the first column and last row of the resulting binary arrays. Starting from the top row down, each row has a 1 in the ith column if one can eliminate the ith node.
Thanks!
You can build a "heap", in which at depth X are all the nodes with X zeros in their binary representation.
Then, starting from the bottom layer, connect each item to a random parent at the layer above, until you get a single-component graph.
Note that this graph is a tree, i.e., each node except for the root has exactly one parent.
Then, traverse the tree (starting from the root) and count the total number of paths in it.
UPDATE:
The method above is bad, because you cannot just pick a random parent for a given item - you have a limited number of items from which you can pick a "legal" parent... But I'm leaving this method here for other people to give their opinion (perhaps it is not "that bad").
In any case, why don't you take your graph, extract a spanning-tree (you can use Prim algorithm or Kruskal algorithm for finding a minimal-spanning-tree), and then count the number of paths in it?

Algorithm: Removing as few elements as possible from a set in order to enforce no subsets

I got a problem which I do not know how to solve:
I have a set of sets A = {A_1, A_2, ..., A_n} and I have a set B.
The target now is to remove as few elements as possible from B (creating B'), such that, after removing the elements for all 1 <= i <= n, A_i is not a subset of B'.
For example, if we have A_1 = {1,2}, A_2 = {1,3,4}, A_3={2,5}, and B={1,2,3,4,5}, we could e.g. remove 1 and 2 from B (that would yield B'={3,4,5}, which is not a superset of one of the A_i).
Is there an algorithm for determining the (minimal number of) elements to be removed?
It sounds like you want to remove the minimal hitting set of A from B (this is closely related to the vertex cover problem).
A hitting set for some set-of-sets A is itself a set such that it contains at least one element from each set in A (it "hits" each set). The minimal hitting set is the smallest such hitting set. So, if you have an MHS for your set-of-sets A, you have an element from each set in A. Removing this from B means no set in A can be a subset of B.
All you need to do is calculate the MHS for (A1, A2, ... An), then remove that from B. Unfortunately, finding the MHS is an NP-complete problem. Knowing that though, you have a few options:
If your data set is small, do the obvious brute-force solution
Use a probabilistic algorithm to get a fast, approximate answer (see this PDF)
Run far, far away in the opposite direction
If you just need some approximation, start with the smallest set in A, and remove one element from B. (You could just grab one at random, or check to see which element is in the most sets in A, depending on how accurate, how fast you need)
Now the smallest set in A isn't a subset of B. Move on from there, but check first to see whether or not the sets you're examining are subsets at this point or not.
This reminds me of the vertex covering problem, and I remember some approximation algorithm for that that is similar to this one.
I think you should find the minimum length from these sets and then delete these elements which is in this set.

Resources