Related
I am using clingo to solve flood-it problems. I use the predicate frontier([CELL], [COLOR], [TIMESTEP]) to keep track of all cells that are neighbors of the flood. The set of frontiers could look something like this:
frontier(c(1,3),2,3) frontier(c(2,1),2,3) frontier(c(2,2),3,3) frontier(c(2,3),3,3) frontier(c(3,1),3,3) frontier(c(3,2),3,3) frontier(c(4,1),3,3)
We can split this set in two subsets. One where each color value is 2 or 3 respectively. What I need is basically two things:
Determine which subset is bigger, i.e. if there are more cells with color value 2 or 3 (BTW the number of colors is not fixed, thus a solution has to be generic)
Get the color value of a member of the biggest set
How can I compare the cardinalities of n (n>=2) sets in predicate logic?
Thank you in advance!
I found an answer which is more domain (i.e. clingo) specific than general.
What I initially do is count the number of cells that are of color C:
frontier_subset_size(C,N) :- color(C), N = #count{ X : frontier(X,C) }.
Then I filter the biggest set(s) using the #max aggregate:
max_subset_color(C) :- frontier_subset_size(C,N), N = #max{ M : frontier_subset_size(_,M) }.
This works as desired for this specific problem.
Yet I would like to know how to do that in pure predicate logic.
Suppose I have the data set {A,B,C,D}, of arbitrary type, and I want to compare it to another data set. I want the comparison to be true for {A,B,C,D}, {B,C,D,A}, {C,D,A,B}, and {D,A,B,C}, but not for {A,C,B,D} or any other set that is not ordered similarly. What is a fast way to do this?
Storing them in arrays,rotating, and doing comparison that way is an O(n^2) task so that's not very good.
My first intuition would be to store the data as a set like {A,B,C,D,A,B,C} and then search for a subset, which is only O(n). Can this be done any faster?
There is a fast algorithm for finding the minimum rotation of a string - https://en.wikipedia.org/wiki/Lexicographically_minimal_string_rotation. So you can store and compare the minimum rotation.
One option is to use a directed graph. Set up a graph with the following transitions:
A -> B
B -> C
C -> D
D -> A
All other transitions will put you in an error state. Thus, provided each member is unique (which is implied by your use of the word set), you will be able to determine membership provided you end on the same graph node on which you started.
If a value can appear multiple times in your search, you'll need a smarter set of states and transitions.
This approach is useful if you precompute a single search and then match it to many data points. It's not so useful if you have to constantly regenerate the graph. It could also be cache-inefficient if your state table is large.
Well Dr Zoidberg, if you are interested in order, as you are, then you need to store your data in a structure that preserves order and also allows for easy rotation.
In Python a list would do.
Find the smallest element of the list then rotate each list you want to compare until the smallest element of them is at the beginning. Note: this is not a sort, but a rotation. With all the lists for comparison so normalised, a straight forward list compare between any two would tell if they are the same after rotation.
>>> def rotcomp(lst1, lst2):
while min(lst1) != lst1[0]:
lst1 = lst1[1:] + [lst1[0]]
while min(lst2) != lst2[0]:
lst2 = lst2[1:] + [lst2[0]]
return lst1 == lst2
>>> rotcomp(list('ABCD'), list('CDAB'))
True
>>> rotcomp(list('ABCD'), list('CDBA'))
False
>>>
>>> rotcomp(list('AABC'), list('ABCA'))
False
>>> def rotcomp2(lst1, lst2):
return repr(lst1)[1:-1] in repr(lst2 + lst2)
>>> rotcomp2(list('ABCD'), list('CDAB'))
True
>>> rotcomp2(list('ABCD'), list('CDBA'))
False
>>> rotcomp2(list('AABC'), list('ABCA'))
True
>>>
NEW SECTION: WITH DUPLICATES?
If the input may contain duplicates then, (from the possible twin question mentioned under the question), An algorithm is to see if one list is a sub-list of the other list repeated twice.
function rotcomp2 uses that algorithm and a textual comparison of the repr of the list contents.
Suppose you have a bunch of sets, whereas each set has a couple of subsets.
Set1 = { (banana, pineapple, orange), (apple, kale, cucumber), (onion, garlic) }
Set2 = { (banana, cucumber, garlic), (avocado, tomato) }
...
SetN = { ... }
The goal now is to select one subset from each set, whereas each subset must be conflict free with any other selected subset. For this toy-size example, a possible solution would be to select (banana, pineapple, orange) (from Set1) and (avocado, tomato) (from Set2).
A conflict would occur, if one would select the first subset of Set1 and Set2 because the banana would be contained in both subsets (which is not possible because it exists only once).
Even though there are many algorithms, I was unable to select a suitable algorithm. I'm somehow stuck and would appreciate answers targeting the following questions:
1) How to find a suitable algorithm and represent this problem in such a way that it can be processed by the algorithm?
2) How a possible solution for this toy-size example may look like (any language is just fine, I just want to get the idea).
Edit1: I was thinking about simulated annealing, too (return one possible solution). This could be of interest to minimize, e.g., the overall cost of selecting the sets. However, I could not figure out how to make an appropriate problem description that takes the 'conflicts' into account.
This problem can be formulated as a generalized exact cover problem.
Create a new atom for each set of sets (Set1, Set2, etc.) and turn your input into an instance like so:
{Set1, banana, pineapple, orange}
{Set1, apple, kale, cucumber}
{Set1, onion, garlic}
{Set2, banana, cucumber, garlic}
{Set2, avocado, tomato}
...
making the Set* atoms primary (covered exactly once) and the other atoms secondary (covered at most once). Then you can solve it with a generalization of Knuth's Algorithm X.
Looking at the list of sets, I had the image of a maze with multiple entrances. The task is akin to tracing paths from top to bottom that are free of subset-intersections. The example in Haskell picks all entrances, and tries each path, returning those that succeed.
My understanding of how the code works (algorithm):
For each subset in the first set, pick each subset in the next set where the intersection of that subset with each of the subsets in the accumulated result is null. If there are no subsets matching the criteria, break that strain of the loop. If there are no sets left to pick from, return that result. Call the function recursively for all chosen subsets (and corresponding accumulating-results).
import Data.List (intersect)
import Control.Monad (guard)
sets = [[["banana", "pineapple", "orange"], ["apple", "kale", "cucumber"], ["onion", "garlic"]]
,[["banana", "cucumber", "garlic"], ["avocado", "tomato"]]]
solve sets = solve' sets [] where
solve' [] result = [result]
solve' (set:rest) result = do
subset <- set
guard (all null (map (intersect subset) result))
solve' rest (result ++ [subset])
OUTPUT:
*Main> solve sets
[[["banana","pineapple","orange"],["avocado","tomato"]]
,[["apple","kale","cucumber"],["avocado","tomato"]]
,[["onion","garlic"],["avocado","tomato"]]]
I have a set of integers, represented as a Reduced Ordered Binary Decision Diagram (ROBDD) (interpreted as a function which evaluates to true iff the input is in the set) which I shall call Domain, and an integer function (which I shall call F) represented as an array of ROBDD's (one entry per bit of the result).
Now I want to calculate the image of the domain for F. It's definitely possible, because it could trivially be done by enumerating all items from the domain, apply F, and insert the result in the image. But that's a horrible algorithm with exponential complexity (linear in the size of the domain), and my gut tells me it can be faster. I've been looking into the direction of:
apply Restrict(Domain) to all bits of F
do magic
But the second step proved difficult. The result of the first step contains the information I need (at least, I'm 90% sure of it), but not in the right form. Is there an efficient algorithm to turn it into a "set encoded as ROBDD"? Do I need an other approach?
Define two set-valued functions:
N(d1...dn): The subset of the image where members start with a particular sequence of digits d0...dn.
D(d1...dn): The subset of the inputs that produce N(d1...dn).
Then when the sequences are empty, we have our full problem:
D(): The entire domain.
N(): The entire image.
From the full domain we can define two subsets:
D(0) = The subset of D() such that F(x)[1]==0 for any x in D().
D(1) = The subset of D() such that F(x)[1]==1 for any x in D().
This process can be applied recursively to generate D for every sequence.
D(d1...d[m+1]) = D(d1...dm) & {x | F(x)[m+1]==d[m+1]}
We can then determine N(x) for the full sequences:
N(d1...dn) = 0 if D(d1...dn) = {}
N(d1...dn) = 1 if D(d1...dn) != {}
The parent nodes can be produced from the two children, until we've produced N().
If at any point we determine that D(d1...dm) is empty, then we know
that N(d1...dm) is also empty, and we can avoid processing that branch.
This is the main optimization.
The following code (in Python) outlines the process:
def createImage(input_set_diagram,function_diagrams,index=0):
if input_set_diagram=='0':
# If the input set is empty, the output set is also empty
return '0'
if index==len(function_diagrams):
# The set of inputs that produce this result is non-empty
return '1'
function_diagram=function_diagrams[index]
# Determine the branch for zero
set0=intersect(input_set_diagram,complement(function_diagram))
result0=createImage(set0,function_diagrams,index+1)
# Determine the branch for one
set1=intersect(input_set_diagram,function_diagram)
result1=createImage(set1,function_diagrams,index+1)
# Merge if the same
if result0==result1:
return result0
# Otherwise create a new node
return {'index':index,'0':result0,'1':result1}
Let S(x1, x2, x3...xn) be the indicator function for the set S, so that S(x1, x2...xn) = true if (x1, x2,...xn) is an element of S. Let F1(x1, x2, x3... xn), F2(),... Fn() be the individual functions that define F. Then I could ask if a particular bit pattern, with wild cards, is in the image of F by forming the equation e.g. S() & F1() & ~F2() for bit-pattern 10 and then solving this equation, which I presume that I can do since it is an ROBDD.
Of course you want a general indicator function, which tells me if abc is in the image. Extending the above, I think you get S() & (a&F1() | ~a&~F1()) & (b&F2() | ~b&~F2()) &... If you then re-order the variables so that the original x1, x2, ... xn occur last in the ROBDD order, then you should be able to prune the tree to return true for the case where any setting of the x1, x2, ... xn leads to the value true, and to return false otherwise.
(of course you could run of space, or patience, waiting for the re-ordering to work).
I will phrase the problem in the precise form that I want below:
Given:
Two floating point lists N and D of the same length k (k is multiple of 2).
It is known that for all i=0,...,k-1, there exists j != i such that D[j]*D[i] == N[i]*N[j]. (I'm using zero-based indexing)
Return:
A (length k/2) list of pairs (i,j) such that D[j]*D[i] == N[i]*N[j].
The pairs returned may not be unique (any valid list of pairs is okay)
The application for this algorithm is to find reciprocal pairs of eigenvalues of a generalized palindromic eigenvalue problem.
The equality condition is equivalent to N[i]/D[i] == D[j]/N[j], but also works when denominators are zero (which is a definite possibility). Degeneracies in the eigenvalue problem cause the pairs to be non-unique.
More generally, the algorithm is equivalent to:
Given:
A list X of length k (k is multiple of 2).
It is known that for all i=0,...,k-1, there exists j != i such that IsMatch(X[i],X[j]) returns true, where IsMatch is a boolean matching function which is guaranteed to return true for at least one j != i for all i.
Return:
A (length k/2) list of pairs (i,j) such that IsMatch(i,j) == true for all pairs in the list.
The pairs returned may not be unique (any valid list of pairs is okay)
Obviously, my first problem can be formulated in terms of the second with IsMatch(u,v) := { (u - 1/v) == 0 }. Now, due to limitations of floating point precision, there will never be exact equality, so I want the solution which minimizes the match error. In other words, assume that IsMatch(u,v) returns the value u - 1/v and I want the algorithm to return a list for which IsMatch returns the minimal set of errors. This is a combinatorial optimization problem. I was thinking I can first naively compute the match error between all possible pairs of indexes i and j, but then I would need to select the set of minimum errors, and I don't know how I would do that.
Clarification
The IsMatch function is reflexive (IsMatch(a,b) implies IsMatch(b,a)), but not transitive. It is, however, 3-transitive: IsMatch(a,b) && IsMatch(b,c) && IsMatch(c,d) implies IsMatch(a,d).
Addendum
This problem is apparently identically the minimum weight perfect matching problem in graph theory. However, in my case I know that there should be a "good" perfect matching, so the distribution of edge weights is not totally random. I feel that this information should be used somehow. The question now is if there is a good implementation to the min-weight-perfect-matching problem that uses my prior knowledge to arrive at a solution early in the search. I'm also open to pointers towards a simple implementation of any such algorithm.
I hope I got your problem.
Well, if IsMatch(i, j) and IsMatch(j, l) then IsMatch(i, l). More generally, the IsMatch relation is transitive, commutative and reflexive, ie. its an equivalence relation. The algorithm translates to which element appears the most times in the list (use IsMatch instead of =).
(If I understand the problem...)
Here is one way to match each pair of products in the two lists.
Multiply each pair N and save it to a structure with the product, and the subscripts of the elements making up the product.
Multiply each pair D and save it to a second instance of the structure with the product, and the subscripts of the elements making up the product.
Sort both structions on the product.
Make a merge-type pass through both sorted structure arrays. Each time you find a product from one array that is close enough to the other, you can record the two subscripts from each sorted list for a match.
You can also use one sorted list for an ismatch function, doing a binary search on the product.
well。。Multiply each pair D and save it to a second instance of the structure with the product, and the subscripts of the elements making up the product.
I just asked my CS friend, and he came up with the algorithm below. He doesn't have an account here (and apparently unwilling to create one), but I think his answer is worth sharing.
// We will find the best match in the minimax sense; we will minimize
// the maximum matching error among all pairs. Alpha maintains a
// lower bound on the maximum matching error. We will raise Alpha until
// we find a solution. We assume MatchError returns an L_1 error.
// This first part finds the set of all possible alphas (which are
// the pairwise errors between all elements larger than maxi-min
// error.
Alpha = 0
For all i:
min = Infinity
For all j > i:
AlphaSet.Insert(MatchError(i,j))
if MatchError(i,j) < min
min = MatchError(i,j)
If min > Alpha
Alpha = min
Remove all elements of AlphaSet smaller than Alpha
// This next part increases Alpha until we find a solution
While !AlphaSet.Empty()
Alpha = AlphaSet.RemoveSmallest()
sol = GetBoundedErrorSolution(Alpha)
If sol != nil
Return sol
// This is the definition of the helper function. It returns
// a solution with maximum matching error <= Alpha or nil if
// no such solution exists.
GetBoundedErrorSolution(Alpha) :=
MaxAssignments = 0
For all i:
ValidAssignments[i] = empty set;
For all j > i:
if MatchError <= Alpha
ValidAssignments[i].Insert(j)
ValidAssignments[j].Insert(i)
// ValidAssignments[i].Size() > 0 due to our choice of Alpha
// in the outer loop
If ValidAssignments[i].Size() > MaxAssignments
MaxAssignments = ValidAssignments[i].Size()
If MaxAssignments = 1
return ValidAssignments
Else
G = graph(ValidAssignments)
// G is an undirected graph whose vertices are all values of i
// and edges between vertices if they have match error less
// than or equal to Alpha
If G has a perfect matching
// Note that this part is NP-complete.
Return the matching
Else
Return nil
It relies on being able to compute a perfect matching of a graph, which is NP-complete, but at least it is reduced to a known problem. It is expected that the solution be NP-complete, but this is OK since in practice the size of the given lists are quite small. I'll wait around for a better answer for a few days, or for someone to expand on how to find the perfect matching in a reasonable way.
You want to find j such that D(i)*D(j) = N(i)*N(j) {I assumed * is ordinary real multiplication}
assuming all N(i) are nonzero, let
Z(i) = D(i)/N(i).
Problem: find j, such that Z(i) = 1/Z(j).
Split set into positives and negatives and process separately.
take logs for clarity. z(i) = log Z(i).
Sort indirectly. Then in the sorted view you should have something like -5 -3 -1 +1 +3 +5, for example. Read off +/- pairs and that should give you the original indices.
Am I missing something, or is the problem easy?
Okay, I ended up using this ported Fortran code, where I simply specify the dense upper triangular distance matrix using:
complex_t num = N[i]*N[j] - D[i]*D[j];
complex_t den1 = N[j]*D[i];
complex_t den2 = N[i]*D[j];
if(std::abs(den1) < std::abs(den2)){
costs[j*(j-1)/2+i] = std::abs(-num/den2);
}else if(std::abs(den1) == 0){
costs[j*(j-1)/2+i] = std::sqrt(std::numeric_limits<double>::max());
}else{
costs[j*(j-1)/2+i] = std::abs(num/den1);
}
This works great and is fast enough for my purposes.
You should be able to sort the (D[i],N[i]) pairs. You don't need to divide by zero -- you can just multiply out, as follows:
bool order(i,j) {
float ni= N[i]; float di= D[i];
if(di<0) { di*=-1; ni*=-1; }
float nj= N[j]; float dj= D[j];
if(dj<0) { dj*=-1; nj*=-1; }
return ni*dj < nj*di;
}
Then, scan the sorted list to find two separation points: (N == D) and (N == -D); you can start matching reciprocal pairs from there, using:
abs(D[i]*D[j]-N[i]*N[j])<epsilon
as a validity check. Leave the (N == 0) and (D == 0) points for last; it doesn't matter whether you consider them negative or positive, as they will all match with each other.
edit: alternately, you could just handle (N==0) and (D==0) cases separately, removing them from the list. Then, you can use (N[i]/D[i]) to sort the rest of the indices. You still might want to start at 1.0 and -1.0, to make sure you can match near-zero cases with exactly-zero cases.