Maximal sets intersection - algorithm

Given 5 finite sets a,b,c,d,e. Each set is assigned the arbitrary number:
a = 100, b = 34, c = 15, d = 89, e = 57
complement of each set has the same number assigned but negated e.g. for (a') it will be -100.
We need to find such intersection of these all sets or their complements so the resulting set is not null set, and the sum of the assigned numbers is maximal.
I only see one brute force solution to this problem, but it will be very inefficient and it's not elegant. In this case we just generate all combinations and resolve them to see if they are not empty, combinations look like this:
{a∩b'∩c'∩d'∩e'}, {a'∩b∩c'∩d∩e'}, {a'∩b'∩c∩d'∩e'}, {a'∩b'∩c'∩d∩e'}, {a'∩b'∩c'∩d'∩e} {a∩b∩c'∩d'∩e'}, {a∩b'∩c∩d'∩e'}, {a∩b'∩c'∩d∩e}, {a∩b'∩c'∩d'∩e}, {a'∩b∩c∩d'∩e'} {a'∩b∩c'∩d∩e'} {a'∩b∩c'∩d'∩e} ...
and then just pick the max number.
Looking forward to see if someone can think of something better :)

Define score(x, X) be to be the value of set X if x is in X, otherwise its negation.
Then, letting * represent an element that's not in any of the 5 sets, the highest score possible is:
max_{x in union(A, B, C, D, E, {*}} sum_{X in A, B, C, D, E} score(x, X)
This follows from the observation that any particular x is either in a set or its complement. You don't actually have to compute the union here. In Python you might write:
def max_config(A, B, C, D, E):
best = None
for S in A, B, C, D, E, set([None]):
for x in S:
best = max(best, sum(score(x, X) for X in A, B, C, D, E)))
return best
Assuming a set membership test is O(1), this has complexity O(N), where N is the total size of the given sets.

Related

Identifying non-intersecting (super-)sets

I am looking for an algorithm to identify non-intersecting (super-)sets in a set of sets.
Lets, assume I have a set of sets containing the sets A, B, C and D, i.e. {A, B, C, D}. Each set may or may not intersect some or all of the other sets.
I would like to identify non-intersecting (super-)sets.
Examples:
If A & B intersect and C & D intersect but (A union B) does not intersect (C union D), I would like the output of {(A union B), (C union D)}
If only C & D intersect, I would like the output {A, B, (C union D)}
I am sure this problem has long been solved. Can somebody point me in the right direction?
Even better would be of course if somebody had already done the work and had an implementation in python they were willing to share. :-)
I would turn this from a set problem into a graph problem by constructing a graph whose nodes are the graphs with edges connecting sets with an intersection.
Here is some code that does it. It takes a dictionary mapping the name of the set to the set. It returns an array of sets of set names that connect.
def set_supersets (sets_by_label):
element_mappings = {}
for label, this_set in sets_by_label.items():
for elt in this_set:
if elt not in element_mappings:
element_mappings[elt] = set()
element_mappings[elt].add(label)
graph_conn = {}
for elt, sets in element_mappings.items():
for s in sets:
if s not in graph_conn:
graph_conn[s] = set()
for t in sets:
if t != s:
graph_conn[s].add(t)
seen = set()
answer = []
for s, sets in graph_conn.items():
if s not in seen:
todo = [s]
this_group = set()
while 0 < len(todo):
t = todo.pop()
if t not in seen:
this_group.add(t)
seen.add(t)
for u in graph_conn[t]:
todo.append(u)
answer.append(this_group)
return answer
print(set_supersets({
"A": set([1, 2]),
"B": set([1, 3]),
"C": set([4, 5]),
"D": set([3, 6])
}))

How can I add an element to a list using the delete1() predicate in prolog?

I need to add an element to a list using the delete1 predicate that I have written:
delete1(H,[H|T],T).
delete1(H,[D|X],[D|Y]):-delete1(H,X,Y).
How can I do that? Well, I know how to delete element but can't figure out how to add an element there. I need to show all the possible lists that will be the result of adding 56 to [x,y,z,a] list. Do you have any ideas?
If they are well-programmed, Prolog predicates can run "backwards":
f(X,Y) should be read as X is related to Y via f.
Given an x, on can compute the Y (possibly several Y via backtracking): f(x,Y) is interpreted as The set of Y such that: f(x) = Y.
Given a y, on can compute the X (possibly several X via backtracking): f(X,y) is interpreted as The set of X such that: f(X) = y.
Given an (x,y), on can compute the truth value: f(x,y) interpreted as true if (but not iff) f(x) == y.
(How many Prolog predicates are "well-programmed"? If there is a study about how many Prolog predicates written outside of the classroom can work bidirectionally I would like to know about it; my guess is most decay quickly into unidirectional functions, it's generally not worth the hassle of adding the edge cases and the test code to make predicates work bidirectionally)
The above works best if
f is bijective, i.e.: "no information is thrown away when computing in either direction"
the computation in both directions is tractable (having encryption work backwards is hard)
So, in this case:
delete(Element,ListWith,ListWithout) relates the three arguments (Element,ListWith,ListWithout) as follows:
ListWithout is ListWith without Element.
Note that "going forward" from (Element,ListWith) to ListWithout destroys information, namely the exact position of the deleted element, or even if there was one in the first place. BAD! NOT BIJECTIVE!
To make delete1/3 run backwards we just have to:
?- delete1(56,L,[a,b,c]).
L = [56, a, b, c] ;
L = [a, 56, b, c] ;
L = [a, b, 56, c] ;
L = [a, b, c, 56] ;
There are four solutions to the reverse deletion problem.
And the program misses one:
L = [a, b, c]
or even a few more:
L = [56, a, b, 56, c]
etc.
As you can see, it is important to retain information!

Solving chain reactions in prolog

One of the recent Advent of code challenges tasks me with solving for the smallest amount of input material that I can use to apply a given set of reactions and get 1 unit of output material.
For example, given
10 ORE => 10 A
1 ORE => 1 B
7 A, 1 B => 1 C
7 A, 1 C => 1 D
7 A, 1 D => 1 E
7 A, 1 E => 1 FUEL
We need 31 total ore to make 1 fuel (1 to produce a unit of B, then 30 to make the requisite 28 A).
This year, I've been trying to push my programming-language horizons, so I've done most of the challenges in SML/NJ. This one seems—seemed—like a good fit for Prolog, given the little I know about it: logic programming, constraint solving, etc.
I haven't, however, been able to successfully model the constraints.
I started by turning this simple example into some facts:
makes([ore(10)], a(10)).
makes([ore(1)], b(1)).
makes([a(7), b(7)], c(1)).
makes([a(7), c(1)], d(1)).
makes([a(7), d(1)], e(1)).
makes([a(7), e(1)], fuel(1)).
To be honest, I'm not even sure if the list argument is a good structure, or if the functor notation (ore(10)) is a good model either.
Then I wanted to build the rules that allow you to say, e.g., 10 ore makes enough for 7 a:
% handles the case where we have leftovers?
% is this even the right way to model all this... when we have leftovers, we may
% have to use them in the "reaction"...
makes(In, Out) :-
Out =.. [F,N],
Val #>= N,
OutN =.. [F,Val],
makes(In, OutN).
This works1, but I'm not sure it's going to be adequate, since we may care about leftovers (this is a minimization problem, after all)?
I'm stuck on the next two pieces though:
I can ask what makes 7 A and get back 10 ore, but I can't ask what is enough for 20 A: how do I write a rule which encodes multiplication/integer factors?
I can say that 7 A and 1 E makes 1 fuel, but I can't state that recursively: that is, I cannot state that 14 A and 1 D also make 1 fuel. How do I write the rule that encodes this?
I'm open to alternate data encodings for the facts I presented—ultimately, I'll be scripting the transformation from Advent's input to Prolog's facts, so that's the least of my worries. I feel that if I can get this small example working, I can solve the larger problem.
?- makes(X, a(7)). gives back X=[ore(10)] infinitely (i.e., if I keep hitting ; at the prompt, it keeps going). Is there a way to fix this?
Not a direct answer to your specific question but my first thought on this problem was to use chr in Prolog.
I then thought I would forward chain from fuel to the amount of ore I need.
The basic constraints:
:- chr_constraint ore/1, a/1, b/1,c/1, ab/1, bc/1, ca/1, fuel/0.
a(1),a(1) <=> ore(9).
b(1),b(1),b(1) <=> ore(8).
c(1),c(1),c(1),c(1),c(1) <=> ore(7).
ab(1) <=> a(3),b(4).
bc(1) <=> b(5),c(7).
ca(1) <=> c(4),a(1).
fuel <=> ab(2),bc(3),ca(4).
%Decompose foo/N into foo/1s
a(X) <=> X>1,Y#=X-1|a(Y),a(1).
b(X) <=> X>1,Y#=X-1|b(Y),b(1).
c(X) <=> X>1, Y#=X-1 | c(Y),c(1).
ab(X) <=> X>1, Y#=X-1|ab(Y),ab(1).
bc(X) <=> X>1,Y#=X-1| bc(Y),bc(1).
ca(X) <=> X>1, Y#= X-1| ca(Y),ca(1).
ore(X)<=>X >1, Y #= X -1 |ore(Y),ore(1).
%aggregation (for convenience)
:- chr_constraint ore_add/1, total_ore/1.
total_ore(A), total_ore(Total) <=> NewTotal #= A + Total, total_ore(NewTotal).
ore_add(A) ==> total_ore(A).
ore(1) <=> ore_add(1).
Query:
?-fuel.
b(1),
b(1),
c(1),
c(1),
ore_add(1),
ore_add(1),
...
total_ore(150).
Then you would need to add a search procedure to eliminate the two b/1s and two c/1s.
I have not implemented this but:
?-fuel,b(1),c(3).
ore_add(1),
...
total_ore(165)
This has only ore_add/1 constraints and is the correct result.
In the example there are no "alternative" path and no multiple "ore sources", so coding the example up in a very non-flexible way using Prolog can be done like this:
need(FUEL,OREOUT) :- need(FUEL,0,0,0,0,0,0,OREOUT).
need(FUEL,E,D,C,A,B,ORE,OREOUT) :- FUEL > 0, A2 is 7*FUEL+A, E2 is FUEL+E, need(0, E2, D, C, A2, B, ORE,OREOUT).
need(0,E,D,C,A,B,ORE,OREOUT) :- E > 0, A2 is 7*E+A, D2 is E+D, need(0, 0, D2, C, A2, B, ORE,OREOUT).
need(0,0,D,C,A,B,ORE,OREOUT) :- D > 0, A2 is 7*D+A, C2 is D+C, need(0, 0, 0, C2, A2, B, ORE,OREOUT).
need(0,0,0,C,A,B,ORE,OREOUT) :- C > 0, A2 is 7*C+A, B2 is C+B, need(0, 0, 0, 0, A2, B2, ORE,OREOUT).
need(0,0,0,0,A,B,ORE,OREOUT) :- X is A + B, X > 0, ORE2 is ORE + (A + 9)//10 + B, need(0, 0, 0, 0, 0, 0, ORE2,OREOUT).
need(0, 0, 0, 0, 0, 0, ORE, ORE).
Then
?- need(1011,ORE).
ORE = 3842
But this is just a silly and inelegant attempt.
There is a major general problem lurking thereunder, which includes parsing the arbitrarily complex reaction directed acyclic graph and building an appropriate structure. The good think is that it is a DAG, so one cannot generate an "earlier ingredient" from a "later one".
While making coffee, this is clearly something for the CLP(FD) engine.
If we have directed acyclic graph of reactions with
FUEL node on the right of the graph and
nodes for intermediate products IP[i] (i in 0..n) in between, with possibly
several FUEL nodes, i.e. several ways generating FUEL: FUEL[0] ... FUEL[v] and possibly
several nodes for intermediate products IP[i], i.e. several ways of creating intermediate product IP[i>0]: IP[i,1] ... IP[i,ways(i)] and
IP[0] identified with ORE on the left side of the graph
with the last two points giving us a way of choosing a strategy for the product mix, then:
FUEL_NEEDED = mix[0] * FUEL[0] + ... + mix[v] * FUEL[v]
with everything in the above a variable
and the following given by the problem statement, with FUEL[0] ... FUEL[v] variables and the rest constants:
out_fuel[0] * FUEL[0] = ∑_j ( IP[j] * flow(IPj->FUEL0) )
⋮
out_fuel[v] * FUEL[v] = ∑_j ( IP[j] * flow(IPj->FUELv) )
and for each IP[i>0], with the IP[i] variables and the rest constants:
out_ip[i] * IP[i] = ∑_j≠i ( IP[j] * flow(IPj->IPi) )
in case of a several ways to generate IP[i], we mix (this is like introducing a graph node for the mix of IP[i] from its possible ways IP[i,j]):
out_ip[i] * IP[i] = ∑_j(0..ways(i)) ( IP[i,j] * mix[i,j] )
out_ip[i,1] * IP[i,1] = ∑_j≠i ( IP[j] * flow(IP[j]->IP[i,1]) )
⋮
out_ip[i,ways(i)] * IP[i,ways(i)] = ∑_j≠i ( IP[j] * flow(IP[j]->IP[i,ways(i)]) )
and IP[0] (i.e. ORE) a free variable to be minimized.
You see an underspecified linear programming problem appearing here, with a matrix having zeroes below the diagonal because it's a DAG, but it contains variables to be optimized in the matrix itself. How to attack that?

Finding size of 'shortest range of indices' which lookup all unique path is passed

Given an array of String, Finding the size of 'shortest range of indices' which lookup all unique path is passed.
Example, A = { E, R, E, R, A, R, T, A }, it should be 5. As we can see, ranges of A[2] = E and A[6] = T contains all unique path. (In this case, E, R, A, T)
I can solved with multiple loop like below. (solved by Kotlin.)
fun problem(array: Array<String>): Int {
if (array.isEmpty()) return 0
val unique = array.distinct()
var result = 200000
for (i in 0 until A.size) {
val tempSet = HashSet<String>()
val remaining = A.sliceArray(i until array.size)
var count = 0
while (true) {
tempSet.add(remaining[count])
if (unique.size == tempSet.size) break
count++
if (count == remaining.size) {
count = 200000
break
}
}
result = Math.min(result, count + 1)
}
return result
}
But when a large array (about 100,000) comes in, I don't know how to reduce the time. How can i do?
Some Test case:
[E, R, E, R, A, R, T, A] -> 5. Because [2..6] contains all unique path. (E, R, A, T)
[C, A, A, R, C, A, A, R] -> 3. Because [3..5] contains all unique path. (C, A, R)
[R, T, A, R, A, R, E, R] -> 6. Because [1..6] contains all unique path. (T, A, R, E)
[A, R, R, C, T, E, A, R] -> 5. Because [2..6] contains all unique path. (R, C, T, E, A)
This problem might be effectively solved with "two-pointers" approach.
Make dictionary structure containing char as key and counter as value (in the simplest case - array of int)
Set two indexes L and R in 0.
Move R right, for current char increment counter of corresponding dict element.
When dict size (in case of array - number of non-zero elements) becomes equal to unique , stop
Now move L right, for current char decrement counter of corresponding dict element, removing element when counter becomes zero. When dict size becomes smaller than unique, stop. At the last step L..R interval contains all possible items.
Continue with R and so on
Choose the shortest interval during scanning.
Python code for alike question here
The phrase "all unique path" I will interpret to mean "all possible values".
For a string of length n with k unique values this is solvable in time O(n log(k)) using both a dictionary and a priority queue. The key ideas are this:
On a first pass, find all possible values.
The second time around, keep a dictionary most_recently_found of where each value was most recently found.
Keep a priority queue longest_since of which value it has been the longest since it has been found.
Keep a running minimum of the shortest gap.
And now when you go back through and have found all the values, you follow per iteration logic that looks something like this:
most_recently_found[current_value] = current_position
oldest = longest_since.top()
if current_value == oldest.value:
while oldest.position() != most_recently_found[oldest.position()]:
longest_since.pop()
longest_since.push({value: top.value, position: most_recently_found[oldest.position()]
oldest = longest_since.top()
if current_position - oldest.position() < best_gap:
best_gap = current_position - oldest.position()
The point being that for each value found, you have to update the dictionary (O(1)), might have to take it off the priority queue (O(k)), might have to put something new on the priority queue (O(k)) and might have to do some arithmetic (O(1)). Hence O(n log(k)) for everything.

Spacial clustering algorithm

Given a collection of points on a 2D plane, I want to find collections of X points that are within Y of each other. For example:
8|
7| a b
6|
5| c
4|
3| e
2| d
1|
-------------------------
1 2 3 4 5 6 7 8 9 0 1
a, b, c and d are points on the 2D plane. Given arguments of 3 for the number of points (X) and 3 for the distance (Y), the algorithm would return [[a, b, c]]. Some examples:
algorithm(X = 3, Y = 3) returns [[a, b, c]]
algorithm(X = 2, Y = 3) returns [[a, b, c], [d, e]] -- [a, b, c] contains at least two points
algorithm(X = 4, Y = 3) returns [] -- no group of 4 points close enough
algorithm(X = 5, Y = 15) returns [[a, b, c, d, e]]
Constraints:
x and y axis (the numbers above) are both 10,000 units long
there are 800 points (a, b, c, d etc) on the graph
I don't think it matters, but I'm using JavaScript
Things I've tried:
I actually care about outputting new points that are close to more than one input point, so I tried iterating on a grid and 'looking around' it using Pythagoras to find each point a given distance away. This is too slow given the total area. See the source here.
You can also see the data size in real data test.
DBSCAN, which seems to have a different purpose - I know how big I want my cluster size to be.
I'm currently trying to compare points to each other and build up close pairs, then close triplets, etc, until the end, but this seems to be going down a bit of an inefficiency hole also. I'm going to continue and try some kind of hashing or dictionary to avoid these loops.
With only 800 points, you can probably just build the graph by comparing each pair, then run Bron--Kerbosch to find maximal cliques. Here's a legit-seeming Javascript implementation of that algorithm: https://github.com/SeregPie/almete.BronKerbosch

Resources