Spacial clustering algorithm - algorithm

Given a collection of points on a 2D plane, I want to find collections of X points that are within Y of each other. For example:
8|
7| a b
6|
5| c
4|
3| e
2| d
1|
-------------------------
1 2 3 4 5 6 7 8 9 0 1
a, b, c and d are points on the 2D plane. Given arguments of 3 for the number of points (X) and 3 for the distance (Y), the algorithm would return [[a, b, c]]. Some examples:
algorithm(X = 3, Y = 3) returns [[a, b, c]]
algorithm(X = 2, Y = 3) returns [[a, b, c], [d, e]] -- [a, b, c] contains at least two points
algorithm(X = 4, Y = 3) returns [] -- no group of 4 points close enough
algorithm(X = 5, Y = 15) returns [[a, b, c, d, e]]
Constraints:
x and y axis (the numbers above) are both 10,000 units long
there are 800 points (a, b, c, d etc) on the graph
I don't think it matters, but I'm using JavaScript
Things I've tried:
I actually care about outputting new points that are close to more than one input point, so I tried iterating on a grid and 'looking around' it using Pythagoras to find each point a given distance away. This is too slow given the total area. See the source here.
You can also see the data size in real data test.
DBSCAN, which seems to have a different purpose - I know how big I want my cluster size to be.
I'm currently trying to compare points to each other and build up close pairs, then close triplets, etc, until the end, but this seems to be going down a bit of an inefficiency hole also. I'm going to continue and try some kind of hashing or dictionary to avoid these loops.

With only 800 points, you can probably just build the graph by comparing each pair, then run Bron--Kerbosch to find maximal cliques. Here's a legit-seeming Javascript implementation of that algorithm: https://github.com/SeregPie/almete.BronKerbosch

Related

How to allocate x copy of n objects among k persons

I have following use case;
I have N distinct items , each can have x number of copies. Now I need to distribute these items among k persons where each person's capacity is varying and can be <=N.
Following conditions must be met;
Each person should get one and only one copy of an Item
Example:
Items = apple , banana , orange
copies = 3 ( It means we have 3 apples , 3 bananas and 3 oranges )
So I have a array;
{1,2,3,4,5,6,7,8,9} // 1,2,3 = 3 apples ; 4,5,6 = 3 banana ; 7,8,9 = 3 oranges
Total Person = 5
Person Capacity
P1 3
P2 2
P3 1
P4 1
P5 2
How can I solve such problem ? The problem I am facing is that when I allocate it for an arbitrary numbers for N , x , k , I sometimes end up in a case where I am left with some items to allocate because I can't ensure the condition that "Each person should get one and only one copy of an Item"
Since each item has the same "weight", you can actually solve this problem greedily. To ensure that each person receives distinct items, we create the sequence containing all the xN items by repeating the sequence of N distinct items x times. Then, we go through each of the persons and simply remove and assign them the first c items of this sequence, where c is that person's carrying capacity.
This works because of the way we have laid out the items and because c <= N. In our "mega" sequence, duplicates are always N indices away from each other, so c consecutive elements will never contain two duplicates. Duplicates will only appear in contiguous subsequences containing more than N items.
Note that in the implementation of this algorithm, you don't actually have to create the mega sequence; you can simply repeatedly iterate through the distinct item sequence by using modular arithmetic. To keep the explanation simple, I will be forming the "mega" items sequence in the examples, but you don't have to do this in the implementation.
Taking the example in your question, let the 3 distinct items be A, B, C with 3 copies each. The "mega" sequence is formed by repeating the distinct items sequence 3 times: A, B, C, A, B, C, A, B, C. Now we go through each person and simply assign them the number of items they can carry. To illustrate this, consider the following cases for the capacity array (taken from your question and comments below):
[3, 2, 1, 1, 2]: P1 gets A, B, C, P2 gets A, B, P3 gets C, P4 gets A, and P5 gets B, C.
[2, 2, 2, 2, 1]: P1 gets A, B, P2 gets C, A, P3 gets B, C, P4 gets A, B, and P5 gets C.
[3, 1, 1, 1, 1, 2]: P1 gets A, B, C, P2 gets A, P3 gets B, P4 gets C, P5 gets A, P6 gets B, C.

How can I code a specific game in Prolog?

I have a problem with coding the program described below
Consider the following game. A board with three black stones, three white stones and an empty space is given. The goal of the game is to swap places of black pawns with white pawns. Moving rules define the following sentences: Move the white and black pieces alternately. Each pawn can move vertically or horizontally taking up an empty space. Each piece can jump vertically or horizontally over another piece (of any color). Write a program in Prolog to find all possible ways to find a winning sequence. For example, if we ask the question:
? - play (w, s (w, w, w, e, b, b, b), s (b, b, b, e, w, w, w), S, R ).
The prologue should answer, for example:
S = [s (w, w, w, e, b, b, b), s (w, e, w, w, b, b, b), ..., s (b, b, b, e, w, w, w)] R = [[w, 2,4], [b, 6,2], [w, 4,6], ..., [w, 4,6]]
Here [ w, 2,4] means moving the white pawn from position 2 to position 4. Of course Prolog should return both letters S and R in full (without "...").
What is the maximum number of different pawn settings possible on the board? Check the query:
? - play (_, s (w, w, w, e, b, b, b), s (b, b, e, w, w, b, w), _, _).
What does Prolog's answer mean? Hint: solve the problem for play/4 without R first
There's also a game board that looks like this:
I have no clue at all even where to start? How can I do that? Could you guys, help me with this one?
This is a standard state space search, a standard paradigm of GOFAI since the mid 50s at least.
The barebones algorithm:
search(State,Path,Path) :- is_final(State),!. % Done, bounce "Path" term
search(State,PathSoFar,PathOut) :-
generate_applicable_operators(State,Operators),
(is_empty(Operators) -> fail ; true),
select_operator(Operators,Op,PathSoFar),
apply_operator(State,Op,NextState), % depth-first / best first
search(NextState,[[NextState,Op]|PathSoFar],PathOut).
% Called like this, where Path will contain the reverse Path through
% State Space by which one may reach a final state:
search(InitialState,[[InitialState,nop]],Path).
First you need to represent a given state in this case the state of the board (at some time t).
We can either list the board positions and their content (w for white, b for black, e for empty token) or list the tokens and their positions. Let's list the board positions.
In Prolog, a term that can be easily pattern-matched is appropriate. The question already provides something: (w, w, w, e, b, b, b). This seems to be inspired by LISP and is not well adapted to Prolog. Let's use a list instead: [w, w, w, e, b, b, b]
The mapping of board positions to list positions shall be:
+---+---+
| 0 | 1 |
+---+---+---+
| 2 | 3 | 4 |
+---+---+---+
| 5 | 6 |
+---+---+
And we are done with setting up a state description!
Then you need to represent/define the operators (operations?) that can be applied to a state: they transform a valid state into another valid state.
An operator corresponds to "moving a token" and of course not all operators apply to a given state (you cannot move a token from field 1 if there is no token there; you cannot move a token to field 1 if there already is a token there).
So you want to write a predicate that links a board state to the operators applicable to that state: generate_applicable_operators/2
Then you need to select the operator that you want to apply. This can be done randomly, exhaustively, according to some heuristic (for example A*), but definitely needs to examine the path taken through the state space till now to avoid cycles: select_operator/3.
Then you apply the operator to generate the next state: apply_operator/3.
And finally recursively call search/3 to find the next move. This continues until the "final state", in this case [b, b, b, e, w, w, w] has been reached!
You can also use Iterative Deepening if you want to perform "breadth-first search" instead, but for that the algorithm structure must be modified.
And that's it.

Solving chain reactions in prolog

One of the recent Advent of code challenges tasks me with solving for the smallest amount of input material that I can use to apply a given set of reactions and get 1 unit of output material.
For example, given
10 ORE => 10 A
1 ORE => 1 B
7 A, 1 B => 1 C
7 A, 1 C => 1 D
7 A, 1 D => 1 E
7 A, 1 E => 1 FUEL
We need 31 total ore to make 1 fuel (1 to produce a unit of B, then 30 to make the requisite 28 A).
This year, I've been trying to push my programming-language horizons, so I've done most of the challenges in SML/NJ. This one seems—seemed—like a good fit for Prolog, given the little I know about it: logic programming, constraint solving, etc.
I haven't, however, been able to successfully model the constraints.
I started by turning this simple example into some facts:
makes([ore(10)], a(10)).
makes([ore(1)], b(1)).
makes([a(7), b(7)], c(1)).
makes([a(7), c(1)], d(1)).
makes([a(7), d(1)], e(1)).
makes([a(7), e(1)], fuel(1)).
To be honest, I'm not even sure if the list argument is a good structure, or if the functor notation (ore(10)) is a good model either.
Then I wanted to build the rules that allow you to say, e.g., 10 ore makes enough for 7 a:
% handles the case where we have leftovers?
% is this even the right way to model all this... when we have leftovers, we may
% have to use them in the "reaction"...
makes(In, Out) :-
Out =.. [F,N],
Val #>= N,
OutN =.. [F,Val],
makes(In, OutN).
This works1, but I'm not sure it's going to be adequate, since we may care about leftovers (this is a minimization problem, after all)?
I'm stuck on the next two pieces though:
I can ask what makes 7 A and get back 10 ore, but I can't ask what is enough for 20 A: how do I write a rule which encodes multiplication/integer factors?
I can say that 7 A and 1 E makes 1 fuel, but I can't state that recursively: that is, I cannot state that 14 A and 1 D also make 1 fuel. How do I write the rule that encodes this?
I'm open to alternate data encodings for the facts I presented—ultimately, I'll be scripting the transformation from Advent's input to Prolog's facts, so that's the least of my worries. I feel that if I can get this small example working, I can solve the larger problem.
?- makes(X, a(7)). gives back X=[ore(10)] infinitely (i.e., if I keep hitting ; at the prompt, it keeps going). Is there a way to fix this?
Not a direct answer to your specific question but my first thought on this problem was to use chr in Prolog.
I then thought I would forward chain from fuel to the amount of ore I need.
The basic constraints:
:- chr_constraint ore/1, a/1, b/1,c/1, ab/1, bc/1, ca/1, fuel/0.
a(1),a(1) <=> ore(9).
b(1),b(1),b(1) <=> ore(8).
c(1),c(1),c(1),c(1),c(1) <=> ore(7).
ab(1) <=> a(3),b(4).
bc(1) <=> b(5),c(7).
ca(1) <=> c(4),a(1).
fuel <=> ab(2),bc(3),ca(4).
%Decompose foo/N into foo/1s
a(X) <=> X>1,Y#=X-1|a(Y),a(1).
b(X) <=> X>1,Y#=X-1|b(Y),b(1).
c(X) <=> X>1, Y#=X-1 | c(Y),c(1).
ab(X) <=> X>1, Y#=X-1|ab(Y),ab(1).
bc(X) <=> X>1,Y#=X-1| bc(Y),bc(1).
ca(X) <=> X>1, Y#= X-1| ca(Y),ca(1).
ore(X)<=>X >1, Y #= X -1 |ore(Y),ore(1).
%aggregation (for convenience)
:- chr_constraint ore_add/1, total_ore/1.
total_ore(A), total_ore(Total) <=> NewTotal #= A + Total, total_ore(NewTotal).
ore_add(A) ==> total_ore(A).
ore(1) <=> ore_add(1).
Query:
?-fuel.
b(1),
b(1),
c(1),
c(1),
ore_add(1),
ore_add(1),
...
total_ore(150).
Then you would need to add a search procedure to eliminate the two b/1s and two c/1s.
I have not implemented this but:
?-fuel,b(1),c(3).
ore_add(1),
...
total_ore(165)
This has only ore_add/1 constraints and is the correct result.
In the example there are no "alternative" path and no multiple "ore sources", so coding the example up in a very non-flexible way using Prolog can be done like this:
need(FUEL,OREOUT) :- need(FUEL,0,0,0,0,0,0,OREOUT).
need(FUEL,E,D,C,A,B,ORE,OREOUT) :- FUEL > 0, A2 is 7*FUEL+A, E2 is FUEL+E, need(0, E2, D, C, A2, B, ORE,OREOUT).
need(0,E,D,C,A,B,ORE,OREOUT) :- E > 0, A2 is 7*E+A, D2 is E+D, need(0, 0, D2, C, A2, B, ORE,OREOUT).
need(0,0,D,C,A,B,ORE,OREOUT) :- D > 0, A2 is 7*D+A, C2 is D+C, need(0, 0, 0, C2, A2, B, ORE,OREOUT).
need(0,0,0,C,A,B,ORE,OREOUT) :- C > 0, A2 is 7*C+A, B2 is C+B, need(0, 0, 0, 0, A2, B2, ORE,OREOUT).
need(0,0,0,0,A,B,ORE,OREOUT) :- X is A + B, X > 0, ORE2 is ORE + (A + 9)//10 + B, need(0, 0, 0, 0, 0, 0, ORE2,OREOUT).
need(0, 0, 0, 0, 0, 0, ORE, ORE).
Then
?- need(1011,ORE).
ORE = 3842
But this is just a silly and inelegant attempt.
There is a major general problem lurking thereunder, which includes parsing the arbitrarily complex reaction directed acyclic graph and building an appropriate structure. The good think is that it is a DAG, so one cannot generate an "earlier ingredient" from a "later one".
While making coffee, this is clearly something for the CLP(FD) engine.
If we have directed acyclic graph of reactions with
FUEL node on the right of the graph and
nodes for intermediate products IP[i] (i in 0..n) in between, with possibly
several FUEL nodes, i.e. several ways generating FUEL: FUEL[0] ... FUEL[v] and possibly
several nodes for intermediate products IP[i], i.e. several ways of creating intermediate product IP[i>0]: IP[i,1] ... IP[i,ways(i)] and
IP[0] identified with ORE on the left side of the graph
with the last two points giving us a way of choosing a strategy for the product mix, then:
FUEL_NEEDED = mix[0] * FUEL[0] + ... + mix[v] * FUEL[v]
with everything in the above a variable
and the following given by the problem statement, with FUEL[0] ... FUEL[v] variables and the rest constants:
out_fuel[0] * FUEL[0] = ∑_j ( IP[j] * flow(IPj->FUEL0) )
⋮
out_fuel[v] * FUEL[v] = ∑_j ( IP[j] * flow(IPj->FUELv) )
and for each IP[i>0], with the IP[i] variables and the rest constants:
out_ip[i] * IP[i] = ∑_j≠i ( IP[j] * flow(IPj->IPi) )
in case of a several ways to generate IP[i], we mix (this is like introducing a graph node for the mix of IP[i] from its possible ways IP[i,j]):
out_ip[i] * IP[i] = ∑_j(0..ways(i)) ( IP[i,j] * mix[i,j] )
out_ip[i,1] * IP[i,1] = ∑_j≠i ( IP[j] * flow(IP[j]->IP[i,1]) )
⋮
out_ip[i,ways(i)] * IP[i,ways(i)] = ∑_j≠i ( IP[j] * flow(IP[j]->IP[i,ways(i)]) )
and IP[0] (i.e. ORE) a free variable to be minimized.
You see an underspecified linear programming problem appearing here, with a matrix having zeroes below the diagonal because it's a DAG, but it contains variables to be optimized in the matrix itself. How to attack that?

Efficient way of generating graphs from source nodes

Let's say I have a graph G, and around each node I have a few source nodes xs. I have to create a new graph G' using xs=[[a, b, c], [d, e], [f]] nodes such that they won't conflict with grey donuts as shown in the figure below.
Expected output G' is [[a, d, f], [a, e, f], [b, e, f]]; all others are conflicting a gray donut.
I currently solved it by taking all permutation and combination of nodes xs. This works for smaller numbers of nodes, but as my number of nodes xs increases with bigger graph G, it soon becomes 100s of thousands of combination to try.
I am looking for an efficient algorithm which will help me speed things up and get me all the non-conflicting graphs with a minimum number of iterations.
You have a fairly obvious minimum set of edges for each stage of your path. They are both necessary and sufficient for your solution. For notational convenience, I'll label the original graph X--Y--Z. Your corresponding G' nodes are
X a b c
Y d f
Z f
You do this in two steps:
For each edge in G, you must test for validity each possible edge in G`. This consists of
X--Y [a, b, c] X [d, e]
a total of 6 edges; 3 qualify: set XY = [a--d, a--e, b--d]
Y--Z [d, e] X [f]
a total of 2 edges; 2 qualify: set YZ = [d--f, e--f]
Now, you need only generate all combinations of XY x YZ where the Y nodes match. If you sort the lists by the "inner" node, you can do this very quickly as
[a--d, b--d] x [d--f]
[a--e] x [e--f]
Most current languages have modules to perform combinations for you, so the code will be short enough.
Does that get you going?

Maximal sets intersection

Given 5 finite sets a,b,c,d,e. Each set is assigned the arbitrary number:
a = 100, b = 34, c = 15, d = 89, e = 57
complement of each set has the same number assigned but negated e.g. for (a') it will be -100.
We need to find such intersection of these all sets or their complements so the resulting set is not null set, and the sum of the assigned numbers is maximal.
I only see one brute force solution to this problem, but it will be very inefficient and it's not elegant. In this case we just generate all combinations and resolve them to see if they are not empty, combinations look like this:
{a∩b'∩c'∩d'∩e'}, {a'∩b∩c'∩d∩e'}, {a'∩b'∩c∩d'∩e'}, {a'∩b'∩c'∩d∩e'}, {a'∩b'∩c'∩d'∩e} {a∩b∩c'∩d'∩e'}, {a∩b'∩c∩d'∩e'}, {a∩b'∩c'∩d∩e}, {a∩b'∩c'∩d'∩e}, {a'∩b∩c∩d'∩e'} {a'∩b∩c'∩d∩e'} {a'∩b∩c'∩d'∩e} ...
and then just pick the max number.
Looking forward to see if someone can think of something better :)
Define score(x, X) be to be the value of set X if x is in X, otherwise its negation.
Then, letting * represent an element that's not in any of the 5 sets, the highest score possible is:
max_{x in union(A, B, C, D, E, {*}} sum_{X in A, B, C, D, E} score(x, X)
This follows from the observation that any particular x is either in a set or its complement. You don't actually have to compute the union here. In Python you might write:
def max_config(A, B, C, D, E):
best = None
for S in A, B, C, D, E, set([None]):
for x in S:
best = max(best, sum(score(x, X) for X in A, B, C, D, E)))
return best
Assuming a set membership test is O(1), this has complexity O(N), where N is the total size of the given sets.

Resources