Merge collided bodies as one - algorithm

I'm working on some Physics stuff in Javascript and basically want to handle collided bodies as one.
I've got an array of all the collisions that occur, a collision consists of two bodies, body A and body B.
Let's say six collision occur, collisions between bodies:
X and Y
Y and Z
C and D
E and F
F and H
G and H
Now I want to merge all the bodies that are in some way connected into a single body. I want those merged bodies in a list. For example in this case I'd want a list that looks like this:
X, Y and Z (Because X collided with Y and Y collided with Z)
C and D (Because C collided with D)
E, F, G and H (Because E collided with F, F with H, and G with H)
Now I'm pretty sure there's some algorithm out there that I need, I just don't know where to look and I'm out of ideas to solve this myself.

How would you do this in real life?
I suppose I would read each rule. For each rule, I'd connect the two pieces. What I'd end up with is a collection of blobs. I could then walk each of the graphs to get the list of nodes in each one. Each "connected component" would be a "blob". Formalizing this algorithm a bit might give this:
// make the graph of connected components
nodes = map<symbol, pair<symbol, list<symbol>>>
for each (a, b) in rules do
if nodes[a] is null then nodes[a] = node(a, [b])
else nodes[a].connections.append(b)
if nodes[b] is null then nodes[b] = node(b, [a])
else nodes[b].connections.append(a)
loop
blobs = map<symbol, list<symbol>>
for each (a, b) in rules do
firstNode = nodes[a]
// do a DFS/BFS search starting from firstNode to find
// all nodes in the connected component. whenever you
// follow a link from a node, remove it from the node's
// list of links. this prevents ever searching from that
// node again since we know what component it's in already
// add each node to the list of symbols in blobs[a]
loop
In the first loop, we read each rule once, then do a constant amount of work, so it is O(n) time in the number of rules. It will store two connections for each rule and so is O(n) storage in terms of the number of rules.
In the second loop, we look at each rule and do a DFS or BFS for each rule's LHS symbol. However, note that the searches will only traverse any edge once, and so this is O(n) time in the number of rules. We will end up with some set of blobs the union of whose lists will be the set of symbols which is no more than the number of rules, so it's O(n) storage as well.
So we have an O(n) time, O(n) space complexity algorithm for determining the blobs. Can we do better, asymptotically speaking? Clearly we need to look at all n rules, so the time complexity is optimal. Also note that any solution to this problem must say for each symbol which blob that symbol ends up belonging to, so simply writing the answer down on the output tape takes O(n) space. So this should be optimal as well.

If you have an ADT (in this case a map) that contains all objects and you keep parent id to track object collisions you can handle each collision+merge in constant time.
// setup
var X = {id: 1, name:'X'};
var Y = {id: 2, name:'Y'};
var Z = {id: 3, name:'Z'};
var C = {id: 4, name:'C'};
var D = {id: 5, name:'D'};
var E = {id: 6, name:'E'};
var F = {id: 7, name:'F'};
var G = {id: 8, name:'G'};
var H = {id: 9, name:'H'};
var all = { 1:X, 2:Y, 3:Z, 4:C, 5:D, 6:E, 7:F, 8:G, 9:H };
// method to merge collided objects together
function collision(obj1, obj2) {
var p1 = obj1.parent;
var p2 = obj2.parent;
if(p1 === undefined && p2 === undefined) {
obj1.parent = obj1.id;
obj2.parent = obj1.id;
obj1.name += obj2.name;
delete all[obj2.id];
} else if(p1 !== undefined && p2 === undefined) {
obj2.parent = obj1.parent;
all[obj1.parent].name += obj2.name;
delete all[obj2.id];
} else if(p1 === undefined && p2 !== undefined) {
obj1.parent = obj2.parent;
all[obj2.parent].name += obj1.name;
delete all[obj1.id];
} else if(p1 !== undefined && p2 !== undefined && obj1.parent !== obj2.parent) {
if(all[obj1.parent] !== undefined) {
all[obj1.parent].name += all[obj2.parent].name;
delete all[obj2.parent];
} else if(all[obj2.parent] !== undefined) {
all[obj2.parent].name += all[obj1.parent].name;
delete all[obj1.parent];
}
}
}
// test
console.log(JSON.stringify(all));
collision(X, Y);
collision(Y, Z);
collision(C, D);
collision(E, F);
collision(F, H);
collision(G, H);
console.log(JSON.stringify(all));
collision(X, E);
console.log(JSON.stringify(all));
{"1":{"id":1,"name":"X"},"2":{"id":2,"name":"Y"},"3":{"id":3,"name":"Z"},"4":{"id":4,"name":"C"},"5":{"id":5,"name":"D"},"6":{"id":6,"name":"E"},"7":{"id":7,"name":"F"},"8":{"id":8,"name":"G"},"9":{"id":9,"name":"H"}}
{"1":{"id":1,"name":"XYZ","parent":1},"4":{"id":4,"name":"CD","parent":4},"6":{"id":6,"name":"EFHG","parent":6}}
{"1":{"id":1,"name":"XYZEFHG","parent":1},"4":{"id":4,"name":"CD","parent":4}}

Related

Satisfying triples in graph

I'm solving a problem where you have N events (1 <= N <= 100000) over M days (2 <= M <= 10^9). You are trying to find the minimum time of occurrence for each event.
For each event, you know that it couldn't have occurred prior to a day Si. You also have C triples (1 <= C <= 10^5) described by (a, b, x). An event b must have occurred at least x days after a.
Example:
There are 4 events, spread over 10 days. Event 1 had to occur on Day 1 or after. Event 2 had to occur on Day 2 or after. Event 3 had to occur on Day 3 or after. Event 4 had to occur on Day 4 or after.
The triples are (1, 2, 5); (2, 4, 2); (3, 4, 4). This means that Event 2 had to occur at least 5 days after Event 1; Event 4 had to occur at least 2 days after Event 2; and Event 4 had to occur at least 4 days after Event 3.
The solution is that Event 1 occurred on Day 1; Event 2 occurred on Day 6; Event 3 occurred on Day 3; and Event 4 occurred on Day 4. The reasoning behind this is Event 2 occurred at least five days after Event 1, so it cannot have occurred before Day 1+5=6. Event 4 occurred at least two days after Event 2, so it cannot have occurred before Day 6+2=8.
My solution:
I had the idea to use the triples to create a Directed graph. So in the example above, the graph would look like this:
1 --5-> 2 --2-> 4
3 --4-> 4
Basically you create a directed edge from the Event that happened first to the Event that had to happen after. The edge weight would be the number of days it had to at least happen after.
I thought that we could would first use the input data to create the graph. Then, you would just Binary search on all possible starting dates of the first event (1 through 10^9, which is about 30). In this case, the first event is Event 1. Then, you would go through the graph and see if this starting date was possible. If you ever encountered an event where the date it was occurring was before its Si date, then you would terminate this search and continue binary searching. This solution would have worked easy if it wasn't for the "event b must have occurred AT LEAST x days after a".
Does anyone have any other solutions for solving this problem, or how to alter mine so that it works? Thank you! If you have any questions please let me know :))
This can be mapped to a Simple Temporal Network where literature is rich, e.g.:
Dechter, Rina, Itay Meiri, and Judea Pearl. "Temporal constraint networks." Artificial intelligence 49.1-3 (1991): 61-95..
Planken, Léon Robert. "Algorithms for simple temporal reasoning." (2013). full dissertation
As indicated in the comments, all-pairs shortest-paths can calculate the minimal-network (which also generates new arcs/constraints between all these events). If your graph is sparse, Johnson's algorithm is better than Floyd-Warshall.
If you don't care about the complete minimal-network, but only about the bounds of your events, you are only interested in the first column and the first row of the all-pairs shortest-paths distance matrix. You can calculate these values by applying Bellman-Ford *2*n* times. These values are the distances of root -> i and i -> root where root is time 0.
Just some remarks about things which Damien indicated (reasoning from scratch it seems: impressive):
we use negative weights in the general problem such that pure Dijkstra won't do
existance of negative cycle <-> infeasibility / no solution / inconsistent
there will be a need for some root vertex which is the origin of time
Edit: Above somewhat targets strong inference / propagation like giving tight bounds in regards to their value-domains.
If you are only interested in some consistent solution, it might be another idea just to post these constraints as linear-program and use one of the highly-optimized implementations to solve it (open-source world: CoinOR clp; maybe google's glop). Simplex-based ones should give you an integral solution (i think the problem is totally unimodular). Interior-point based solvers should be faster, but i'm not sure if your result will be integral without some additional need for cross-over. (might be a good idea to add some dummy-objective like min(max(x)) (makespan-like))
Consider a topological sort of your DAG.
For a list L corresponding to the toposort of your graph, you have at the end the leaves.
Then for a vertex just before
L = [..., v, leaves]
you know that the edges outoing from v can only go to the vertices after (here the leaves).
This allows you to compute the minimal weight associated to v by applying Damien's max.
Do so up to the head of L.
Topological sorting is O(V+E)
Here is an illustration with a more interesting graph (read it from top to bottom)
5
/ \
4 7
1 2
0
6
A topo ordering is (4601275)
So we will visit in order 4,6,0,1,2,7 then 5 and any vertex we visit has all its dependencies already computed.
Assume each vertex k has event occuring after 2^k days. The after date is referred as weight.
e.g vertex 4 is weighted 2^4
Assume each edge (i,j) is weighted 5*i + j
6 is weighted 2^6 = 64
0 is weighted max(2^0, 64 + (0*5+6)) = 70
1 takes max(2^1, 70 + 5) = 75
7 takes max(2^7, 75 + 5*7+1, 2^2) = 2^7
Point to be highlighted (here for 7) is that the minimal date induced by dependencies of a node may occur before the date attached to that node. (and we have to keep the biggest one)
function topologicalSort({ V, E }) {
const visited = new Set ()
const stack = []
function dfs (v) {
if (visited.has(v)) { return }
E.has(v) && E.get(v).forEach(({ to, w }) => dfs(to))
visited.add(v)
stack.push(v)
}
// process nodes without incoming edges first
const heads = new Set ([...V])
for (const v of V) {
const edges = E.get(v)
edges && edges.forEach(({ to }) => heads.delete(to))
}
for (const v of heads) {
dfs(v)
}
for (const v of V) {
dfs(v)
}
return stack
}
class G {
constructor () {
this.V = new Set()
this.E = new Map()
}
setEdges (from, tos) {
this.V.add(from)
tos.forEach(({ to, w }) => this.V.add(to))
this.E.set(from, tos)
}
}
function solve ({ g, vToWeight }) {
const stack = topologicalSort(g)
console.log('ordering', stack.join(''))
stack.forEach(v => {
const edges = g.E.get(v)
if (!edges) { return }
const newval = Math.max(
vToWeight.get(v),
...edges.map(({ to, w }) => vToWeight.get(to) + w)
)
console.log('setting best for', v, edges.map(({ to, w }) => [vToWeight.get(to), w].join('+') ))
vToWeight.set(v, newval)
})
return vToWeight
}
function demo () {
const g = new G ()
g.setEdges(2, [{ to: 1, w: 5 }])
g.setEdges(4, [{ to: 2, w: 2 }, { to: 3, w: 4 }])
const vToWeight = new Map ([
[1, 1],
[2, 6],
[3, 3],
[4, 4]
])
return { g, vToWeight }
}
function demo2 () {
const g = new G ()
const addEdges = (i, ...tos) => {
g.setEdges(i, tos.map(to => ({ to, w: 5 * i + to })))
}
addEdges(5,4,7)
addEdges(7,1,2)
addEdges(1,0)
addEdges(0,6)
const vToWeight = new Map ([...g.V].map(v => [v, 2**v]))
return { g, vToWeight }
}
function dump (map) {
return [...map].map(([k, v])=> k+'->'+v)
}
console.log('----op\s sol----\n',dump(solve(demo())))
console.log('----that case---\n',dump(solve(demo2())))
The distance matrix (between all pairs of events = nodes) can by obtained in a iterative way, similar to the Floyd algorithm. Basically, iteratively:
T(x, y) = max (T(x,y), T(x, z) +T (z, y))
However, as mentioned by the OP in a comment, Floyd algorithm is O(n^3), which is too much for a value of n up to 10^5.
A key point is that no loop exists, and therefore a more efficient algorithm should exist.
A nice proposal was made by grodzi in their proposal: use a topologic sort of the Directed Acyclic Graph (DAG).
I made an implementation in C++ according to this idea, with on main difference:
I used a simple sort (from C++ library) for building the topological sorting. Doing it is simple and has a complexity of O(n logn). The dedicated method proposed by grodzi could be more efficient (seems O(n)). However, it is very easy to implement and such a complexity remains low.
After the topological sorting, we know that a given event only depends on the events before it. For this part, this insures a complexity of O(C), where C is the number of triples, i.e. the number of edges.
#include <iostream>
#include <vector>
#include <set>
#include <unordered_set>
#include <algorithm>
#include <tuple>
#include <numeric>
struct Triple {
int event1;
int event2;
int days;
};
struct Pred {
int pred;
int days;
};
void print_result (const std::vector<int> &index, const std::vector<int> &times) {
int n = times.size();
for (int i = 0; i < n; i++) {
std::cout << index[i]+1 << " " << times[index[i]] << "\n";
}
}
std::tuple<std::vector<int>, std::vector<int>> ordering (int n, const std::vector<Triple> &triples) {
std::vector<int> index(n);
std::vector<int> times(n, 0);
std::iota(index.begin(), index.end(), 0);
// Build predecessors matrix and sets
std::vector<std::vector<Pred>> pred (n);
std::vector<std::unordered_set<int>> set_pred (n);
for (auto &triple: triples) {
pred[triple.event2 - 1].emplace_back(Pred{triple.event1 - 1, triple.days});
set_pred[triple.event2 - 1].insert(triple.event1 - 1);
}
// Topological sort
std::sort (index.begin(), index.end(), [&set_pred] (int &i, int &j) {return set_pred[j].find(i) != set_pred[j].end();});
// Iterative calculation of times of arrival
for (int i = 1; i < n; ++i) {
int ip = index[i];
for (auto &p: pred[ip]) {
times[ip] = std::max(times[ip], times[p.pred] + p.days);
}
}
// Final sort, according to times of arrival
std::sort (index.begin(), index.end(), [&times] (int &i, int &j) {return times[i] < times[j];});
return {index, times};
}
int main() {
int n_events = 4;
std::vector<Triple> triples = {
{1, 2, 5},
{1, 3, 1},
{3, 2, 6},
{3, 4, 1}
};
std::vector<int> index(n_events);
std::vector<int> times(n_events);
std::tie (index, times) = ordering (n_events, triples);
print_result (index, times);
}
Result:
1 0
3 1
4 2
2 7

4x4 2D character matrix permutations

I have a 4x4 2D array of characters like this:
A B C D
U A L E
T S U G
N E Y I
Now, I would need to find all the permutations of 3 characters, 4 characters, etc till 10.
So, some words that one could "find" out of this are TEN, BALD, BLUE, GUYS.
I did search SO for this and Googled, but to no concrete help. Can you push me in the right direction in which algorithm I should learn (A* maybe?). Please be gentle as I'm no algorithms guy (aren't we all (well, at least a majority :)), but am willing to learn just don't know where exactly to start.
Ahhh, that's the game Boggle isn't it... You don't want permutations, you want a graph and you want to find words in the graph.
Well, I would start by arranging the characters as graph nodes, and join them to their immediate and diagonal neighbours.
Now you just want to search the graph. For each of the 16 starting nodes, you're going to do a recursion. As you move to a new node, you must flag it as being used so that you can't move to it again. When you leave a node (having completely searched it) you unflag it.
I hope you see where this is going...
For each node, you will visit each of its neighbours and add that character to a string. If you have built your dictionary with this search in mind, you will immediately be able to see whether the characters you have so far are the beginning of a word. This narrows the search nicely.
The kind of dictionary I'm talking about is where you have a tree whose nodes have one child for each letter of the alphabet. The beauty of these is that you only need to store which tree node you're currently up to in the search. If you decide you've found a word, you just backtrack via the parent nodes to work out which word it is.
Using this tree style along with a depth-first graph search, you can search ALL possible word lengths at the same time. That's about the most efficient way I can think of.
Let me just write a pseudocodish function for your graph search:
function FindWords( graphNode, dictNode, wordsList )
# can't use a letter twice
if graphNode.used then return
# don't continue if the letter not part of any word
if not dictNode.hasChild(graphNode.letter) then return
nextDictNode = dictNode.getChild(graphNode.letter)
# if this dictionary node is flagged as a word, add it to our list
nextDictNode.isWord()
wordsList.addWord( nextDictNode .getWord() )
end
# Now do a recursion on all our neighbours
graphNode.used = true
foreach nextGraphNode in graphNode.neighbours do
FindWords( nextGraphNode, nextDictNode, wordsList )
end
graphNode.used = false
end
And of course, to kick the whole thing off:
foreach graphNode in graph do
FindWords( graphNode, dictionary, wordsList )
end
All that remains is to build the graph and the dictionary. And I just remembered what that dictionary data structure is called! It's a Trie. If you need more space-efficient storage, you can compress into a Radix Tree or similar, but by far the easiest (and fastest) is to just use a straight Trie.
As you not define preferred language I implemented on C#:
private static readonly int[] dx = new int[] { 1, 1, 1, 0, 0, -1, -1, -1 };
private static readonly int[] dy = new int[] { -1, 0, 1, 1, -1, -1, 0, 1 };
private static List<string> words;
private static List<string> GetAllWords(char[,] matrix ,int d)
{
words = new List<string>();
bool[,] visited = new bool[4, 4];
char[] result = new char[d];
for (int i = 0; i < 4; i++)
for (int j = 0; j < 4; j++)
Go(matrix, result, visited, d, i, j);
return words;
}
private static void Go(char[,] matrix, char[] result, bool[,] visited, int d, int x, int y)
{
if (x < 0 || x >= 4 || y < 0 || y >= 4 || visited[x, y])
return;
if (d == 0)
{
words.Add(new String(result));
return;
}
visited[x, y] = true;
result[d - 1] = matrix[x, y];
for (int i = 0; i < 8; i++)
{
Go(matrix, result, visited, d - 1, x + dx[i], y + dy[i]);
}
visited[x, y] = false;
}
Code to get results:
char[,] matrix = new char[,] { { 'A', 'B', 'C', 'D' }, { 'U', 'A', 'L', 'E' }, { 'T', 'S', 'U', 'G' }, { 'N', 'E', 'Y', 'I' } };
List<string> list = GetAllWords(matrix, 3);
Change parameter 3 to required text length.
It seems you just use the 4x4 matrix as an array of length 16. If it is the case, you can try the recursive approach to generate permutations up to length k as follows:
findPermutations(chars, i, highLim, downLim, candidate):
if (i > downLim):
print candidate
if (i == highLim): //stop clause
return
for j in range(i,length(chars)):
curr <- chars[i]
candidate.append(curr)
swap(chars,i,j) // make it unavailable for repicking
findPermutations(chars,i+1,highLim,downLim,candidate)
//clean up environment after recursive call:
candidate.removeLast()
swap(chars ,i, j)
The idea is to print each "candidate" that has more chars then downLim (3 in your case), and terminate when you reach the upper limit (highLim) - 10 in your case.
At each time, you "guess" which character is the next to put - and you append it to the candidate, and recursively invoke to find the next candidate.
Repeat the process for all possible guesses.
Note that there are choose(10,16)*10! + choose(9,16)*9! + ... + choose(3,16)*3! different such permutations, so it might be time consuming...
If you want meaningful words, you are going to need some kind of dictionary (or to statistically extract one from some context) in order to match the candidates with the "real words".

Bentley-Ottmann Algorithm in Lua

I'm implementing the Bentley-Ottmann Algorithm in Lua for finding intersecting points in a polygon using the pseudo code located here.
I'm relatively new to implementing algorithms so I couldn't understand all parts of it. Here's my code so far:
local function getPolygonIntersectingVertices( poly )
-- initializing and sorting X
local X = {}
for i = 1, table.getn( poly ) do
if i == 1 then
table.insert( X, { x = poly[i].x, y = poly[i].y, endpoint = 'left' } )
elseif i == table.getn( poly ) then
table.insert( X, { x = poly[i].x, y = poly[i].y, endpoint = 'right' } )
else
table.insert( X, { x = poly[i].x, y = poly[i].y, endpoint = 'right' })
table.insert( X, { x = poly[i].x, y = poly[i].y, endpoint = 'left' })
end
end
local sortxy = function( a, b )
if a.x < b.x then return true
elseif a.x > b.x then return false
elseif a.y <= b.y then return true
else return false end
end
table.sort( X, sortxy )
-- Main loop
local SL = {}
local L = {}
local E
local i = 1
while next(X) ~= nil do
E = { x = X[i].x, y = X[i].y, endpoint = X[i].endpoint }
if E.endpoint == 'left' then
-- left endpoint code here
elseif E.endpoint == 'right' then
-- right endpoint code here
else
end
table.remove( X, i )
end
return L
end
My polygon is a table using this structure: { { x = 1, y = 3 }, { x = 5, y = 6 }, ... }
How do I determine "the segment above segE in SL;" and "the segment below segE in SL;" and what to do if the sweep line (SL) is empty? Also when inserting I into X, should I mark it with endpoint = 'intersect' and append it to the end so when the loop comes to this part goes into the "else" statement of the main loop or I've got the whole algorithm wrong?
It would be perfect in someone can show me a link with a simple implementation in Python, Ruby, etc. as I find it hard to follow the pseudo code and match it with the C++ example.
Your reference link fails from my location. I will reference the Wikipedia article, which is reasonably good.
How do I determine "the segment above segE in SL;" and "the segment below segE in SL;"
The algorithm requires a BST for current scan line intersections sorted on a key of y, i.e. in order vertically. So the segment above is the BST successor and the one below is the BST predecessor. Finding the predecessor and successor of a given node in a BST is standard stuff. The predecessor of key K is the rightmost node left of K. The successor is the leftmost node right of K. There are several ways of computing these. The simplest is to use parent pointers to walk back up and then down the tree from K. A stack-based iterator is another.
what to do if the sweep line (SL) is empty?
Keep processing the event queue. An empty sweep line just means no segments are crossing at its current x location.
Also when inserting I into X, should I mark it with endpoint = 'intersect' and append it to the end ...?
The event queue must remain sorted on the x-coordinate of points. When you insert an intersection it must be in x-coordinate order, too. It must be marked as an intersection because intersections are processed differently from endpoints. It will be processed in due course when it's the first remaining item in x order.
Note that Bentley Ottman - just as nearly all geometric algorithms - is notoriously subject to horrendous failures due to floating point inaccuracy. Also, the algorithm is normally given with a "general position" assumption, which lets out all the nasty cases of vertical edges, point-edge coincidence, edge-edge overlaps, etc. My strongest recommendation is to use rational arithmetic. Even then, getting a fully robust, correct implementation is a significant achievement. You can tell this by the very small number of free implementations!

determine if intersection of a set with conjunction of two other sets is empty

For any three given sets A, B and C: is there a way to determine (programmatically) whether there is an element of A that is part of the conjunction (edit: intersection) of B and C?
example:
A: all numbers greater than 3
B: all numbers lesser than 7
C: all numbers that equal 5
In this case there is an element in set A, being the number 5, that fits. I'm implementing this as specifications, so this numerical range is just an example. A, B, C could be anything.
EDIT:
Thanks Niki!
It will be helpful if B.Count <= C.Count <= A.Count.
D = GetCommonElements(B,C);
if( D.Count>0 && GetCommonElements(D,A).Count >0)
{
// what you want IS NOT EMPTY
}
else
{
// what you want IS EMPTY
}
SET GetCommonElements(X,Y)
{
common = {}
for x in X:
if Y.Contains(x):
common.Add(x);
return common;
}
Look at Efficient Set Intersection Algorithm.
We can use distributive laws of sets
if(HasCommonElements(A,B) || HasCommonElements(A,C))
{
// what you want IS NOT EMPTY
}
else
{
// what you want IS EMPTY
}
bool HasCommonElements(X,Y)
{
// if at least one common element is found return true(immediately)
return false
}
If I'm understanding your question correctly, you want to programmatically compute the intersection of 3 sets, right? You want to see if there is an element in A that exists in the intersection of B and C, or in other words, you want to know if the intersection of A, B and C is non-empty.
Many languages have set containers and intersection algorithms so you should just be able to use those. Your example in OCaml:
module Int = struct
type t = int
let compare i j = if i<j then -1 else if i=j then 0 else 1
end;;
module IntSet = Set.Make(Int);;
let a = List.fold_left (fun a b -> IntSet.add b a) IntSet.empty [4;5;6;7;8;9;10];;
let b = List.fold_left (fun a b -> IntSet.add b a) IntSet.empty [0;1;2;3;4;5;6];;
let c = IntSet.add 5 IntSet.empty;;
let aIbIc = IntSet.inter (IntSet.inter b c) a;;
IntSet.is_empty aIbIc;;
This outputs false, as the intersection of a b and c is non-empty (contains 5). This of course relies on the fact that the elements of the set are comparable (in the example, the function compare defines this property in the Int module).
Alternatively in C++:
#include<iostream>
#include<set>
#include<algorithm>
#include<iterator>
int main()
{
std::set<int> A, B, C;
for(int i=10; i>3; --i)
A.insert(i);
for(int i=0; i<7; ++i)
B.insert(i);
C.insert(5);
std::set<int> ABC, BC;
std::set_intersection(B.begin(), B.end(), C.begin(), C.end(), std::inserter(BC, BC.begin()));
std::set_intersection(BC.begin(), BC.end(), A.begin(), A.end(), std::inserter(ABC, ABC.begin()));
for(std::set<int>::iterator i = ABC.begin(); i!=ABC.end(); ++i)
{
std::cout << *i << " ";
}
std::cout << std::endl;
return 0;
}
The question needs further clarification.
First, do you want to work with symbolic sets given by a range?
And secondly, is it a one time question or is it going to be repeated in some form (if yes, what are the stable parts of the question?)?
If you want to work with ranges, then you could represent these with binary trees and define union and intersection operations on these structures. Building the tree would require O(n log n) and finding the result would require O(log n). This would not pay off with only tree sets, but it would be flexible to efficiently support any combination of ranges (if that is what you thought by 'it can be anything').
On the other hand if anything means, any set of elements, then the only option is to enumerate elements. In this case building B+ trees on sets B and C will also require O(n log n) time, but here n is the number of elements, and in the first case n is the number of ranges. The later might be several orders of magnitude bigger and of course it can represent only finite number of elements.

Fewest number of turns heuristic

Is there anyway to ensure the that the fewest number of turns heuristic is met by anything except a breadth first search? Perhaps some more explanation would help.
I have a random graph, much like this:
0 1 1 1 2
3 4 5 6 7
9 a 5 b c
9 d e f f
9 9 g h i
Starting in the top left corner, I need to know the fewest number of steps it would take to get to the bottom right corner. Each set of connected colors is assumed to be a single node, so for instance in this random graph, the three 1's on the top row are all considered a single node, and every adjacent (not diagonal) connected node is a possible next state. So from the start, possible next states are the 1's in the top row or 3 in the second row.
Currently I use a bidirectional search, but the explosiveness of the tree size ramps up pretty quickly. For the life of me, I haven't been able to adjust the problem so that I can safely assign weights to the nodes and have them ensure the fewest number of state changes to reach the goal without it turning into a breadth first search. Thinking of this as a city map, the heuristic would be the fewest number of turns to reach the goal.
It is very important that the fewest number of turns is the result of this search as that value is part of the heuristic for a more complex problem.
You said yourself each group of numbers represents one node, and each node is connected to adjascent nodes. Then this is a simple shortest-path problem, and you could use (for instance) Dijkstra's algorithm, with each edge having weight 1 (for 1 turn).
This sounds like Dijkstra's algorithm. The hardest part would lay in properly setting up the graph (keeping track of which node gets which children), but if you can devote some CPU cycles to that, you'd be fine afterwards.
Why don't you want a breadth-first search?
Here.. I was bored :-) This is in Ruby but may get you started. Mind you, it is not tested.
class Node
attr_accessor :parents, :children, :value
def initialize args={}
#parents = args[:parents] || []
#children = args[:children] || []
#value = args[:value]
end
def add_parents *args
args.flatten.each do |node|
#parents << node
node.add_children self unless node.children.include? self
end
end
def add_children *args
args.flatten.each do |node|
#children << node
node.add_parents self unless node.parents.include? self
end
end
end
class Graph
attr_accessor :graph, :root
def initialize args={}
#graph = args[:graph]
#root = Node.new
prepare_graph
#root = #graph[0][0]
end
private
def prepare_graph
# We will iterate through the graph, and only check the values above and to the
# left of the current cell.
#graph.each_with_index do |row, i|
row.each_with_index do |cell, j|
cell = Node.new :value => cell #in-place modification!
# Check above
unless i.zero?
above = #graph[i-1][j]
if above.value == cell.value
# Here it is safe to do this: the new node has no children, no parents.
cell = above
else
cell.add_parents above
above.add_children cell # Redundant given the code for both of those
# methods, but implementations may differ.
end
end
# Check to the left!
unless j.zero?
left = #graph[i][j-1]
if left.value == cell.value
# Well, potentially it's the same as the one above the current cell,
# so we can't just set one equal to the other: have to merge them.
left.add_parents cell.parents
left.add_children cell.children
cell = left
else
cell.add_parents left
left.add_children cell
end
end
end
end
end
end
#j = 0, 1, 2, 3, 4
graph = [
[3, 4, 4, 4, 2], # i = 0
[8, 3, 1, 0, 8], # i = 1
[9, 0, 1, 2, 4], # i = 2
[9, 8, 0, 3, 3], # i = 3
[9, 9, 7, 2, 5]] # i = 4
maze = Graph.new :graph => graph
# Now, going from maze.root on, we have a weighted graph, should it matter.
# If it doesn't matter, you can just count the number of steps.
# Dijkstra's algorithm is really simple to find in the wild.
This looks like same problem as this projeceuler http://projecteuler.net/index.php?section=problems&id=81
Comlexity of solution is O(n) n-> number of nodes
What you need is memoization.
At each step you can get from max 2 directions. So pick the solution that is cheaper.
It is something like (just add the code that takes 0 if on boarder)
for i in row:
for j in column:
matrix[i][j]=min([matrix[i-1][j],matrix[i][j-1]])+matrix[i][j]
And now you have lest expensive solution if you move just left or down
Solution is in matrix[MAX_i][MAX_j]
If you can go left and up too, than the BigO is much higher (I can figure out optimal solution)
In order for A* to always find the shortest path, your heuristic needs to always under-estimate the actual cost (the heuristic is "admissable"). Simple heuristics like using the Euclidean or Manhattan distance on a grid work well because they're fast to compute and are guaranteed to be less than or equal to the actual cost.
Unfortunately, in your case, unless you can make some simplifying assumptions about the size/shape of the nodes, I'm not sure there's much you can do. For example, consider going from A to B in this case:
B 1 2 3 A
C 4 5 6 D
C 7 8 9 C
C e f g C
C C C C C
The shortest path would be A -> D -> C -> B, but using spatial information would probably give 3 a lower heuristic cost than D.
Depending on your circumstances, you might be able to live with a solution that isn't actually the shortest path, as long as you can get the answer sooner. There's a nice blogpost here by Christer Ericson (progammer for God of War 3 on PS3) on the topic: http://realtimecollisiondetection.net/blog/?p=56
Here's my idea for an nonadmissable heuristic: from the point, move horizontally until you're even with the goal, then move vertically until you reach it, and count the number of state changes that you made. You can compute other test paths (e.g. vertically then horizontally) too, and pick the minimum value as your final heuristic. If your nodes are roughly equal size and regularly shaped (unlike my example), this might do pretty well. The more test paths you do, the more accurate you'd get, but the slower it would be.
Hope that's helpful, let me know if any of it doesn't make sense.
This untuned C implementation of breadth-first search can chew through a 100-by-100 grid in less than 1 msec. You can probably do better.
int shortest_path(int *grid, int w, int h) {
int mark[w * h]; // for each square in the grid:
// 0 if not visited
// 1 if not visited and slated to be visited "now"
// 2 if already visited
int todo1[4 * w * h]; // buffers for two queues, a "now" queue
int todo2[4 * w * h]; // and a "later" queue
int *readp; // read position in the "now" queue
int *writep[2] = {todo1 + 1, 0};
int x, y, same;
todo1[0] = 0;
memset(mark, 0, sizeof(mark));
for (int d = 0; ; d++) {
readp = (d & 1) ? todo2 : todo1; // start of "now" queue
writep[1] = writep[0]; // end of "now" queue
writep[0] = (d & 1) ? todo1 : todo2; // "later" queue (empty)
// Now consume the "now" queue, filling both the "now" queue
// and the "later" queue as we go. Points in the "now" queue
// have distance d from the starting square. Points in the
// "later" queue have distance d+1.
while (readp < writep[1]) {
int p = *readp++;
if (mark[p] < 2) {
mark[p] = 2;
x = p % w;
y = p / w;
if (x > 0 && !mark[p-1]) { // go left
mark[p-1] = same = (grid[p-1] == grid[p]);
*writep[same]++ = p-1;
}
if (x + 1 < w && !mark[p+1]) { // go right
mark[p+1] = same = (grid[p+1] == grid[p]);
if (y == h - 1 && x == w - 2)
return d + !same;
*writep[same]++ = p+1;
}
if (y > 0 && !mark[p-w]) { // go up
mark[p-w] = same = (grid[p-w] == grid[p]);
*writep[same]++ = p-w;
}
if (y + 1 < h && !mark[p+w]) { // go down
mark[p+w] = same = (grid[p+w] == grid[p]);
if (y == h - 2 && x == w - 1)
return d + !same;
*writep[same]++ = p+w;
}
}
}
}
}
This paper has a slightly faster version of Dijsktra's algorithm, which lowers the constant term. Still O(n) though, since you are really going to have to look at every node.
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.54.8746&rep=rep1&type=pdf
EDIT: THE PREVIOUS VERSION WAS WRONG AND WAS FIXED
Since a Djikstra is out. I'll recommend a simple DP, which has the benefit of running in the optimal time and not having you construct a graph.
D[a][b] is the minimal distance to x=a and y=b using only nodes where the x<=a and y<=b.
And since you can't move diagonally you only have to look at D[a-1][b] and D[a][b-1] when calculating D[a][b]
This gives you the following recurrence relationship:
D[a][b] = min(if grid[a][b] == grid[a-1][b] then D[a-1][b] else D[a-1][b] + 1, if grid[a][b] == grid[a][b-1] then D[a][b-1] else D[a][b-1] + 1)
However doing only the above fails on this case:
0 1 2 3 4
5 6 7 8 9
A b d e g
A f r t s
A z A A A
A A A f d
Therefore you need to cache the minimum of each group of node you found so far. And instead of looking at D[a][b] you look at the minimum of the group at grid[a][b].
Here's some Python code:
Note grid is the grid that you're given as input and it's assumed the grid is N by N
groupmin = {}
for x in xrange(0, N):
for y in xrange(0, N):
groupmin[grid[x][y]] = N+1#N+1 serves as 'infinity'
#init first row and column
groupmin[grid[0][0]] = 0
for x in xrange(1, N):
gm = groupmin[grid[x-1][0]]
temp = (gm) if grid[x][0] == grid[x-1][0] else (gm + 1)
groupmin[grid[x][0]] = min(groupmin[grid[x][0]], temp);
for y in xrange(1, N):
gm = groupmin[grid[0][y-1]]
temp = (gm) if grid[0][y] == grid[0][y-1] else (gm + 1)
groupmin[grid[0][y]] = min(groupmin[grid[0][y]], temp);
#do the rest of the blocks
for x in xrange(1, N):
for y in xrange(1, N):
gma = groupmin[grid[x-1][y]]
gmb = groupmin[grid[x][y-1]]
a = (gma) if grid[x][y] == grid[x-1][y] else (gma + 1)
b = (gmb) if grid[x][y] == grid[x][y-1] else (gma + 1)
temp = min(a, b)
groupmin[grid[x][y]] = min(groupmin[grid[x][y]], temp);
ans = groupmin[grid[N-1][N-1]]
This will run in O(N^2 * f(x)) where f(x) is the time the hash function takes which is normally O(1) time and this is one of the best functions you can hope for and it has a lot lower constant factor than Djikstra's.
You should easily be able to handle N's of up to a few thousand in a second.
Is there anyway to ensure the that the fewest number of turns heuristic is met by anything except a breadth first search?
A faster way, or a simpler way? :)
You can breadth-first search from both ends, alternating, until the two regions meet in the middle. This will be much faster if the graph has a lot of fanout, like a city map, but the worst case is the same. It really depends on the graph.
This is my implementation using a simple BFS. A Dijkstra would also work (substitute a stl::priority_queue that sorts by descending costs for the stl::queue) but would seriously be overkill.
The thing to notice here is that we are actually searching on a graph whose nodes do not exactly correspond to the cells in the given array. To get to that graph, I used a simple DFS-based floodfill (you could also use BFS, but DFS is slightly shorter for me). What that does is to find all connected and same character components and assign them to the same colour/node. Thus, after the floodfill we can find out what node each cell belongs to in the underlying graph by looking at the value of colour[row][col]. Then I just iterate over the cells and find out all the cells where adjacent cells do not have the same colour (i.e. are in different nodes). These therefore are the edges of our graph. I maintain a stl::set of edges as I iterate over the cells to eliminate duplicate edges. After that it is a simple matter of building an adjacency list from the list of edges and we are ready for a bfs.
Code (in C++):
#include <queue>
#include <vector>
#include <iostream>
#include <string>
#include <set>
#include <cstring>
using namespace std;
#define SIZE 1001
vector<string> board;
int colour[SIZE][SIZE];
int dr[]={0,1,0,-1};
int dc[]={1,0,-1,0};
int min(int x,int y){ return (x<y)?x:y;}
int max(int x,int y){ return (x>y)?x:y;}
void dfs(int r, int c, int col, vector<string> &b){
if (colour[r][c]<0){
colour[r][c]=col;
for(int i=0;i<4;i++){
int nr=r+dr[i],nc=c+dc[i];
if (nr>=0 && nr<b.size() && nc>=0 && nc<b[0].size() && b[nr][nc]==b[r][c])
dfs(nr,nc,col,b);
}
}
}
int flood_fill(vector<string> &b){
memset(colour,-1,sizeof(colour));
int current_node=0;
for(int i=0;i<b.size();i++){
for(int j=0;j<b[0].size();j++){
if (colour[i][j]<0){
dfs(i,j,current_node,b);
current_node++;
}
}
}
return current_node;
}
vector<vector<int> > build_graph(vector<string> &b){
int total_nodes=flood_fill(b);
set<pair<int,int> > edge_list;
for(int r=0;r<b.size();r++){
for(int c=0;c<b[0].size();c++){
for(int i=0;i<4;i++){
int nr=r+dr[i],nc=c+dc[i];
if (nr>=0 && nr<b.size() && nc>=0 && nc<b[0].size() && colour[nr][nc]!=colour[r][c]){
int u=colour[r][c], v=colour[nr][nc];
if (u!=v) edge_list.insert(make_pair(min(u,v),max(u,v)));
}
}
}
}
vector<vector<int> > graph(total_nodes);
for(set<pair<int,int> >::iterator edge=edge_list.begin();edge!=edge_list.end();edge++){
int u=edge->first,v=edge->second;
graph[u].push_back(v);
graph[v].push_back(u);
}
return graph;
}
int bfs(vector<vector<int> > &G, int start, int end){
vector<int> cost(G.size(),-1);
queue<int> Q;
Q.push(start);
cost[start]=0;
while (!Q.empty()){
int node=Q.front();Q.pop();
vector<int> &adj=G[node];
for(int i=0;i<adj.size();i++){
if (cost[adj[i]]==-1){
cost[adj[i]]=cost[node]+1;
Q.push(adj[i]);
}
}
}
return cost[end];
}
int main(){
string line;
int rows,cols;
cin>>rows>>cols;
for(int r=0;r<rows;r++){
line="";
char ch;
for(int c=0;c<cols;c++){
cin>>ch;
line+=ch;
}
board.push_back(line);
}
vector<vector<int> > actual_graph=build_graph(board);
cout<<bfs(actual_graph,colour[0][0],colour[rows-1][cols-1])<<"\n";
}
This is just a quick hack, lots of improvements can be made. But I think it is pretty close to optimal in terms of runtime complexity, and should run fast enough for boards of size of several thousand (don't forget to change the #define of SIZE). Also, I only tested it with the one case you have provided. So, as Knuth said, "Beware of bugs in the above code; I have only proved it correct, not tried it." :).

Resources