Satisfying triples in graph - algorithm

I'm solving a problem where you have N events (1 <= N <= 100000) over M days (2 <= M <= 10^9). You are trying to find the minimum time of occurrence for each event.
For each event, you know that it couldn't have occurred prior to a day Si. You also have C triples (1 <= C <= 10^5) described by (a, b, x). An event b must have occurred at least x days after a.
Example:
There are 4 events, spread over 10 days. Event 1 had to occur on Day 1 or after. Event 2 had to occur on Day 2 or after. Event 3 had to occur on Day 3 or after. Event 4 had to occur on Day 4 or after.
The triples are (1, 2, 5); (2, 4, 2); (3, 4, 4). This means that Event 2 had to occur at least 5 days after Event 1; Event 4 had to occur at least 2 days after Event 2; and Event 4 had to occur at least 4 days after Event 3.
The solution is that Event 1 occurred on Day 1; Event 2 occurred on Day 6; Event 3 occurred on Day 3; and Event 4 occurred on Day 4. The reasoning behind this is Event 2 occurred at least five days after Event 1, so it cannot have occurred before Day 1+5=6. Event 4 occurred at least two days after Event 2, so it cannot have occurred before Day 6+2=8.
My solution:
I had the idea to use the triples to create a Directed graph. So in the example above, the graph would look like this:
1 --5-> 2 --2-> 4
3 --4-> 4
Basically you create a directed edge from the Event that happened first to the Event that had to happen after. The edge weight would be the number of days it had to at least happen after.
I thought that we could would first use the input data to create the graph. Then, you would just Binary search on all possible starting dates of the first event (1 through 10^9, which is about 30). In this case, the first event is Event 1. Then, you would go through the graph and see if this starting date was possible. If you ever encountered an event where the date it was occurring was before its Si date, then you would terminate this search and continue binary searching. This solution would have worked easy if it wasn't for the "event b must have occurred AT LEAST x days after a".
Does anyone have any other solutions for solving this problem, or how to alter mine so that it works? Thank you! If you have any questions please let me know :))

This can be mapped to a Simple Temporal Network where literature is rich, e.g.:
Dechter, Rina, Itay Meiri, and Judea Pearl. "Temporal constraint networks." Artificial intelligence 49.1-3 (1991): 61-95..
Planken, Léon Robert. "Algorithms for simple temporal reasoning." (2013). full dissertation
As indicated in the comments, all-pairs shortest-paths can calculate the minimal-network (which also generates new arcs/constraints between all these events). If your graph is sparse, Johnson's algorithm is better than Floyd-Warshall.
If you don't care about the complete minimal-network, but only about the bounds of your events, you are only interested in the first column and the first row of the all-pairs shortest-paths distance matrix. You can calculate these values by applying Bellman-Ford *2*n* times. These values are the distances of root -> i and i -> root where root is time 0.
Just some remarks about things which Damien indicated (reasoning from scratch it seems: impressive):
we use negative weights in the general problem such that pure Dijkstra won't do
existance of negative cycle <-> infeasibility / no solution / inconsistent
there will be a need for some root vertex which is the origin of time
Edit: Above somewhat targets strong inference / propagation like giving tight bounds in regards to their value-domains.
If you are only interested in some consistent solution, it might be another idea just to post these constraints as linear-program and use one of the highly-optimized implementations to solve it (open-source world: CoinOR clp; maybe google's glop). Simplex-based ones should give you an integral solution (i think the problem is totally unimodular). Interior-point based solvers should be faster, but i'm not sure if your result will be integral without some additional need for cross-over. (might be a good idea to add some dummy-objective like min(max(x)) (makespan-like))

Consider a topological sort of your DAG.
For a list L corresponding to the toposort of your graph, you have at the end the leaves.
Then for a vertex just before
L = [..., v, leaves]
you know that the edges outoing from v can only go to the vertices after (here the leaves).
This allows you to compute the minimal weight associated to v by applying Damien's max.
Do so up to the head of L.
Topological sorting is O(V+E)
Here is an illustration with a more interesting graph (read it from top to bottom)
5
/ \
4 7
1 2
0
6
A topo ordering is (4601275)
So we will visit in order 4,6,0,1,2,7 then 5 and any vertex we visit has all its dependencies already computed.
Assume each vertex k has event occuring after 2^k days. The after date is referred as weight.
e.g vertex 4 is weighted 2^4
Assume each edge (i,j) is weighted 5*i + j
6 is weighted 2^6 = 64
0 is weighted max(2^0, 64 + (0*5+6)) = 70
1 takes max(2^1, 70 + 5) = 75
7 takes max(2^7, 75 + 5*7+1, 2^2) = 2^7
Point to be highlighted (here for 7) is that the minimal date induced by dependencies of a node may occur before the date attached to that node. (and we have to keep the biggest one)
function topologicalSort({ V, E }) {
const visited = new Set ()
const stack = []
function dfs (v) {
if (visited.has(v)) { return }
E.has(v) && E.get(v).forEach(({ to, w }) => dfs(to))
visited.add(v)
stack.push(v)
}
// process nodes without incoming edges first
const heads = new Set ([...V])
for (const v of V) {
const edges = E.get(v)
edges && edges.forEach(({ to }) => heads.delete(to))
}
for (const v of heads) {
dfs(v)
}
for (const v of V) {
dfs(v)
}
return stack
}
class G {
constructor () {
this.V = new Set()
this.E = new Map()
}
setEdges (from, tos) {
this.V.add(from)
tos.forEach(({ to, w }) => this.V.add(to))
this.E.set(from, tos)
}
}
function solve ({ g, vToWeight }) {
const stack = topologicalSort(g)
console.log('ordering', stack.join(''))
stack.forEach(v => {
const edges = g.E.get(v)
if (!edges) { return }
const newval = Math.max(
vToWeight.get(v),
...edges.map(({ to, w }) => vToWeight.get(to) + w)
)
console.log('setting best for', v, edges.map(({ to, w }) => [vToWeight.get(to), w].join('+') ))
vToWeight.set(v, newval)
})
return vToWeight
}
function demo () {
const g = new G ()
g.setEdges(2, [{ to: 1, w: 5 }])
g.setEdges(4, [{ to: 2, w: 2 }, { to: 3, w: 4 }])
const vToWeight = new Map ([
[1, 1],
[2, 6],
[3, 3],
[4, 4]
])
return { g, vToWeight }
}
function demo2 () {
const g = new G ()
const addEdges = (i, ...tos) => {
g.setEdges(i, tos.map(to => ({ to, w: 5 * i + to })))
}
addEdges(5,4,7)
addEdges(7,1,2)
addEdges(1,0)
addEdges(0,6)
const vToWeight = new Map ([...g.V].map(v => [v, 2**v]))
return { g, vToWeight }
}
function dump (map) {
return [...map].map(([k, v])=> k+'->'+v)
}
console.log('----op\s sol----\n',dump(solve(demo())))
console.log('----that case---\n',dump(solve(demo2())))

The distance matrix (between all pairs of events = nodes) can by obtained in a iterative way, similar to the Floyd algorithm. Basically, iteratively:
T(x, y) = max (T(x,y), T(x, z) +T (z, y))
However, as mentioned by the OP in a comment, Floyd algorithm is O(n^3), which is too much for a value of n up to 10^5.
A key point is that no loop exists, and therefore a more efficient algorithm should exist.
A nice proposal was made by grodzi in their proposal: use a topologic sort of the Directed Acyclic Graph (DAG).
I made an implementation in C++ according to this idea, with on main difference:
I used a simple sort (from C++ library) for building the topological sorting. Doing it is simple and has a complexity of O(n logn). The dedicated method proposed by grodzi could be more efficient (seems O(n)). However, it is very easy to implement and such a complexity remains low.
After the topological sorting, we know that a given event only depends on the events before it. For this part, this insures a complexity of O(C), where C is the number of triples, i.e. the number of edges.
#include <iostream>
#include <vector>
#include <set>
#include <unordered_set>
#include <algorithm>
#include <tuple>
#include <numeric>
struct Triple {
int event1;
int event2;
int days;
};
struct Pred {
int pred;
int days;
};
void print_result (const std::vector<int> &index, const std::vector<int> &times) {
int n = times.size();
for (int i = 0; i < n; i++) {
std::cout << index[i]+1 << " " << times[index[i]] << "\n";
}
}
std::tuple<std::vector<int>, std::vector<int>> ordering (int n, const std::vector<Triple> &triples) {
std::vector<int> index(n);
std::vector<int> times(n, 0);
std::iota(index.begin(), index.end(), 0);
// Build predecessors matrix and sets
std::vector<std::vector<Pred>> pred (n);
std::vector<std::unordered_set<int>> set_pred (n);
for (auto &triple: triples) {
pred[triple.event2 - 1].emplace_back(Pred{triple.event1 - 1, triple.days});
set_pred[triple.event2 - 1].insert(triple.event1 - 1);
}
// Topological sort
std::sort (index.begin(), index.end(), [&set_pred] (int &i, int &j) {return set_pred[j].find(i) != set_pred[j].end();});
// Iterative calculation of times of arrival
for (int i = 1; i < n; ++i) {
int ip = index[i];
for (auto &p: pred[ip]) {
times[ip] = std::max(times[ip], times[p.pred] + p.days);
}
}
// Final sort, according to times of arrival
std::sort (index.begin(), index.end(), [&times] (int &i, int &j) {return times[i] < times[j];});
return {index, times};
}
int main() {
int n_events = 4;
std::vector<Triple> triples = {
{1, 2, 5},
{1, 3, 1},
{3, 2, 6},
{3, 4, 1}
};
std::vector<int> index(n_events);
std::vector<int> times(n_events);
std::tie (index, times) = ordering (n_events, triples);
print_result (index, times);
}
Result:
1 0
3 1
4 2
2 7

Related

Minimum jumps to reach end when some points are blocked

A person is currently at (0,0) and wants to reach (X,0) and he has to jump a few steps to reach his house.From a point say (a,0), he can jump to either (a + k1,0) i.e forward of k1 steps or he can jump(a-k2,0) i.e backward jump of k2 steps. The first jump he takes must be forward.Also,he cannot jump backward twice consecutively.But he can jump any no of continuous forward jump.There are n points a1,a2 upto an where he cannot jump.
I have to determine minimum no of jumps to reach his house or to conclude that he cannot reach his house. If he can reach house print yes and specify no. of jumps If not print no.
Here
X = location of persons house.
N = no. of points where he cannot jump.
k1 = forward jump.
k2 = backward jump.
example
For inputs
X=6 N=2 k1=4 k2=2
Blocked points = 3 5
the answer is 3 (4 to 8 to 6 or 4 to 2 to 6)
For input
6 2 5 2
1 3
the person cannot reach his house
N can be upto 10^4 and X can be upto 10^5
I thought of using dynamic programming but i'm not able to implement it. Can anyone help?
I think your direction of using dynamic programming can work but I will show another way to solve the question with the same asymptotic time complexity as dynamic programming would achieve.
This question can be described as a problem in graphs where you have X nodes indexed 1 to X and these is an edge between every a and a + k1, b and b - k2, where you remove the nodes in N.
This will be enough if you can jump backward how many times you would like but you cannot jump twice in a row so you can add the following modification: Duplicate the graph's nodes, duplicate also the forward going edges but make them go from the duplicated to the original, now make all of the backward going edges to go to the duplicated graph. Now every backward edge will send you to the duplicated and you will not be able to take a backward edge again until you will go to the original using a forward going edge. This will make sure that after a backward edge you will always take a forward edge - so you will not be able to jump forward twice.
Now finding the shortest path from 1 to X is like finding smallest number of jumps since edge is a jump.
Finding the shortest path in directed unweighted graph takes O(|V|+|E|) time and memory (using BFS), your graph has 2 * X as |V| and also the number of edges will be 2 * 2 * X so time and memory complexity of O(X).
If you can jump backward twice you use the networkx library in python for a simple demo (you can also use if for complicated demo):
import matplotlib.pyplot as plt
import networkx as nx
X = 6
N = 2
k1 = 4
k2 = 2
nodes = [0, 1, 2, 4, 6]
G = nx.DiGraph()
G.add_nodes_from(nodes)
for n in nodes:
if n + k1 in nodes:
G.add_edge(n, n + k1)
if n - k2 in nodes:
G.add_edge(n, n - k2)
nx.draw(G, with_labels=True, font_weight='bold')
plt.plot()
plt.show()
path = nx.shortest_path(G, 0, X)
print(f"Number of jumps: {len(path) - 1}. path: {str(path)}")
Would a breadth-first search be efficient enough?
Something like this? (Python code)
from collections import deque
def f(x, k1, k2, blocked):
queue = deque([(k1, 0, None, None)])
while (queue):
(p, depth, direction, prev) = queue.popleft()
if p in blocked or (x + k2 < p < x - k1): # not sure about these boundaries ... ideas welcome
continue
if p == x:
return depth
blocked.add(p) # visited
queue.append((p + k1, depth + 1, "left", direction))
if prev != "right":
queue.append((p - k2, depth + 1, "right", direction))
X = 6
k1 = 4
k2 = 2
blocked = set([3, 5])
print f(X, k1, k2, blocked)
X = 2
k1 = 3
k2 = 4
blocked = set()
print f(X, k1, k2, blocked)
Here is the code of גלעדברקן in c++:
#include <iostream>
#include <queue>
using namespace std;
struct node {
int id;
int depth;
int direction; // 1 is left, 0 is right
};
int BFS(int start, int end, int k1, int k2, bool blocked[], int length)
{
queue<node> q;
blocked[0] = true;
q.push({start, 0, 0});
while(!q.empty())
{
node f = q.front();
q.pop();
if (f.id == end) {
return f.depth;
}
if(f.id + k1 < length and !blocked[f.id + k1])
{
blocked[f.id + k1] = true;
q.push({f.id + k1, f.depth + 1, 0});
}
if (f.direction != 1) { // If you just went left - don't go left again
if(f.id - k2 >= 0 and !blocked[f.id - k2])
{
blocked[f.id - k2] = true;
q.push({f.id - k2, f.depth + 1, 1});
}
}
}
return -1;
}
int main() {
bool blocked[] = {false, false, false, false, false, false, false};
std::cout << BFS(0, 6, 4, 2, blocked, 7) << std::endl;
return 0;
}
You can control on the length of the steps, the start and end, and the blocked nodes.

Dividing K resources fairly to N people

There are K points on a circle that represent the location of the treasures. N people want to share the treasures. You want to divide the treasure fairly among all of them such that the difference between the person having the maximum value and the person having the minimum value is as minimum as possible.
They all take contiguous set of points on the circle. That is, they
cannot own segmented treasures.
All the treasures must be allocated
Each treasure should only belong to only one person.
For example if there are 4 treasures and 2 people as shown in the figure, then the optimal way of dividing would be
(6, 10) and (11, 3) => with a difference of 2.
1 <= n <= 25
1 <= k <= 50
How do I approach solving this problem? I planned to calculate the mean of all the points and keep adding the resources until they are lesser than the mean for each person. But as obvious as it is, it will not work in all cases.
I'd be glad if someone throws some light.
So say we fix x, y as the min max allowed for the treasure.
I need to figure out if we can get a solution in these constraints.
For that I need to traverse the circle and create exactly N segments with sums between x and y.
This I can solve via dynamic programming, a[i][j][l] = 1 if I can split the elements between i and j into l whose sums are between x and y (see above). To compute it we can evaluate a[i][j][l] = is_there_a_q_such_that(a[i][q - 1][l-1] == 1 and sum(q -> j) between x and y).
To handle the circle look for n-1 segments that cover enough elements and the remaining difference remains between x and y.
So naive solution is O(total_sum^2) to select X and Y plus O(K^3) to iterate over i,j,l and another O(K) to find a q and another O(K) to get the sum. That's a total of O(total_sum^2 * K^5) which likely is too slow.
So we need to compute sums a lot. So let's precompute a partial sums array sums[w] = sum(elements between pos 0 and pos w). So to get the sum between q and j you only need to compute sums[j] - sums[q-1]. This takes care of O(K).
To compute the a[i][j][l].
Since the treasures are always positive, if a partial sum is too small we need to grow the interval, if the sum is too high we need to shrink the interval. Sine we fixed a side of the interval (at j) we can only move the q. We can use binary search to find the closes t and the furthest q that allow us to be between x and y. Let's call them low_q (the closest to j, lowest sum) and high_q (far from j, largest sum). If low_q < i then the interval is too small so the value is 0. So now we need to check if there's a 1 between max(high_q, i) and low_q. The max is to make sure we don't look outside of the interval. To do the check we can precompute again partial sums to count how many 1s are in out interval. We only need to do this once per level so it will be amortized O(1). So, if we did everything right this will be O(K^3 logK).
We still have the total_sum^2 in front. Let's say we fix X. If for a given y we have a solution you might be able to find also a smaller y that still has a solution. If you can't find a solution for a given y then you won't be able to find a solution for any smaller value. So we can now do a binary search on y.
So this is now O(total_sum *log(total_sum) * K^3 * logK).
Other optimization would be to not raise i if the sum(0-> i- 1) > x.
You might not want to check for values of x > total_sum/K since that's the ideal minimum value. This should cancel out one of the K is the complexity.
There might be other things that you can do, but I think this will be fast enough for your constraints.
You can do brute-force for O(k^n) or dp for O(k^{2}*MAXSUM^{k — 1}).
dp[i][val1][val2]...[val k -1] is it possible to distribute first k items, so first have val1, second — val2 and so on. There are k * MAXSUM(k - 1) states and you need O(k) to do step, you simply choose who takes ith item.
I dont think it's possible to solve it faster.
No standard type of algorithm ( greedy, divide and conquer etc ) exists for this problem.
You would have to check each and every combination of (resource, people) and pick the answer. Once you have solved the problem using recursion, you can throw DP to optimize the solution.
The curx of the solution is:
Recuse through all the treasures
If you current treasure is not the last,
set minimum difference to Infinity
for each user
assign the current treasure to the current user
ans = recurse further by going to the next treasure
update minimumDifference if necessary
else
Find and max amount of treasure assigned and minimum amount of treasure assigned
and return the difference
Here is the javascript version of the answer.
I have commented it to try to explain the logic as well:
// value of the treasure
const K = [6, 3, 11, 10];
// number of users
const N = 2;
// Array which track amount of treasure with each user
const U = new Array(N).fill(0);
// 2D array to save whole solution
const bitset = [...new Array(N)].map(() => [...new Array(K.length)]);
const solve = index => {
/**
* The base case:
* We are out of treasure.
* So far, the assigned treasures will be in U array
*/
if (index >= K.length) {
/**
* Take the maximum and minimum and return the difference along with the bitset
*/
const max = Math.max(...U);
const min = Math.min(...U);
const answer = { min: max - min, set: bitset };
return answer;
}
/**
* We have treasures to check
*/
let answer = { min: Infinity, set: undefined };
for (let i = 0; i < N; i++) {
// Let ith user take the treasure
U[i] += K[index];
bitset[i][index] = 1;
/**
* Let us recuse and see what will be the answer if ith user has treasure at `index`
* Note that ith user might also have other treasures for indices > `index`
*/
const nextAnswer = solve(index + 1);
/**
* Did we do better?
* Was the difference bw the distribution of treasure reduced?
* If so, let us update the current answer
* If not, we can assign treasure at `index` to next user (i + 1) and see if we did any better or not
*/
if (nextAnswer.min <= answer.min) {
answer = JSON.parse(JSON.stringify(nextAnswer));
}
/**
* Had we done any better,the changes might already be recorded in the answer.
* Because we are going to try and assign this treasure to the next user,
* Let us remove it from the current user before iterating further
*/
U[i] -= K[index];
bitset[i][index] = 0;
}
return answer;
};
const ans = solve(0);
console.log("Difference: ", ans.min);
console.log("Treasure: [", K.join(", "), "]");
console.log();
ans.set.forEach((x, i) => console.log("User: ", i + 1, " [", x.join(", "), "]"));
Each problem at index i creates exactly N copies of
itself and we have total K indices, the time complexity of the problem
to solve is O(K^N)
We can definitely do better by throwing memoization.
Here comes the tricky part:
If we have a distribution of treasure for one user, the minimum
difference of the distributions of treasure among users will be
the same.
In out case, bitset[i] represents the distribution for ith user.
Thus, we can memoize the results for the bitset of the user.
One you realize that, coding that is easy:
// value of the treasure
const K = [6, 3, 11, 10, 1];
// number of users
const N = 2;
// Array which track amount of treasure with each user
const U = new Array(N).fill(0);
// 2D array to save whole solution
const bitset = [...new Array(N)].map(() => [...new Array(K.length).fill(0)]);
const cache = {};
const solve = (index, userIndex) => {
/**
* Do we have cached answer?
*/
if (cache[bitset[userIndex]]) {
return cache[bitset[userIndex]];
}
/**
* The base case:
* We are out of treasure.
* So far, the assigned treasures will be in U array
*/
if (index >= K.length) {
/**
* Take the maximum and minimum and return the difference along with the bitset
*/
const max = Math.max(...U);
const min = Math.min(...U);
const answer = { min: max - min, set: bitset };
// cache the answer
cache[bitset[userIndex]] = answer;
return answer;
}
/**
* We have treasures to check
*/
let answer = { min: Infinity, set: undefined };
// Help us track the index of the user with optimal answer
let minIndex = undefined;
for (let i = 0; i < N; i++) {
// Let ith user take the treasure
U[i] += K[index];
bitset[i][index] = 1;
/**
* Let us recuse and see what will be the answer if ith user has treasure at `index`
* Note that ith user might also have other treasures for indices > `index`
*/
const nextAnswer = solve(index + 1, i);
/**
* Did we do better?
* Was the difference bw the distribution of treasure reduced?
* If so, let us update the current answer
* If not, we can assign treasure at `index` to next user (i + 1) and see if we did any better or not
*/
if (nextAnswer.min <= answer.min) {
answer = JSON.parse(JSON.stringify(nextAnswer));
minIndex = i;
}
/**
* Had we done any better,the changes might already be recorded in the answer.
* Because we are going to try and assign this treasure to the next user,
* Let us remove it from the current user before iterating further
*/
U[i] -= K[index];
bitset[i][index] = 0;
}
cache[answer.set[minIndex]] = answer;
return answer;
};
const ans = solve(0);
console.log("Difference: ", ans.min);
console.log("Treasure: [", K.join(", "), "]");
console.log();
ans.set.forEach((x, i) => console.log("User: ", i + 1, " [", x.join(", "), "]"));
// console.log("Cache:\n", cache);
We can definitely improve the space used by not caching the whole bitset. Removing the bitset from cahce is trivial.
Consider that for each k, we can pair a sum growing from A[i] to the left (sum A[i-j..i]) with all available intervals recorded for f(k-1, i-j-1) and update them: for each interval, (low, high), if the sum is greater than high, then new_interval = (low, sum) and if the sum is lower than low, then new_interval = (sum, high); otherwise, the interval stays the same. For example,
i: 0 1 2 3 4 5
A: [5 1 1 1 3 2]
k = 3
i = 3, j = 0
The ordered intervals available for f(3-1, 3-0-1) = f(2,2) are:
(2,5), (1,6) // These were the sums, (A[1..2], A[0]) and (A[2], A[0..1])
Sum = A[3..3-0] = 1
Update intervals: (2,5) -> (1,5)
(1,6) -> (1,6) no change
Now, we can make this iteration much more efficient by recognizing and pruning intervals during the previous k round.
Watch:
A: [5 1 1 1 3 2]
K = 1:
N = 0..5; Intervals: (5,5), (6,6), (7,7), (8,8), (11,11), (13,13)
K = 2:
N = 0: Intervals: N/A
N = 1: Intervals: (1,5)
N = 2: (1,6), (2,5)
Prune: remove (1,6) since any sum <= 1 would be better paired with (2,5)
and any sum >= 6 would be better paired with (2,5)
N = 3: (1,7), (2,6), (3,5)
Prune: remove (2,6) and (1,7)
N = 4: (3,8), (4,7), (5,6), (5,6)
Prune: remove (3,8) and (4,7)
N = 5: (2,11), (5,8), (6,7)
Prune: remove (2,11) and (5,8)
For k = 2, we are now left with the following pruned record:
{
k: 2,
n: {
1: (1,5),
2: (2,5),
3: (3,5),
4: (5,6),
5: (6,7)
}
}
We've cut down the iteration of k = 3 from a list of n choose 2 possible splits to n relevant splits!
The general algorithm applied to k = 3:
for k' = 1 to k
for sum A[i-j..i], for i <- [k'-1..n], j <- [0..i-k'+1]:
for interval in record[k'-1][i-j-1]: // records are for [k'][n']
update interval
prune intervals in k'
k' = 3
i = 2
sum = 1, record[2][1] = (1,5) -> no change
i = 3
// sums are accumulating right to left starting from A[i]
sum = 1, record[2][2] = (2,5) -> (1,5)
sum = 2, record[2][1] = (1,5) -> no change
i = 4
sum = 3, record[2][3] = (3,5) -> no change
sum = 4, record[2][2] = (2,5) -> no change
sum = 5, record[2][1] = (1,5) -> no change
i = 5
sum = 2, record[2][4] = (5,6) -> (2,6)
sum = 5, record[2][3] = (3,5) -> no change
sum = 6, record[2][2] = (2,5) -> (2,6)
sum = 7, record[2][1] = (1,5) -> (1,7)
The answer is 5 paired with record[2][3] = (3,5), yielding the updated interval, (3,5). I'll leave the pruning logic for the reader to work out. If we wanted to continue, here's the pruned list for k = 3
{
k: 3
n: {
2: (1,5),
3: (1,5),
4: (3,5),
5: (3,5)
}
}

Largest sum of k elements not larger than m

This problem is from a programing competition, and I can't manage to solve it in acceptable time.
You are given an array a of n integers. Find the largest sum s of exactly k elements (not necessarily continuous) that does not exceed a given integer m (s < m).
Constraints:
0 < k <= n < 100
m < 3000
0 < a[i] < 100
Info: A solution is guaranteed to exist for the given input.
Now, I guess my best bet is a DP approach, but couldn't come up with the correct formula.
I would try two things. They are both based on the following idea:
If we can solve the problem of deciding if there are k elements that sum exactly to p, then we can binary search for the answer in [1, m].
1. Optimized bruteforce
Simply sort your array and cut your search short when the current sum exceeds p. The idea is that you will generally only have to backtrack very little, since the sorted array should help eliminate bad solutions early.
To be honest, I doubt this will be fast enough however.
2. A randomized algorithm
Keep a used array of size k. Randomly assign elements to it. While their sum is not p, randomly replace an element with another and make sure to update their sum in constant time.
Keep doing this a maximum of e times (experiment with its value for best results, the complexity will be O(e log m) in the end, so it can probably go quite high), if you couldn't get to sum p during this time, assume that it is not possible.
Alternatively, forget the binary search. Run the randomized algorithm directly and return the largest valid sum it finds in e runs or until your allocated running time ends.
I am not sure how DP would efficiently keep track of the number of elements used in the sum. I think the randomized algorithm is worth a shot since it is easy to implement.
Both of the accepted methods are inferior. Also, this is not a problem type that can be solved by DP.
The following is the correct method illustrated via example:
imagine a = { 2, 3, 5, 9, 11, 14, 17, 23 } (hence n = 8), k = 3, and s = 30
Sort the array a.
Define three pointers into the array, P1, P2, and P3 going from 1 to n. P1 < P2 < P3
Set P3 to a_max (here 23), P1 to 1, and P2 to 2. Calculate the sum s (here 23 + 2 + 3 = 28). If s > S, then decrease P3 by one and try again until you find a solution. If P3 < 3, then there is no solution. Store your first solution as best known solution so far (BKSSF).
Next, increase P2 until s > S. If you find a better solution update BKSSF. Decrease P2 by one.
Next increase P1 until s > S. Update if you find a better solution.
Now go back to P2 and decrease it by one.
Then increase P1 until s > S. etc.
You can see this is a recursive algorithm, in which for every increase or decrease, there is one or more corresponding decreases, increases.
This algorithm will be much, much faster than the attempts above.
for l <= k and r <= s:
V[l][r] = true iff it is possible to choose exactly l elements that sum up to r.
V[0][0] = true
for i in 1..n:
V'[][] - initialize with false
for l in 0..k-1:
for r in 0..s:
if V[l][r] and s + a[i] <= s:
V'[l + 1][r + a[i]] = true
V |= V'
That gives you all achievable sums in O(k * n * s).
I think Tyler Durden had the right idea. But you don't have to sum -all- the elements, and you can basically do it greedily, so you can cut down the loop a lot. In C++:
#include <iostream>
#include <algorithm>
using namespace std;
#define FI(n) for(int i=0;i<(n);i++)
int m, n, k;
int a[] = { 12, 43, 1, 4, 3, 5, 13, 34, 24, 22, 31 },
e[20];
inline int max(int i) { return n-k+i+1; }
void print(int e[], int ii, int sum)
{ cout << sum << '\t';
FI(ii+1) cout << e[i]<<','; cout<<'\n';
}
bool desc(int a, int b) { return a>b; }
int solve()
{ sort(a, a+n, desc);
cout <<"a="; FI(n) cout << a[i]<<','; cout<<"\nsum\tindexes\n";
int i,sum;
i = e[0] = sum = 0;
print (e,i,a[0]);
while(1)
{ while (e[i]<max(i) && sum+a[e[i]]>=m) e[i]++;
if (e[i]==max(i))
{ if (!i) return -1; // FAIL
cout<<"*"; print (e,i,sum);
sum -= a[e[--i]++];
} else // sum+a[e[i]]<m
{ sum += a[e[i]];
print (e,i,sum);
if (i+1==k) return sum;
e[i+1] = e[i]+1;
i++;
}
}
}
int main()
{ n = sizeof(a)/sizeof(int);
k = 3;
m = 39;
cout << "n,k,m="<<n<<' '<<k<<' '<<m<<'\n';
cout << solve();
}
For m=36 it gives the output
n,k,m=11 3 36
a=43,34,31,24,22,13,12,5,4,3,1,
sum indexes
43 0,
34 1,
*34 1,10,
31 2,
35 2,8,
*35 2,8,11,
34 2,9,
35 2,9,10,
35
For m=37 it gives
n,k,m=11 3 37
a=43,34,31,24,22,13,12,5,4,3,1,
sum indexes
43 0,
34 1,
*34 1,10,
31 2,
36 2,7,
*36 2,7,11,
35 2,8,
36 2,8,10,
36
(One last try: for m=39 it also gives the right answer, 38)
Output: the last number is the sum and the line above it has the indexes. Lines with an asterisk are before backtracking, so the last index of the line is one too high. Runtime should be O(k*n).
Sorry for the hard-to-understand code. I can clean it up and provide explanation upon request but I have another project due at the moment ;).

Find the a location in a matrix so that the cost of every one moving to that location is smallest

There is a matrix, m×n. Several groups of people locate at some certain spots. In the following example, there are three groups and the number 4 indicates there are four people in this group. Now we want to find a meeting point in the matrix so that the cost of all groups moving to that point is the minimum. As for how to compute the cost of moving one group to another point, please see the following example.
Group1: (0, 1), 4
Group2: (1, 3), 3
Group3: (2, 0), 5
. 4 . .
. . . 3
5 . . .
If all of these three groups moving to (1, 1), the cost is:
4*((1-0)+(1-1)) + 5*((2-1)+(1-0))+3*((1-1)+(3-1))
My idea is :
Firstly, this two dimensional problem can be reduced to two one dimensional problem.
In the one dimensional problem, I can prove that the best spot must be one of these groups.
In this way, I can give a O(G^2) algorithm.(G is the number of group).
Use iterator's example for illustration:
{(-100,0,100),(100,0,100),(0,1,1)},(x,y,population)
for x, {(-100,100),(100,100),(0,1)}, 0 is the best.
for y, {(0,100),(0,100),(1,1)}, 0 is the best.
So it's (0, 0)
Is there any better solution for this problem.
I like the idea of noticing that the objective function can be decomposed to give the sum of two one-dimensional problems. The remaining problems look a lot like the weighted median to me (note "solves the following optimization problem in "http://www.stat.ucl.ac.be/ISdidactique/Rhelp/library/R.basic/html/weighted.median.html" or consider what happens to the objective function as you move away from the weighted median).
The URL above seems to say the weighted median takes time n log n, which I guess means that you could attain their claim by sorting the data and then doing a linear pass to work out the weighted median. The numbers you have to sort are in the range [0, m] and [0, n] so you could in theory do better if m and n are small, or - of course - if you are given the data pre-sorted.
Come to think of it, I don't see why you shouldn't be able to find the weighted median with a linear time randomized algorithm similar to that used to find the median (http://en.wikibooks.org/wiki/Algorithms/Randomization#find-median) - repeatedly pick a random element, use it to partition the items remaining, and work out which half the weighted median should be in. That gives you expected linear time.
I think this can be solved in O(n>m?n:m) time and O(n>m?n:m) space.
We have to find the median of x coordinates and median of all y coordinates in the k points and the answer will be (x_median,y_median);
Assumption is this function takes in the following inputs:
total number of points :int k= 4+3+5 = 12;
An array of coordinates:
struct coord_t c[12] = {(0,1),(0,1),(0,1), (0,1), (1,3), (1,3),(1,3),(2,0),(2,0),(2,0),(2,0),(2,0)};
c.int size = n>m ? n:m;
Let the input of the coordinates be an array of coordinates. coord_t c[k]
struct coord_t {
int x;
int y;
};
1. My idea is to create an array of size = n>m?n:m;
2. int array[size] = {0} ; //initialize all the elements in the array to zero
for(i=0;i<k;i++)
{
array[c[i].x] = +1;
count++;
}
int tempCount =0;
for(i=0;i<k;i++)
{
if(array[i]!=0)
{
tempCount += array[i];
}
if(tempCount >= count/2)
{
break;
}
}
int x_median = i;
//similarly with y coordinate.
int array[size] = {0} ; //initialize all the elements in the array to zero
for(i=0;i<k;i++)
{
array[c[i].y] = +1;
count++;
}
int tempCount =0;
for(i=0;i<k;i++)
{
if(array[i]!=0)
{
tempCount += array[i];
}
if(tempCount >= count/2)
{
break;
}
}
int y_median = i;
coord_t temp;
temp.x = x_median;
temp.y= y_median;
return temp;
Sample Working code for MxM matrix with k points:
*Problem
Given a MxM grid . and N people placed in random position on the grid. Find the optimal meeting point of all the people.
/
/
Answer:
Find the median of all the x coordiates of the positions of the people.
Find the median of all the y coordinates of the positions of the people.
*/
#include<stdio.h>
#include<stdlib.h>
typedef struct coord_struct {
int x;
int y;
}coord_struct;
typedef struct distance {
int count;
}distance;
coord_struct toFindTheOptimalDistance (int N, int M, coord_struct input[])
{
coord_struct z ;
z.x=0;
z.y=0;
int i,j;
distance * array_dist;
array_dist = (distance*)(malloc(sizeof(distance)*M));
for(i=0;i<M;i++)
{
array_dist[i].count =0;
}
for(i=0;i<N;i++)
{
array_dist[input[i].x].count +=1;
printf("%d and %d\n",input[i].x,array_dist[input[i].x].count);
}
j=0;
for(i=0;i<=N/2;)
{
printf("%d\n",i);
if(array_dist[j].count !=0)
i+=array_dist[j].count;
j++;
}
printf("x coordinate = %d",j-1);
int x= j-1;
for(i=0;i<M;i++)
array_dist[i].count =0;
for(i=0;i<N;i++)
{
array_dist[input[i].y].count +=1;
}
j=0;
for(i=0;i<N/2;)
{
if(array_dist[j].count !=0)
i+=array_dist[j].count;
j++;
}
int y =j-1;
printf("y coordinate = %d",j-1);
z.x=x;
z.y =y;
return z;
}
int main()
{
coord_struct input[5];
input[0].x =1;
input[0].y =2;
input[1].x =1;
input[1].y =2;
input[2].x =4;
input[2].y =1;
input[3].x = 5;
input[3].y = 2;
input[4].x = 5;
input[4].y = 2;
int size = m>n?m:n;
coord_struct x = toFindTheOptimalDistance(5,size,input);
}
Your algorithm is fine, and divide the problem into two one-dimensional problem. And the time complexity is O(nlogn).
You only need to divide every groups of people into n single people, so every move to left, right, up or down will be 1 for each people. We only need to find where's the (n + 1) / 2th people stand for row and column respectively.
Consider your sample. {(-100,0,100),(100,0,100),(0,1,1)}.
Let's take the line numbers out. It's {(-100,100),(100,100),(0,1)}, and that means 100 people stand at -100, 100 people stand at 100, and 1 people stand at 0.
Sort it by x, and it's {(-100,100),(0,1),(100,100)}. There is 201 people in total, so we only need to set the location at where the 101th people stands. It's 0, and that's for the answer.
The column number is with the same algorithm. {(0,100),(0,100),(1,1)}, and it's sorted. The 101th people is at 0, so the answer for column is also 0.
The answer is (0,0).
I can think of O(n) solution for one dimensional problem, which in turn means you can solve original problem in O(n+m+G).
Suppose, people are standing like this, a_0, a_1, ... a_n-1: a_0 people at spot 0, a_1 at spot 1. Then the solution in pseudocode is
cur_result = sum(i*a_i, i = 1..n-1)
cur_r = sum(a_i, i = 1..n-1)
cur_l = a_0
for i = 1:n-1
cur_result = cur_result - cur_r + cur_l
cur_r = cur_r - a_i
cur_l = cur_l + a_i
end
You need to find point, where cur_result is minimal.
So you need O(n) + O(m) for solving 1d problems + O(G) to build them, meaning total complexity is O(n+m+G).
Alternatively you solve 1d in O(G*log G) (or O(G) if data is sorted) using the same idea. Choose the one from expected number of groups.
you can solve this in O(G Log G) time by reducing it to, two one dimensional problems as you mentioned.
And as to how to solve it in one dimension, just sort them and go through them one by one and calculate cost moving to that point. This calculation can be done in O(1) time for each point.
You can also avoid Log(G) component if your x and y coordinates are small enough for you to use bucket/radix sort.
Inspired by kilotaras's idea. It seems that there is a O(G) solution for this problem.
Since everyone agree with the two dimensional problem can be reduced to two one dimensional problem. I will not repeat it again. I just focus on how to solve the one dimensional problem
with O(G).
Suppose, people are standing like this, a[0], a[1], ... a[n-1]. There is a[i] people standing at spot i. There are G spots having people(G <= n). Assuming these G spots are g[1], g[2], ..., g[G], where gi is in [0,...,n-1]. Without losing generality, we can also assume that g[1] < g[2] < ... < g[G].
It's not hard to prove that the optimal spot must come from these G spots. I will pass the
prove here and left it as an exercise if you guys have interest.
Since the above observation, we can just compute the cost of moving to the spot of every group and then chose the minimal one. There is an obvious O(G^2) algorithm to do this.
But using kilotaras's idea, we can do it in O(G)(no sorting).
cost[1] = sum((g[i]-g[1])*a[g[i]], i = 2,...,G) // the cost of moving to the
spot of first group. This step is O(G).
cur_r = sum(a[g[i]], i = 2,...,G) //How many people is on the right side of the
second group including the second group. This step is O(G).
cur_l = a[g[1]] //How many people is on the left side of the second group not
including the second group.
for i = 2:G
gap = g[i] - g[i-1];
cost[i] = cost[i-1] - cur_r*gap + cur_l*gap;
if i != G
cur_r = cur_r - a[g[i]];
cur_l = cur_l + a[g[i]];
end
end
The minimal of cost[i] is the answer.
Using the example 5 1 0 3 to illustrate the algorithm.
In this example,
n = 4, G = 3.
g[1] = 0, g[2] = 1, g[3] = 3.
a[0] = 5, a[1] = 1, a[2] = 0, a[3] = 3.
(1) cost[1] = 1*1+3*3 = 10, cur_r = 4, cur_l = 5.
(2) cost[2] = 10 - 4*1 + 5*1 = 11, gap = g[2] - g[1] = 1, cur_r = 4 - a[g[2]] = 3, cur_l = 6.
(3) cost[3] = 11 - 3*2 + 6*2 = 17, gap = g[3] - g[2] = 2.

Fewest number of turns heuristic

Is there anyway to ensure the that the fewest number of turns heuristic is met by anything except a breadth first search? Perhaps some more explanation would help.
I have a random graph, much like this:
0 1 1 1 2
3 4 5 6 7
9 a 5 b c
9 d e f f
9 9 g h i
Starting in the top left corner, I need to know the fewest number of steps it would take to get to the bottom right corner. Each set of connected colors is assumed to be a single node, so for instance in this random graph, the three 1's on the top row are all considered a single node, and every adjacent (not diagonal) connected node is a possible next state. So from the start, possible next states are the 1's in the top row or 3 in the second row.
Currently I use a bidirectional search, but the explosiveness of the tree size ramps up pretty quickly. For the life of me, I haven't been able to adjust the problem so that I can safely assign weights to the nodes and have them ensure the fewest number of state changes to reach the goal without it turning into a breadth first search. Thinking of this as a city map, the heuristic would be the fewest number of turns to reach the goal.
It is very important that the fewest number of turns is the result of this search as that value is part of the heuristic for a more complex problem.
You said yourself each group of numbers represents one node, and each node is connected to adjascent nodes. Then this is a simple shortest-path problem, and you could use (for instance) Dijkstra's algorithm, with each edge having weight 1 (for 1 turn).
This sounds like Dijkstra's algorithm. The hardest part would lay in properly setting up the graph (keeping track of which node gets which children), but if you can devote some CPU cycles to that, you'd be fine afterwards.
Why don't you want a breadth-first search?
Here.. I was bored :-) This is in Ruby but may get you started. Mind you, it is not tested.
class Node
attr_accessor :parents, :children, :value
def initialize args={}
#parents = args[:parents] || []
#children = args[:children] || []
#value = args[:value]
end
def add_parents *args
args.flatten.each do |node|
#parents << node
node.add_children self unless node.children.include? self
end
end
def add_children *args
args.flatten.each do |node|
#children << node
node.add_parents self unless node.parents.include? self
end
end
end
class Graph
attr_accessor :graph, :root
def initialize args={}
#graph = args[:graph]
#root = Node.new
prepare_graph
#root = #graph[0][0]
end
private
def prepare_graph
# We will iterate through the graph, and only check the values above and to the
# left of the current cell.
#graph.each_with_index do |row, i|
row.each_with_index do |cell, j|
cell = Node.new :value => cell #in-place modification!
# Check above
unless i.zero?
above = #graph[i-1][j]
if above.value == cell.value
# Here it is safe to do this: the new node has no children, no parents.
cell = above
else
cell.add_parents above
above.add_children cell # Redundant given the code for both of those
# methods, but implementations may differ.
end
end
# Check to the left!
unless j.zero?
left = #graph[i][j-1]
if left.value == cell.value
# Well, potentially it's the same as the one above the current cell,
# so we can't just set one equal to the other: have to merge them.
left.add_parents cell.parents
left.add_children cell.children
cell = left
else
cell.add_parents left
left.add_children cell
end
end
end
end
end
end
#j = 0, 1, 2, 3, 4
graph = [
[3, 4, 4, 4, 2], # i = 0
[8, 3, 1, 0, 8], # i = 1
[9, 0, 1, 2, 4], # i = 2
[9, 8, 0, 3, 3], # i = 3
[9, 9, 7, 2, 5]] # i = 4
maze = Graph.new :graph => graph
# Now, going from maze.root on, we have a weighted graph, should it matter.
# If it doesn't matter, you can just count the number of steps.
# Dijkstra's algorithm is really simple to find in the wild.
This looks like same problem as this projeceuler http://projecteuler.net/index.php?section=problems&id=81
Comlexity of solution is O(n) n-> number of nodes
What you need is memoization.
At each step you can get from max 2 directions. So pick the solution that is cheaper.
It is something like (just add the code that takes 0 if on boarder)
for i in row:
for j in column:
matrix[i][j]=min([matrix[i-1][j],matrix[i][j-1]])+matrix[i][j]
And now you have lest expensive solution if you move just left or down
Solution is in matrix[MAX_i][MAX_j]
If you can go left and up too, than the BigO is much higher (I can figure out optimal solution)
In order for A* to always find the shortest path, your heuristic needs to always under-estimate the actual cost (the heuristic is "admissable"). Simple heuristics like using the Euclidean or Manhattan distance on a grid work well because they're fast to compute and are guaranteed to be less than or equal to the actual cost.
Unfortunately, in your case, unless you can make some simplifying assumptions about the size/shape of the nodes, I'm not sure there's much you can do. For example, consider going from A to B in this case:
B 1 2 3 A
C 4 5 6 D
C 7 8 9 C
C e f g C
C C C C C
The shortest path would be A -> D -> C -> B, but using spatial information would probably give 3 a lower heuristic cost than D.
Depending on your circumstances, you might be able to live with a solution that isn't actually the shortest path, as long as you can get the answer sooner. There's a nice blogpost here by Christer Ericson (progammer for God of War 3 on PS3) on the topic: http://realtimecollisiondetection.net/blog/?p=56
Here's my idea for an nonadmissable heuristic: from the point, move horizontally until you're even with the goal, then move vertically until you reach it, and count the number of state changes that you made. You can compute other test paths (e.g. vertically then horizontally) too, and pick the minimum value as your final heuristic. If your nodes are roughly equal size and regularly shaped (unlike my example), this might do pretty well. The more test paths you do, the more accurate you'd get, but the slower it would be.
Hope that's helpful, let me know if any of it doesn't make sense.
This untuned C implementation of breadth-first search can chew through a 100-by-100 grid in less than 1 msec. You can probably do better.
int shortest_path(int *grid, int w, int h) {
int mark[w * h]; // for each square in the grid:
// 0 if not visited
// 1 if not visited and slated to be visited "now"
// 2 if already visited
int todo1[4 * w * h]; // buffers for two queues, a "now" queue
int todo2[4 * w * h]; // and a "later" queue
int *readp; // read position in the "now" queue
int *writep[2] = {todo1 + 1, 0};
int x, y, same;
todo1[0] = 0;
memset(mark, 0, sizeof(mark));
for (int d = 0; ; d++) {
readp = (d & 1) ? todo2 : todo1; // start of "now" queue
writep[1] = writep[0]; // end of "now" queue
writep[0] = (d & 1) ? todo1 : todo2; // "later" queue (empty)
// Now consume the "now" queue, filling both the "now" queue
// and the "later" queue as we go. Points in the "now" queue
// have distance d from the starting square. Points in the
// "later" queue have distance d+1.
while (readp < writep[1]) {
int p = *readp++;
if (mark[p] < 2) {
mark[p] = 2;
x = p % w;
y = p / w;
if (x > 0 && !mark[p-1]) { // go left
mark[p-1] = same = (grid[p-1] == grid[p]);
*writep[same]++ = p-1;
}
if (x + 1 < w && !mark[p+1]) { // go right
mark[p+1] = same = (grid[p+1] == grid[p]);
if (y == h - 1 && x == w - 2)
return d + !same;
*writep[same]++ = p+1;
}
if (y > 0 && !mark[p-w]) { // go up
mark[p-w] = same = (grid[p-w] == grid[p]);
*writep[same]++ = p-w;
}
if (y + 1 < h && !mark[p+w]) { // go down
mark[p+w] = same = (grid[p+w] == grid[p]);
if (y == h - 2 && x == w - 1)
return d + !same;
*writep[same]++ = p+w;
}
}
}
}
}
This paper has a slightly faster version of Dijsktra's algorithm, which lowers the constant term. Still O(n) though, since you are really going to have to look at every node.
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.54.8746&rep=rep1&type=pdf
EDIT: THE PREVIOUS VERSION WAS WRONG AND WAS FIXED
Since a Djikstra is out. I'll recommend a simple DP, which has the benefit of running in the optimal time and not having you construct a graph.
D[a][b] is the minimal distance to x=a and y=b using only nodes where the x<=a and y<=b.
And since you can't move diagonally you only have to look at D[a-1][b] and D[a][b-1] when calculating D[a][b]
This gives you the following recurrence relationship:
D[a][b] = min(if grid[a][b] == grid[a-1][b] then D[a-1][b] else D[a-1][b] + 1, if grid[a][b] == grid[a][b-1] then D[a][b-1] else D[a][b-1] + 1)
However doing only the above fails on this case:
0 1 2 3 4
5 6 7 8 9
A b d e g
A f r t s
A z A A A
A A A f d
Therefore you need to cache the minimum of each group of node you found so far. And instead of looking at D[a][b] you look at the minimum of the group at grid[a][b].
Here's some Python code:
Note grid is the grid that you're given as input and it's assumed the grid is N by N
groupmin = {}
for x in xrange(0, N):
for y in xrange(0, N):
groupmin[grid[x][y]] = N+1#N+1 serves as 'infinity'
#init first row and column
groupmin[grid[0][0]] = 0
for x in xrange(1, N):
gm = groupmin[grid[x-1][0]]
temp = (gm) if grid[x][0] == grid[x-1][0] else (gm + 1)
groupmin[grid[x][0]] = min(groupmin[grid[x][0]], temp);
for y in xrange(1, N):
gm = groupmin[grid[0][y-1]]
temp = (gm) if grid[0][y] == grid[0][y-1] else (gm + 1)
groupmin[grid[0][y]] = min(groupmin[grid[0][y]], temp);
#do the rest of the blocks
for x in xrange(1, N):
for y in xrange(1, N):
gma = groupmin[grid[x-1][y]]
gmb = groupmin[grid[x][y-1]]
a = (gma) if grid[x][y] == grid[x-1][y] else (gma + 1)
b = (gmb) if grid[x][y] == grid[x][y-1] else (gma + 1)
temp = min(a, b)
groupmin[grid[x][y]] = min(groupmin[grid[x][y]], temp);
ans = groupmin[grid[N-1][N-1]]
This will run in O(N^2 * f(x)) where f(x) is the time the hash function takes which is normally O(1) time and this is one of the best functions you can hope for and it has a lot lower constant factor than Djikstra's.
You should easily be able to handle N's of up to a few thousand in a second.
Is there anyway to ensure the that the fewest number of turns heuristic is met by anything except a breadth first search?
A faster way, or a simpler way? :)
You can breadth-first search from both ends, alternating, until the two regions meet in the middle. This will be much faster if the graph has a lot of fanout, like a city map, but the worst case is the same. It really depends on the graph.
This is my implementation using a simple BFS. A Dijkstra would also work (substitute a stl::priority_queue that sorts by descending costs for the stl::queue) but would seriously be overkill.
The thing to notice here is that we are actually searching on a graph whose nodes do not exactly correspond to the cells in the given array. To get to that graph, I used a simple DFS-based floodfill (you could also use BFS, but DFS is slightly shorter for me). What that does is to find all connected and same character components and assign them to the same colour/node. Thus, after the floodfill we can find out what node each cell belongs to in the underlying graph by looking at the value of colour[row][col]. Then I just iterate over the cells and find out all the cells where adjacent cells do not have the same colour (i.e. are in different nodes). These therefore are the edges of our graph. I maintain a stl::set of edges as I iterate over the cells to eliminate duplicate edges. After that it is a simple matter of building an adjacency list from the list of edges and we are ready for a bfs.
Code (in C++):
#include <queue>
#include <vector>
#include <iostream>
#include <string>
#include <set>
#include <cstring>
using namespace std;
#define SIZE 1001
vector<string> board;
int colour[SIZE][SIZE];
int dr[]={0,1,0,-1};
int dc[]={1,0,-1,0};
int min(int x,int y){ return (x<y)?x:y;}
int max(int x,int y){ return (x>y)?x:y;}
void dfs(int r, int c, int col, vector<string> &b){
if (colour[r][c]<0){
colour[r][c]=col;
for(int i=0;i<4;i++){
int nr=r+dr[i],nc=c+dc[i];
if (nr>=0 && nr<b.size() && nc>=0 && nc<b[0].size() && b[nr][nc]==b[r][c])
dfs(nr,nc,col,b);
}
}
}
int flood_fill(vector<string> &b){
memset(colour,-1,sizeof(colour));
int current_node=0;
for(int i=0;i<b.size();i++){
for(int j=0;j<b[0].size();j++){
if (colour[i][j]<0){
dfs(i,j,current_node,b);
current_node++;
}
}
}
return current_node;
}
vector<vector<int> > build_graph(vector<string> &b){
int total_nodes=flood_fill(b);
set<pair<int,int> > edge_list;
for(int r=0;r<b.size();r++){
for(int c=0;c<b[0].size();c++){
for(int i=0;i<4;i++){
int nr=r+dr[i],nc=c+dc[i];
if (nr>=0 && nr<b.size() && nc>=0 && nc<b[0].size() && colour[nr][nc]!=colour[r][c]){
int u=colour[r][c], v=colour[nr][nc];
if (u!=v) edge_list.insert(make_pair(min(u,v),max(u,v)));
}
}
}
}
vector<vector<int> > graph(total_nodes);
for(set<pair<int,int> >::iterator edge=edge_list.begin();edge!=edge_list.end();edge++){
int u=edge->first,v=edge->second;
graph[u].push_back(v);
graph[v].push_back(u);
}
return graph;
}
int bfs(vector<vector<int> > &G, int start, int end){
vector<int> cost(G.size(),-1);
queue<int> Q;
Q.push(start);
cost[start]=0;
while (!Q.empty()){
int node=Q.front();Q.pop();
vector<int> &adj=G[node];
for(int i=0;i<adj.size();i++){
if (cost[adj[i]]==-1){
cost[adj[i]]=cost[node]+1;
Q.push(adj[i]);
}
}
}
return cost[end];
}
int main(){
string line;
int rows,cols;
cin>>rows>>cols;
for(int r=0;r<rows;r++){
line="";
char ch;
for(int c=0;c<cols;c++){
cin>>ch;
line+=ch;
}
board.push_back(line);
}
vector<vector<int> > actual_graph=build_graph(board);
cout<<bfs(actual_graph,colour[0][0],colour[rows-1][cols-1])<<"\n";
}
This is just a quick hack, lots of improvements can be made. But I think it is pretty close to optimal in terms of runtime complexity, and should run fast enough for boards of size of several thousand (don't forget to change the #define of SIZE). Also, I only tested it with the one case you have provided. So, as Knuth said, "Beware of bugs in the above code; I have only proved it correct, not tried it." :).

Resources