Fewest number of turns heuristic - algorithm

Is there anyway to ensure the that the fewest number of turns heuristic is met by anything except a breadth first search? Perhaps some more explanation would help.
I have a random graph, much like this:
0 1 1 1 2
3 4 5 6 7
9 a 5 b c
9 d e f f
9 9 g h i
Starting in the top left corner, I need to know the fewest number of steps it would take to get to the bottom right corner. Each set of connected colors is assumed to be a single node, so for instance in this random graph, the three 1's on the top row are all considered a single node, and every adjacent (not diagonal) connected node is a possible next state. So from the start, possible next states are the 1's in the top row or 3 in the second row.
Currently I use a bidirectional search, but the explosiveness of the tree size ramps up pretty quickly. For the life of me, I haven't been able to adjust the problem so that I can safely assign weights to the nodes and have them ensure the fewest number of state changes to reach the goal without it turning into a breadth first search. Thinking of this as a city map, the heuristic would be the fewest number of turns to reach the goal.
It is very important that the fewest number of turns is the result of this search as that value is part of the heuristic for a more complex problem.

You said yourself each group of numbers represents one node, and each node is connected to adjascent nodes. Then this is a simple shortest-path problem, and you could use (for instance) Dijkstra's algorithm, with each edge having weight 1 (for 1 turn).

This sounds like Dijkstra's algorithm. The hardest part would lay in properly setting up the graph (keeping track of which node gets which children), but if you can devote some CPU cycles to that, you'd be fine afterwards.
Why don't you want a breadth-first search?
Here.. I was bored :-) This is in Ruby but may get you started. Mind you, it is not tested.
class Node
attr_accessor :parents, :children, :value
def initialize args={}
#parents = args[:parents] || []
#children = args[:children] || []
#value = args[:value]
end
def add_parents *args
args.flatten.each do |node|
#parents << node
node.add_children self unless node.children.include? self
end
end
def add_children *args
args.flatten.each do |node|
#children << node
node.add_parents self unless node.parents.include? self
end
end
end
class Graph
attr_accessor :graph, :root
def initialize args={}
#graph = args[:graph]
#root = Node.new
prepare_graph
#root = #graph[0][0]
end
private
def prepare_graph
# We will iterate through the graph, and only check the values above and to the
# left of the current cell.
#graph.each_with_index do |row, i|
row.each_with_index do |cell, j|
cell = Node.new :value => cell #in-place modification!
# Check above
unless i.zero?
above = #graph[i-1][j]
if above.value == cell.value
# Here it is safe to do this: the new node has no children, no parents.
cell = above
else
cell.add_parents above
above.add_children cell # Redundant given the code for both of those
# methods, but implementations may differ.
end
end
# Check to the left!
unless j.zero?
left = #graph[i][j-1]
if left.value == cell.value
# Well, potentially it's the same as the one above the current cell,
# so we can't just set one equal to the other: have to merge them.
left.add_parents cell.parents
left.add_children cell.children
cell = left
else
cell.add_parents left
left.add_children cell
end
end
end
end
end
end
#j = 0, 1, 2, 3, 4
graph = [
[3, 4, 4, 4, 2], # i = 0
[8, 3, 1, 0, 8], # i = 1
[9, 0, 1, 2, 4], # i = 2
[9, 8, 0, 3, 3], # i = 3
[9, 9, 7, 2, 5]] # i = 4
maze = Graph.new :graph => graph
# Now, going from maze.root on, we have a weighted graph, should it matter.
# If it doesn't matter, you can just count the number of steps.
# Dijkstra's algorithm is really simple to find in the wild.

This looks like same problem as this projeceuler http://projecteuler.net/index.php?section=problems&id=81
Comlexity of solution is O(n) n-> number of nodes
What you need is memoization.
At each step you can get from max 2 directions. So pick the solution that is cheaper.
It is something like (just add the code that takes 0 if on boarder)
for i in row:
for j in column:
matrix[i][j]=min([matrix[i-1][j],matrix[i][j-1]])+matrix[i][j]
And now you have lest expensive solution if you move just left or down
Solution is in matrix[MAX_i][MAX_j]
If you can go left and up too, than the BigO is much higher (I can figure out optimal solution)

In order for A* to always find the shortest path, your heuristic needs to always under-estimate the actual cost (the heuristic is "admissable"). Simple heuristics like using the Euclidean or Manhattan distance on a grid work well because they're fast to compute and are guaranteed to be less than or equal to the actual cost.
Unfortunately, in your case, unless you can make some simplifying assumptions about the size/shape of the nodes, I'm not sure there's much you can do. For example, consider going from A to B in this case:
B 1 2 3 A
C 4 5 6 D
C 7 8 9 C
C e f g C
C C C C C
The shortest path would be A -> D -> C -> B, but using spatial information would probably give 3 a lower heuristic cost than D.
Depending on your circumstances, you might be able to live with a solution that isn't actually the shortest path, as long as you can get the answer sooner. There's a nice blogpost here by Christer Ericson (progammer for God of War 3 on PS3) on the topic: http://realtimecollisiondetection.net/blog/?p=56
Here's my idea for an nonadmissable heuristic: from the point, move horizontally until you're even with the goal, then move vertically until you reach it, and count the number of state changes that you made. You can compute other test paths (e.g. vertically then horizontally) too, and pick the minimum value as your final heuristic. If your nodes are roughly equal size and regularly shaped (unlike my example), this might do pretty well. The more test paths you do, the more accurate you'd get, but the slower it would be.
Hope that's helpful, let me know if any of it doesn't make sense.

This untuned C implementation of breadth-first search can chew through a 100-by-100 grid in less than 1 msec. You can probably do better.
int shortest_path(int *grid, int w, int h) {
int mark[w * h]; // for each square in the grid:
// 0 if not visited
// 1 if not visited and slated to be visited "now"
// 2 if already visited
int todo1[4 * w * h]; // buffers for two queues, a "now" queue
int todo2[4 * w * h]; // and a "later" queue
int *readp; // read position in the "now" queue
int *writep[2] = {todo1 + 1, 0};
int x, y, same;
todo1[0] = 0;
memset(mark, 0, sizeof(mark));
for (int d = 0; ; d++) {
readp = (d & 1) ? todo2 : todo1; // start of "now" queue
writep[1] = writep[0]; // end of "now" queue
writep[0] = (d & 1) ? todo1 : todo2; // "later" queue (empty)
// Now consume the "now" queue, filling both the "now" queue
// and the "later" queue as we go. Points in the "now" queue
// have distance d from the starting square. Points in the
// "later" queue have distance d+1.
while (readp < writep[1]) {
int p = *readp++;
if (mark[p] < 2) {
mark[p] = 2;
x = p % w;
y = p / w;
if (x > 0 && !mark[p-1]) { // go left
mark[p-1] = same = (grid[p-1] == grid[p]);
*writep[same]++ = p-1;
}
if (x + 1 < w && !mark[p+1]) { // go right
mark[p+1] = same = (grid[p+1] == grid[p]);
if (y == h - 1 && x == w - 2)
return d + !same;
*writep[same]++ = p+1;
}
if (y > 0 && !mark[p-w]) { // go up
mark[p-w] = same = (grid[p-w] == grid[p]);
*writep[same]++ = p-w;
}
if (y + 1 < h && !mark[p+w]) { // go down
mark[p+w] = same = (grid[p+w] == grid[p]);
if (y == h - 2 && x == w - 1)
return d + !same;
*writep[same]++ = p+w;
}
}
}
}
}

This paper has a slightly faster version of Dijsktra's algorithm, which lowers the constant term. Still O(n) though, since you are really going to have to look at every node.
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.54.8746&rep=rep1&type=pdf

EDIT: THE PREVIOUS VERSION WAS WRONG AND WAS FIXED
Since a Djikstra is out. I'll recommend a simple DP, which has the benefit of running in the optimal time and not having you construct a graph.
D[a][b] is the minimal distance to x=a and y=b using only nodes where the x<=a and y<=b.
And since you can't move diagonally you only have to look at D[a-1][b] and D[a][b-1] when calculating D[a][b]
This gives you the following recurrence relationship:
D[a][b] = min(if grid[a][b] == grid[a-1][b] then D[a-1][b] else D[a-1][b] + 1, if grid[a][b] == grid[a][b-1] then D[a][b-1] else D[a][b-1] + 1)
However doing only the above fails on this case:
0 1 2 3 4
5 6 7 8 9
A b d e g
A f r t s
A z A A A
A A A f d
Therefore you need to cache the minimum of each group of node you found so far. And instead of looking at D[a][b] you look at the minimum of the group at grid[a][b].
Here's some Python code:
Note grid is the grid that you're given as input and it's assumed the grid is N by N
groupmin = {}
for x in xrange(0, N):
for y in xrange(0, N):
groupmin[grid[x][y]] = N+1#N+1 serves as 'infinity'
#init first row and column
groupmin[grid[0][0]] = 0
for x in xrange(1, N):
gm = groupmin[grid[x-1][0]]
temp = (gm) if grid[x][0] == grid[x-1][0] else (gm + 1)
groupmin[grid[x][0]] = min(groupmin[grid[x][0]], temp);
for y in xrange(1, N):
gm = groupmin[grid[0][y-1]]
temp = (gm) if grid[0][y] == grid[0][y-1] else (gm + 1)
groupmin[grid[0][y]] = min(groupmin[grid[0][y]], temp);
#do the rest of the blocks
for x in xrange(1, N):
for y in xrange(1, N):
gma = groupmin[grid[x-1][y]]
gmb = groupmin[grid[x][y-1]]
a = (gma) if grid[x][y] == grid[x-1][y] else (gma + 1)
b = (gmb) if grid[x][y] == grid[x][y-1] else (gma + 1)
temp = min(a, b)
groupmin[grid[x][y]] = min(groupmin[grid[x][y]], temp);
ans = groupmin[grid[N-1][N-1]]
This will run in O(N^2 * f(x)) where f(x) is the time the hash function takes which is normally O(1) time and this is one of the best functions you can hope for and it has a lot lower constant factor than Djikstra's.
You should easily be able to handle N's of up to a few thousand in a second.

Is there anyway to ensure the that the fewest number of turns heuristic is met by anything except a breadth first search?
A faster way, or a simpler way? :)
You can breadth-first search from both ends, alternating, until the two regions meet in the middle. This will be much faster if the graph has a lot of fanout, like a city map, but the worst case is the same. It really depends on the graph.

This is my implementation using a simple BFS. A Dijkstra would also work (substitute a stl::priority_queue that sorts by descending costs for the stl::queue) but would seriously be overkill.
The thing to notice here is that we are actually searching on a graph whose nodes do not exactly correspond to the cells in the given array. To get to that graph, I used a simple DFS-based floodfill (you could also use BFS, but DFS is slightly shorter for me). What that does is to find all connected and same character components and assign them to the same colour/node. Thus, after the floodfill we can find out what node each cell belongs to in the underlying graph by looking at the value of colour[row][col]. Then I just iterate over the cells and find out all the cells where adjacent cells do not have the same colour (i.e. are in different nodes). These therefore are the edges of our graph. I maintain a stl::set of edges as I iterate over the cells to eliminate duplicate edges. After that it is a simple matter of building an adjacency list from the list of edges and we are ready for a bfs.
Code (in C++):
#include <queue>
#include <vector>
#include <iostream>
#include <string>
#include <set>
#include <cstring>
using namespace std;
#define SIZE 1001
vector<string> board;
int colour[SIZE][SIZE];
int dr[]={0,1,0,-1};
int dc[]={1,0,-1,0};
int min(int x,int y){ return (x<y)?x:y;}
int max(int x,int y){ return (x>y)?x:y;}
void dfs(int r, int c, int col, vector<string> &b){
if (colour[r][c]<0){
colour[r][c]=col;
for(int i=0;i<4;i++){
int nr=r+dr[i],nc=c+dc[i];
if (nr>=0 && nr<b.size() && nc>=0 && nc<b[0].size() && b[nr][nc]==b[r][c])
dfs(nr,nc,col,b);
}
}
}
int flood_fill(vector<string> &b){
memset(colour,-1,sizeof(colour));
int current_node=0;
for(int i=0;i<b.size();i++){
for(int j=0;j<b[0].size();j++){
if (colour[i][j]<0){
dfs(i,j,current_node,b);
current_node++;
}
}
}
return current_node;
}
vector<vector<int> > build_graph(vector<string> &b){
int total_nodes=flood_fill(b);
set<pair<int,int> > edge_list;
for(int r=0;r<b.size();r++){
for(int c=0;c<b[0].size();c++){
for(int i=0;i<4;i++){
int nr=r+dr[i],nc=c+dc[i];
if (nr>=0 && nr<b.size() && nc>=0 && nc<b[0].size() && colour[nr][nc]!=colour[r][c]){
int u=colour[r][c], v=colour[nr][nc];
if (u!=v) edge_list.insert(make_pair(min(u,v),max(u,v)));
}
}
}
}
vector<vector<int> > graph(total_nodes);
for(set<pair<int,int> >::iterator edge=edge_list.begin();edge!=edge_list.end();edge++){
int u=edge->first,v=edge->second;
graph[u].push_back(v);
graph[v].push_back(u);
}
return graph;
}
int bfs(vector<vector<int> > &G, int start, int end){
vector<int> cost(G.size(),-1);
queue<int> Q;
Q.push(start);
cost[start]=0;
while (!Q.empty()){
int node=Q.front();Q.pop();
vector<int> &adj=G[node];
for(int i=0;i<adj.size();i++){
if (cost[adj[i]]==-1){
cost[adj[i]]=cost[node]+1;
Q.push(adj[i]);
}
}
}
return cost[end];
}
int main(){
string line;
int rows,cols;
cin>>rows>>cols;
for(int r=0;r<rows;r++){
line="";
char ch;
for(int c=0;c<cols;c++){
cin>>ch;
line+=ch;
}
board.push_back(line);
}
vector<vector<int> > actual_graph=build_graph(board);
cout<<bfs(actual_graph,colour[0][0],colour[rows-1][cols-1])<<"\n";
}
This is just a quick hack, lots of improvements can be made. But I think it is pretty close to optimal in terms of runtime complexity, and should run fast enough for boards of size of several thousand (don't forget to change the #define of SIZE). Also, I only tested it with the one case you have provided. So, as Knuth said, "Beware of bugs in the above code; I have only proved it correct, not tried it." :).

Related

Minimum jumps to reach end when some points are blocked

A person is currently at (0,0) and wants to reach (X,0) and he has to jump a few steps to reach his house.From a point say (a,0), he can jump to either (a + k1,0) i.e forward of k1 steps or he can jump(a-k2,0) i.e backward jump of k2 steps. The first jump he takes must be forward.Also,he cannot jump backward twice consecutively.But he can jump any no of continuous forward jump.There are n points a1,a2 upto an where he cannot jump.
I have to determine minimum no of jumps to reach his house or to conclude that he cannot reach his house. If he can reach house print yes and specify no. of jumps If not print no.
Here
X = location of persons house.
N = no. of points where he cannot jump.
k1 = forward jump.
k2 = backward jump.
example
For inputs
X=6 N=2 k1=4 k2=2
Blocked points = 3 5
the answer is 3 (4 to 8 to 6 or 4 to 2 to 6)
For input
6 2 5 2
1 3
the person cannot reach his house
N can be upto 10^4 and X can be upto 10^5
I thought of using dynamic programming but i'm not able to implement it. Can anyone help?
I think your direction of using dynamic programming can work but I will show another way to solve the question with the same asymptotic time complexity as dynamic programming would achieve.
This question can be described as a problem in graphs where you have X nodes indexed 1 to X and these is an edge between every a and a + k1, b and b - k2, where you remove the nodes in N.
This will be enough if you can jump backward how many times you would like but you cannot jump twice in a row so you can add the following modification: Duplicate the graph's nodes, duplicate also the forward going edges but make them go from the duplicated to the original, now make all of the backward going edges to go to the duplicated graph. Now every backward edge will send you to the duplicated and you will not be able to take a backward edge again until you will go to the original using a forward going edge. This will make sure that after a backward edge you will always take a forward edge - so you will not be able to jump forward twice.
Now finding the shortest path from 1 to X is like finding smallest number of jumps since edge is a jump.
Finding the shortest path in directed unweighted graph takes O(|V|+|E|) time and memory (using BFS), your graph has 2 * X as |V| and also the number of edges will be 2 * 2 * X so time and memory complexity of O(X).
If you can jump backward twice you use the networkx library in python for a simple demo (you can also use if for complicated demo):
import matplotlib.pyplot as plt
import networkx as nx
X = 6
N = 2
k1 = 4
k2 = 2
nodes = [0, 1, 2, 4, 6]
G = nx.DiGraph()
G.add_nodes_from(nodes)
for n in nodes:
if n + k1 in nodes:
G.add_edge(n, n + k1)
if n - k2 in nodes:
G.add_edge(n, n - k2)
nx.draw(G, with_labels=True, font_weight='bold')
plt.plot()
plt.show()
path = nx.shortest_path(G, 0, X)
print(f"Number of jumps: {len(path) - 1}. path: {str(path)}")
Would a breadth-first search be efficient enough?
Something like this? (Python code)
from collections import deque
def f(x, k1, k2, blocked):
queue = deque([(k1, 0, None, None)])
while (queue):
(p, depth, direction, prev) = queue.popleft()
if p in blocked or (x + k2 < p < x - k1): # not sure about these boundaries ... ideas welcome
continue
if p == x:
return depth
blocked.add(p) # visited
queue.append((p + k1, depth + 1, "left", direction))
if prev != "right":
queue.append((p - k2, depth + 1, "right", direction))
X = 6
k1 = 4
k2 = 2
blocked = set([3, 5])
print f(X, k1, k2, blocked)
X = 2
k1 = 3
k2 = 4
blocked = set()
print f(X, k1, k2, blocked)
Here is the code of גלעדברקן in c++:
#include <iostream>
#include <queue>
using namespace std;
struct node {
int id;
int depth;
int direction; // 1 is left, 0 is right
};
int BFS(int start, int end, int k1, int k2, bool blocked[], int length)
{
queue<node> q;
blocked[0] = true;
q.push({start, 0, 0});
while(!q.empty())
{
node f = q.front();
q.pop();
if (f.id == end) {
return f.depth;
}
if(f.id + k1 < length and !blocked[f.id + k1])
{
blocked[f.id + k1] = true;
q.push({f.id + k1, f.depth + 1, 0});
}
if (f.direction != 1) { // If you just went left - don't go left again
if(f.id - k2 >= 0 and !blocked[f.id - k2])
{
blocked[f.id - k2] = true;
q.push({f.id - k2, f.depth + 1, 1});
}
}
}
return -1;
}
int main() {
bool blocked[] = {false, false, false, false, false, false, false};
std::cout << BFS(0, 6, 4, 2, blocked, 7) << std::endl;
return 0;
}
You can control on the length of the steps, the start and end, and the blocked nodes.

Lowest Common Ancestor

I am looking for constant time implementation of lowest common ancestor given two nodes in full binary tree( parent x than child 2*x and 2*x+1).
My problem is that there are large number of nodes in the tree and many queries. Is there a algorithm, which preprocesses so that queries can be answered in constant time.
I looked into LCA using RMQ, but I can't use that technique as I can't use array for this many nodes in the tree.
Can some one give me efficient implementation of the algorithm for answering many queries quickly, knowing it is full binary tree and the relation between nodes is as given above.
What I did was to start with two given nodes and successively find their parents ( node/2) keep hash list of visited nodes. when ever we reach a node that is already in hash list, than that node would be the lowest common ancestor.
But when there are many queries this algorithm is very time consuming, as in worst case I may have to traverse height of 30(max. height of tree) to reach root( worst case).
If you represent the two indices in binary, then the LCA can be found in two steps:
Shift right the larger number until the leading 1 bit is in the
same place as the other number.
Shift right both numbers until they are the same.
The first step can be done by getting log base 2 of the numbers and shifting the larger number right by the difference:
if a>b:
a = shift_right(a,log2(a)-log2(b))
else:
b = shift_right(b,log2(b)-log2(a))
The second step can be done by XORing the resulting two numbers and shifting right by the log base 2 of the result (plus 1):
if a==b:
return a
else:
return shift_right(a,log2(xor(a,b))+1)
Log base 2 can be found in O(log(word_size)) time, so as long as you are using integer indices with a fixed number of bits, this effectively constant.
See this question for information on fast ways to compute log base 2:
Fast computing of log2 for 64-bit integers
Edit :-
Faster way to get the common_ancestor in O(log(logn)) :-
int get_bits(unsigned int x) {
int high = 31;
int low = 0,mid;
while(high>=low) {
mid = (high+low)/2;
if(1<<mid==x)
return mid+1;
if(1<<mid<x) {
low = mid+1;
}
else {
high = mid-1;
}
}
if(1<<mid>x)
return mid;
return mid+1;
}
unsigned int Common_Ancestor(unsigned int x,unsigned int y) {
int xbits = get_bits(x);
int ybits = get_bits(y);
int diff,kbits;
unsigned int k;
if(xbits>ybits) {
diff = xbits-ybits;
x = x >> diff;
}
else if(xbits<ybits) {
diff = ybits-xbits;
y = y >> diff;
}
k = x^y;
kbits = get_bits(k);
return y>>kbits;
}
Explanation :-
get bits needed to represent x & y which using binary search is O(log(32))
the common prefix of binary notation of x & y is the common ancestor
whichever is represented by larger no of bits is brought to same bit by k >> diff
k = x^y erazes common prefix of x & y
find bits representing the remaining suffix
shift x or y by suffix bits to get common prefix which is the common ancestor.
Example :-
x = 12 = b1100
y = 8 = b1000
xbits = 4
ybits = 4
diff = 0
k = x^y = 4 = b0100
kbits = 3
res = x >> kbits = x >> 3 = 1
ans : 1

Obtain forest out of tree with even number of nodes

I'm stuck on a code challenge, and I want a hint.
PROBLEM: You are given a tree data structure (without cycles) and are asked to remove as many "edges" (connections) as possible, creating smaller trees with even numbers of nodes. This problem is always solvable as there are an even number of nodes and connections.
Your task is to count the removed edges.
Input:
The first line of input contains two integers N and M. N is the number of vertices and M is the number of edges. 2 <= N <= 100.
Next M lines contains two integers ui and vi which specifies an edge of the tree. (1-based index)
Output:
Print the number of edges removed.
Sample Input
10 9
2 1
3 1
4 3
5 2
6 1
7 2
8 6
9 8
10 8
Sample Output :
2
Explanation : On removing the edges (1, 3) and (1, 6), we can get the desired result.
I used BFS to travel through the nodes.
First, maintain an array separately to store the total number of child nodes + 1.
So, you can initially assign all the leaf nodes with value 1 in this array.
Now start from the last node and count the number of children for each node. This will work in bottom to top manner and the array that stores the number of child nodes will help in runtime to optimize the code.
Once you get the array after getting the number of children nodes for all the nodes, just counting the nodes with even number of nodes gives the answer. Note: I did not include root node in counting in final step.
This is my solution. I didn't use bfs tree, just allocated another array for holding eachnode's and their children nodes total number.
import java.util.Scanner;
import java.util.Arrays;
public class Solution {
public static void main(String[] args) {
int tree[];
int count[];
Scanner scan = new Scanner(System.in);
int N = scan.nextInt(); //points
int M = scan.nextInt();
tree = new int[N];
count = new int[N];
Arrays.fill(count, 1);
for(int i=0;i<M;i++)
{
int u1 = scan.nextInt();
int v1 = scan.nextInt();
tree[u1-1] = v1;
count[v1-1] += count[u1-1];
int root = tree[v1-1];
while(root!=0)
{
count[root-1] += count[u1-1];
root = tree[root-1];
}
}
System.out.println("");
int counter = -1;
for(int i=0;i<count.length;i++)
{
if(count[i]%2==0)
{
counter++;
}
}
System.out.println(counter);
}
}
If you observe the input, you can see that it is quite easy to count the number of nodes under each node. Consider (a b) as the edge input, in every case, a is the child and b is the immediate parent. The input always has edges represented bottom-up.
So its essentially the number of nodes which have an even count(Excluding the root node). I submitted the below code on Hackerrank and all the tests passed. I guess all the cases in the input satisfy the rule.
def find_edges(count):
root = max(count)
count_even = 0
for cnt in count:
if cnt % 2 == 0:
count_even += 1
if root % 2 == 0:
count_even -= 1
return count_even
def count_nodes(edge_list, n, m):
count = [1 for i in range(0, n)]
for i in range(m-1,-1,-1):
count[edge_list[i][1]-1] += count[edge_list[i][0]-1]
return find_edges(count)
I know that this has already been answered here lots and lots of time. I still want to know reviews on my solution here. I tried to construct the child count as the edges were coming through the input and it passed all the test cases.
namespace Hackerrank
{
using System;
using System.Collections.Generic;
using System.Linq;
class Program
{
static void Main(string[] args)
{
var tempArray = Console.ReadLine().Split(' ').Select(x => Convert.ToInt32(x)).ToList();
int verticeNumber = tempArray[0];
int edgeNumber = tempArray[1];
Dictionary<int, int> childCount = new Dictionary<int, int>();
Dictionary<int, int> parentDict = new Dictionary<int, int>();
for (int count = 0; count < edgeNumber; count++)
{
var nodes = Console.ReadLine().Split(' ').Select(x => Convert.ToInt32(x)).ToList();
var node1 = nodes[0];
var node2 = nodes[1];
if (childCount.ContainsKey(node2))
childCount[node2]++;
else childCount.Add(node2, 1);
var parent = node2;
while (parentDict.ContainsKey(parent))
{
var par = parentDict[parent];
childCount[par]++;
parent = par;
}
parentDict[node1] = node2;
}
Console.WriteLine(childCount.Count(x => x.Value % 2 == 1) - 1);
}
}
}
My first inclination is to work up from the leaf nodes because you cannot cut their edges as that would leave single-vertex subtrees.
Here's the approach that I used to successfully pass all the test cases.
Mark vertex 1 as the root
Starting at the current root vertex, consider each child. If the sum total of the child and all of its children are even, then you can cut that edge
Descend to the next vertex (child of root vertex) and let that be the new root vertex. Repeat step 2 until you have traversed all of the nodes (depth first search).
Here's the general outline of an alternative approach:
Find all of the articulation points in the graph.
Check each articulation point to see if edges can be removed there.
Remove legal edges and look for more articulation points.
Solution - Traverse all the edges, and count the number of even edges
If we remove an edge from the tree and it results in two tree with even number of vertices, let's call that edge - even edge
If we remove an edge from the tree and it results in two trees with odd
number of vertices, let's call that edge - odd edge
Here is my solution in Ruby
num_vertices, num_edges = gets.chomp.split(' ').map { |e| e.to_i }
graph = Graph.new
(1..num_vertices).to_a.each do |vertex|
graph.add_node_by_val(vertex)
end
num_edges.times do |edge|
first, second = gets.chomp.split(' ').map { |e| e.to_i }
graph.add_edge_by_val(first, second, 0, false)
end
even_edges = 0
graph.edges.each do |edge|
dup = graph.deep_dup
first_tree = nil
second_tree = nil
subject_edge = nil
dup.edges.each do |e|
if e.first.value == edge.first.value && e.second.value == edge.second.value
subject_edge = e
first_tree = e.first
second_tree = e.second
end
end
dup.remove_edge(subject_edge)
if first_tree.size.even? && second_tree.size.even?
even_edges += 1
end
end
puts even_edges
Note - Click Here to check out the code for Graph, Node and Edge classes

Find the a location in a matrix so that the cost of every one moving to that location is smallest

There is a matrix, m×n. Several groups of people locate at some certain spots. In the following example, there are three groups and the number 4 indicates there are four people in this group. Now we want to find a meeting point in the matrix so that the cost of all groups moving to that point is the minimum. As for how to compute the cost of moving one group to another point, please see the following example.
Group1: (0, 1), 4
Group2: (1, 3), 3
Group3: (2, 0), 5
. 4 . .
. . . 3
5 . . .
If all of these three groups moving to (1, 1), the cost is:
4*((1-0)+(1-1)) + 5*((2-1)+(1-0))+3*((1-1)+(3-1))
My idea is :
Firstly, this two dimensional problem can be reduced to two one dimensional problem.
In the one dimensional problem, I can prove that the best spot must be one of these groups.
In this way, I can give a O(G^2) algorithm.(G is the number of group).
Use iterator's example for illustration:
{(-100,0,100),(100,0,100),(0,1,1)},(x,y,population)
for x, {(-100,100),(100,100),(0,1)}, 0 is the best.
for y, {(0,100),(0,100),(1,1)}, 0 is the best.
So it's (0, 0)
Is there any better solution for this problem.
I like the idea of noticing that the objective function can be decomposed to give the sum of two one-dimensional problems. The remaining problems look a lot like the weighted median to me (note "solves the following optimization problem in "http://www.stat.ucl.ac.be/ISdidactique/Rhelp/library/R.basic/html/weighted.median.html" or consider what happens to the objective function as you move away from the weighted median).
The URL above seems to say the weighted median takes time n log n, which I guess means that you could attain their claim by sorting the data and then doing a linear pass to work out the weighted median. The numbers you have to sort are in the range [0, m] and [0, n] so you could in theory do better if m and n are small, or - of course - if you are given the data pre-sorted.
Come to think of it, I don't see why you shouldn't be able to find the weighted median with a linear time randomized algorithm similar to that used to find the median (http://en.wikibooks.org/wiki/Algorithms/Randomization#find-median) - repeatedly pick a random element, use it to partition the items remaining, and work out which half the weighted median should be in. That gives you expected linear time.
I think this can be solved in O(n>m?n:m) time and O(n>m?n:m) space.
We have to find the median of x coordinates and median of all y coordinates in the k points and the answer will be (x_median,y_median);
Assumption is this function takes in the following inputs:
total number of points :int k= 4+3+5 = 12;
An array of coordinates:
struct coord_t c[12] = {(0,1),(0,1),(0,1), (0,1), (1,3), (1,3),(1,3),(2,0),(2,0),(2,0),(2,0),(2,0)};
c.int size = n>m ? n:m;
Let the input of the coordinates be an array of coordinates. coord_t c[k]
struct coord_t {
int x;
int y;
};
1. My idea is to create an array of size = n>m?n:m;
2. int array[size] = {0} ; //initialize all the elements in the array to zero
for(i=0;i<k;i++)
{
array[c[i].x] = +1;
count++;
}
int tempCount =0;
for(i=0;i<k;i++)
{
if(array[i]!=0)
{
tempCount += array[i];
}
if(tempCount >= count/2)
{
break;
}
}
int x_median = i;
//similarly with y coordinate.
int array[size] = {0} ; //initialize all the elements in the array to zero
for(i=0;i<k;i++)
{
array[c[i].y] = +1;
count++;
}
int tempCount =0;
for(i=0;i<k;i++)
{
if(array[i]!=0)
{
tempCount += array[i];
}
if(tempCount >= count/2)
{
break;
}
}
int y_median = i;
coord_t temp;
temp.x = x_median;
temp.y= y_median;
return temp;
Sample Working code for MxM matrix with k points:
*Problem
Given a MxM grid . and N people placed in random position on the grid. Find the optimal meeting point of all the people.
/
/
Answer:
Find the median of all the x coordiates of the positions of the people.
Find the median of all the y coordinates of the positions of the people.
*/
#include<stdio.h>
#include<stdlib.h>
typedef struct coord_struct {
int x;
int y;
}coord_struct;
typedef struct distance {
int count;
}distance;
coord_struct toFindTheOptimalDistance (int N, int M, coord_struct input[])
{
coord_struct z ;
z.x=0;
z.y=0;
int i,j;
distance * array_dist;
array_dist = (distance*)(malloc(sizeof(distance)*M));
for(i=0;i<M;i++)
{
array_dist[i].count =0;
}
for(i=0;i<N;i++)
{
array_dist[input[i].x].count +=1;
printf("%d and %d\n",input[i].x,array_dist[input[i].x].count);
}
j=0;
for(i=0;i<=N/2;)
{
printf("%d\n",i);
if(array_dist[j].count !=0)
i+=array_dist[j].count;
j++;
}
printf("x coordinate = %d",j-1);
int x= j-1;
for(i=0;i<M;i++)
array_dist[i].count =0;
for(i=0;i<N;i++)
{
array_dist[input[i].y].count +=1;
}
j=0;
for(i=0;i<N/2;)
{
if(array_dist[j].count !=0)
i+=array_dist[j].count;
j++;
}
int y =j-1;
printf("y coordinate = %d",j-1);
z.x=x;
z.y =y;
return z;
}
int main()
{
coord_struct input[5];
input[0].x =1;
input[0].y =2;
input[1].x =1;
input[1].y =2;
input[2].x =4;
input[2].y =1;
input[3].x = 5;
input[3].y = 2;
input[4].x = 5;
input[4].y = 2;
int size = m>n?m:n;
coord_struct x = toFindTheOptimalDistance(5,size,input);
}
Your algorithm is fine, and divide the problem into two one-dimensional problem. And the time complexity is O(nlogn).
You only need to divide every groups of people into n single people, so every move to left, right, up or down will be 1 for each people. We only need to find where's the (n + 1) / 2th people stand for row and column respectively.
Consider your sample. {(-100,0,100),(100,0,100),(0,1,1)}.
Let's take the line numbers out. It's {(-100,100),(100,100),(0,1)}, and that means 100 people stand at -100, 100 people stand at 100, and 1 people stand at 0.
Sort it by x, and it's {(-100,100),(0,1),(100,100)}. There is 201 people in total, so we only need to set the location at where the 101th people stands. It's 0, and that's for the answer.
The column number is with the same algorithm. {(0,100),(0,100),(1,1)}, and it's sorted. The 101th people is at 0, so the answer for column is also 0.
The answer is (0,0).
I can think of O(n) solution for one dimensional problem, which in turn means you can solve original problem in O(n+m+G).
Suppose, people are standing like this, a_0, a_1, ... a_n-1: a_0 people at spot 0, a_1 at spot 1. Then the solution in pseudocode is
cur_result = sum(i*a_i, i = 1..n-1)
cur_r = sum(a_i, i = 1..n-1)
cur_l = a_0
for i = 1:n-1
cur_result = cur_result - cur_r + cur_l
cur_r = cur_r - a_i
cur_l = cur_l + a_i
end
You need to find point, where cur_result is minimal.
So you need O(n) + O(m) for solving 1d problems + O(G) to build them, meaning total complexity is O(n+m+G).
Alternatively you solve 1d in O(G*log G) (or O(G) if data is sorted) using the same idea. Choose the one from expected number of groups.
you can solve this in O(G Log G) time by reducing it to, two one dimensional problems as you mentioned.
And as to how to solve it in one dimension, just sort them and go through them one by one and calculate cost moving to that point. This calculation can be done in O(1) time for each point.
You can also avoid Log(G) component if your x and y coordinates are small enough for you to use bucket/radix sort.
Inspired by kilotaras's idea. It seems that there is a O(G) solution for this problem.
Since everyone agree with the two dimensional problem can be reduced to two one dimensional problem. I will not repeat it again. I just focus on how to solve the one dimensional problem
with O(G).
Suppose, people are standing like this, a[0], a[1], ... a[n-1]. There is a[i] people standing at spot i. There are G spots having people(G <= n). Assuming these G spots are g[1], g[2], ..., g[G], where gi is in [0,...,n-1]. Without losing generality, we can also assume that g[1] < g[2] < ... < g[G].
It's not hard to prove that the optimal spot must come from these G spots. I will pass the
prove here and left it as an exercise if you guys have interest.
Since the above observation, we can just compute the cost of moving to the spot of every group and then chose the minimal one. There is an obvious O(G^2) algorithm to do this.
But using kilotaras's idea, we can do it in O(G)(no sorting).
cost[1] = sum((g[i]-g[1])*a[g[i]], i = 2,...,G) // the cost of moving to the
spot of first group. This step is O(G).
cur_r = sum(a[g[i]], i = 2,...,G) //How many people is on the right side of the
second group including the second group. This step is O(G).
cur_l = a[g[1]] //How many people is on the left side of the second group not
including the second group.
for i = 2:G
gap = g[i] - g[i-1];
cost[i] = cost[i-1] - cur_r*gap + cur_l*gap;
if i != G
cur_r = cur_r - a[g[i]];
cur_l = cur_l + a[g[i]];
end
end
The minimal of cost[i] is the answer.
Using the example 5 1 0 3 to illustrate the algorithm.
In this example,
n = 4, G = 3.
g[1] = 0, g[2] = 1, g[3] = 3.
a[0] = 5, a[1] = 1, a[2] = 0, a[3] = 3.
(1) cost[1] = 1*1+3*3 = 10, cur_r = 4, cur_l = 5.
(2) cost[2] = 10 - 4*1 + 5*1 = 11, gap = g[2] - g[1] = 1, cur_r = 4 - a[g[2]] = 3, cur_l = 6.
(3) cost[3] = 11 - 3*2 + 6*2 = 17, gap = g[3] - g[2] = 2.

Algorithm for sampling without replacement?

I am trying to test the likelihood that a particular clustering of data has occurred by chance. A robust way to do this is Monte Carlo simulation, in which the associations between data and groups are randomly reassigned a large number of times (e.g. 10,000), and a metric of clustering is used to compare the actual data with the simulations to determine a p value.
I've got most of this working, with pointers mapping the grouping to the data elements, so I plan to randomly reassign pointers to data. THE QUESTION: what is a fast way to sample without replacement, so that every pointer is randomly reassigned in the replicate data sets?
For example (these data are just a simplified example):
Data (n=12 values) - Group A: 0.1, 0.2, 0.4 / Group B: 0.5, 0.6, 0.8 / Group C: 0.4, 0.5 / Group D: 0.2, 0.2, 0.3, 0.5
For each replicate data set, I would have the same cluster sizes (A=3, B=3, C=2, D=4) and data values, but would reassign the values to the clusters.
To do this, I could generate random numbers in the range 1-12, assign the first element of group A, then generate random numbers in the range 1-11 and assign the second element in group A, and so on. The pointer reassignment is fast, and I will have pre-allocated all data structures, but the sampling without replacement seems like a problem that might have been solved many times before.
Logic or pseudocode preferred.
Here's some code for sampling without replacement based on Algorithm 3.4.2S of Knuth's book Seminumeric Algorithms.
void SampleWithoutReplacement
(
int populationSize, // size of set sampling from
int sampleSize, // size of each sample
vector<int> & samples // output, zero-offset indicies to selected items
)
{
// Use Knuth's variable names
int& n = sampleSize;
int& N = populationSize;
int t = 0; // total input records dealt with
int m = 0; // number of items selected so far
double u;
while (m < n)
{
u = GetUniform(); // call a uniform(0,1) random number generator
if ( (N - t)*u >= n - m )
{
t++;
}
else
{
samples[m] = t;
t++; m++;
}
}
}
There is a more efficient but more complex method by Jeffrey Scott Vitter in "An Efficient Algorithm for Sequential Random Sampling," ACM Transactions on Mathematical Software, 13(1), March 1987, 58-67.
A C++ working code based on the answer by John D. Cook.
#include <random>
#include <vector>
// John D. Cook, https://stackoverflow.com/a/311716/15485
void SampleWithoutReplacement
(
int populationSize, // size of set sampling from
int sampleSize, // size of each sample
std::vector<int> & samples // output, zero-offset indicies to selected items
)
{
// Use Knuth's variable names
int& n = sampleSize;
int& N = populationSize;
int t = 0; // total input records dealt with
int m = 0; // number of items selected so far
std::default_random_engine re;
std::uniform_real_distribution<double> dist(0,1);
while (m < n)
{
double u = dist(re); // call a uniform(0,1) random number generator
if ( (N - t)*u >= n - m )
{
t++;
}
else
{
samples[m] = t;
t++; m++;
}
}
}
#include <iostream>
int main(int,char**)
{
const size_t sz = 10;
std::vector< int > samples(sz);
SampleWithoutReplacement(10*sz,sz,samples);
for (size_t i = 0; i < sz; i++ ) {
std::cout << samples[i] << "\t";
}
return 0;
}
See my answer to this question Unique (non-repeating) random numbers in O(1)?. The same logic should accomplish what you are looking to do.
Inspired by #John D. Cook's answer, I wrote an implementation in Nim. At first I had difficulties understanding how it works, so I commented extensively also including an example. Maybe it helps to understand the idea. Also, I have changed the variable names slightly.
iterator uniqueRandomValuesBelow*(N, M: int) =
## Returns a total of M unique random values i with 0 <= i < N
## These indices can be used to construct e.g. a random sample without replacement
assert(M <= N)
var t = 0 # total input records dealt with
var m = 0 # number of items selected so far
while (m < M):
let u = random(1.0) # call a uniform(0,1) random number generator
# meaning of the following terms:
# (N - t) is the total number of remaining draws left (initially just N)
# (M - m) is the number how many of these remaining draw must be positive (initially just M)
# => Probability for next draw = (M-m) / (N-t)
# i.e.: (required positive draws left) / (total draw left)
#
# This is implemented by the inequality expression below:
# - the larger (M-m), the larger the probability of a positive draw
# - for (N-t) == (M-m), the term on the left is always smaller => we will draw 100%
# - for (N-t) >> (M-m), we must get a very small u
#
# example: (N-t) = 7, (M-m) = 5
# => we draw the next with prob 5/7
# lets assume the draw fails
# => t += 1 => (N-t) = 6
# => we draw the next with prob 5/6
# lets assume the draw succeeds
# => t += 1, m += 1 => (N-t) = 5, (M-m) = 4
# => we draw the next with prob 4/5
# lets assume the draw fails
# => t += 1 => (N-t) = 4
# => we draw the next with prob 4/4, i.e.,
# we will draw with certainty from now on
# (in the next steps we get prob 3/3, 2/2, ...)
if (N - t)*u >= (M - m).toFloat: # this is essentially a draw with P = (M-m) / (N-t)
# no draw -- happens mainly for (N-t) >> (M-m) and/or high u
t += 1
else:
# draw t -- happens when (M-m) gets large and/or low u
yield t # this is where we output an index, can be used to sample
t += 1
m += 1
# example use
for i in uniqueRandomValuesBelow(100, 5):
echo i
When the population size is much greater than the sample size, the above algorithms become inefficient, since they have complexity O(n), n being the population size.
When I was a student I wrote some algorithms for uniform sampling without replacement, which have average complexity O(s log s), where s is the sample size. Here is the code for the binary tree algorithm, with average complexity O(s log s), in R:
# The Tree growing algorithm for uniform sampling without replacement
# by Pavel Ruzankin
quicksample = function (n,size)
# n - the number of items to choose from
# size - the sample size
{
s=as.integer(size)
if (s>n) {
stop("Sample size is greater than the number of items to choose from")
}
# upv=integer(s) #level up edge is pointing to
leftv=integer(s) #left edge is poiting to; must be filled with zeros
rightv=integer(s) #right edge is pointig to; must be filled with zeros
samp=integer(s) #the sample
ordn=integer(s) #relative ordinal number
ordn[1L]=1L #initial value for the root vertex
samp[1L]=sample(n,1L)
if (s > 1L) for (j in 2L:s) {
curn=sample(n-j+1L,1L) #current number sampled
curordn=0L #currend ordinal number
v=1L #current vertice
from=1L #how have come here: 0 - by left edge, 1 - by right edge
repeat {
curordn=curordn+ordn[v]
if (curn+curordn>samp[v]) { #going down by the right edge
if (from == 0L) {
ordn[v]=ordn[v]-1L
}
if (rightv[v]!=0L) {
v=rightv[v]
from=1L
} else { #creating a new vertex
samp[j]=curn+curordn
ordn[j]=1L
# upv[j]=v
rightv[v]=j
break
}
} else { #going down by the left edge
if (from==1L) {
ordn[v]=ordn[v]+1L
}
if (leftv[v]!=0L) {
v=leftv[v]
from=0L
} else { #creating a new vertex
samp[j]=curn+curordn-1L
ordn[j]=-1L
# upv[j]=v
leftv[v]=j
break
}
}
}
}
return(samp)
}
The complexity of this algorithm is discussed in:
Rouzankin, P. S.; Voytishek, A. V. On the cost of algorithms for random selection. Monte Carlo Methods Appl. 5 (1999), no. 1, 39-54.
http://dx.doi.org/10.1515/mcma.1999.5.1.39
If you find the algorithm useful, please make a reference.
See also:
P. Gupta, G. P. Bhattacharjee. (1984) An efficient algorithm for random sampling without replacement. International Journal of Computer Mathematics 16:4, pages 201-209.
DOI: 10.1080/00207168408803438
Teuhola, J. and Nevalainen, O. 1982. Two efficient algorithms for random sampling without replacement. /IJCM/, 11(2): 127–140.
DOI: 10.1080/00207168208803304
In the last paper the authors use hash tables and claim that their algorithms have O(s) complexity. There is one more fast hash table algorithm, which will soon be implemented in pqR (pretty quick R):
https://stat.ethz.ch/pipermail/r-devel/2017-October/075012.html
I wrote a survey of algorithms for sampling without replacement. I may be biased but I recommend my own algorithm, implemented in C++ below, as providing the best performance for many k, n values and acceptable performance for others. randbelow(i) is assumed to return a fairly chosen random non-negative integer less than i.
void cardchoose(uint32_t n, uint32_t k, uint32_t* result) {
auto t = n - k + 1;
for (uint32_t i = 0; i < k; i++) {
uint32_t r = randbelow(t + i);
if (r < t) {
result[i] = r;
} else {
result[i] = result[r - t];
}
}
std::sort(result, result + k);
for (uint32_t i = 0; i < k; i++) {
result[i] += i;
}
}
Another algorithm for sampling without replacement is described here.
It is similar to the one described by John D. Cook in his answer and also from Knuth, but it has different hypothesis: The population size is unknown, but the sample can fit in memory. This one is called "Knuth's algorithm S".
Quoting the rosettacode article:
Select the first n items as the sample as they become available;
For the i-th item where i > n, have a random chance of n/i of keeping it. If failing this chance, the sample remains the same. If
not, have it randomly (1/n) replace one of the previously selected n
items of the sample.
Repeat #2 for any subsequent items.

Resources