Find() operation for disjoint sets using "Path Halving" - algorithm

According to Disjoint-set_data_structure, In Union section, I have a problem understanding the implementation of Path Halving approach.
function Find(x)
while x.parent ≠ x
x.parent := x.parent.parent
x := x.parent
return x
My first iteration looks like this :
After the first iteration, x.parent and x is pointing at same node(which should not happen).I need help with the correct flow and iteration of this function
I am confused with the 3rd and 4th lines of that function and also "Path halving makes every other node on the path point to its grandparent".
Any help will be appreciated, thanks!

The algorithm wokrs as follows: you start from a node x, make x point to its granparent, then move on to the granparent itself, you continue until you find the root because the parent of the root is the root itself.
Look at the picture, you are actually halving the set by transforming it into a binary tree (it's not a proper binary tree but it can represented as such).
Let's say we have a set like this:
where the arrow means the parent (e.g. 8->7 = the parent of 8 is 7)
Say we call Find(8)
first iteration:
x = 8
8.parent = 7
8.parent = 8.parent.parent = 7.parent = 6
x = 8.parent = 7.parent = 6
second iteration:
x = 6
6.parent = 5
6.parent = 6.parent.parent = 5.parent = 4
x = 6.parent = 5.parent = 4
and so on...


Algorithm: path finding with variable path width

given a grid of paths with different width, how can i find a path which leads to the end point?
The path is going to be represented by a two dimentional array where 0 means cannot be walk on, 1 means it is walkable, 2 represents starting point and 3 represents end point. Consider the following example:
in the above example the width of a path varies from 1 to 3, and there exists many solutions which would lead to the end point. I want to find one path which leads to it and the path does not have to be the shortest one (should not be the longest one either). The width of each path is unknown which means the grid could be all "1"s except the starting and end point.
Edited: The path should not contain uneccessary "wasted" walk meaning that if a vertical path has width 2 the result should not just walk down the path and then take one step right then walk all the way up
I agree with Calumn: DFS is the simplest approach here. Here is a simple solution in python-like pseudocode. It will print the solution as a sequence of 'L','R',U','D' to indicate left,right,up, or down.
def flood(x,y,story):
if (visited[x][y] or map[x][y]=='0'): return;
if (map[x][y]=='3'):
print 'done. The path is: '+story
if (x<len(a[0])): flood(x+1,y,story+'R')
if (y<len(a)): flood(x,y+1,story+'D')
if (x>0): flood(x-1,y,story+'L')
if (y>0): flood(x,y-1,story+'U')
def solve(map):
visited = array_of_false_of_same_size_as(map)
x,y = find_the_two(map)
The optimization of making it stop as soon as it finds a solution is left as an exercise to the reader (you could make flood return a boolean to indicate if it found something, or use a global flag).
(p.s. I made this answer community wiki since I'm just clarifying Calumn's answer. I can't claim much credit)
Breadth-First Search version, also in Python
For what it's worth, and just to show that breadth-first search is not that complicated, an actual runnable program in Python:
def find(grid, xstart=0, ystart=0):
# Maps (xi,yi) to (x(i-1), y(i-1))
prev = {(xstart, ystart):None}
# Prepare for the breadth-first search
queue = [(xstart, ystart)]
qpos = 0
# Possibly enqueue a trial coordinate
def enqueue(prevxy, dx, dy):
x = prevxy[0] + dx
y = prevxy[1] + dy
xy = (x, y)
# Check that it hasn't been visited and the coordinates
# are valid and the grid position is not a 0
if (xy not in prev
and x >= 0 and x < len(grid)
and y >= 0 and y < len(grid[x])
and grid[x][y] != 0):
# Record the history (and the fact that we've been here)
prev[xy] = prevxy
# If we found the target, signal success
if grid[x][y] == 3:
return xy
# Otherwise, queue the new coordinates
return None
# The actual breadth-first search
while qpos < len(queue):
xy = queue[qpos]
qpos += 1
found = ( enqueue(xy, 1, 0)
or enqueue(xy, 0, 1)
or enqueue(xy, -1, 0)
or enqueue(xy, 0, -1))
if found: break
# Recover the path
path = []
while found:
found = prev[found]
return path
# Test run
grid = [ [2,1,1,1,1,1,1,1,1,0,0,0,0,0]
, [0,0,0,0,0,0,1,1,0,0,0,0,0,0]
, [0,0,0,0,1,1,1,1,1,1,1,1,1,1]
, [0,0,0,0,1,1,1,1,1,0,0,1,1,1]
, [0,0,0,0,1,1,1,0,0,0,0,1,0,1]
, [0,0,0,0,1,1,1,1,1,0,0,1,1,3]
for x, y in find(grid): grid[x][y]='*'
print '\n'.join(''.join(str(p) for p in line) for line in grid)

Fast way of checking if an element is ranked higher than another

I am writing in MATLAB a program that checks whether two elements A and B were exchanged in ranking positions.
Assume the first ranking is:
list1 = [1 2 3 4]
while the second one is:
list2 = [1 2 4 3]
I want to check whether A = 3 and B = 4 have exchanged relative positions in the rankings, which in this case is true, since in the first ranking 3 comes before 4 and in the second ranking 3 comes after 4.
In order to do this, I have written the following MATLAB code:
positionA1 = find(list1 == A);
positionB1 = find(list1 == B);
positionA2 = find(list2 == A);
positionB2 = find(list2 == B);
if (positionA1 <= positionB1 && positionA2 >= positionB2) || ...
(positionA1 >= positionB1 && positionA2 <= positionB2)
... do something
Unfortunately, I need to run this code a lot of times, and the find function is really slow (but needed to get the element position in the list).
I was wondering if there is a way of speeding up the procedure. I have also tried to write a MEX file that performs in C the find operation, but it did not help.
If the lists don't change within your loop, then you can determine the positions of the items ahead of time.
Assuming that your items are always integers from 1 to N:
[~, positions_1] = sort( list1 );
[~, positions_2] = sort( list2 );
This way you won't need to call find within the loop, you can just do:
positionA1 = positions_1(A);
positionB1 = positions_1(B);
positionA2 = positions_2(A);
positionB2 = positions_2(B);
If your loop is going over all possible combinations of A and B, then you can also vectorize that
Find the elements that exchanged relative ranking:
rank_diff_1 = bsxfun(#minus, positions_1, positions_1');
rank_diff_2 = bsxfun(#minus, positions_2, positions_2');
rel_rank_changed = sign(rank_diff_1) ~= sign(rank_diff_2);
[A_changed, B_changed] = find(rel_rank_changed);
Optional: Throw out half of the results, because if (3,4) is in the list, then (4,3) also will be, and maybe you don't want that:
mask = (A_changed < B_changed);
A_changed = A_changed(mask);
B_changed = B_changed(mask);
Now loop over only those elements that have exchanged relative ranking
for ii = 1:length(A_changed)
A = A_changed(ii);
B = B_changed(ii);
% Do something...
Instead of find try to compute something like this
Check if there is any exchanged values.
if logical(sum(abs(list1-list2)))
do something
For specific values A and B:
if (list1(logical((list1-list2)-abs((list1-list2))))==A)&&(list1(logical((list1-list2)+abs((list1-list2))))==B)
do something

Using non-continuous integers as identifiers in cells or structs in Matlab

I want to store some results in the following way:
Res.0 = magic(4); % or Res.baseCase = magic(4);
Res.2 = magic(5); % I would prefer to use integers on all other
Res.7 = magic(6); % elements than the first.
Res.2000 = 1:3;
I want to use numbers between 0 and 3000, but I will only use approx 100-300 of them. Is it possible to use 0 as an identifier, or will I have to use a minimum value of 1? (The numbers have meaning, so I would prefer if I don't need to change them). Can I use numbers as identifiers in structs?
I know I can do the following:
Res{(last number + 1)} = magic(4);
Res{2} = magic(5);
Res{7} = magic(6);
Res{2000} = 1:3;
And just remember that the last element is really the "number zero" element.
In this case I will create a bunch of empty cell elements [] in the non-populated positions. Does this cause a problem? I assume it will be best to assign the last element first, to avoid creating a growing cell, or does this not have an effect? Is this an efficient way of doing this?
Which will be most efficient, struct's or cell's? (If it's possible to use struct's, that is).
My main concern is computational efficiency.
Let's review your options:
Indexing into a cell arrays
MATLAB indices start from 1, not from 0. If you want to store your data in cell arrays, in the worst case, you could always use the subscript k + 1 to index into cell corresponding to the k-th identifier (k ≥ 0). In my opinion, using the last element as the "base case" is more confusing. So what you'll have is:
Res{1} = magic(4); %// Base case
Res{2} = magic(5); %// Corresponds to identifier 1
Res{k + 1} = ... %// Corresponds to indentifier k
Accessing fields in structures
Field names in structures are not allowed to begin with numbers, but they are allowed to contain them starting from the second character. Hence, you can build your structure like so:
Res.c0 = magic(4); %// Base case
Res.c1 = magic(5); %// Corresponds to identifier 1
Res.c2 = magic(6); %// Corresponds to identifier 2
%// And so on...
You can use dynamic field referencing to access any field, for instance:
k = 3;
kth_field = Res.(sprintf('c%d', k)); %// Access field k = 3 (i.e field 'c3')
I can't say which alternative seems more elegant, but I believe that indexing into a cell should be faster than dynamic field referencing (but you're welcome to check that out and prove me wrong).
As an alternative to EitanT's answer, it sounds like matlab's map containers are exactly what you need. They can deal with any type of key and the value may be a struct or cell.
In your case this will be:
k = {0,2,7,2000};
Res = {magic(4),magic(5),magic(6),1:3};
ResMap = containers.Map(k, Res)
ans =
16 2 3 13
5 11 10 8
9 7 6 12
4 14 15 1
I agree with the idea in #wakjah 's comment. If you are concerned about the efficiency of your program it's better to change the interpretation of the problem. In my opinion there is definitely a way that you could priorotize your data. This prioritization could be according to the time you acquired them, or with respect to the inputs that they are calculated. If you set any kind of priority among them, you can sort them into an structure or cell (structure might be faster).
Priority (Your Current Index Meaning) Data
1 0 magic(4)
2 2 magic(5)
3 7 magic(6)
4 2000 1:3
% Initialize Result structure which is different than your Res.
Result(300).Data = 0; % 300 the maximum number of data
Result(300).idx = 0; % idx or anything that represent the meaning of your current index.
% Assigning
k = 1; % Priority index
Result(k).idx = 0; Result(k).Data = magic(4); k = k + 1;
Result(k).idx = 2; Result(k).Data = magic(5); k = k + 1;
Result(k).idx = 7; Result(k).Data = magic(6); k = k + 1;

alternative rank function RBTree (red black tree)

I have an order-statistic augmented red black tree.
it works for the most part. but i need to implement a fast function (O(lg n)) that mostly returns the place of a node in sorted order. like the OS-rank function from my textbook. but with one twist: the return value if two nodes have the same score, should be the same. here is the os-rank function (in pseudocode, for a given node x, where root is the root of the tree).
while y!=root
if y==y.p.right
return r
But: what i need is something where if A has key 1 and Node B has key 1, the function returns 1 for both. and so on. I tried myself with something like this.
start with value r=1
check that x.right is not Nil
case x.right has the same key as x
add x.right.#nodeswithkeyhigher(x.key) to r
other cases: add x.right.size to r
while y != root
if y.parent.left == y
case y.parent.right.key>x.key
add y.parent.right to r
other cases
add y.parent.right.#nodeswithkeyhigher(x.key) to r
return r
Guess what: a testcase failed. I'd like to know if this is a correct way of doing things, or if perhaps i made some mistake i am not seeing (else the mistake is in the Node.#nodeswithkeyhigher(key) function).
edit: final paragraph for answer, thanks to Sticky.
tl;dr: skip to last paragraphs
This is the same issue I'm having trouble with. (Yes DS aswell). So far all runs except 5 are correct. I've tested several things, one being a very simple one: Just exchange left and right in OSRank. In some cases it gave a correct answer but in the harder cases it was quite a bit off. Oh I also added that if y.score == y.parent.score I only add the right size of y.parent, if not I add the right size + 1.
public int OSRank(Node x)
int r = x.Right.Size + 1;
Node y = x;
while (y != root)
if (y == y.Parent.Left)
if (y.Score == y.Parent.Score)
r = r + y.Parent.Right.Size;
r = r + y.Parent.Right.Size + 1;
y = y.Parent;
return r;
Let's first test this method on the tree on page 340 (figure 14.1). We'll search for the rank of 38 (which should return 4 because 39, 47 and 41 are higher):
r = 1 + 1 = 2 //Right side + 1
r = 2 //nothing happens because we're a right child
r = r + 1 + 1 = 4 //we're a left child, the key of our parent is larger and parent.Right.size = 1
r = 4 //nothing happens because we're a right child
So in this case the result is correct. But what if we add another node with key 38 to our tree. That reshapes our tree a bit, the right part of node 26 now looks like:
(I'm not allowed to add images yet so look here:
If we would use the same algorithm we'd get the following result (picking the red one):
r = 0 + 1 = 1 //no right side
r = 1 //we're a right child
r = 1 //we're a right child
r = 1 + 3 + 1 = 5 //The 3 comes from the size of node 41.
r = 5 //we're a right child
Though we expect rank 4 here. While I was typing this out I noticed that we check if y.Score == y.Parent.Score, but I completely forgot y changes. So in line 4 the clause "y.Score == y.Parent.Score" was false because we compared node 30 with 38. So if we change that line to:
if (x.Score == y.Parent.Score)
The algorithm outputs rank 4, which is correct. This means we eliminated another issue. But there are more, which I didn't figure out either:
The case in which Y.Parent.Right contains duplicate keys. Technically if we have 3 nodes with the same key, they should count as 1.
The case in which Y.Parent.Right contains keys that are equal to x.Key (the node you want the rank of). That would put us a few ranks back, incorrectly.
I suppose you could keep another integer which holds the amount of nodes with a higher score. Upon insertion you could climb the tree and adjust values if the subtree of that node doesn't contain a node with the same score. But how this is done (and efficiently) is unknown to me right now.
edit: First find the final successor of x with the same score x. Then calculate the rank the normal way. The code above works.

How to easily know if a maze has a road from start to goal?

I implemented a maze using 0,1 array. The entry and goal is fixed in the maze. Entry always be 0,0 point of the maze. Goal always be m-1,n-1 point of the maze. I'm using breadth-first search algorithm for now, but the speed is not good enough. Especially for large maze (100*100 or so). Could someone help me on this algorithm?
Here is my solution:
queue = []
position = start_node
queue << position
p = queue.shift #pop the first element
return true if maze.goal?(p)
left = p.left
visit(queue,left) if can_visit?(maze,left)
right = p.right
visit(queue,right) if can_visit?(maze,right)
up = p.up
visit(queue,up) if can_visit?(maze,up)
down = p.down
visit(queue,down) if can_visit?(maze,down)
return false
the can_visit? method check whether the node is inside the maze, whether the node is visited, whether the node is blocked.
worst answer possible.
1) go front until you cant move
2) turn left
3) rinse and repeat.
if you make it out , there is an end.
A better solution.
Traverse through your maze keeping 2 lists for open and closed nodes. Use the famous A-Star algorithm
to choose evaluate the next node and discard nodes which are a dead end. If you run out of nodes on your open list, there is no exit.
Here is a simple algorithm which should be much faster:
From start/goal move to to the first junction. You can ignore anything between that junction and the start/goal.
Locate all places in the maze which are dead ends (they have three walls). Move back to the next junction and take this path out of the search tree.
After you have removed all dead ends this way, there should be a single path left (or several if there are several ways to reach the goal).
I would not use the AStar algorithm there yet, unless I really need to, because this can be done with some simple 'coloring'.
# maze is a m x n array
def canBeTraversed(maze):
m = len(maze)
n = len(maze[0])
colored = [ [ False for i in range(0,n) ] for j in range(0,m) ]
open = [(0,0),]
while len(open) != 0:
(x,y) = open.pop()
if x == m-1 and y == n-1:
return True
elif x < m and y < n and maze[x][y] != 0 not colored[x][y]:
colored[x][y] = True
open.extend([(x-1,y), (x,y-1), (x+1,y), (x,y+1)])
return False
Yes it's stupid, yes it's breadfirst and all that.
Here is the A* implementation
def dist(x,y):
return (abs(x[0]-y[0]) + abs(x[1]-y[1]))^2
def heuristic(x,y):
return (x[0]-y[0])^2 + (x[1]-y[1])^2
def find(open,f):
result = None
min = None
for x in open:
tmp = f[x[0]][x[1]]
if min == None or tmp < min:
min = tmp
result = x
return result
def neighbors(x,m,n):
def add(result,y,m,n):
if x < m and y < n: result.append(y)
result = []
add(result, (x[0]-1,x[1]), m, n)
add(result, (x[0],x[1]-1), m, n)
add(result, (x[0]+1,x[1]), m, n)
add(result, (x[0],x[1]+1), m, n)
return result
def canBeTraversedAStar(maze):
m = len(maze)
n = len(maze[0])
goal = (m-1,n-1)
closed = set([])
open = set([(0,0),])
g = [ [ 0 for y in range(0,n) ] for x in range(0,m) ]
h = [ [ heuristic((x,y),goal) for y in range(0,n) ] for x in range(0,m) ]
f = [ [ h[x][y] for y in range(0,n) ] for x in range(0,m) ]
while len(open) != 0:
x = find(open,f)
if x == (m-1,n-1):
return True
for y in neighbors(x,m,n):
if y in closed: continue
if y not in open:
g[y[0]][y[1]] = g[x[0]][x[1]] + dist(x,y)
h[y[0]][y[1]] = heuristic(y,goal)
f[y[0]][y[1]] = g[y[0]][y[1]] + h[y[0]][y[1]]
return True
Here is my (simple) benchmark code:
def tryIt(func,size, runs):
maze = [ [ 1 for i in range(0,size) ] for j in range(0,size) ]
begin =
for i in range(0,runs): func(maze)
end =
print size, 'x', size, ':', (end - begin) / runs, 'average on', runs, 'runs'
Which outputs:
# For canBeTraversed
100 x 100 : 0:00:00.002650 average on 100 runs
1000 x 1000 : 0:00:00.198440 average on 100 runs
# For canBeTraversedAStar
100 x 100 : 0:00:00.016100 average on 100 runs
1000 x 1000 : 0:00:01.679220 average on 100 runs
The obvious here: going A* to run smoothly requires a lot of optimizations I did not bother to go after...
I would say:
Don't optimize
(Expert only) Don't optimize yet
How much time are you talking about when you say too much ? Really a 100x100 grid is so easily parsed in brute force it's a joke :/
I would have solved this with an AStar implementation. If you want even more speed, you can optimize to only generate the nodes from the junctions rather than every tile/square/step.
A method you can use that does not need to visit all nodes in the maze is as follows:
create an integer[][] with one value per maze "room"
create a queue, add [startpoint, count=1, delta=1] and [goal, count=-1, delta=-1]
start coloring the route by:
popping an object from the head of the queue, put the count at the maze point.
check all reachable rooms for a count with sign opposite to that of the rooms delta, if you find one the maze is solved: run both ways and connect the routes with the biggest steps up and down in room counts.
otherwise add all reachable rooms that have no count to the tail of the queue, with delta added to the room count.
if the queue is empty no path through the maze is possible.
This not only determines if there is a path, but also shows the shortest path possible through the maze.
You don't need to backtrack, so its O(number of maze rooms)
