Finding group sizes in matrices - algorithm

So i was wondering, is there an easy way to detect the sizes of adjacent same values in a matrix? For example, when looking at the matrix of values between 0 and 12 below:
The size of the group at [0,4] is 14 because there are 14 5's connected to each other. But the 1 and 4 are not connected.

I think you can use a breath first search (well kind of, try to visualize the matrix as a tree)
Here's a pseudo python implementation. that does this. Would this work for you? Did you have a complexity in mind?
Code
visited_nodes = set()
def find_adjacent_vals(target_val, cell_row, cell_column):
if inside_matrix(cell_row, cell_column)
cell = matrix(cell_row, cell_column)
if cell not in visited_nodes:
visited_nodes.add(cell)
if cell.value == target_val:
return (1 +
find_adjacent_vals(target_val, cell_row + 1, cell_column) # below
+find_adjacent_vals(target_val, cell_row - 1, cell_column) # above
+find_adjacent_vals(target_val, cell_row, cell_column -1) # left
+find_adjacent_vals(target_val, cell_row, cell_column +1) # right
))
print "Adjacent values count: " + str(find_adjacent_vals(target_val, target_row, target_column))
Explanation
Let's say you start at a node, you start branching out visiting nodes you haven't visited before. You do this till you encounter no new cells of the same value. And each node is guaranteed to have only 1 parent node thanks to the set logic. Therefore no cell is double counted.

Related

Generate random graph with probability p

Write a function in main.cpp, which creates a random graph of a certain size as follows. The function takes two parameters. The first parameter is the number of vertices n. The second parameter p (1 >= p >= 0) is the probability that an edge exists between a pair of nodes. In particular, after instantiating a graph with n vertices and 0 edges, go over all possible vertex pairs one by one, and for each such pair, put an edge between the vertices with probability p.
How to know if an edge exists between two vertices .
Here is the full question
PS: I don't need the code implementation
The problem statement clearly says that the first input parameter is the number of nodes and the second parameter is the probability p that an edge exists between any 2 nodes.
What you need to do is as follows (Updated to amend a mistake that was pointed out by #user17732522):
1- Create a bool matrix (2d nested array) of size n*n initialized with false.
2- Run a loop over the rows:
- Run an inner loop over the columns:
- if row_index != col_index do:
- curr_p = random() // random() returns a number between 0 and 1 inclusive
- if curr_p <= p: set matrix[row_index][col_index] = true
else: set matrix[row_index][col_index] = false
- For an undirected graph, also set matrix[col_index][row_index] = true/false based on curr_p
Note: Since we are setting both cells (both directions) in the matrix in case of a probability hit, we could potentially set an edge 2 times. This doesn't corrupt the correctnes of the probability and isn't much additional work. It helps to keep the code clean.
If you want to optimize this solution, you could run the loop such that you only visit the lower-left triangle (excluding the diagonal) and just mirror the results you get for those cells to the upper-right triangle.
That's it.

Find the path in a graph, that requires the least amount of nodes to visit

I'm looking for an algorithm to find the shortest way in a non-circular graph. However, the shortest path is not defined as the one with the least weight, but the one with the least number of nodes to visit before reaching the destination.
Let me give you an example: Let's assume you use bad timer app on your phone, which timer is limited to half an hour. There are three buttons:
add 10 minutes
add 1 minute
subtract one minute
The initial value is zero. The input range is 0 to 30 minutes. We now want to set a timer for let's say 29 min. The shortest path to accomplish this is by hitting add 10 min three times and subtract 1 min one time.
We represent every permutation of button presses in a graph (where every node represents a click on one of the three buttons). We are now looking for a way from start to a specific number, that has the least amount of button presses on it.
The problem can be solved using breadth first search (BFS). Problem is given a starting time (which we consider as starting node of the graph) and a target time (which we consider as destination node). In every step, we can do either of the following with the current time:
Add 10 minutes
Add 1 minute
Subtract 1 minute
Now, the nodes of the graph will be the time we can reach using either of the above steps. For example, starting from time 0 (consider it as node-id 0) we can reach to wither 10, 1, or -1 (i.e., nodes) in one step. So, in one step we discover three different nodes of the graph and non of them is our destination node.
With this three newly discovered nodes, we can do the similar moves and reach to some new nodes which are 2-step away from the source node (i.e., time 0). Here is the list of nodes we can discover in our second step:
From 10 to: 20, 11, 9
From 1 to: 11, 2, 0
From -1 to: 9, 0, -2
There are couple of thing we can notice from here:
If we discover a node which we never discovered before, then the steps it required to discover is the shortest distance. For example, we can reach to node 9 by 2 steps from the source node 0 (here we can see, 9 can be reached from both nodes 10 and -1), and clearly it is the shortest distance for node 9.
The graph can be circular. For example, there is an edge from node 0 to node -1, and from node -1 to node 0. You need to handle that.
So, it is clear now that we transform this problem to a standard BFS problem. Now, we can solve it by the following pseudo code:
Shortest-Path(initial-time, target-time):
Queue<time> Q
Map<time, step> M // this map will help tracking whether we discover a node before or not
Q.insert(initial-time)
M[initial-time] = 0
while(!Q.empty):
current-time = Q.front()
Q.pop()
if current-time is equal to target-time:
return M[current-time]
if (current-time + 10) is not in the M:
Q.push(current-time + 10)
M[(current-time + 10)] = M[current-time] + 1
if (current-time + 1) is not in the M:
Q.push(current-time + 1)
M[(current-time + 1)] = M[current-time] + 1
if (current-time - 1) is not in the M:
Q.push(current-time - 1)
M[(current-time - 1)] = M[current-time] + 1
Instead of thinking of the problem as a graph traversal problem, you can solve it simply by performing BFS on a tree that starts with a single root node and grows over time:
#!/usr/bin/python3
DESIRED_SUM = 29
SET = [-1, 1, 10]
QUEUE = [(0, [])] # For BFS
ANSWER = {} # For Dynamic Programming
while True:
(cur_sum, elements) = QUEUE.pop(0)
if cur_sum not in ANSWER:
ANSWER[cur_sum] = elements
if cur_sum == DESIRED_SUM:
break
for num in SET:
new_sum = cur_sum + num
if new_sum not in ANSWER:
new_elements = elements.copy()
new_elements.append(num)
QUEUE.append((new_sum, new_elements))
for key, item in ANSWER.items():
print("{0}: {1}".format(key, item))
Every time you are appending to the QUEUE, you are effectively adding leaf nodes. If you just perform BFS without storing the results (i.e. dynamic programming), then you will hit exponential time. The way the above code copies and carries around the list of elements is highly inefficient. You could instead store ANSWER[28] = ANSWER[29] - 1, for example, such that when you actually want to retrieve the full list of elements for a given sum, you can recursively follow the ANSWER to retrieve it.
Note that the above program will never terminate if the DESIRED_SUM can't be derived from the given list of elements.

Find minimum distance between points

I have a set of points (x,y).
i need to return two points with minimal distance.
I use this:
http://www.cs.ucsb.edu/~suri/cs235/ClosestPair.pdf
but , i dont really understand how the algo is working.
Can explain in more simple how the algo working?
or suggest another idea?
Thank!
If the number of points is small, you can use the brute force approach i.e:
for each point find the closest point among other points and save the minimum distance with the current two indices till now.
If the number of points is large, I think you may find the answer in this thread:
Shortest distance between points algorithm
Solution for Closest Pair Problem with minimum time complexity O(nlogn) is divide-and-conquer methodology as it mentioned in the document that you have read.
Divide-and-conquer Approach for Closest-Pair Problem
Easiest way to understand this algorithm is reading an implementation of it in a high-level language (because sometimes understanding the algorithms or pseudo-codes can be harder than understanding the real codes) like Python:
# closest pairs by divide and conquer
# David Eppstein, UC Irvine, 7 Mar 2002
from __future__ import generators
def closestpair(L):
def square(x): return x*x
def sqdist(p,q): return square(p[0]-q[0])+square(p[1]-q[1])
# Work around ridiculous Python inability to change variables in outer scopes
# by storing a list "best", where best[0] = smallest sqdist found so far and
# best[1] = pair of points giving that value of sqdist. Then best itself is never
# changed, but its elements best[0] and best[1] can be.
#
# We use the pair L[0],L[1] as our initial guess at a small distance.
best = [sqdist(L[0],L[1]), (L[0],L[1])]
# check whether pair (p,q) forms a closer pair than one seen already
def testpair(p,q):
d = sqdist(p,q)
if d < best[0]:
best[0] = d
best[1] = p,q
# merge two sorted lists by y-coordinate
def merge(A,B):
i = 0
j = 0
while i < len(A) or j < len(B):
if j >= len(B) or (i < len(A) and A[i][1] <= B[j][1]):
yield A[i]
i += 1
else:
yield B[j]
j += 1
# Find closest pair recursively; returns all points sorted by y coordinate
def recur(L):
if len(L) < 2:
return L
split = len(L)/2
L = list(merge(recur(L[:split]), recur(L[split:])))
# Find possible closest pair across split line
# Note: this is not quite the same as the algorithm described in class, because
# we use the global minimum distance found so far (best[0]), instead of
# the best distance found within the recursive calls made by this call to recur().
for i in range(len(E)):
for j in range(1,8):
if i+j < len(E):
testpair(E[i],E[i+j])
return L
L.sort()
recur(L)
return best[1]
closestpair([(0,0),(7,6),(2,20),(12,5),(16,16),(5,8),\
(19,7),(14,22),(8,19),(7,29),(10,11),(1,13)])
# returns: (7,6),(5,8)
Taken from: https://www.ics.uci.edu/~eppstein/161/python/closestpair.py
Detailed explanation:
First we define an Euclidean distance aka Square distance function to prevent code repetition.
def square(x): return x*x # Define square function
def sqdist(p,q): return square(p[0]-q[0])+square(p[1]-q[1]) # Define Euclidean distance function
Then we are taking the first two points as our initial best guess:
best = [sqdist(L[0],L[1]), (L[0],L[1])]
This is a function definition for comparing Euclidean distances of next pair with our current best pair:
def testpair(p,q):
d = sqdist(p,q)
if d < best[0]:
best[0] = d
best[1] = p,q
def merge(A,B): is just a rewind function for the algorithm to merge two sorted lists that previously divided to half.
def recur(L): function definition is the actual body of the algorithm. So I will explain this function definition in more detail:
if len(L) < 2:
return L
with this part, algorithm terminates the recursion if there is only one element/point left in the list of points.
Split the list to half: split = len(L)/2
Create a recursion (by calling function's itself) for each half: L = list(merge(recur(L[:split]), recur(L[split:])))
Then lastly this nested loops will test whole pairs in the current half-list with each other:
for i in range(len(E)):
for j in range(1,8):
if i+j < len(E):
testpair(E[i],E[i+j])
As the result of this, if a better pair is found best pair will be updated.
So they solve for the problem in Many dimensions using a divide-and-conquer approach. Binary search or divide-and-conquer is mega fast. Basically, if you can split a dataset into two halves, and keep doing that until you find some info you want, you are doing it as fast as humanly and computerly possible most of the time.
For this question, it means that we divide the data set of points into two sets, S1 and S2.
All the points are numerical, right? So we have to pick some number where to divide the dataset.
So we pick some number m and say it is the median.
So let's take a look at an example:
(14, 2)
(11, 2)
(5, 2)
(15, 2)
(0, 2)
What's the closest pair?
Well, they all have the same Y coordinate, so we can look at Xs only... X shortest distance is 14 to 15, a distance of 1.
How can we figure that out using divide-and-conquer?
We look at the greatest value of X and the smallest value of X and we choose the median as a dividing line to make our two sets.
Our median is 7.5 in this example.
We then make 2 sets
S1: (0, 2) and (5, 2)
S2: (11, 2) and (14, 2) and (15, 2)
Median: 7.5
We must keep track of the median for every split, because that is actually a vital piece of knowledge in this algorithm. They don't show it very clearly on the slides, but knowing the median value (where you split a set to make two sets) is essential to solving this question quickly.
We keep track of a value they call delta in the algorithm. Ugh I don't know why most computer scientists absolutely suck at naming variables, you need to have descriptive names when you code so you don't forget what the f000 you coded 10 years ago, so instead of delta let's call this value our-shortest-twig-from-the-median-so-far
Since we have the median value of 7.5 let's go and see what our-shortest-twig-from-the-median-so-far is for Set1 and Set2, respectively:
Set1 : shortest-twig-from-the-median-so-far 2.5 (5 to m where m is 7.5)
Set 2: shortest-twig-from-the-median-so-far 3.5 (looking at 11 to m)
So I think the key take-away from the algorithm is that this shortest-twig-from-the-median-so-far is something that you're trying to improve upon every time you divide a set.
Since S1 in our case has 2 elements only, we are done with the left set, and we have 3 in the right set, so we continue dividing:
S2 = { (11,2) (14,2) (15,2) }
What do you do? You make a new median, call it S2-median
S2-median is halfway between 15 and 11... or 13, right? My math may be fuzzy, but I think that's right so far.
So let's look at the shortest-twig-so-far-for-our-right-side-with-median-thirteen ...
15 to 13 is... 2
11 to 13 is .... 2
14 to 13 is ... 1 (!!!)
So our m value or shortest-twig-from-the-median-so-far is improved (where we updated our median from before because we're in a new chunk or Set...)
Now that we've found it we know that (14, 2) is one of the points that satisfies the shortest pair equation. You can then check exhaustively against the points in this subset (15, 11, 14) to see which one is the closer one.
Clearly, (15,2) and (14,2) are the winning pair in this case.
Does that make sense? You must keep track of the median when you cut the set, and keep a new median for everytime you cut the set until you have only 2 elements remaining on each side (or in our case 3)
The magic is in the median or shortest-twig-from-the-median-so-far
Thanks for asking this question, I went in not knowing how this algorithm worked but found the right highlighted bullet point on the slide and rolled with it. Do you get it now? I don't know how to explain the median magic other than binary search is f000ing awesome.

Find North-East path with most points [duplicate]

In Cracking the Coding Interview, Fourth Edition, there is such a problem:
A circus is designing a tower routine consisting of people standing
atop one anoth- er’s shoulders For practical and aesthetic reasons,
each person must be both shorter and lighter than the person below him
or her Given the heights and weights of each person in the circus,
write a method to compute the largest possible number of people in
such a tower.
EXAMPLE: Input (ht, wt): (65, 100) (70, 150) (56, 90)
(75, 190) (60, 95) (68, 110)
Output: The longest tower is length 6 and
includes from top to bottom: (56, 90) (60,95) (65,100) (68,110)
(70,150) (75,190)
Here is its solution in the book
Step 1 Sort all items by height first, and then by weight This means that if all the heights are unique, then the items will be sorted by their height If heights are the same, items will be sorted by their weight
Step 2 Find the longest sequence which contains increasing heights and increasing weights
To do this, we:
a) Start at the beginning of the sequence Currently, max_sequence is empty
b) If, for the next item, the height and the weight is not greater than those of the previous item, we mark this item as “unfit”
c) If the sequence found has more items than “max sequence”, it becomes “max sequence”
d) After that the search is repeated from the “unfit item”, until we reach the end of the original sequence
I have some questions about its solutions.
Q1
I believe this solution is wrong.
For example
(3,2) (5,9) (6,7) (7,8)
Obviously, (6,7) is an unfit item, but how about (7,8)? According to the solution, it is NOT unfit as its h and w are bother bigger than (6,7), however, it cannot be considered into the sequence, because (7,8) does not fit (5,9).
Am I right?
If I am right, what is the fix?
Q2
I believe even if there is a fix for the above solution, the style of the solution will lead to at least O(n^2), because it need to iterate again and again, according to step 2-d.
So is it possible to have a O(nlogn) solution?
You can solve the problem with dynamic programming.
Sort the troupe by height. For simplicity, assume all the heights h_i and weights w_j are distinct. Thus h_i is an increasing sequence.
We compute a sequence T_i, where T_i is a tower with person i at the top of maximal size. T_1 is simply {1}. We can deduce subsequent T_k from the earlier T_j — find the largest tower T_j that can take k's weight (w_j < w_k) and stand k on it.
The largest possible tower from the troupe is then the largest of the T_i.
This algorithm takes O(n**2) time, where n is the cardinality of the troupe.
Tried solving this myself, did not meant to give 'ready made solution', but still giving , more to check my own understanding and if my code(Python) is ok and would work of all test cases. I tried for 3 cases and it seemed to work of correct answer.
#!/usr/bin/python
#This function takes a list of tuples. Tuple(n):(height,weight) of nth person
def htower_len(ht_wt):
ht_sorted = sorted(ht_wt,reverse=True)
wt_sorted = sorted(ht_wt,key=lambda ht_wt:ht_wt[1])
max_len = 1
len1 = len(ht_sorted)
i=0
j=0
while i < (len1-1):
if(ht_sorted[i+1][1] < ht_sorted[0][1]):
max_len = max_len+1
i=i+1
print "maximum tower length :" ,max_len
###Called above function with below sample app code.
testcase =1
print "Result of Test case ",testcase
htower_len([(5,75),(6.7,83),(4,78),(5.2,90)])
testcase = testcase + 1
print "Result of Test case ",testcase
htower_len([(65, 100),(70, 150),(56, 90),(75, 190),(60, 95),(68, 110)])
testcase = testcase + 1
print "Result of Test case ",testcase
htower_len([(3,2),(5,9),(6,7),(7,8)])
For example
(3,2) (5,9) (6,7) (7,8)
Obviously, (6,7) is an unfit item, but how about (7,8)?
In answer to your Question - the algorithm first runs starting with 3,2 and gets the sequence (3,2) (5,9) marking (6,7) and (7,8) as unfit.
It then starts again on (6,7) (the first unfit) and gets (6,7) (7,8), and that makes the answer 2. Since there are no more "unfit" items, the sequence terminates with maximum length 2.
After first sorting the array by height and weight, my code checks what the largest tower would be if we grabbed any of the remaining tuples in the array (and possible subsequent tuples). In order to avoid re-computing sub-problems, solution_a is used to store the optimal max length from the tail of the input_array.
The beginning_index is the index from which we can consider grabbing elements from (the index from which we can consider people who could go below on the human stack), and beginning_tuple refers to the element/person higher up on the stack.
This solution runs in O(nlogn) to do the sort. The space used is O(n) for the solution_a array and the copy of the input_array.
def determine_largest_tower(beginning_index, a, beginning_tuple, solution_a):
# base case
if beginning_index >= len(a):
return 0
if solution_a[beginning_index] != -1: # already computed
return solution_a[beginning_index]
# recursive case
max_len = 0
for i in range(beginning_index, len(a)):
# if we can grab that value, check what the max would be
if a[i][0] >= beginning_tuple[0] and a[i][1] >= beginning_tuple[1]:
max_len = max(1 + determine_largest_tower(i+1, a, a[i], solution_a), max_len)
solution_a[beginning_index] = max_len
return max_len
def algorithm_for_human_towering(input_array):
a = sorted(input_array)
return determine_largest_tower(0, a, (-1,-1), [-1] * len(a))
a = [(3,2),(5,9),(6,7),(7,8)]
print algorithm_for_human_towering(a)
Here is another way to approach the problem altogether with code;
Algorithm
Sorting first by height and then by width
Sorted array:
[(56, 90), (60, 95), (65, 100), (68, 110), (70, 150), (75, 190)]
Finding the length of the longest increasing subsequence of weights
Why the longest subsequence of weights is the answer?
The people are sorted by increasing height,
so when we are finding a subsequence of people with increasing weights too
these selected people would satisfy our requirement as they are both in increasing order of heights and weights and therefore can form a human tower.
For example:
[(56, 90) (60,95) (65,100) (68,110) (70,150) (75,190)]
Efficient Implementation
In the attached implementation we maintain a list of increasing numbers and uses bisect_left, which is implemented under the hood using binary search, to find the proper index for insertion.
Please Note; The sequence generated by longest_increasing_sequence method might not be the actual longest subsequence, however, the length of it - will surely be as the length of the longest increasing subsequence.
Kindly refer to Longest increasing subsequence Efficient algorithms for more details.
The overall time complexity is O(n log(n)) as desired.
Code
from bisect import bisect_left
def human_tower(height, weight):
def longest_increasing_sequence(A, get_property):
lis = []
for i in range(len(A)):
x = get_property(A[i])
i = bisect_left(lis, x)
if i == len(lis):
lis.append(x)
else:
lis[i] = x
return len(lis)
# Edge case, no people
if 0 == len(height):
return 0
# Creating array of heights and widths
people = [(h, w) for h, w in zip(height, weight)]
# Sorting array first by height and then by width
people.sort()
# Returning length longest increasing sequence
return longest_increasing_sequence(people, lambda t : t[1])
assert 6 == human_tower([65,70,56,75,60,68], [100,150,90,190,95,110])

How to adapt Fenwick tree to answer range minimum queries

Fenwick tree is a data-structure that gives an efficient way to answer to main queries:
add an element to a particular index of an array update(index, value)
find sum of elements from 1 to N find(n)
both operations are done in O(log(n)) time and I understand the logic and implementation. It is not hard to implement a bunch of other operations like find a sum from N to M.
I wanted to understand how to adapt Fenwick tree for RMQ. It is obvious to change Fenwick tree for first two operations. But I am failing to figure out how to find minimum on the range from N to M.
After searching for solutions majority of people think that this is not possible and a small minority claims that it actually can be done (approach1, approach2).
The first approach (written in Russian, based on my google translate has 0 explanation and only two functions) relies on three arrays (initial, left and right) upon my testing was not working correctly for all possible test cases.
The second approach requires only one array and based on the claims runs in O(log^2(n)) and also has close to no explanation of why and how should it work. I have not tried to test it.
In light of controversial claims, I wanted to find out whether it is possible to augment Fenwick tree to answer update(index, value) and findMin(from, to).
If it is possible, I would be happy to hear how it works.
Yes, you can adapt Fenwick Trees (Binary Indexed Trees) to
Update value at a given index in O(log n)
Query minimum value for a range in O(log n) (amortized)
We need 2 Fenwick trees and an additional array holding the real values for nodes.
Suppose we have the following array:
index 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
value 1 0 2 1 1 3 0 4 2 5 2 2 3 1 0
We wave a magic wand and the following trees appear:
Note that in both trees each node represents the minimum value for all nodes within that subtree. For example, in BIT2 node 12 has value 0, which is the minimum value for nodes 12,13,14,15.
Queries
We can efficiently query the minimum value for any range by calculating the minimum of several subtree values and one additional real node value. For example, the minimum value for range [2,7] can be determined by taking the minimum value of BIT2_Node2 (representing nodes 2,3) and BIT1_Node7 (representing node 7), BIT1_Node6 (representing nodes 5,6) and REAL_4 - therefore covering all nodes in [2,7]. But how do we know which sub trees we want to look at?
Query(int a, int b) {
int val = infinity // always holds the known min value for our range
// Start traversing the first tree, BIT1, from the beginning of range, a
int i = a
while (parentOf(i, BIT1) <= b) {
val = min(val, BIT2[i]) // Note: traversing BIT1, yet looking up values in BIT2
i = parentOf(i, BIT1)
}
// Start traversing the second tree, BIT2, from the end of range, b
i = b
while (parentOf(i, BIT2) >= a) {
val = min(val, BIT1[i]) // Note: traversing BIT2, yet looking up values in BIT1
i = parentOf(i, BIT2)
}
val = min(val, REAL[i]) // Explained below
return val
}
It can be mathematically proven that both traversals will end in the same node. That node is a part of our range, yet it is not a part of any subtrees we have looked at. Imagine a case where the (unique) smallest value of our range is in that special node. If we didn't look it up our algorithm would give incorrect results. This is why we have to do that one lookup into the real values array.
To help understand the algorithm I suggest you simulate it with pen & paper, looking up data in the example trees above. For example, a query for range [4,14] would return the minimum of values BIT2_4 (rep. 4,5,6,7), BIT1_14 (rep. 13,14), BIT1_12 (rep. 9,10,11,12) and REAL_8, therefore covering all possible values [4,14].
Updates
Since a node represents the minimum value of itself and its children, changing a node will affect its parents, but not its children. Therefore, to update a tree we start from the node we are modifying and move up all the way to the fictional root node (0 or N+1 depending on which tree).
Suppose we are updating some node in some tree:
If new value < old value, we will always overwrite the value and move up
If new value == old value, we can stop since there will be no more changes cascading upwards
If new value > old value, things get interesting.
If the old value still exists somewhere within that subtree, we are done
If not, we have to find the new minimum value between real[node] and each tree[child_of_node], change tree[node] and move up
Pseudocode for updating node with value v in a tree:
while (node <= n+1) {
if (v > tree[node]) {
if (oldValue == tree[node]) {
v = min(v, real[node])
for-each child {
v = min(v, tree[child])
}
} else break
}
if (v == tree[node]) break
tree[node] = v
node = parentOf(node, tree)
}
Note that oldValue is the original value we replaced, whereas v may be reassigned multiple times as we move up the tree.
Binary Indexing
In my experiments Range Minimum Queries were about twice as fast as a Segment Tree implementation and updates were marginally faster. The main reason for this is using super efficient bitwise operations for moving between nodes. They are very well explained here. Segment Trees are really simple to code so think about is the performance advantage really worth it? The update method of my Fenwick RMQ is 40 lines and took a while to debug. If anyone wants my code I can put it on github. I also produced a brute and test generators to make sure everything works.
I had help understanding this subject & implementing it from the Finnish algorithm community. Source of the image is http://ioinformatics.org/oi/pdf/v9_2015_39_44.pdf, but they credit Fenwick's 1994 paper for it.
The Fenwick tree structure works for addition because addition is invertible. It doesn't work for minimum, because as soon as you have a cell that's supposed to be the minimum of two or more inputs, you've lost information potentially.
If you're willing to double your storage requirements, you can support RMQ with a segment tree that is constructed implicitly, like a binary heap. For an RMQ with n values, store the n values at locations [n, 2n) of an array. Locations [1, n) are aggregates, with the formula A(k) = min(A(2k), A(2k+1)). Location 2n is an infinite sentinel. The update routine should look something like this.
def update(n, a, i, x): # value[i] = x
i += n
a[i] = x
# update the aggregates
while i > 1:
i //= 2
a[i] = min(a[2*i], a[2*i+1])
The multiplies and divides here can be replaced by shifts for efficiency.
The RMQ pseudocode is more delicate. Here's another untested and unoptimized routine.
def rmq(n, a, i, j): # min(value[i:j])
i += n
j += n
x = inf
while i < j:
if i%2 == 0:
i //= 2
else:
x = min(x, a[i])
i = i//2 + 1
if j%2 == 0:
j //= 2
else:
x = min(x, a[j-1])
j //= 2
return x

Resources