Related
I understand that there are mainly two approaches to dynamic programming solutions:
Fixed optimal order of evaluation (lets call it Foo approach): Foo approach usually goes from subproblems to bigger problems thus using results obtained earlier for subproblems to solve bigger problems, thus avoiding "revisiting" subproblem. CLRS also seems to call this "Bottom Up" approach.
Without fixed optimal order of evaluation (lets call it Non-Foo approach): In this approach evaluation proceeds from problems to their sub-problems . It ensures that sub problems are not "re-evaluated" (thus ensuring optimality) by maintaining results of their past evaluations in some data structure and then first checking if the result of the problem at hand exists in this data structure before starting its evaluation. CLRS seem to call this as "Top Down" approach
This is what is roughly conveyed as one of the main points by this answer.
I have following doubts:
Q1. Memoization or not?
CLRS uses terms "top down with memoization" approach and "bottom up" approach. I feel both approaches require memory to cache results of sub problems. But, then, why CLRS use term "memoization" only for top down approach and not for bottom up approach? After solving some problems by DP approach, I feel that solutions by top down approach for all problems require memory to caches results of "all" subproblems. However, that is not the case with bottom up approach. Solutions by bottom up approach for some problems does not need to cache results of "all" sub problems. Q1. Am I correct with this?
For example consider this problem:
Given cost[i] being the cost of ith step on a staircase, give the minimum cost of reaching the top of the floor if:
you can climb either one or two steps
you can start from the step with index 0, or the step with index 1
The top down approach solution is as follows:
class Solution:
def minCostAux(self, curStep, cost):
if self.minCosts[curStep] > -1:
return self.minCosts[curStep]
if curStep == -1:
return 0
elif curStep == 0:
self.minCosts[curStep] = cost[0]
else:
self.minCosts[curStep] = min(self.minCostAux(curStep-2, cost) + cost[curStep]
, self.minCostAux(curStep-1, cost) + cost[curStep])
return self.minCosts[curStep]
def minCostClimbingStairs(self, cost) -> int:
cost.append(0)
self.minCosts = [-1] * len(cost)
return self.minCostAux(len(cost)-1, cost)
The bottom up approach solution is as follows:
class Solution:
def minCostClimbingStairs(self, cost) -> int:
cost.append(0)
secondLastMinCost = cost[0]
lastMinCost = min(cost[0]+cost[1], cost[1])
minCost = lastMinCost
for i in range(2,len(cost)):
minCost = min(lastMinCost, secondLastMinCost) + cost[i]
secondLastMinCost = lastMinCost
lastMinCost = minCost
return minCost
Note that the top down approach caches result of all steps in self.minCosts while bottom up approach caches result of only last two steps in variables lastMinCost and secondLastMinCost.
Q2. Does all problems have solutions by both approaches?
I feel no. I came to this opinion after solving this problem:
Find the probability that the knight will not go out of n x n chessboard after k moves, if the knight was initially kept in the cell at index (row, column).
I feel the only way to solve this problem is to find successive probabilities in increasing number of steps starting from cell (row, column), that is probability that the knight will not go out of chessboard after step 1, then after step 2, then after step 3 and so on. This is bottom up approach. We cannot do it top down, for example, we cannot start with kth step and go to k-1th step, then k-2th step and so on, because:
We cannot know which cells will be reached in kth step to start with
We cannot ensure that all paths from kth step will lead to initial knight cell position (row,column).
Even one of the top voted answer gives dp solution as follows:
class Solution {
private int[][]dir = new int[][]{{-2,-1},{-1,-2},{1,-2},{2,-1},{2,1},{1,2},{-1,2},{-2,1}};
private double[][][] dp;
public double knightProbability(int N, int K, int r, int c) {
dp = new double[N][N][K + 1];
return find(N,K,r,c);
}
public double find(int N,int K,int r,int c){
if(r < 0 || r > N - 1 || c < 0 || c > N - 1) return 0;
if(K == 0) return 1;
if(dp[r][c][K] != 0) return dp[r][c][K];
double rate = 0;
for(int i = 0;i < dir.length;i++) rate += 0.125 * find(N,K - 1,r + dir[i][0],c + dir[i][1]);
dp[r][c][K] = rate;
return rate;
}
}
I feel this is still a bottom up approach since it starts with initial knight cell position (r,c) (and hence starts from 0th or no step to Kth step) despite the fact that it counts K downwads to 0. So, this is bottom up approach done recursively and not top down approach. To be precise, this solution does NOT first find:
probability of knight not going out of chessboard after K steps starting at cell (r,c)
and then find:
probability of knight not going out of chessboard after K-1 steps starting at cell (r,c)
but it finds in reverse / bottom up order: first for K-1 steps and then for K steps.
Also, I did not find any solutions in of top voted discussions in leetcode doing it in truly top down manner, starting from Kth step to 0th step ending in (row,column) cell, instead of starting with (row,column) cell.
Similarly we cannot solve the following problem with the bottom up approach but only with top down approach:
Find the probability that the Knight ends up in the cell at index (row,column) after K steps, starting at any initial cell.
Q2. So am I correct with my understanding that not all problems have solutions by both top down or bottom up approaches? Or am I just overthinking unnecessarily and both above problems can indeed be solved with both top down and bottom up approaches?
PS: I indeed seem to have done overthinking here: knightProbability() function above is indeed top down, and I ill-interpreted as explained in detailed above 😑. I have kept this explanation for reference as there are already some answers below and also as a hint of how confusion / mis-interpretaions might happen, so that I will be more cautious in future. Sorry if this long explanation caused you some confusion / frustrations. Regardless, the main question still holds: does every problem have bottom up and top down solutions?
Q3. Bottom up approach recursively?
I am pondering if bottom up solutions for all problems can also be implemented recursively. After trying to do so for other problems, I came to following conclusion:
We can implement bottom up solutions for such problems recursively, only that the recursion won't be meaningful, but kind of hacky.
For example, below is recursive bottom up solution for minimum cost climbing stairs problem mentioned in Q1:
class Solution:
def minCostAux(self, step_i, cost):
if self.minCosts[step_i] != -1:
return self.minCosts[step_i]
self.minCosts[step_i] = min(self.minCostAux(step_i-1, cost)
, self.minCostAux(step_i-2, cost)) + cost[step_i]
if step_i == len(cost)-1: # returning from non-base case, gives sense of
# not-so meaningful recursion.
# Also, base cases usually appear at the
# beginning, before recursive call.
# Or should we call it "ceil condition"?
return self.minCosts[step_i]
return self.minCostAux(step_i+1, cost)
def minCostClimbingStairs(self, cost: List[int]) -> int:
cost.append(0)
self.minCosts = [-1] * len(cost)
self.minCosts[0] = cost[0]
self.minCosts[1] = min(cost[0]+cost[1], cost[1])
return self.minCostAux(2, cost)
Is my quoted understanding correct?
First, context.
Every dynamic programming problem can be solved without dynamic programming using a recursive function. Generally this will take exponential time, but you can always do it. At least in principle. If the problem can't be written that way, then it really isn't a dynamic programming problem.
The idea of dynamic programming is that if I already did a calculation and have a saved result, I can just use that saved result instead of doing the calculation again.
The whole top-down vs bottom-up distinction refers to the naive recursive solution.
In a top-down approach your call stack looks like the naive version except that you make a "memo" of what the recursive result would have given. And then the next time you short-circuit the call and return the memo. This means you can always, always, always solve dynamic programming problems top down. There is always a solution that looks like recursion+memoization. And that solution by definition is top down.
In a bottom up approach you start with what some of the bottom levels would have been and build up from there. Because you know the structure of the data very clearly, frequently you are able to know when you are done with data and can throw it away, saving memory. Occasionally you can filter data on non-obvious conditions that are hard for memoization to duplicate, making bottom up faster as well. For a concrete example of the latter, see Sorting largest amounts to fit total delay.
Start with your summary.
I strongly disagree with your thinking about the distinction in terms of the optimal order of evaluations. I've encountered many cases with top down where optimizing the order of evaluations will cause memoization to start hitting sooner, making code run faster. Conversely while bottom up certainly picks a convenient order of operations, it is not always optimal.
Now to your questions.
Q1: Correct. Bottom up often knows when it is done with data, top down does not. Therefore bottom up gives you the opportunity to delete data when you are done with it. And you gave an example where this happens.
As for why only one is called memoization, it is because memoization is a specific technique for optimizing a function, and you get top down by memoizing recursion. While the data stored in dynamic programming may match up to specific memos in memoization, you aren't using the memoization technique.
Q2: I do not know.
I've personally found cases where I was solving a problem over some complex data structure and simply couldn't find a bottom up approach. Maybe I simply wasn't clever enough, but I don't believe that a bottom up approach always exists to be found.
But top down is always possible. Here is how to do it in Python for the example that you gave.
First the naive recursive solution looks like this:
def prob_in_board(n, i, j, k):
if i < 0 or j < 0 or n <= i or n <= j:
return 0
elif k <= 0:
return 1
else:
moves = [
(i+1, j+2), (i+1, j-2),
(i-1, j+2), (i-1, j-2),
(i+2, j+1), (i+2, j-1),
(i-2, j+1), (i-2, j-1),
]
answer = 0
for next_i, next_j in moves:
answer += prob_in_board(n, next_i, next_j, k-1) / len(moves)
return answer
print(prob_in_board(8, 3, 4, 7))
And now we just memoize.
def prob_in_board_memoized(n, i, j, k, cache=None):
if cache is None:
cache = {}
if i < 0 or j < 0 or n <= i or n <= j:
return 0
elif k <= 0:
return 1
elif (i, j, k) not in cache:
moves = [
(i+1, j+2), (i+1, j-2),
(i-1, j+2), (i-1, j-2),
(i+2, j+1), (i+2, j-1),
(i-2, j+1), (i-2, j-1),
]
answer = 0
for next_i, next_j in moves:
answer += prob_in_board_memoized(n, next_i, next_j, k-1, cache) / len(moves)
cache[(i, j, k)] = answer
return cache[(i, j, k)]
print(prob_in_board_memoized(8, 3, 4, 7))
This solution is top down. If it seems otherwise to you, then you do not correctly understand what is meant by top-down.
I found your question ( does every dynamic programming problem have bottom up and top down solutions ? ) very interesting. That's why I'm adding another answer to continue the discussion about it.
To answer the question in its generic form, I need to formulate it more precisely with math. First, I need to formulate precisely what is a dynamic programming problem. Then, I need to define precisely what is a bottom up solution and what is a top down solution.
I will try to put some definitions but I think they are not the most generic ones. I think a really generic definition would need more heavy math.
First, define a state space S of dimension d as a subset of Z^d (Z represents the integers set).
Let f: S -> R be a function that we are interested in calculate for a given point P of the state space S (R represents the real numbers set).
Let t: S -> S^k be a transition function (it associates points in the state space to sets of points in the state space).
Consider the problem of calculating f on a point P in S.
We can consider it as a dynamic programming problem if there is a function g: R^k -> R such that f(P) = g(f(t(P)[0]), f(t(P)[1]), ..., f(t(P)[k])) (a problem can be solved only by using sub problems) and t defines a directed graph that is not a tree (sub problems have some overlap).
Consider the graph defined by t. We know it has a source (the point P) and some sinks for which we know the value of f (the base cases). We can define a top down solution for the problem as a depth first search through this graph that starts in the source and calculate f for each vertex at its return time (when the depth first search of all its sub graph is completed) using the transition function. On the other hand, a bottom up solution for the problem can be defined as a multi source breadth first search through the transposed graph that starts in the sinks and finishes in the source vertex, calculating f at each visited vertex using the previous visited layer.
The problem is: to navigate through the transposed graph, for each point you visit you need to know what points transition to this point in the original graph. In math terms, for each point Q in the transition graph, you need to know the set J of points such that for each point Pi in J, t(Pi) contains Q and there is no other point Pr in the state space outside of J such that t(Pr) contains Q. Notice that a trivial way to know this is to visit all the state space for each point Q.
The conclusion is that a bottom up solution as defined here always exists but it only compensates if you have a way to navigate through the transposed graph at least as efficiently as navigating through the original graph. This depends essentially in the properties of the transition function.
In particular, for the leetcode problem you mentioned, the transition function is the function that, for each point in the chessboard, gives all the points to which the knight can go to. A very special property about this function is that it's symmetric: if the knight can go from A to B, then it can also go from B to A. So, given a certain point P, you can know to which points the knight can go as efficiently as you can know from which points the knight can come from. This is the property that guarantees you that there exists a bottom up approach as efficient as the top down approach for this problem.
For the leetcode question you mentioned, the top down approach is like the following:
Let P(x, y, k) be the probability that the knight is at the square (x, y) at the k-th step. Look at all squares that the knight could have come from (you can get them in O(1), just look at the board with a pen and paper and get the formulas from the different cases, like knight in the corner, knight in the border, knight in a central region etc). Let them be (x1, y1), ... (xj, yj). For each of these squares, what is the probability that the knight jumps to (x, y) ? Considering that it can go out of the border, it's always 1/8. So:
P(x, y, k) = (P(x1, y1, k-1) + ... + P(xj, yj, k-1))/8
The base case is k = 0:
P(x, y ,0) = 1 if (x, y) = (x_start, y_start) and P(x, y, 0) = 0 otherwise.
You iterate through all n^2 squares and use the recurrence formula to calculate P(x, y, k). Many times you will need solutions you already calculated for k-1 and so you can benefit a lot from memoization.
In the end, the final solution will be the sum of P(x, y, k) over all squares of the board.
I'm trying to solve a problem that is about minimizing the distance traveled by a group of n entities who have to go trough a group of x points in a given order.
The n entities all start in the same position (1,1) and then I'm given x points that are in a queue and have to be "answered" in the correct order. However, I want the distance to be minimal.
My approach to the algorithm so far was to order the entities in increasing order of their distance to the x that is next in line. Then, I'd check, from the ones that are closer to the ones that are the furthest away, if this distance to the next in line is bigger than the distance to the one that comes afterwards to minimize the distance. The closest one to not fulfill this condition went to answer. If all were closer to the x that came afterwards, I'd reorder them in increasing order of distance to the one that came afterwards and send the furthest away from this to answer the x. Since this is a test problem I'm doing as practice for a competition I know what the result should be for my test case and it seems I'm doing this wrong.
How should I implement such an algorithm that guarantees that the distance is minimal?
The algorithm you've described sounds like a greedy search algorithm. Greedy algorithms are not guaranteed to find optimal solutions except under specific conditions that don't seem to hold here.
This looks like a candidate for a dynamic programming formulation. Alternatively, you can use heuristic search such as the A* search algorithm. I'd go with the latter if I was in the competition. See the link for a description of how the algorithm works, how to implement it, and how you might apply it to your problem.
Although since the points need to be visited in order, there is a bound on the number of possible arrangements, I couldn't think of a more efficient way than the following formulation. Let f(ns, i) represent the optimal arrangement up to the ith point, where ns is the list of the last chosen point for each entity that has at least one point. Then we have at most two choices: either start a new entity if we haven't run out, or try the current point as the next visit for each entity.
Python recursion:
import math
def d(p1, p2):
return math.sqrt(math.pow(p1[0] - p2[0], 2) + math.pow(p1[1] - p2[1], 2))
def f(ps, n):
def g(ns, i):
if i == len(ps):
return 0
# start a new entity if we haven't run out
best = d((0,0), ps[i]) + g(ns[:] + [i], i + 1) if len(ns) < n else float('inf')
# try the current point as the next visit for each entity
for entity_idx, point_idx in enumerate(ns):
_ns = ns[:]
_ns[entity_idx] = i
best = min(best, d(ps[point_idx], ps[i]) + g(_ns, i + 1))
return best
return d((0,0), ps[0]) + g([0], 1)
Output:
"""
p5
p3
p1
(0,0) p4 p6
p2
"""
points = [(0,1), (0,-1), (0,2), (2,0), (0,3), (3,0)]
print f(points, 3) # 7.0
I'm playing a game that has a weapon-forging component, where you combine two weapons to get a new one. The sheer number of weapon combinations (see "6.1. Blade Combination Tables" at http://www.gamefaqs.com/ps/914326-vagrant-story/faqs/8485) makes it difficult to figure out what you can ultimately create out of your current weapons through repeated forging, so I tried writing a program that would do this for me. I give it a list of weapons that I currently have, such as:
francisca
tabarzin
kris
and it gives me the list of all weapons that I can forge:
ball mace
chamkaq
dirk
francisca
large crescent
throwing knife
The problem is that I'm using a brute-force algorithm that scales extremely poorly; it takes about 15 seconds to calculate all possible weapons for seven starting weapons, and a few minutes to calculate for eight starting weapons. I'd like it to be able to calculate up to 64 weapons (the maximum that you can hold at once), but I don't think I'd live long enough to see the results.
function find_possible_weapons(source_weapons)
{
for (i in source_weapons)
{
for (j in source_weapons)
{
if (i != j)
{
result_weapon = combine_weapons(source_weapons[i], source_weapons[j]);
new_weapons = array();
new_weapons.add(result_weapon);
for (k in source_weapons)
{
if (k != i && k != j)
new_weapons.add(source_weapons[k]);
}
find_possible_weapons(new_weapons);
}
}
}
}
In English: I attempt every combination of two weapons from my list of source weapons. For each of those combinations, I create a new list of all weapons that I'd have following that combination (that is, the newly-combined weapon plus all of the source weapons except the two that I combined), and then I repeat these steps for the new list.
Is there a better way to do this?
Note that combining weapons in the reverse order can change the result (Rapier + Firangi = Short Sword, but Firangi + Rapier = Spatha), so I can't skip those reversals in the j loop.
Edit: Here's a breakdown of the test example that I gave above, to show what the algorithm is doing. A line in brackets shows the result of a combination, and the following line is the new list of weapons that's created as a result:
francisca,tabarzin,kris
[francisca + tabarzin = chamkaq]
chamkaq,kris
[chamkaq + kris = large crescent]
large crescent
[kris + chamkaq = large crescent]
large crescent
[francisca + kris = dirk]
dirk,tabarzin
[dirk + tabarzin = francisca]
francisca
[tabarzin + dirk = francisca]
francisca
[tabarzin + francisca = chamkaq]
chamkaq,kris
[chamkaq + kris = large crescent]
large crescent
[kris + chamkaq = large crescent]
large crescent
[tabarzin + kris = throwing knife]
throwing knife,francisca
[throwing knife + francisca = ball mace]
ball mace
[francisca + throwing knife = ball mace]
ball mace
[kris + francisca = dirk]
dirk,tabarzin
[dirk + tabarzin = francisca]
francisca
[tabarzin + dirk = francisca]
francisca
[kris + tabarzin = throwing knife]
throwing knife,francisca
[throwing knife + francisca = ball mace]
ball mace
[francisca + throwing knife = ball mace]
ball mace
Also, note that duplicate items in a list of weapons are significant and can't be removed. For example, if I add a second kris to my list of starting weapons so that I have the following list:
francisca
tabarzin
kris
kris
then I'm able to forge the following items:
ball mace
battle axe
battle knife
chamkaq
dirk
francisca
kris
kudi
large crescent
scramasax
throwing knife
The addition of a duplicate kris allowed me to forge four new items that I couldn't before. It also increased the total number of forge tests to 252 for a four-item list, up from 27 for the three-item list.
Edit: I'm getting the feeling that solving this would require more math and computer science knowledge than I have, so I'm going to give up on it. It seemed like a simple enough problem at first, but then, so does the Travelling Salesman. I'm accepting David Eisenstat's answer since the suggestion of remembering and skipping duplicate item lists made such a huge difference in execution time and seems like it would be applicable to a lot of similar problems.
Start by memoizing the brute force solution, i.e., sort source_weapons, make it hashable (e.g., convert to a string by joining with commas), and look it up in a map of input/output pairs. If it isn't there, do the computation as normal and add the result to the map. This often results in big wins for little effort.
Alternatively, you could do a backward search. Given a multiset of weapons, form predecessors by replacing one of the weapon with two weapons that forge it, in all possible ways. Starting with the singleton list consisting of the singleton multiset consisting of the goal weapon, repeatedly expand the list by predecessors of list elements and then cull multisets that are supersets of others. Stop when you reach a fixed point.
If linear programming is an option, then there are systematic ways to prune search trees. In particular, let's make the problem easier by (i) allowing an infinite supply of "catalysts" (maybe not needed here?) (ii) allowing "fractional" forging, e.g., if X + Y => Z, then 0.5 X + 0.5 Y => 0.5 Z. Then there's an LP formulation as follows. For all i + j => k (i and j forge k), the variable x_{ijk} is the number of times this forge is performed.
minimize sum_{i, j => k} x_{ijk} (to prevent wasteful cycles)
for all i: sum_{j, k: j + k => i} x_{jki}
- sum_{j, k: j + i => k} x_{jik}
- sum_{j, k: i + j => k} x_{ijk} >= q_i,
for all i + j => k: x_{ijk} >= 0,
where q_i is 1 if i is the goal item, else minus the number of i initially available. There are efficient solvers for this easy version. Since the reactions are always 2 => 1, you can always recover a feasible forging schedule for an integer solution. Accordingly, I would recommend integer programming for this problem. The paragraph below may still be of interest.
I know shipping an LP solver may be inconvenient, so here's an insight that will let you do without. This LP is feasible if and only if its dual is bounded. Intuitively, the dual problem is to assign a "value" to each item such that, however you forge, the total value of your inventory does not increase. If the goal item is valued at more than the available inventory, then you can't forge it. You can use any method that you can think of to assign these values.
I think you are unlikely to get a good general answer to this question because
if there was an efficient algorithm to solve your problem, then it would also be able to solve NP-complete problems.
For example, consider the problem of finding the maximum number of independent rows in a binary matrix.
This is a known NP-complete problem (e.g. by showing equivalence to the maximum independent set problem).
We can reduce this problem to your question in the following manner:
We can start holding one weapon for each column in the binary matrix, and then we imagine each row describes an alternative way of making a new weapon (say a battle axe).
We construct the weapon translation table such that to make the battle axe using method i, we need all weapons j such that M[i,j] is equal to 1 (this may involve inventing some additional weapons).
Then we construct a series of super weapons which can be made by combining different numbers of our battle axes.
For example, the mega ultimate battle axe may require 4 battle axes to be combined.
If we are able to work out the best weapon that can be constructed from your starting weapons, then we have solved the problem of finding the maximum number of independent rows in the original binary matrix.
It's not a huge saving, however looking at the source document, there are times when combining weapons produces the same weapon as one that was combined. I assume that you won't want to do this as you'll end up with less weapons.
So if you added a check for if the result_weapon was the same type as one of the inputs, and didn't go ahead and recursively call find_possible_weapons(new_weapons), you'd trim the search down a little.
The other thing I could think of, is you are not keeping a track of work done, so if the return from find_possible_weapons(new_weapons) returns the same weapon that you already have got by combining other weapons, you might well be performing the same search branch multiple times.
e.g. if you have a, b, c, d, e, f, g, and if a + b = x, and c + d = x, then you algorithm will be performing two lots of comparing x against e, f, and g. So if you keep a track of what you've already computed, you'll be onto a winner...
Basically, you have to trim the search tree. There are loads of different techniques to do this: it's called search. If you want more advice, I'd recommend going to the computer science stack exchange.
If you are still struggling, then you could always start weighting items/resulting items, and only focus on doing the calculation on 'high gain' objects...
You might want to start by creating a Weapon[][] matrix, to show the results of forging each pair. You could map the name of the weapon to the index of the matrix axis, and lookup of the results of a weapon combination would occur in constant time.
I want to make a linear fit to few data points, as shown on the image. Since I know the intercept (in this case say 0.05), I want to fit only points which are in the linear region with this particular intercept. In this case it will be lets say points 5:22 (but not 22:30).
I'm looking for the simple algorithm to determine this optimal amount of points, based on... hmm, that's the question... R^2? Any Ideas how to do it?
I was thinking about probing R^2 for fits using points 1 to 2:30, 2 to 3:30, and so on, but I don't really know how to enclose it into clear and simple function. For fits with fixed intercept I'm using polyfit0 (http://www.mathworks.com/matlabcentral/fileexchange/272-polyfit0-m) . Thanks for any suggestions!
EDIT:
sample data:
intercept = 0.043;
x = 0.01:0.01:0.3;
y = [0.0530642513911393,0.0600786706929529,0.0673485248329648,0.0794662409166333,0.0895915873196170,0.103837395346484,0.107224784565365,0.120300492775786,0.126318699218730,0.141508831492330,0.147135757370947,0.161734674733680,0.170982455701681,0.191799936622712,0.192312642057298,0.204771365716483,0.222689541632988,0.242582251060963,0.252582727297656,0.267390860166283,0.282890010610515,0.292381165948577,0.307990544720676,0.314264952297699,0.332344368808024,0.355781519885611,0.373277721489254,0.387722683944356,0.413648156978284,0.446500064130389;];
What you have here is a rather difficult problem to find a general solution of.
One approach would be to compute all the slopes/intersects between all consecutive pairs of points, and then do cluster analysis on the intersepts:
slopes = diff(y)./diff(x);
intersepts = y(1:end-1) - slopes.*x(1:end-1);
idx = kmeans(intersepts, 3);
x([idx; 3] == 2) % the points with the intersepts closest to the linear one.
This requires the statistics toolbox (for kmeans). This is the best of all methods I tried, although the range of points found this way might have a few small holes in it; e.g., when the slopes of two points in the start and end range lie close to the slope of the line, these points will be detected as belonging to the line. This (and other factors) will require a bit more post-processing of the solution found this way.
Another approach (which I failed to construct successfully) is to do a linear fit in a loop, each time increasing the range of points from some point in the middle towards both of the endpoints, and see if the sum of the squared error remains small. This I gave up very quickly, because defining what "small" is is very subjective and must be done in some heuristic way.
I tried a more systematic and robust approach of the above:
function test
%% example data
slope = 2;
intercept = 1.5;
x = linspace(0.1, 5, 100).';
y = slope*x + intercept;
y(1:12) = log(x(1:12)) + y(12)-log(x(12));
y(74:100) = y(74:100) + (x(74:100)-x(74)).^8;
y = y + 0.2*randn(size(y));
%% simple algorithm
[X,fn] = fminsearch(#(ii)P(ii, x,y,intercept), [0.5 0.5])
[~,inds] = P(X, y,x,intercept)
end
function [C, inds] = P(ii, x,y,intercept)
% ii represents fraction of range from center to end,
% So ii lies between 0 and 1.
N = numel(x);
n = round(N/2);
ii = round(ii*n);
inds = min(max(1, n+(-ii(1):ii(2))), N);
% Solve linear system with fixed intercept
A = x(inds);
b = y(inds) - intercept;
% and return the sum of squared errors, divided by
% the number of points included in the set. This
% last step is required to prevent fminsearch from
% reducing the set to 1 point (= minimum possible
% squared error).
C = sum(((A\b)*A - b).^2)/numel(inds);
end
which only finds a rough approximation to the desired indices (12 and 74 in this example).
When fminsearch is run a few dozen times with random starting values (really just rand(1,2)), it gets more reliable, but I still wouln't bet my life on it.
If you have the statistics toolbox, use the kmeans option.
Depending on the number of data values, I would split the data into a relative small number of overlapping segments, and for each segment calculate the linear fit, or rather the 1-st order coefficient, (remember you know the intercept, which will be same for all segments).
Then, for each coefficient calculate the MSE between this hypothetical line and entire dataset, choosing the coefficient which yields the smallest MSE.
I am currently doing some calculations with trees. Each node has 5 values I am trying to calculate and a type deciding how these values are calculated. Some calculations can be pretty complicated algorithms. All calculations within a node depend solely on the values of its child nodes, so I am doing calculations from down to top. For each node type, a value depends on different values of the childnodes. I am interested mainly in the 5 values in the root node, which depend on all values in all other nodes ofc. All this is working just fine. A node can only have 1 or 2 childnodes, and the tree usually is no deeper than 5 levels.
For some node-types, there is a tolerance; meaning some values there would not matter, see this picture, I marked those with XX. Sometimes even, some values would be in relation, like C = XX * A. Currently, these values are just set to some default values. Sometimes there would be a complicated relationship even, like multiple possible solutions of an algorithm like Newton's Method, depending on starting values.
Now there is a rating I can apply on the values of the root node. What I would like is to optimize this rating by adjusting the XX-values deep within the tree. The calculations within each node can be a range of many possible formulas and the tolerance can be one of many possible patterns, so I cannot just figure out some formula but I would need some algorithm which is very flexible. I do not know of such an algorithm. Does anyone have an idea?
/Edit: To clarify, it is unclear how many values in the tree will be free. There is not just one XX, but there may be any number of them (I guess max. 10), so my first step would be identifying these values. Also, I will be doing this on many generated trees within a time window, so speed is not unimportant as well. Thanks:)
If you have 3 input values XX, YY, and ZZ, you are searching a 3 dimension space. What you are looking to do is to apply an optimisation algorithm, or Heuristic algorithm. Your choice of algorithm is key, a cost benefit between your time and the computer's time. I am guessing that you just want to do this once.
What ever method you use, you need to understand the problem, which means to understand how your algorithm changes with different input values. Some solutions have a very nice minimum that is easy to find (e.g. using Newton's Method), some don't.
I suggest starting simple. One of the most basic is just to do an iterative search. It's slow, but it works. You need to make sure that your iteration step is not too large, such that you don't miss some sweet spots.
For XX = XXmin to XXmax
For YY = YYmin to YYmax
For ZZ = ZZmin to ZZmax
result = GetRootNodeValue(XX, YY, ZZ)
If result < best_result then
print result, XX, YY, ZZ
best_result = result
End if
End For
End For
End For
Below is another method, it's a stochastic optimisation method (uses random points to converge on the best solution), it's results are reasonable for most conditions. I have used this successfully and it's good at converging to the minimum value. This is good to use if there is no clear global minimum. You will have to configure the parameters for your problem.
' How long to search for, a larger value will result in long search time
max_depth = 20000
' Initial values
x0 = initial XX value
y0 = initial YY value
z0 = initial ZZ value
' These are the delta values, how far should the default values range
dx = 5
dy = 5
dz = 5
' Set this at a large value (assuming the best result is a small number)
best_result = inf
' Loop for a long time
For i = 1 To max_depth
' New random values near the best result
xx = x0 + dx * (Rnd() - 0.5) * (Rnd() - 0.5)
yy = y0 + dy * (Rnd() - 0.5) * (Rnd() - 0.5)
zz = y0 + dy * (Rnd() - 0.5) * (Rnd() - 0.5)
' Do the test
result = GetRootNodeValue(xx, yy, zz)
' We have found the best solution so far
If result < best_result Then
x0 = xx
y0 = yy
z0 = zz
best_result = result
End If
Print progress
Next i
There are many optimisation algorithms to choose from. Above are some very simple ones, but they may not be the best for your problem.
As another answer has pointed out, this looks like a optimization problem. You may consider using a genetic algorithm. Basically, you try to mimic the evolution process by "mating" different individuals (in your case trees) with different traits (in your case the values on the leaves) and make them survive based on an objective function (in your case, what you obtain on the root node). The algorithm can be improved by adding mutations to your populations (as in nature's evolution).