Big O notation and branching factor - performance

Lets say that you are trying to figure out what the best path to take is. You have z number of possible moves and can make x number of moves at the same time. You always do x number of moves at once, no more or less. How can you figure out the branching factor in terms of x and z?

the branching factor in this example is 1 - the size of the problem is not increasing - you had x options to start with, you followed them all and you have the same number of available moves. You appear to be effectively taking 1 step down each of x straight lines at once. no branching is occurring unless i have misunderstood your question (whcih is possible, cause i don't see what z has to do with it)

If you are generating x new states (one for each move valid move you can make) at every node then the branching factor is x if x is always less than z. If z is always less than x then the branching factor is z (as you can only make valid moves).

Related

Robot moving in a grid algorithm possible paths and time complexity ?

I'm not able to understand how for the problem below the number of paths are (x+y)!/x!y! .. I understand it comes from choose X items out of a path of X+Y items, but why is it not choosing x items over x+y + choosing y items over x+y ? Why does it have to be only x ?
A robot is located at the top-left corner of a m x n grid (marked
‘Start’ in the diagram below). The robot can only move either down or
right at any point in time. The robot is trying to reach the
bottom-right corner of the grid (marked ‘Finish’ in the diagram
below). How many possible paths are there?
Are all of these paths unique ?
How do I determine that ?
And what would be the time complexity for the backtracking algorithm ?
This is somewhat based on Mukul Joshi's answer, but hopefully a little clearer.
To go from 0,0 to x,y, you need to move right exactly x times and down exactly y times.
Let each right movement be represented by a 0 and a down movement by a 1.
Let a string of 0s and 1s then indicate a path from 0,0 to x,y. This path will contain x 0s and y 1s.
Now we want to count all such strings. This is equivalent to counting the number of permutations of any string containing x 0s and y 1s. This string is a multiset (each element can appear more than once), thus we want a multiset permutation, which can be calculated as n!/(m1!m2!...mk!) where n is the total number of characters, k is the number of unique characters and mi is the number of times the ith unique character is repeated. Since there are x+y characters in total, and 0 is repeated x times and 1 is repeated y times, we get to (x+y)!/x!y!.
Time Complexity:
The time complexity of backtracking / brute force would involve having to explore all of these paths. Think of it as a tree, with there being (x+y)!/x!y! leaves. I might be wrong, but I think the number of nodes in trees with a branching factor > 1 can be represented as the big-O of the number of leaves, thus we end up with O((x+y)!/x!y!) nodes, and thus the same time complexity.
Ok, I give you a solution to that problem so that you have better time catching it.
First of all, let us decide a solution algorithm. We will count all possible paths for every cell to reach end from it. The algorithm will check cells and write there sum of right and bottom cells. We do it because robot can move down and follow any of bottom paths or move right and follow any of rightside paths, thus, adding the total number of different paths. It is quite obvious for me to prove the divercity of these paths. If you want I can do it in comments.
Initial values for cells will be 1 for rightmost bottom cell (finish) because there only 1 way to get there from this cell (not to move at all). And if cell doesn't exist (e.g. taking bottom cell for bottommost cell) it will have value of 0.
Building cell values one by one will result in a Pascal's Triangle which values are (x + y)! / x! / y! in a (x, y) cell where x is the Ox distance from finish and y is Oy one.
Talking about complexity we will have x * y iterations over grid cells, each iteration is a constant time. If you don't want to use backtracking algorith you can use the formula that is mentioned above and have O(x + y) instead of O(x * y)
Well here is the explanation.
To reach till the destination no matter how you go, the path has to have m rows and n columns.
Consider that you represent row by 1 and column by 0. Your path is a string of m+n characters. But it can have only m 1s and n 0s.
if you have m+n different characters the number of permutations will be (m+n)! but when you have repeating characters then it will be (m+n)!/m!n! Refer to this
Of course this will be unique. Test it for 4*3 grid and you can see it.
You don't add "How many ways can I distribute my X moves?" to "How many ways can I distribute my Y moves?" for two reasons:
The distribution of X moves and Y moves are not independent. For each configuration of X moves, there is only 1 possible configuration of Y moves.
If they were independent, you wouldn't add them, you would multiply them. For example, if I have X different color shirts and Y different color pants, there are X * Y different combinations of shirts and pants.
Note that for #1 there is nothing special about X - I could just have easily chosen Y and said: "The distribution of Y moves and X moves are not independent. For each configuration of Y moves, there is only 1 possible configuration of X moves." Which is why, as others have pointed out, counting the number of ways to distribute your Y moves gives the same result as counting the number of ways to distribute your X moves.

Approximation of a common divisor closest to some value?

Say we have two numbers (not necessarily integers) x1 and x2. Say, the user inputs a number y. What I want to find, is a number y' close to y so that x1 % y' and x2 % y' are very small (smaller than 0.02, for example, but lets call this number LIMIT). In other words, I don't need an optimal algorithm, but a good approximation.
I thank you all for your time and effort, that's really kind!
Let me explain what the problem is in my application : say, a screen size is given, with a width of screenWidth and a height of screenHeight (in pixels). I fill the screen with squares of a length y'. Say, the user wants the square size to be y. If y is not a divisor of screenWidth and/or screenHeight, there will be non-used space at the sides of the screen, not big enough to fit squares. If that non-used space is small (e.g. one row of pixels), it's not that bad, but if it's not, it won't look good. How can I find common divisors of screenWidth and screenHeight?
I don't see how you can ensure that x1%y' and x2%y' are both below some value - if x1 is prime, nothing is going to be below your limit (if the limit is below 1) except x1 (or very close) and 1.
So the only answer that always works is the trivial y'=1.
If you are permitting non-integer divisors, then just pick y'=1/(x1*x2), since the remainder is always 0.
Without restricting the common divisor to integers, it can be anything, and the whole 'greatest common divisor' concept goes out the window.
x1 and x2 are not very large, so a simple brute force algorithm should be good enough.
Divide x1 and x2 to y and compute floor and ceiling of the results. This gives four numbers: x1f, x1c, y1f, y1c.
Choose one of these numbers, closest to the exact value of x1/y (for x1f, x1c) or x2/y (for y1f, y1c). Let it be x1f, for example. Set y' = x1/x1f. If both x1%y' and y1%y' are not greater than limit, success (y' is the best approximation). Otherwise add x1f - 1 to the pool of four numbers (or y1f - 1, or x1c + 1, or y1c + 1), choose another closest number and repeat.
You want to fit the maximum amount of evenly spaced squares inside a fixed area. It's possible to find the optimal solution for your problem with some simple math.
Lets say you have a region with width = W and height = H, and you are trying to fit squares with sides of length = x. The maximum number of squares horizontaly and verticaly, that I will call max_hor and max_vert respectively, are max_hor=floor(W/x) and max_vert=floor(H/x) . If you draw all the squares side by side, without any spacement, there will be a rest in each line and each column. Lets call the horizontal/vertical rests respectively by rest_w and rest_h. This case is illustrated in the figure below:
Note that rest_w=W-max_horx and rest_h=H-max_vertx.
What you want is divide rest_w and rest_h equaly, generating small horizontal and vertical spaces of sizes space_w and space_h like the figure below:
Note that space_w=rest_w/(max_hor+1) and space_h=rest_h/(max_vert+1).
Is that the number you are looking for?
I believe I made a mistake, but I don't see why. Based on Phil H's answer, I decided to restrict to integer values, but multiply x1 and x2 by a power of 10. Afterwards, I'd divide the common integer divisors by that number.
Online, I found a common factors calculator. Experimenting with it made me realize it wouldn't give me any common divisors... I tried multiple cases (x1 = 878000 and x2 = 1440000 and some others), and none of them had good results.
In other words, you probably have to multiply with very high numbers to achieve results, but that would make the calculation very, very slow.
If anyone has a solution to this problem, that would be awesome. For now though, I decided to take advantage of the fact that screenWidth and screenHeight are good numbers to work with, since they are the dimension of a computer screen. 900 and 1440 have more than enough common divisors, so I can work with that...
Thank you all for your answers on this thread and on my previous thread about an optimal algorithm for this problem.

Cycle detection in a linked list : Exhaustive theory

This is NOT the problem about detecting cycle in a linked list using the famous Hare and Tortoise method.
In the Hare & Tortoise method we have pointers running in 1x and 2x speeds to determine that they meet and I am convinced that its the most efficient way and the order of that type of search is O(n).
The problem is I have to come up with a proof (proving or disproving) that it is possible that two pointers will always meet when the moving speed is Ax (A times x) and Bx (B times x) and A is not equal to B. Where A an B are two random integers operating on a linked list with a cycle present.
This was asked in one of interviews I recently attended and I was not able to prove it comprehensively to myself that whether the above is possible. Any help appreciated.
Suppose there is a loop, say of length L.
Easy case first
To make it easier, first consider the case where the two particles entire loop at the same time. These particles are at the same position whenever n*A = n*B (mod L) for some positive integer n, which is the number of steps until they meet again. Taking n=L gives one solution (though there may be a smaller solution). So after L units of time, particle A has made A trips around the loop to be back at the beginning and particle B has made B trips around the loop to be back at the beginning, where they happily collide.
General Case
Now what happens when they do not enter the loop at the same time? Let A be the slower particle, i.e. A<B, and suppose A enters the loop at time m, and let's call the position at which A enters the loop 0 (since they're in the loop, they can never leave it, so I'm just renaming positions by subtracting A*m, the distance A has traveled after m time units). Then, at that time, B is already at position m*(B-A) (it's real position after m time units is B*m and it's renamed position is therefore B*m-A*m). Then we need to show that there is a time n such that n*A = n*B+m*(B-A) (mod L). That is, we need a solution to the modular equation
(n+m) * (A-B) = 0 (mod L)
Taking n = k*L-m for k large enough that k*L>m does the trick, though again, there may be a smaller solution.
Therefore, yes, they always meet.
If your two step-sizes have a common factor x: let's say the step sizes are Ax and Bx, then just consider the sequence you get from taking the original sequence and taking every x'th element. This new sequence has a cycle if and only if the original sequence does, and taking steps of size A and B on it is equivalent to taking steps of size Ax and Bx on the original sequence.
This reduction means that it's sufficient to prove that the algorithm works when A and B are coprime.
The hypothesis is false. For instance, if both pointers make leaps of an even size, the loop is also of even size, and distance between the pointers is odd, they will never meet.
UPD this is apparently an impossible situation. Because the two pointers start at the same point, the distance between them will always be even.

Minimize a function

Suppose you are given a function of a single variable and arguments a and b and are asked to find the minimum value that the function takes on the interval [a, b]. (You can assume that the argument is a double, though in my application I may need to use an arbitrary-precision library.)
In general this is a hard problem because functions can be weird. A simple version of this problem would be to minimize the function assuming that it is continuous (no gaps or jumps) and single-peaked (there is a unique minimum; to the left of the minimum the function is decreasing and to the right it is increasing). Is there a good way to solve this easier (but perhaps not easy!) problem?
Assume that the function may be difficult to calculate but not particularly expensive to store an answer that you've computed. (Obviously, it's better if you don't have to make giant arrays of key/value pairs.)
Bonus points for good ideas on improving the algorithm in the fortunate case in which it's nice (e.g.: derivative exists, function is smooth/analytic, derivative can be computed in closed form, derivative can be computed at no cost when the function is evaluated).
The version you describe, with a single minimum, is easy to solve.
The idea is this. Suppose that I have 3 points with a < b < c and f(b) < f(a) and f(b) < f(c). Then the true minimum is between a and c. Furthermore if I pick another point d somewhere in the interval, then I can throw away one of a or d and still have an interval with the true minimum in the middle. My approximations will improve exponentially quickly as I do more iterations.
We don't quite start with this. We start with 2 points, a and b, and know that the answer is somewhere in the middle. Take the mid-point. If f there is below the end points, we're into the case I discussed above. Otherwise it must be below one of the end points, and above the other. We can throw away the higher end point and repeat.
If the function is nice, i.e., single-peaked and strictly monotonic (i.e., strictly decreasing to the left of the minimum and strictly increasing to the right), then you can find the minimum with binary search:
Set x = (b-a)/2
test whether x is to the right of the minimum or to the left
if x is left of the minimum:b = x
if x is right of the minimum:a = x
repeat from start until you get bored
the minimum is at x
To test whether x is left/right of the minimum, invent a small value epsilon and check whether f(x - epsilon) < f(x + epsilon). If it is, the minimum is to the left, otherwise it's to the right. By "until you get bored", I mean: invent another small value delta and stop if fabs(f(x - epsilon) - f(x + epsilon)) < delta.
Note that in the general case where you don't know anything about the behavior of a function f, it's not possible to decide a non-trivial property of f. Well, unless you're willing to try all possible inputs. See Rice's Theorem for details.
The Boost project has an implementation of Brent's algorithm that may be useful.
It seems to assume that the function is continuous, and has no maxima (only a minimum) in the input interval.
Not a direct answer but a pointer to more reading:
scipy.optimize: http://docs.scipy.org/doc/scipy/reference/optimize.html
section e04 of naglib: http://www.nag.co.uk/numeric/cl/nagdoc_cl09/html/genint/libconts.html
For the special case where the function is differentiable twice (and the two derivatives can be calculated easily), one can use Newton's method for optimization, i.e. essentially finding the roots of the first derivative (which is a necessary condition for the minimum).
Concerning the general case, note that the extreme case of 'weird' is a function which is continuous nowhere and for which it is very hard if not impossible to find the minimum (in finite time). So I guess you should try to make at least some assumptions about the function you are trying to minimize.
What you want is to optimize an Unimodal function. The correct algorithm is similar to btilly's but you need extra points.
Take 4 points a < b < c < d.
We want to minimize f in [a,d].
If f(b) < f(c) we know the minimum is in [a, c]
If f(b) > f(c) " " " " is in [b, d]
This can give an algorithm by itself, but there is a nice trick involving the golden ratio that allows you to reuse the intermediate values (in a way you only need to compute f once per iteration instead of twice)
If you have an expression for the function, there are global optimization algorithms based on interval analysis.

Which algorithm will be required to do this?

I have data of this form:
for x=1, y is one of {1,4,6,7,9,18,16,19}
for x=2, y is one of {1,5,7,4}
for x=3, y is one of {2,6,4,8,2}
....
for x=100, y is one of {2,7,89,4,5}
Only one of the values in each set is the correct value, the rest is random noise.
I know that the correct values describe a sinusoid function whose parameters are unknown. How can I find the correct combination of values, one from each set?
I am looking something like "travelling salesman"combinatorial optimization algorithm
You're trying to do curve fitting, for which there are several algorithms depending on the type of curve you want to fit your curve to (linear, polynomial, etc.). I have no idea whether there is a specific algorithm for sinusoidal curves (Fourier approximations), but my first idea would be to use a polynomial fitting algorithm with a polynomial approximation of the sine.
I wonder whether you need to do this in the course of another larger program, or whether you are trying to do this task on its own. If so, then you'd be much better off using a statistical package, my preferred one being R. It allows you to import your data and fit curves and draw graphs in just a few lines, and you could also use R in batch-mode to call it from a script or even a program (this is what I tend to do).
It depends on what you mean by "exactly", and what you know beforehand. If you know the frequency w, and that the sinusoid is unbiased, you have an equation
a cos(w * x) + b sin(w * x)
with two (x,y) points at different x values you can find a and b, and then check the generated curve against all the other points. Choose the two x values with the smallest number of y observations and try it for all the y's. If there is a bias, i.e. your equation is
a cos(w * x) + b sin(w * x) + c
You need to look at three x values.
If you do not know the frequency, you can try the same technique, unfortunately the solutions may not be unique, there may be more than one w that fits.
Edit As I understand your problem, you have a real y value for each x and a bunch of incorrect ones. You want to find the real values. The best way to do this is to fit curves through a small number of points and check to see if the curve fits some y value in the other sets.
If not all the x values have valid y values then the same technique applies, but you need to look at a much larger set of pairs, triples or quadruples (essentially every pair, triple, or quad of points with different y values)
If your problem is something else, and I suspect it is, please specify it.
Define sinusoid. Most people take that to mean a function of the form a cos(w * x) + b sin(w * x) + c. If you mean something different, specify it.
2 Specify exactly what success looks like. An example with say 10 points instead of 100 would be nice.
It is extremely unclear what this has to do with combinatorial optimization.
Sinusoidal equations are so general that if you take any random value of all y's these values can be fitted in sinusoidal function unless you give conditions eg. Frequency<100 or all parameters are integers,its not possible to diffrentiate noise and data theorotically so work on finding such conditions from your data source/experiment first.
By sinusoidal, do you mean a function that is increasing for n steps, then decreasing for n steps, etc.? If so, you you can model your data as a sequence of nodes connected by up-links and down-links. For each node (possible value of y), record the length and end-value of chains of only ascending or descending links (there will be multiple chain per node). Then you scan for consecutive runs of equal length and opposite direction, modulo some initial offset.

Resources