I'm not able to understand how for the problem below the number of paths are (x+y)!/x!y! .. I understand it comes from choose X items out of a path of X+Y items, but why is it not choosing x items over x+y + choosing y items over x+y ? Why does it have to be only x ?
A robot is located at the top-left corner of a m x n grid (marked
‘Start’ in the diagram below). The robot can only move either down or
right at any point in time. The robot is trying to reach the
bottom-right corner of the grid (marked ‘Finish’ in the diagram
below). How many possible paths are there?
Are all of these paths unique ?
How do I determine that ?
And what would be the time complexity for the backtracking algorithm ?
This is somewhat based on Mukul Joshi's answer, but hopefully a little clearer.
To go from 0,0 to x,y, you need to move right exactly x times and down exactly y times.
Let each right movement be represented by a 0 and a down movement by a 1.
Let a string of 0s and 1s then indicate a path from 0,0 to x,y. This path will contain x 0s and y 1s.
Now we want to count all such strings. This is equivalent to counting the number of permutations of any string containing x 0s and y 1s. This string is a multiset (each element can appear more than once), thus we want a multiset permutation, which can be calculated as n!/(m1!m2!...mk!) where n is the total number of characters, k is the number of unique characters and mi is the number of times the ith unique character is repeated. Since there are x+y characters in total, and 0 is repeated x times and 1 is repeated y times, we get to (x+y)!/x!y!.
Time Complexity:
The time complexity of backtracking / brute force would involve having to explore all of these paths. Think of it as a tree, with there being (x+y)!/x!y! leaves. I might be wrong, but I think the number of nodes in trees with a branching factor > 1 can be represented as the big-O of the number of leaves, thus we end up with O((x+y)!/x!y!) nodes, and thus the same time complexity.
Ok, I give you a solution to that problem so that you have better time catching it.
First of all, let us decide a solution algorithm. We will count all possible paths for every cell to reach end from it. The algorithm will check cells and write there sum of right and bottom cells. We do it because robot can move down and follow any of bottom paths or move right and follow any of rightside paths, thus, adding the total number of different paths. It is quite obvious for me to prove the divercity of these paths. If you want I can do it in comments.
Initial values for cells will be 1 for rightmost bottom cell (finish) because there only 1 way to get there from this cell (not to move at all). And if cell doesn't exist (e.g. taking bottom cell for bottommost cell) it will have value of 0.
Building cell values one by one will result in a Pascal's Triangle which values are (x + y)! / x! / y! in a (x, y) cell where x is the Ox distance from finish and y is Oy one.
Talking about complexity we will have x * y iterations over grid cells, each iteration is a constant time. If you don't want to use backtracking algorith you can use the formula that is mentioned above and have O(x + y) instead of O(x * y)
Well here is the explanation.
To reach till the destination no matter how you go, the path has to have m rows and n columns.
Consider that you represent row by 1 and column by 0. Your path is a string of m+n characters. But it can have only m 1s and n 0s.
if you have m+n different characters the number of permutations will be (m+n)! but when you have repeating characters then it will be (m+n)!/m!n! Refer to this
Of course this will be unique. Test it for 4*3 grid and you can see it.
You don't add "How many ways can I distribute my X moves?" to "How many ways can I distribute my Y moves?" for two reasons:
The distribution of X moves and Y moves are not independent. For each configuration of X moves, there is only 1 possible configuration of Y moves.
If they were independent, you wouldn't add them, you would multiply them. For example, if I have X different color shirts and Y different color pants, there are X * Y different combinations of shirts and pants.
Note that for #1 there is nothing special about X - I could just have easily chosen Y and said: "The distribution of Y moves and X moves are not independent. For each configuration of Y moves, there is only 1 possible configuration of X moves." Which is why, as others have pointed out, counting the number of ways to distribute your Y moves gives the same result as counting the number of ways to distribute your X moves.
Related
Most of the implementations of the algorithm to find the closest pair of points in the plane that I've seen online have one of two deficiencies: either they fail to meet an O(nlogn) runtime, or they fail to accommodate the case where some points share an x-coordinate. Is a hash map (or equivalent) required to solve this problem optimally?
Roughly, the algorithm in question is (per CLRS Ch. 33.4):
For an array of points P, create additional arrays X and Y such that X contains all points in P, sorted by x-coordinate and Y contains all points in P, sorted by y-coordinate.
Divide the points in half - drop a vertical line so that you split X into two arrays, XL and XR, and divide Y similarly, so that YL contains all points left of the line and YR contains all points right of the line, both sorted by y-coordinate.
Make recursive calls for each half, passing XL and YL to one and XR and YR to the other, and finding the minimum distance, d in each of those halves.
Lastly, determine if there's a pair with one point on the left and one point on the right of the dividing line with distance smaller than d; through a geometric argument, we find that we can adopt the strategy of just searching through the next 7 points for every point within distance d of the dividing line, meaning the recombination of the divided subproblems is only an O(n) step (even if it looks n2 at first glance).
This has some tricky edge cases. One way people deal with this is sorting the strip of points of distance d from the dividing line at every recombination step (e.g. here), but this is known to result in an O(nlog2n) solution.
Another way people deal with edge cases is by assuming each point has a distinct x-coordinate (e.g. here): note the snippet in closestUtil which adds to Pyl (or YL as we call it) if the x-coordinate of a point in Y is <= the line, or to Pyr (YR) otherwise. Note that if all points lie on the same vertical line, this would result us writing past the end of the array in C++, as we write all n points to YL.
So the tricky bit when points can have the same x-coordinate is dividing the points in Y into YL and YR depending on whether a point p in Y is in XL or XR. The pseudocode in CLRS for this is (edited slightly for brevity):
for i = 1 to Y.length
if Y[i] in X_L
Y_L.length = Y_L.length + 1;
Y_L[Y_L.length] = Y[i]
else Y_R.length = Y_R.length + 1;
Y_R[Y_R.length] = Y[i]
However, absent of pseudocode, if we're working with plain arrays, we don't have a magic function that can determine whether Y[i] is in X_L in O(1) time. If we're assured that all x-coordinates are distinct, sure - we know that anything with an x-coordinate less than the dividing line is in XL, so with one comparison we know what array to partition any point p in Y into. But in the case where x-coordinates are not necessarily distinct (e.g. in the case where they all lie on the same vertical line), do we require a hash map to determine whether a point in Y is in XL or XR and successfully break down Y into YL and YR in O(n) time? Or is there another strategy?
Yes, there are at least two approaches that work here.
The first, as Bing Wang suggests, is to apply a rotation. If the angle is sufficiently small, this amounts to breaking ties by y coordinate after comparing by x, no other math needed.
The second is to adjust the algorithm on G4G to use a linear-time partitioning algorithm to divide the instance, and a linear-time sorted merge to conquer it. Presumably this was not done because the author valued the simplicity of sorting relative to the previously mentioned algorithms in most programming languages.
Tardos & Kleinberg suggests annotating each point with its position (index) in X.
You could do this in N time, or, if you really, really want to, you could do it "for free" in the sorting operation.
With this annotation, you could do your O(1) partitioning, and then take the position pr of the right-most point in Xl in O(1), using it to determine weather a point in Y goes in Yl (position <= pr), or Yr (position > pr). This does not require an extra data structure like a hash map, but it does require that those same positions are used in X and Y.
NB:
It is not immediately obvious to me that the partitioning of Y is the only problem that arises when multiple points have the same coordinate on the x-axis. It seems to me that the proof of linearity of the comparisons neccesary across partitions breaks, but I have seen only the proof that you need only 15 comparisons, not the proof for the stricter 7-point version, so i cannot be sure.
Given this question, what about the special case when the start point and end point are the same?
Another change in my case is that we must move at every step. How many such paths can be found and what would be the most efficient approach? I guess this would be a random walk of some sort?
My think so far is, since we must always return to our starting point, thinking about n/2 might be easier. At every step, except at step n/2, we have 6 choices. At n/2 we have a different amount of choices depending on if n is even or odd. We also have a different amount of choices depending on where we are (what previous choices we made). For example if n is even and we went straight out, we only have one choice at n/2, going back. But if n is even and we didn't go straight out, we have more choices.
It is all the cases at this turning point that I have trouble getting straight.
Am I on the right track?
To be clear, I just want to count the paths. So I guess we are looking for some conditioned permutation?
This version of the combinatorial problem looks like it actually has a short formula as an answer.
Nevertheless, the general version, both this and the original question's, can be solved by dynamic programming in O (n^3) time and O (n^2) memory.
Consider a hexagonal grid which spans at least n steps in all directions from the target cell.
Introduce a coordinate system, so that every cell has coordinates of the form (x, y).
Let f (k, x, y) be the number of ways to arrive at cell (x, y) from the starting cell after making exactly k steps.
These can be computed either recursively or iteratively:
f (k, x, y) is just the sum of f (k-1, x', y') for the six neighboring cells (x', y').
The base case is f (0, xs, ys) = 1 for the starting cell (xs, ys), and f (0, x, y) = 0 for every other cell (x, y).
The answer for your particular problem is the value f (n, xs, ys).
The general structure of an iterative solution is as follows:
let f be an array [0..n] [-n-1..n+1] [-n-1..n+1] (all inclusive) of integers
f[0][*][*] = 0
f[0][xs][ys] = 1
for k = 1, 2, ..., n:
for x = -n, ..., n:
for y = -n, ..., n:
f[k][x][y] =
f[k-1][x-1][y] +
f[k-1][x][y-1] +
f[k-1][x+1][y] +
f[k-1][x][y+1]
answer = f[n][xs][ys]
OK, I cheated here: the solution above is for a rectangular grid, where the cell (x, y) has four neighbors.
The six neighbors of a hexagon depend on how exactly we introduce a coordinate system.
I'd prefer other coordinate systems than the one in the original question.
This link gives an overview of the possibilities, and here is a short summary of that page on StackExchange, to protect against link rot.
My personal preference would be axial coordinates.
Note that, if we allow standing still instead of moving to one of the neighbors, that just adds one more term, f[k-1][x][y], to the formula.
The same goes for using triangular, rectangular, or hexagonal grid, for using 4 or 8 or some other subset of neighbors in a grid, and so on.
If you want to arrive to some other target cell (xt, yt), that is also covered: the answer is the value f[n][xt][yt].
Similarly, if you have multiple start or target cells, and you can start and finish at any of them, just alter the base case or sum the answers in the cells.
The general layout of the solution remains the same.
This obviously works in n * (2n+1) * (2n+1) * number-of-neighbors, which is O(n^3) for any constant number of neighbors (4 or 6 or 8...) a cell may have in our particular problem.
Finally, note that, at step k of the main loop, we need only two layers of the array f: f[k-1] is the source layer, and f[k] is the target layer.
So, instead of storing all layers for the whole time, we can store just two layers, as we don't need more: one for odd k and one for even k.
Using only two layers is as simple as changing all f[k] and f[k-1] to f[k%2] and f[(k-1)%2], respectively.
This lowers the memory requirement from O(n^3) down to O(n^2), as advertised in the beginning.
For a more mathematical solution, here are some steps that would perhaps lead to one.
First, consider the following problem: what is the number of ways to go from (xs, ys) to (xt, yt) in n steps, each step moving one square north, west, south, or east?
To arrive from x = xs to x = xt, we need H = |xt - xs| steps in the right direction (without loss of generality, let it be east).
Similarly, we need V = |yt - ys| steps in another right direction to get to the desired y coordinate (let it be south).
We are left with k = n - H - V "free" steps, which can be split arbitrarily into pairs of north-south steps and pairs of east-west steps.
Obviously, if k is odd or negative, the answer is zero.
So, for each possible split k = 2h + 2v of "free" steps into horizontal and vertical steps, what we have to do is construct a path of H+h steps east, h steps west, V+v steps south, and v steps north. These steps can be done in any order.
The number of such sequences is a multinomial coefficient, and is equal to n! / (H+h)! / h! / (V+v)! / v!.
To finally get the answer, just sum these over all possible h and v such that k = 2h + 2v.
This solution calculates the answer in O(n) if we precalculate the factorials, also in O(n), and consider all arithmetic operations to take O(1) time.
For a hexagonal grid, a complicating feature is that there is no such clear separation into horizontal and vertical steps.
Still, given the starting cell and the number of steps in each of the six directions, we can find the final cell, regardless of the order of these steps.
So, a solution can go as follows:
Enumerate all possible partitions of n into six summands a1, ..., a6.
For each such partition, find the final cell.
For each partition where the final cell is the cell we want, add multinomial coefficient n! / a1! / ... / a6! to the answer.
Just so, this takes O(n^6) time and O(1) memory.
By carefully studying the relations between different directions on a hexagonal grid, perhaps we can actually consider only the partitions which arrive at the target cell, and completely ignore all other partitions.
If so, this solution can be optimized into at least some O(n^3) or O(n^2) time, maybe further with decent algebraic skills.
I am trying to create a grid with n separate labels, where each cell is labelled with one of the n labels such that all labels neighbour (edge-wise) all other labels somewhere in the grid (I don't care where). Labels are free to appear as many times as necessary, and I'd like the grid to be as small as possible. As an example, here's a grid for five labels, 1 to 5:
3 2 4
5 1 3
2 4 5
While generating this by hand is not too bad for small numbers of labels, it appears to be very hard to generate a grid of reasonable size for larger numbers and so I'm looking to write a program to generate them, without having to resort to a brute-force search. I imagine this must have been investigated before, but the closest I've found are De Bruijn tori, which are not quite what I'm looking for. Any help would be appreciated.
EDIT: Thanks to Benawii for the following improved description:
"Given an integer n, generate the smallest possible matrix where for every pair (x,y) where x≠y and x,y ∈ {1,...,n} there exists a pair of adjacent cells in the matrix whose values are x and y."
You can experiment with a simple greedy algorithm.
I don't think that I'm able to give you a strict mathematical prove, at least because the question is not strictly defined, but the algorithm is quite intuitive.
First, if you have 1...K numbers (K labels) then you need at least K*(K-1)/2 adjacent cells (connections) for full coverage. A matrix of size NxM generates (N-1)*M+(M-1)*N=2*N*M-(N+M) connections.
Since you didn't mention what you understand under 'smallest matrix', let's assume that you meant the area. In that case it is obvious that for the given area the square matrix will generate bigger number of connections because it has more 'inner' cells adjacent to 4 others. For example, for area 16 the matrix 4x4 is better than 2x8. 'Better' is intuitive - more connections and more chances to reach the goal. So lets use target square matrixes and expand them if needed. The above formula will become 2*N*(N-1).
Then we can experiment with the following greedy algorithm:
For input number K find the N such that 2*N*(N-1)>K*(K-1)/2. A simple school equation.
Keep an adjacency matrix M, set M[i][i]=1 for all i, and 0 for the rest of the pairs.
Initialize a resulting matrix R of size NxN, fill with 'empty value' markers, for example with -1.
Start from top-left corner and iterate right-down:
for (int i = 0; i < N; ++i)
for (int j = 0; j < N; ++j)
R[i][j];
for each such R[i][j] (which is -1 now) find such a value which will 'fit best'. Again, 'fit best' is an intuitive definition, here we understand such a value that will contribute to a new unused connection. For that reason create the set of already filled cell neighbor numbers - S, its size is 2 at most (upper and left neighbor). Then find first k such that M[x][k]=0 for both numbers x in S. If no such number then try at least one new connection, if no number even for one then both neighbors are completely covered, put some number from uncovered here - probably the one in 'worst situation' - such x where Sum(M[x][i]) is the smallest. You should also choose the one in 'worst situation' when there are several ones to choose from in any case.
After setting the value for R[i][j] don't forget to mark the new connections with numbers x from S - M[R[i][j]][x] = M[x][R[i][j]] = 1.
If the matrix is filled and there are still unmarked connections in M then append another row to the matrix and continue. If all the connections are found before the end then remove extra rows.
You can check this algorithm and see what will happen. Step 5 is the place for playing around, particularly in guessing which one to choose in equal situation (several numbers can be in equally 'worst situation').
Example:
for K=6 we need 15 connections:
N=4, we need 4x4 square matrix. The theory says that 4x3 matrix has 17 connections, so it can possibly fit, but we will try 4x4.
Here is the output of the algorithm above:
1234
5615
2413
36**
I'm not sure if you can do by 4x3, maybe yes... :)
Imagine you have a dancing robot in n-dimensional euclidean space starting at origin P_0 = (0,0,...,0).
The robot can make m types of dance moves D_1, D_2, ..., D_m
D_i is an n-vector of integers (D_i_1, D_i_2, ..., D_i_n)
If the robot makes dance move i than its position changes by D_i:
P_{t+1} = P_t + D_i
The robot can make any of the dance moves as many times as he wants and in any order.
Let a k-dance be defined as a sequence of k dance moves.
Clearly there are m^k possible k-dances.
We are interested to know the set of possible end positions of a k-dance, and for each end position, how many k-dances end at that location.
One way to do this is as follows:
P0 = (0, 0, ..., 0);
S[0][P0] = 1
for I in 1 to k
for J in 1 to m
for P in S[I-1]
S[I][P + D_J] += S[I][P]
Now S[k][Q] will tell you how many k-dances end at position Q
Assume that n, m, |D_i| are small (less than 5) and k is less than 40.
Is there a faster way? Can we calculate S[k][Q] "directly" somehow with some sort of linear algebra related trick? or some other approach?
You could create an adjacency matrix that would contain dance-move transitions in your space (the part of it that's reachable in k moves, otherwise it would be infinite). Then, the P_0 row of n-th power of this matrix contains the S[k] values.
The matrix in question quickly gets enormous, something like (k*(max(D_i_j)-min(D_i_j)))^n (every dimension can be halved if Q is close to origin), but that's true for your S matrix as well
Since dance moves are interchangable you can assume that for a i < j the robot first makes all the D_i moves before the D_j moves, thus reducing the number of combinations to actually calculate.
If you keep track of the number of times each dance move was made calculating the total number of combinations should be easy.
Since the 1-dimensional problem is closely related to the subset sum problem, you could probably take a similar approach - find all of the combinations of dance vectors that add together to have the correct first coordinate with exactly k moves; then take that subset of combinations and check to see which of those have the right sum for the second, and take the subset which matches both and check it for the third, and so on.
In this way, you get to at least only have to perform a very simple addition for the extremely painful O(n^k) step. It will indeed find all of the vectors which will hit a given value.
Given a 2-D array starting at (0,0) and proceeding to infinity in positive x and y axes. Given a number k>0 , find the number of cells reachable from (0,0) such that at every moment -> sum of digits of x+ sum of digits of y <=k . Moves can be up, down ,left or right. given x,y>=0 . Dfs gives answers but not sufficient for large values of k. anyone can help me with a better algorithm for this?
I think they asked you to calculate the number of cells (x,y) reachable with k>=x+y. If x=1 for example, then y can take any number between 0 and k-1 and the sum would be <=k. The total number of possibilities can be calculated by
sum(sum(1,y=0..k-x),x=0..k) = 1/2*k²+3/2*k+1
That should be able to do the trick for large k.
I am somewhat confused by the "digits" in your question. The digits make up the index like 3 times 9 makes 999. The sum of digits for the cell (999,888) would be 51. If you would allow the sum of digits to be 10^9 then you could potentially have 10^8 digits for an index, resulting something around 10^(10^8) entries, well beyond normal sizes for a table. I am therefore assuming my first interpretation. If that's not correct, then could you explain it a bit more?
EDIT:
okay, so my answer is not going to solve it. I'm afraid I don't see a nice formula or answer. I would approach it as a coloring/marking problem and mark all valid cells, then use some other technique to make sure all the parts are connected/to count them.
I have tried to come up with something but it's too messy. Basically I would try and mark large parts at once based on the index and k. If k=20, you can mark the cell range (0,0..299) at once (as any lower index will have a lower index sum) and continue to check the rest of the range. I start with 299 by fixing the 2 last digits to their maximum value and look for the max value for the first digit. Then continue that process for the remaining hundreds (300-999) and only fix the last digit to end up with 300..389 and 390..398. However, you can already see that it's a mess... (nevertheless i wanted to give it to you, you might get some better idea)
Another thing you can see immediately is that you problem is symmetric in index so any valid cell (x,y) tells you there's another valid cell (y,x). In a marking scheme / dfs/ bfs this can be exploited.