Longest Increasing Subsequence variations dynamic programming - algorithm

I have this question:
Given the following:
A = [9,6,9,3,8,9,2,0,4,12]
C = [r,g,r,g,r,g,r,r,r,g]
Where
- r = red
- g = green
This list represent the color of the number in the same index in array A i.e. A[0] = 9 = red, A[1] = 6 = green, ...
We need to pick a number N to start, if the number is green we can only move right (by any numbers) to a number that are >=N greater than the current one.
If the number N is red, we can only move left (by any numbers) to a number that are >=N greater than the current one.
Objective: find the longest sequence of moves possible, return the indices of the path. If there are multiple subsequences of the same length that are longest, return anyone:
Example 1:
A = [9,6,9,3,8,9,2,0,4,12]
C = [r,g,r,g,r,g,r,r,r,g]
output: [7,6,3,8,1,4,0]
Example 2:
A = [1,2,3,4,5,6,7,10]
C =[r,r,r,r,r,r,r,r]
output:[7]
Example 3:
A = [5,3,2,0,24,9,20]
C = [g,g,g,g,r,r,g]
output: [0,5,4]
Current idea of my algorithm:
Consider possible moves for every element in A, for the first example, A[0] = 9 = red.
As there is no left elements, there is only 1 move (choose A[0]).
So, OPT[0] = 1. For A[1] = 6 = green.
Possible move are: A[2]= 9, A[4] = 8, A[5] = 9, A[9] =12.
Recursion is OPT[i] = max{1, 1+ OPT[j]} where j is the next possible move.
Am I on the right track using dynamic programming? The runtime is O(n²) isn't it?

Related

Finding a Hamiltonian path in Julia Language

I'm a beginner.
I am starting my studies in image processing and I am using the Julia Language.
I have tried until unsuccessfully to develop the Algorithm 1 of this article https://hal.archives-ouvertes.fr/hal-03330433/document, on page 3.
I noticed that there is a similar algorithm proposed in Matlab and in C on this site https://elad.cs.technion.ac.il/software/?pn=1427. The authors provided the algorithm and the article where these ideas were proposed.
I read and studied both, but I could not to do in Julia Language.
The algorithm proposed is
where, the graph G is a pair (V, E), with V the color values {v1, v2, ..., vm} and E the edge eij = (vi, vj).
I think that the constant alpha is 10^6. But I need to read the second paper again.
To show my doubts, I am placing a random RGB color image of size 6 x 5 (30 pixels), putting the color set T as a 3 x 30 matrix and putting the random probability vector as p.
After that I am generating the list L and choose j in L.
julia>
using Colors, FixedPointNumbers
using Images
using Statistics
using LinearAlgebra
img = rand(RGB{N0f8}, 6, 5) # random RGB color image of size 6 x 5
a, b = size(img)
T = reshape(channelview(img), 3, a*b) # matrix T of 3 x 30. Every column is a r, g, b color of img
p = rand(a*b) # random vector p of 30 elements
L = [i for i=1:30] #index list
j = rand(L) #choose a random j in L
#find the pixel (pix_i, pix_j) of the image
if j % a == 0
pix_i = a
pix_j = div(j, a)
else
pix_i = j % a
pix_j = 1 + div(j, a)
end
#The edges eij = (vj, vk)#
for i = pix_i - 1: pix_i+1
for j = pix_j -1 : pix_j + 1
N = CartesianIndex(i, j)
end
end
After that, I don't know how can I continue.
Here in that last line I already have a problem. Because I'm not able to build the set N(vj) with the indices.
In fact, I also don't know how to add to the end the vectors that we get at each iteration.
That is, my problem is in the construction of this set P and so on.
Regarding the N(vj) inside of the command "for" that the author proposes, I am considering a 3 x 3 square. So, for example, for a pixel at position (4, 3) in the first iteration (i.e., j = (4,3)), I would have for absolute value of N(vj) is 8, because in the first iteration, we will get the pixels (3, 2), (3,3), (3,4), (4,2), (4,3), (4,4), (5,2), (5,3) and (5,4).
So N(vj) = {(3, 2), (3,3), (3,4), (4,2), (4,3), (4,4), (5,2), (5,3),(5,4)} \ {(4,3)}.
So the absolute value is 8.
But, I can't do it.
If you can help me, I would be very grateful.

Number of ways to form a string from a matrix of characters with the optimal approach in terms of time complexity?

(UPDATED)
We need to find the number of ways a given string can be formed from a matrix of characters.
We can start forming the word from any position(i, j) in the matrix and can go in any unvisited direction from the 8 directions available across every cell(i, j) of the matrix, i.e
(i + 1, j)
(i + 1, j + 1)
(i + 1, j - 1)
(i - 1, j)
(i - 1, j + 1)
(i - 1, j - 1)
(i, j + 1)
(i, j - 1)
Sample test cases:
(1) input:
N = 3 (length of string)
string = "fit"
matrix: fitptoke
orliguek
ifefunef
tforitis
output: 7
(2) input:
N = 5 (length of string)
string = "pifit"
matrix: qiq
tpf
pip
rpr
output: 5
Explanation:
num of ways to make 'fit' are as given below:
(0,0)(0,1)(0,2)
(2,1)(2,0)(3,0)
(2,3)(1,3)(0,4)
(3,1)(2,0)(3,0)
(2,3)(3,4)(3,5)
(2,7)(3,6)(3,5)
(2,3)(1,3)(0,2)
I approach the solution as a naive way, go to every possible position (i,j) in the matrix and start forming the string from that cell (i, j) by performing DFS search on the matrix and add the number of ways to form the given string from that pos (i, j) to total_num_ways variable.
pseudocode:
W = 0
for i : 0 - n:
for j: 0 - m:
visited[n][m] = {false}
W += DFS(i, j, 0, str, matrix, visited);
But it turns out that this solution would be exponential in time complexity as we are going to every possible n * m position and then traversing to every possible k(length of the string) length path to form the string.
How can we improve the solution efficiency?
Suggestion - 1: Preprocessing the matrix and the input string
We are only concerned about a cell of the matrix if the character in the cell appears anywhere in the input string. So, we aren't concerned about a cell containing the alphabet 'z' if our input string is 'fit'.
Using that, following is a suggestion.
Taking the input string, first put its characters in a set S. It is an O(k) step, where k is the length of the string;
Next we iterate over the matrix (a O(m*n) step) and:
If the character in the cell does not appear in the S, we continue to the next one;
If the character in the cell appears, we add an entry of cell position in a map of > called M.
Now, iterating over the input (not the matrix), for each position where current char c appears, get the unvisited positions of the right, left, above and below of the current cell;
If any of these positions are present in the list of cells in M where the next character is present in the matrix, then:
Recursively go to the next character of the input string, until you have exhausted all the characters.
What is better in this solution? We are getting the next cell we need to explore in O(1) because it is already present in the map. As a result, the complexity is not exponential anymore, but it is actually O(c) where c is the total occurrences of the input string in the matrix.
Suggestion - 2: Dynamic Programming
DP helps in case where there is Optimal Substructure and Overlapping Subproblems. So, in situations where the same substring is a part of multiple solutions, using DP could help.
Ex: If we found 'fit' somewhere then if there is an 'f' in an adjacent cell, it could use the substring 'it' from the first 'fit' we found. This way we would prevent recursing down the rest of the string the moment we encounter a substring that was previously explored.
# Checking if the given (x,y) coordinates are within the boundaries
# of the matrix
def in_bounds(x, y, rows, cols):
return x >= 0 and x < rows and y >= 0 and y < cols
# Finding all possible moves from the current (x,y) position
def possible_moves(position, path_set, rows, cols):
moves = []
move_range = [-1,0,1]
for i in range(len(move_range)):
for j in range(len(move_range)):
x = position[0] + move_range[i]
y = position[1] + move_range[j]
if in_bounds(x,y,rows,cols):
if x in path_set:
if y in path_set[x]:
continue
moves.append((x,y))
return moves
# Deterimine which of the possible moves lead to the next letter
# of the goal string
def check_moves(goal_letter, candidates, search_space):
moves = []
for x, y in candidates:
if search_space[x][y] == goal_letter:
moves.append((x,y))
return moves
# Recursively expanding the paths of each starting coordinate
def search(goal, path, search_space, path_set, rows, cols):
# Base Case
if goal == '':
return [path]
x = path[-1][0]
y = path[-1][1]
if x in path_set:
path_set[x].add(y)
else:
path_set.update([(x,set([y]))])
results = []
moves = possible_moves(path[-1],path_set,rows,cols)
moves = check_moves(goal[0],moves,search_space)
for move in moves:
result = search(goal[1:], path + [move], search_space, path_set, rows, cols)
if result is not None:
results += result
return results
# Finding the coordinates in the matrix where the first letter from the goal
# string appears which is where all potential paths will begin from.
def find_paths(goal, search_space):
results = []
rows, cols = len(search_space), len(search_space[0])
# Finding starting coordinates for candidate paths
for i in range(len(search_space)):
for j in range(len(search_space[i])):
if search_space[i][j] == goal[0]:
# Expanding path from root letter
results += search(goal[1:],[(i,j)],search_space,dict(),rows,cols)
return results
goal = "fit"
matrix = [
'fitptoke',
'orliguek',
'ifefunef',
'tforitis'
]
paths = find_paths(goal, matrix)
for path in paths:
print(path)
print('# of paths:',len(paths))
Instead of expanding the paths from every coordinate of the matrix, the matrix can first be iterated over to find all the (i,j) coordinates that have the same letter as the first letter from the goal string. This takes O(n^2) time.
Then, for each (i,j) coordinate found which contained the first letter from the goal string, expand the paths from there by searching for the second letter from the goal string and expand only the paths that match the second letter. This action is repeated for each letter in the goal string to recursively find all valid paths from the starting coordinates.

What is the sublist array that can give us maximum 'flip-flop' sum?

my problem is that I'm given an array of with length l.
let's say this is my array: [1,5,4,2,9,3,6] let's call this A.
This array can have multiple sub arrays with nodes being adjacent to each other. so we can have [1,5,4] or [2,9,3,6] and so on. the length of each sub array does not matter.
But the trick is the sum part. we cannot just add all numbers, it works like flip flop. so for the sublist [2,9,3,6] the sum would be [2,-9,3,-6] which is: -10. and is pretty small.
what would be the sublist (or sub-array if you like) of this array A that produces the maximum sum?
one possible way would be (from intuition) that the sublist [4,2,9] will output a decent result : [4, -2, 9] = (add all the elements) = 11.
The question is, how to come up with a result like this?
what is the sub-array that gives us the maximum flip-flop sum?
and mainly, what is the algorithm that takes any array as an input and outputs a sub-array with all numbers being adjacent and with the maximum sum?
I haven't come up with anything but I'm pretty sure I should pick either dynamic programming or divide and conquer to solve this issue. again, I don't know, I may be totally wrong.
The problem can indeed be solved using dynamic programming, by keeping track of the maximum sum ending at each position.
However, since the current element can be either added to or subtracted from a sum (depending on the length of the subsequence), we will keep track of the maximum sums ending here, separately, for both even as well as odd subsequence lengths.
The code below (implemented in python) does that (please see comments in the code for additional details).
The time complexity is O(n).
a = [1, 5, 4, 2, 9, 3, 6]
# initialize the best sequences which end at element a[0]
# best sequence with odd length ending at the current position
best_ending_here_odd = a[0] # the sequence sum value
best_ending_here_odd_start_idx = 0
# best sequence with even length ending at the current position
best_ending_here_even = 0 # the sequence sum value
best_ending_here_even_start_idx = 1
best_sum = 0
best_start_idx = 0
best_end_idx = 0
for i in range(1, len(a)):
# add/subtract the current element to the best sequences that
# ended in the previous element
best_ending_here_even, best_ending_here_odd = \
best_ending_here_odd - a[i], best_ending_here_even + a[i]
# swap starting positions (since a sequence which had odd length when it
# was ending at the previous element has even length now, and vice-versa)
best_ending_here_even_start_idx, best_ending_here_odd_start_idx = \
best_ending_here_odd_start_idx, best_ending_here_even_start_idx
# we can always make a sequence of even length with sum 0 (empty sequence)
if best_ending_here_even < 0:
best_ending_here_even = 0
best_ending_here_even_start_idx = i + 1
# update the best known sub-sequence if it is the case
if best_ending_here_even > best_sum:
best_sum = best_ending_here_even
best_start_idx = best_ending_here_even_start_idx
best_end_idx = i
if best_ending_here_odd > best_sum:
best_sum = best_ending_here_odd
best_start_idx = best_ending_here_odd_start_idx
best_end_idx = i
print(best_sum, best_start_idx, best_end_idx)
For the example sequence in the question, the above code outputs the following flip-flop sub-sequence:
4 - 2 + 9 - 3 + 6 = 14
As quertyman wrote, we can use dynamic programming. This is similar to Kadane's algorithm but with a few twists. We need a second temporary variable to keep track of trying each element both as an addition and as a subtraction. Note that a subtraction must be preceded by an addition but not vice versa. O(1) space, O(n) time.
JavaScript code:
function f(A){
let prevAdd = [A[0], 1] // sum, length
let prevSubt = [0, 0]
let best = [0, -1, 0, null] // sum, idx, len, op
let add
let subt
for (let i=1; i<A.length; i++){
// Try adding
add = [A[i] + prevSubt[0], 1 + prevSubt[1]]
if (add[0] > best[0])
best = [add[0], i, add[1], ' + ']
// Try subtracting
if (prevAdd[0] - A[i] > 0)
subt = [prevAdd[0] - A[i], 1 + prevAdd[1]]
else
subt = [0, 0]
if (subt[0] > best[0])
best = [subt[0], i, subt[1], ' - ']
prevAdd = add
prevSubt = subt
}
return best
}
function show(A, sol){
let [sum, i, len, op] = sol
let str = A[i] + ' = ' + sum
for (let l=1; l<len; l++){
str = A[i-l] + op + str
op = op == ' + ' ? ' - ' : ' + '
}
return str
}
var A = [1, 5, 4, 2, 9, 3, 6]
console.log(JSON.stringify(A))
var sol = f(A)
console.log(JSON.stringify(sol))
console.log(show(A, sol))
Update
Per OP's request in the comments, here is some theoretical elaboration on the general recurrence (pseudocode): let f(i, subtract) represent the maximum sum up to and including the element indexed at i, where subtract indicates whether or not the element is subtracted or added. Then:
// Try subtracting
f(i, true) =
if f(i-1, false) - A[i] > 0
then f(i-1, false) - A[i]
otherwise 0
// Try adding
f(i, false) =
A[i] + f(i-1, true)
(Note that when f(i-1, true) evaluates
to zero, the best ending at
i as an addition is just A[i])
The recurrence only depends on the evaluation at the previous element, which means we can code it with O(1) space, just saving the very last evaluation after each iteration, and updating the best so far (including the sequence's ending index and length if we want).

Number of Paths in a Triangle

I recently encountered a much more difficult variation of this problem, but realized I couldn't generate a solution for this very simple case. I searched Stack Overflow but couldn't find a resource that previously answered this.
You are given a triangle ABC, and you must compute the number of paths of certain length that start at and end at 'A'. Say our function f(3) is called, it must return the number of paths of length 3 that start and end at A: 2 (ABA,ACA).
I'm having trouble formulating an elegant solution. Right now, I've written a solution that generates all possible paths, but for larger lengths, the program is just too slow. I know there must be a nice dynamic programming solution that reuses sequences that we've previously computed but I can't quite figure it out. All help greatly appreciated.
My dumb code:
def paths(n,sequence):
t = ['A','B','C']
if len(sequence) < n:
for node in set(t) - set(sequence[-1]):
paths(n,sequence+node)
else:
if sequence[0] == 'A' and sequence[-1] == 'A':
print sequence
Let PA(n) be the number of paths from A back to A in exactly n steps.
Let P!A(n) be the number of paths from B (or C) to A in exactly n steps.
Then:
PA(1) = 1
PA(n) = 2 * P!A(n - 1)
P!A(1) = 0
P!A(2) = 1
P!A(n) = P!A(n - 1) + PA(n - 1)
= P!A(n - 1) + 2 * P!A(n - 2) (for n > 2) (substituting for PA(n-1))
We can solve the difference equations for P!A analytically, as we do for Fibonacci, by noting that (-1)^n and 2^n are both solutions of the difference equation, and then finding coefficients a, b such that P!A(n) = a*2^n + b*(-1)^n.
We end up with the equation P!A(n) = 2^n/6 + (-1)^n/3, and PA(n) being 2^(n-1)/3 - 2(-1)^n/3.
This gives us code:
def PA(n):
return (pow(2, n-1) + 2*pow(-1, n-1)) / 3
for n in xrange(1, 30):
print n, PA(n)
Which gives output:
1 1
2 0
3 2
4 2
5 6
6 10
7 22
8 42
9 86
10 170
11 342
12 682
13 1366
14 2730
15 5462
16 10922
17 21846
18 43690
19 87382
20 174762
21 349526
22 699050
23 1398102
24 2796202
25 5592406
26 11184810
27 22369622
28 44739242
29 89478486
The trick is not to try to generate all possible sequences. The number of them increases exponentially so the memory required would be too great.
Instead, let f(n) be the number of sequences of length n beginning and ending A, and let g(n) be the number of sequences of length n beginning with A but ending with B. To get things started, clearly f(1) = 1 and g(1) = 0. For n > 1 we have f(n) = 2g(n - 1), because the penultimate letter will be B or C and there are equal numbers of each. We also have g(n) = f(n - 1) + g(n - 1) because if a sequence ends begins A and ends B the penultimate letter is either A or C.
These rules allows you to compute the numbers really quickly using memoization.
My method is like this:
Define DP(l, end) = # of paths end at end and having length l
Then DP(l,'A') = DP(l-1, 'B') + DP(l-1,'C'), similar for DP(l,'B') and DP(l,'C')
Then for base case i.e. l = 1 I check if the end is not 'A', then I return 0, otherwise return 1, so that all bigger states only counts those starts at 'A'
Answer is simply calling DP(n, 'A') where n is the length
Below is a sample code in C++, you can call it with 3 which gives you 2 as answer; call it with 5 which gives you 6 as answer:
ABCBA, ACBCA, ABABA, ACACA, ABACA, ACABA
#include <bits/stdc++.h>
using namespace std;
int dp[500][500], n;
int DP(int l, int end){
if(l<=0) return 0;
if(l==1){
if(end != 'A') return 0;
return 1;
}
if(dp[l][end] != -1) return dp[l][end];
if(end == 'A') return dp[l][end] = DP(l-1, 'B') + DP(l-1, 'C');
else if(end == 'B') return dp[l][end] = DP(l-1, 'A') + DP(l-1, 'C');
else return dp[l][end] = DP(l-1, 'A') + DP(l-1, 'B');
}
int main() {
memset(dp,-1,sizeof(dp));
scanf("%d", &n);
printf("%d\n", DP(n, 'A'));
return 0;
}
EDITED
To answer OP's comment below:
Firstly, DP(dynamic programming) is always about state.
Remember here our state is DP(l,end), represents the # of paths having length l and ends at end. So to implement states using programming, we usually use array, so DP[500][500] is nothing special but the space to store the states DP(l,end) for all possible l and end (That's why I said if you need a bigger length, change the size of array)
But then you may ask, I understand the first dimension which is for l, 500 means l can be as large as 500, but how about the second dimension? I only need 'A', 'B', 'C', why using 500 then?
Here is another trick (of C/C++), the char type indeed can be used as an int type by default, which value is equal to its ASCII number. And I do not remember the ASCII table of course, but I know that around 300 will be enough to represent all the ASCII characters, including A(65), B(66), C(67)
So I just declare any size large enough to represent 'A','B','C' in the second dimension (that means actually 100 is more than enough, but I just do not think that much and declare 500 as they are almost the same, in terms of order)
so you asked what DP[3][1] means, it means nothing as the I do not need / calculate the second dimension when it is 1. (Or one can think that the state dp(3,1) does not have any physical meaning in our problem)
In fact, I always using 65, 66, 67.
so DP[3][65] means the # of paths of length 3 and ends at char(65) = 'A'
You can do better than the dynamic programming/recursion solution others have posted, for the given triangle and more general graphs. Whenever you are trying to compute the number of walks in a (possibly directed) graph, you can express this in terms of the entries of powers of a transfer matrix. Let M be a matrix whose entry m[i][j] is the number of paths of length 1 from vertex i to vertex j. For a triangle, the transfer matrix is
0 1 1
1 0 1.
1 1 0
Then M^n is a matrix whose i,j entry is the number of paths of length n from vertex i to vertex j. If A corresponds to vertex 1, you want the 1,1 entry of M^n.
Dynamic programming and recursion for the counts of paths of length n in terms of the paths of length n-1 are equivalent to computing M^n with n multiplications, M * M * M * ... * M, which can be fast enough. However, if you want to compute M^100, instead of doing 100 multiplies, you can use repeated squaring: Compute M, M^2, M^4, M^8, M^16, M^32, M^64, and then M^64 * M^32 * M^4. For larger exponents, the number of multiplies is about c log_2(exponent).
Instead of using that a path of length n is made up of a path of length n-1 and then a step of length 1, this uses that a path of length n is made up of a path of length k and then a path of length n-k.
We can solve this with a for loop, although Anonymous described a closed form for it.
function f(n){
var as = 0, abcs = 1;
for (n=n-3; n>0; n--){
as = abcs - as;
abcs *= 2;
}
return 2*(abcs - as);
}
Here's why:
Look at one strand of the decision tree (the other one is symmetrical):
A
B C...
A C
B C A B
A C A B B C A C
B C A B B C A C A C A B B C A B
Num A's Num ABC's (starting with first B on the left)
0 1
1 (1-0) 2
1 (2-1) 4
3 (4-1) 8
5 (8-3) 16
11 (16-5) 32
Cleary, we can't use the strands that end with the A's...
You can write a recursive brute force solution and then memoize it (aka top down dynamic programming). Recursive solutions are more intuitive and easy to come up with. Here is my version:
# search space (we have triangle with nodes)
nodes = ["A", "B", "C"]
#cache # memoize!
def recurse(length, steps):
# if length of the path is n and the last node is "A", then it's
# a valid path and we can count it.
if length == n and ((steps-1)%3 == 0 or (steps+1)%3 == 0):
return 1
# we don't want paths having len > n.
if length > n:
return 0
# from each position, we have two possibilities, either go to next
# node or previous node. Total paths will be sum of both the
# possibilities. We do this recursively.
return recurse(length+1, steps+1) + recurse(length+1, steps-1)

How to approach Vertical Sticks challenge?

This problem is taken from interviewstreet.com
Given array of integers Y=y1,...,yn, we have n line segments such that
endpoints of segment i are (i, 0) and (i, yi). Imagine that from the
top of each segment a horizontal ray is shot to the left, and this ray
stops when it touches another segment or it hits the y-axis. We
construct an array of n integers, v1, ..., vn, where vi is equal to
length of ray shot from the top of segment i. We define V(y1, ..., yn)
= v1 + ... + vn.
For example, if we have Y=[3,2,5,3,3,4,1,2], then v1, ..., v8 =
[1,1,3,1,1,3,1,2], as shown in the picture below:
For each permutation p of [1,...,n], we can calculate V(yp1, ...,
ypn). If we choose a uniformly random permutation p of [1,...,n], what
is the expected value of V(yp1, ..., ypn)?
Input Format
First line of input contains a single integer T (1 <= T <= 100). T
test cases follow.
First line of each test-case is a single integer N (1 <= N <= 50).
Next line contains positive integer numbers y1, ..., yN separated by a
single space (0 < yi <= 1000).
Output Format
For each test-case output expected value of V(yp1, ..., ypn), rounded
to two digits after the decimal point.
Sample Input
6
3
1 2 3
3
3 3 3
3
2 2 3
4
10 2 4 4
5
10 10 10 5 10
6
1 2 3 4 5 6
Sample Output
4.33
3.00
4.00
6.00
5.80
11.15
Explanation
Case 1: We have V(1,2,3) = 1+2+3 = 6, V(1,3,2) = 1+2+1 = 4, V(2,1,3) =
1+1+3 = 5, V(2,3,1) = 1+2+1 = 4, V(3,1,2) = 1+1+2 = 4, V(3,2,1) =
1+1+1 = 3. Average of these values is 4.33.
Case 2: No matter what the permutation is, V(yp1, yp2, yp3) = 1+1+1 =
3, so the answer is 3.00.
Case 3: V(y1 ,y2 ,y3)=V(y2 ,y1 ,y3) = 5, V(y1, y3, y2)=V(y2, y3, y1) =
4, V(y3, y1, y2)=V(y3, y2, y1) = 3, and average of these values is
4.00.
A naive solution to the problem will run forever for N=50. I believe that the problem can be solved by independently calculating a value for each stick. I still need to know if there is any other efficient approach for this problem. On what basis do we have to independently calculate value for each stick?
We can solve this problem, by figure out:
if the k th stick is put in i th position, what is the expected ray-length of this stick.
then the problem can be solve by adding up all the expected length for all sticks in all positions.
Let expected[k][i] be the expected ray-length of k th stick put in i th position, let num[k][i][length] be the number of permutations that k th stick put in i th position with ray-length equals to length, then
expected[k][i] = sum( num[k][i][length] * length ) / N!
How to compute num[k][i][length]? For example, for length=3, consider the following graph:
...GxxxI...
Where I is the position, 3 'x' means we need 3 sticks that are strictly lower then I, and G means we need a stick that are at least as high as I.
Let s_i be the number of sticks that are smaller then the k th the stick, and g_i be the number of sticks that are greater or equal to the k th stick, then we can choose any one of g_i to put in G position, we can choose any length of s_i to fill the x position, so we have:
num[k][i][length] = P(s_i, length) * g_i * P(n-length-1-1)
In case that all the positions before I are all smaller then I, we don't need a greater stick in G, i.e. xxxI...., we have:
num[k][i][length] = P(s_i, length) * P(n-length-1)
And here's a piece of Python code that can solve this problem:
def solve(n, ys):
ret = 0
for y_i in ys:
s_i = len(filter(lambda x: x < y_i, ys))
g_i = len(filter(lambda x: x >= y_i, ys)) - 1
for i in range(n):
for length in range(1, i+1):
if length == i:
t_ret = combination[s_i][length] * factorial[length] * factorial[ n - length - 1 ]
else:
t_ret = combination[s_i][length] * factorial[length] * g_i * factorial[ n - length - 1 - 1 ]
ret += t_ret * length
return ret * 1.0 / factorial[n] + n
This is the same question as https://cs.stackexchange.com/questions/1076/how-to-approach-vertical-sticks-challenge and my answer there (which is a little simpler than those given earlier here) was:
Imagine a different problem: if you had to place k sticks of equal heights in n slots then the expected distance between sticks (and the expected distance between the first stick and a notional slot 0, and the expected distance between the last stick and a notional slot n+1) is (n+1)/(k+1) since there are k+1 gaps to fit in a length n+1.
Returning to this problem, a particular stick is interested in how many sticks (including itself) as as high or higher. If this is k, then the expected gap before it is also (n+1)/(k+1).
So the algorithm is simply to find this value for each stick and add up the expectation. For example, starting with heights of 3,2,5,3,3,4,1,2, the number of sticks with a greater or equal height is 5,7,1,5,5,2,8,7 so the expectation is 9/6+9/8+9/2+9/6+9/6+9/3+9/9+9/8 = 15.25.
This is easy to program: for example a single line in R
V <- function(Y){(length(Y) + 1) * sum(1 / (rowSums(outer(Y, Y, "<=")) + 1) )}
gives the values in the sample output in the original problem
> V(c(1,2,3))
[1] 4.333333
> V(c(3,3,3))
[1] 3
> V(c(2,2,3))
[1] 4
> V(c(10,2,4,4))
[1] 6
> V(c(10,10,10,5,10))
[1] 5.8
> V(c(1,2,3,4,5,6))
[1] 11.15
As you correctly, noted we can solve problem independently for each stick.
Let F(i, len) is number of permutations, that ray from stick i is exactly len.
Then answer is
(Sum(by i, len) F(i,len)*len)/(n!)
All is left is to count F(i, len). Let a(i) be number of sticks j, that y_j<=y_i. b(i) - number of sticks, that b_j>b_i.
In order to get ray of length len, we need to have situation like this.
B, l...l, O
len-1 times
Where O - is stick #i. B - is stick with bigger length, or beginning. l - is stick with heigth, lesser then ith.
This gives us 2 cases:
1) B is the beginning, this can be achieved in P(a(i), len-1) * (b(i)+a(i)-(len-1))! ways.
2) B is bigger stick, this can be achieved in P(a(i), len-1)*b(i)*(b(i)+a(i)-len)!*(n-len) ways.
edit: corrected b(i) as 2nd term in (mul)in place of a(i) in case 2.

Resources