How Could One Implement the K-Means++ Algorithm? - algorithm

I am having trouble fully understanding the K-Means++ algorithm. I am interested exactly how the first k centroids are picked, namely the initialization as the rest is like in the original K-Means algorithm.
Is the probability function used based on distance or Gaussian?
In the same time the most long distant point (From the other centroids) is picked for a new centroid.
I will appreciate a step by step explanation and an example. The one in Wikipedia is not clear enough. Also a very well commented source code would also help. If you are using 6 arrays then please tell us which one is for what.

Interesting question. Thank you for bringing this paper to my attention - K-Means++: The Advantages of Careful Seeding
In simple terms, cluster centers are initially chosen at random from the set of input observation vectors, where the probability of choosing vector x is high if x is not near any previously chosen centers.
Here is a one-dimensional example. Our observations are [0, 1, 2, 3, 4]. Let the first center, c1, be 0. The probability that the next cluster center, c2, is x is proportional to ||c1-x||^2. So, P(c2 = 1) = 1a, P(c2 = 2) = 4a, P(c2 = 3) = 9a, P(c2 = 4) = 16a, where a = 1/(1+4+9+16).
Suppose c2=4. Then, P(c3 = 1) = 1a, P(c3 = 2) = 4a, P(c3 = 3) = 1a, where a = 1/(1+4+1).
I've coded the initialization procedure in Python; I don't know if this helps you.
def initialize(X, K):
C = [X[0]]
for k in range(1, K):
D2 = scipy.array([min([scipy.inner(c-x,c-x) for c in C]) for x in X])
probs = D2/D2.sum()
cumprobs = probs.cumsum()
r = scipy.rand()
for j,p in enumerate(cumprobs):
if r < p:
i = j
break
C.append(X[i])
return C
EDIT with clarification: The output of cumsum gives us boundaries to partition the interval [0,1]. These partitions have length equal to the probability of the corresponding point being chosen as a center. So then, since r is uniformly chosen between [0,1], it will fall into exactly one of these intervals (because of break). The for loop checks to see which partition r is in.
Example:
probs = [0.1, 0.2, 0.3, 0.4]
cumprobs = [0.1, 0.3, 0.6, 1.0]
if r < cumprobs[0]:
# this event has probability 0.1
i = 0
elif r < cumprobs[1]:
# this event has probability 0.2
i = 1
elif r < cumprobs[2]:
# this event has probability 0.3
i = 2
elif r < cumprobs[3]:
# this event has probability 0.4
i = 3

One Liner.
Say we need to select 2 cluster centers, instead of selecting them all randomly{like we do in simple k means}, we will select the first one randomly, then find the points that are farthest to the first center{These points most probably do not belong to the first cluster center as they are far from it} and assign the second cluster center nearby those far points.

I have prepared a full source implementation of k-means++ based on the book "Collective Intelligence" by Toby Segaran and the k-menas++ initialization provided here.
Indeed there are two distance functions here. For the initial centroids a standard one is used based numpy.inner and then for the centroids fixation the Pearson one is used. Maybe the Pearson one can be also be used for the initial centroids. They say it is better.
from __future__ import division
def readfile(filename):
lines=[line for line in file(filename)]
rownames=[]
data=[]
for line in lines:
p=line.strip().split(' ') #single space as separator
#print p
# First column in each row is the rowname
rownames.append(p[0])
# The data for this row is the remainder of the row
data.append([float(x) for x in p[1:]])
#print [float(x) for x in p[1:]]
return rownames,data
from math import sqrt
def pearson(v1,v2):
# Simple sums
sum1=sum(v1)
sum2=sum(v2)
# Sums of the squares
sum1Sq=sum([pow(v,2) for v in v1])
sum2Sq=sum([pow(v,2) for v in v2])
# Sum of the products
pSum=sum([v1[i]*v2[i] for i in range(len(v1))])
# Calculate r (Pearson score)
num=pSum-(sum1*sum2/len(v1))
den=sqrt((sum1Sq-pow(sum1,2)/len(v1))*(sum2Sq-pow(sum2,2)/len(v1)))
if den==0: return 0
return 1.0-num/den
import numpy
from numpy.random import *
def initialize(X, K):
C = [X[0]]
for _ in range(1, K):
#D2 = numpy.array([min([numpy.inner(c-x,c-x) for c in C]) for x in X])
D2 = numpy.array([min([numpy.inner(numpy.array(c)-numpy.array(x),numpy.array(c)-numpy.array(x)) for c in C]) for x in X])
probs = D2/D2.sum()
cumprobs = probs.cumsum()
#print "cumprobs=",cumprobs
r = rand()
#print "r=",r
i=-1
for j,p in enumerate(cumprobs):
if r 0:
for rowid in bestmatches[i]:
for m in range(len(rows[rowid])):
avgs[m]+=rows[rowid][m]
for j in range(len(avgs)):
avgs[j]/=len(bestmatches[i])
clusters[i]=avgs
return bestmatches
rows,data=readfile('/home/toncho/Desktop/data.txt')
kclust = kcluster(data,k=4)
print "Result:"
for c in kclust:
out = ""
for r in c:
out+=rows[r] +' '
print "["+out[:-1]+"]"
print 'done'
data.txt:
p1 1 5 6
p2 9 4 3
p3 2 3 1
p4 4 5 6
p5 7 8 9
p6 4 5 4
p7 2 5 6
p8 3 4 5
p9 6 7 8

Related

Find second minimum for each row of a matrix

I have a set i of customers and a set j of facilities. I have two binary variables: y ij which is 1 if client i is served by a primary facility, 0 otherwise; b ij is 1 if client i is served by a backup facility, 0 otherwise.
Given the starting matrix d:
-I must set y[i,j] = 1 based on the minimum distance of each row in the matrix (and this I have done);
I have to fix b[i,j] = 1 according to the second minimum distance of each row in the matrix (I don't know how to do this. I wrote max, but I don't have to do that). I've tried removing the first minimum from each row with the various pop, deleteat, splice, etc, but the solver gives me an error.
using JuMP
using Gurobi
using DelimitedFiles
import Random
import LinearAlgebra
import Plots
n = 3
m = 5
model = Model(Gurobi.Optimizer);
#variable(model, y[1:m,1:n] >= 0, Bin);
#variable(model, b[1:m,1:n] >= 0, Bin);
d = [
[80 20 40]
[71 55 24]
[56 47 81]
[10 20 30]
[31 41 21]
];
#PRIMARY ASSIGNMENTS
# 1) For each customer find the minimum d i-j and its position in matrix and create a vector V composed by all d i-j just founded
V = [];
for i = 1:m;
c = findmin(d[i,j] for j = 1:n);
push!(V,[c[1] ,c[2], i]);
end
println(V)
# 2) Sort vector's evelements from the smallest to the largest
S = sort(V)
println(S)
for i = 1:m
println(S[i][2])
println(S[i][3])
end
# 3) Fix primary assingnments for the first 50% of customers
for i = 1:3
fix(y[S[i][3], S[i][2]], 1.0, force = true);
end
# SECONDARY ASSIGNMENTS
# 1) For each customer find the second minimum d i-j and its position in matrix and create a vector W composed by all d i-j just founded
W = [];
for i = 1:m;
f = findmax(d[i,j] for j = 1:n);
push!(W,[f[1] ,f[2], i]);
end
println(W)
# 2) Sort vector's elements from the smallest to the largest
T = sort(W)
println(T)
for i = 1:3
println(T[i][2])
println(T[i][3])
end
# 3) Fix secondary assingnments for the first 50% of customers
for i = 1:3
fix(b[T[i][3], T[i][2]], 1.0, force = true);
end
optimize!(model)
I tried to find for each line the second minimum, but I could not.

Finding a Hamiltonian path in Julia Language

I'm a beginner.
I am starting my studies in image processing and I am using the Julia Language.
I have tried until unsuccessfully to develop the Algorithm 1 of this article https://hal.archives-ouvertes.fr/hal-03330433/document, on page 3.
I noticed that there is a similar algorithm proposed in Matlab and in C on this site https://elad.cs.technion.ac.il/software/?pn=1427. The authors provided the algorithm and the article where these ideas were proposed.
I read and studied both, but I could not to do in Julia Language.
The algorithm proposed is
where, the graph G is a pair (V, E), with V the color values {v1, v2, ..., vm} and E the edge eij = (vi, vj).
I think that the constant alpha is 10^6. But I need to read the second paper again.
To show my doubts, I am placing a random RGB color image of size 6 x 5 (30 pixels), putting the color set T as a 3 x 30 matrix and putting the random probability vector as p.
After that I am generating the list L and choose j in L.
julia>
using Colors, FixedPointNumbers
using Images
using Statistics
using LinearAlgebra
img = rand(RGB{N0f8}, 6, 5) # random RGB color image of size 6 x 5
a, b = size(img)
T = reshape(channelview(img), 3, a*b) # matrix T of 3 x 30. Every column is a r, g, b color of img
p = rand(a*b) # random vector p of 30 elements
L = [i for i=1:30] #index list
j = rand(L) #choose a random j in L
#find the pixel (pix_i, pix_j) of the image
if j % a == 0
pix_i = a
pix_j = div(j, a)
else
pix_i = j % a
pix_j = 1 + div(j, a)
end
#The edges eij = (vj, vk)#
for i = pix_i - 1: pix_i+1
for j = pix_j -1 : pix_j + 1
N = CartesianIndex(i, j)
end
end
After that, I don't know how can I continue.
Here in that last line I already have a problem. Because I'm not able to build the set N(vj) with the indices.
In fact, I also don't know how to add to the end the vectors that we get at each iteration.
That is, my problem is in the construction of this set P and so on.
Regarding the N(vj) inside of the command "for" that the author proposes, I am considering a 3 x 3 square. So, for example, for a pixel at position (4, 3) in the first iteration (i.e., j = (4,3)), I would have for absolute value of N(vj) is 8, because in the first iteration, we will get the pixels (3, 2), (3,3), (3,4), (4,2), (4,3), (4,4), (5,2), (5,3) and (5,4).
So N(vj) = {(3, 2), (3,3), (3,4), (4,2), (4,3), (4,4), (5,2), (5,3),(5,4)} \ {(4,3)}.
So the absolute value is 8.
But, I can't do it.
If you can help me, I would be very grateful.

Psuedo-Random Variable

I have a variable, between 0 and 1, which should dictate the likelyhood that a second variable, a random number between 0 and 1, is greater than 0.5. In other words, if I were to generate the second variable 1000 times, the average should be approximately equal to the first variable's value. How do I make this code?
Oh, and the second variable should always be capable of producing either 0 or 1 in any condition, just more or less likely depending on the value of the first variable. Here is a link to a graph which models approximately how I would like the program to behave. Each equation represents a separate value for the first variable.
You have a variable p and you are looking for a mapping function f(x) that maps random rolls between x in [0, 1] to the same interval [0, 1] such that the expected value, i.e. the average of all rolls, is p.
You have chosen the function prototype
f(x) = pow(x, c)
where c must be chosen appropriately. If x is uniformly distributed in [0, 1], the average value is:
int(f(x) dx, [0, 1]) == p
With the integral:
int(pow(x, c) dx) == pow(x, c + 1) / (c + 1) + K
one gets:
c = 1/p - 1
A different approach is to make p the median value of the distribution, such that half of the rolls fall below p, the other half above p. This yields a different distribution. (I am aware that you didn't ask for that.) Now, we have to satisfy the condition:
f(0.5) == pow(0.5, c) == p
which yields:
c = log(p) / log(0.5)
With the current function prototype, you cannot satisfy both requirements. Your function is also asymmetric (f(x, p) != f(1-x, 1-p)).
Python functions below:
def medianrand(p):
"""Random number between 0 and 1 whose median is p"""
c = math.log(p) / math.log(0.5)
return math.pow(random.random(), c)
def averagerand(p):
"""Random number between 0 and 1 whose expected value is p"""
c = 1/p - 1
return math.pow(random.random(), c)
You can do this by using a dummy. First set the first variable to a value between 0 and 1. Then create a random number in the dummy between 0 and 1. If this dummy is bigger than the first variable, you generate a random number between 0 and 0.5, and otherwise you generate a number between 0.5 and 1.
In pseudocode:
real a = 0.7
real total = 0.0
for i between 0 and 1000 begin
real dummy = rand(0,1)
real b
if dummy > a then
b = rand(0,0.5)
else
b = rand(0.5,1)
end if
total = total + b
end for
real avg = total / 1000
Please note that this algorithm will generate average values between 0.25 and 0.75. For a = 1 it will only generate random values between 0.5 and 1, which should average to 0.75. For a=0 it will generate only random numbers between 0 and 0.5, which should average to 0.25.
I've made a sort of pseudo-solution to this problem, which I think is acceptable.
Here is the algorithm I made;
a = 0.2 # variable one
b = 0 # variable two
b = random.random()
b = b^(1/(2^(4*a-1)))
It doesn't actually produce the average results that I wanted, but it's close enough for my purposes.
Edit: Here's a graph I made that consists of a large amount of datapoints I generated with a python script using this algorithm;
import random
mod = 6
div = 100
for z in xrange(div):
s = 0
for i in xrange (100000):
a = (z+1)/float(div) # variable one
b = random.random() # variable two
c = b**(1/(2**((mod*a*2)-mod)))
s += c
print str((z+1)/float(div)) + "\t" + str(round(s/100000.0, 3))
Each point in the table is the result of 100000 randomly generated points from the algorithm; their x positions being the a value given, and their y positions being their average. Ideally they would fit to a straight line of y = x, but as you can see they fit closer to an arctan equation. I'm trying to mess around with the algorithm so that the averages fit the line, but I haven't had much luck as of yet.

Vectorized search for permutations (with repetitions) that contain given subpermutations (with repetitions)

This question is can be viewed continuation/extension/generalization of a previous question of mine from here.
Some definitions: I have a set of integers S = {1,2,...,s}, say s = 20, and two matrices N and M whose rows are finite sequences of numbers from S (i.e. permutations with possible repetitions), of order n and m respectively, where 1 <= n <= m. Let us think of N as a collection of candidate sub-sequences for the sequences from M.
Example: [2 3 4 3] is a sub-sequence of [1 2 2 3 5 4 1 3] that occurs with multiplicity 2 (=in how many different ways one can find the sub-seq. in the main seq.), whereas [3 2 2 3] is not a sub-sequence of it. In particular, a valid sub-sequence by definition must preserve the order of the indices.
Problem statement:
(P1) For each row of M, obtain the number of sub-sequences of it, with multiplicity and without multiplicity, that occur in N as rows (it can be zero if none are contained in N);
(P2) For each row of N, find out how many times, with multiplicity and without multiplicity, it is contained in M as a sub-sequence (again, this number can be zero);
Example: Let N = [1 2 2; 2 3 4] and M = [1 1 2 2 3; 1 2 2 3 4; 1 2 3 5 6]. Then (P1) returns [2; 3; 0] for 'with multiplicities' and [1; 2; 0] for 'without multiplicities'. (P2) returns [3; 2] for 'with multiplicities' and [2; 1] without multiplicities.
Order of magnitude: M could typically have up to 30-40 columns and a few thousand rows, although I currently have M with only a few hundred rows and ~10 columns. N could be approaching the size of
M or could be also much smaller.
What I have so far: Not much, to be honest. I believe I might be able to slightly modify my not-very-well-vectorized solution from my previous question to tackle permutations with repetitions, but I am still thinking on that and will update as soon as I have something working. But given my (lack of) experience so far, it would be in all likelihood very suboptimal :(
Thanks!
Introduction : Owing to the repetitions in the input data in each row, the combination finding process doesn't have the sort of "uniqueness" among elements which was exploited in your previous problem and hence the loops used here. Also, note that the without multiplicity codes don't use nchoosek and as such, I feel more optimistic about them for performance.
Notations :
p1wim -> P1 with multiplicity
p2wim -> P2 with multiplicity
p1wom -> P1 without multiplicity
p2wom -> P2 without multiplicity
Codes :
I. Code for P1, 2 with multiplicity
permN = permute(N,[3 2 1]);
p1wim(size(M,1),1)=0;
p2wim(size(N,1),1)=0;
for k1 = 1:size(M,1)
d1 = nchoosek(M(k1,:),3);
t1 = all(bsxfun(#eq,d1,permN),2);
p1wim(k1) = sum(t1(:));
p2wim = p2wim + squeeze(sum(t1,1));
end
II. Code for P1, 2 without multiplicity
eqmat = bsxfun(#eq,M,permute(N,[3 4 2 1])); %// equality matrix
[m,n,p,q] = size(eqmat); %// get sizes
inds = zeros(size(M,1),p,q); %// pre-allocate for indices array
vec1 = [1:m]'; %//' setup constants to loop
vec2 = [0:q-1]*m*n*p;
vec3 = permute([0:p-1]*m*n,[1 3 2]);
for iter = 1:p
[~,ind1] = max(eqmat(:,:,iter,:),[],2);
inds(:,iter,:) = reshape(ind1,m,1,q);
ind2 = squeeze(ind1);
ind3 = bsxfun(#plus,vec1,(ind2-1)*m); %//' setup forward moving equalities
ind4 = bsxfun(#plus,ind3,vec2);
ind5 = bsxfun(#plus,ind4,vec3);
eqmat(ind5(:)) = 0;
end
p1wom = sum(all(diff(inds,[],2)>0,2),3);
p2wom = squeeze(sum(all(diff(inds,[],2)>0,2),1));
As usual, I would encourage you to use gpuArrays too with your favorite parfor.
This approach uses only one loop over the rows of M (P1) or N (P2). The code makes use of linear indexing and the very powerful bsxfun function. Note that if the number of columns is large you may experience problems because of nchoosek.
[mr mc] = size(M);
[nr nc] = size(N);
%// P1
combs = nchoosek(1:mc, nc)-1;
P1mu = NaN(mr,1);
P1nm = NaN(mr,1);
for r = 1:mr
aux = M(r+mr*combs);
P1mu(r) = sum(ismember(aux, N, 'rows'));
P1nm(r) = sum(ismember(unique(aux, 'rows'), N, 'rows'));
end
%// P2. Multiplicity defined to span across different rows
rr = reshape(repmat(1:mr, size(combs,1), 1),[],1);
P2mu = NaN(nr,1);
P2nm = NaN(nr,1);
for r = 1:nr
aux = M(bsxfun(#plus, rr, mr*repmat(combs, mr, 1)));
P2mu(r) = sum(all(bsxfun(#eq, N(r,:), aux), 2));
P2nm(r) = sum(all(bsxfun(#eq, N(r,:), unique(aux, 'rows')), 2));
end
%// P2. Multiplicity defined restricted to within one row
rr = reshape(repmat(1:mr, size(combs,1), 1),[],1);
P2mur = NaN(nr,1);
P2nmr = NaN(nr,1);
for r = 1:nr
aux = M(bsxfun(#plus, rr, mr*repmat(combs, mr, 1)));
P2mur(r) = sum(all(bsxfun(#eq, N(r,:), aux), 2));
aux2 = unique([aux rr], 'rows'); %// concat rr to differentiate rows...
aux2 = aux2(:,1:end-1); %// ...and now remove it
P2nmr(r) = sum(all(bsxfun(#eq, N(r,:), aux2), 2));
end
Results for your example data:
P1mu =
2
3
0
P1nm =
1
2
0
P2mu =
3
2
P2nm =
1
1
P2mur =
3
2
P2nmr =
2
1
Some optimizations to the code would be possible. Not sure they are worth the effort:
Replace repmat by another bsxfun (using a 3rd dimension). That may save some memory
Transpose original matrices and work down colunmns, instead of along rows. That may be faster.

How to approach Vertical Sticks challenge?

This problem is taken from interviewstreet.com
Given array of integers Y=y1,...,yn, we have n line segments such that
endpoints of segment i are (i, 0) and (i, yi). Imagine that from the
top of each segment a horizontal ray is shot to the left, and this ray
stops when it touches another segment or it hits the y-axis. We
construct an array of n integers, v1, ..., vn, where vi is equal to
length of ray shot from the top of segment i. We define V(y1, ..., yn)
= v1 + ... + vn.
For example, if we have Y=[3,2,5,3,3,4,1,2], then v1, ..., v8 =
[1,1,3,1,1,3,1,2], as shown in the picture below:
For each permutation p of [1,...,n], we can calculate V(yp1, ...,
ypn). If we choose a uniformly random permutation p of [1,...,n], what
is the expected value of V(yp1, ..., ypn)?
Input Format
First line of input contains a single integer T (1 <= T <= 100). T
test cases follow.
First line of each test-case is a single integer N (1 <= N <= 50).
Next line contains positive integer numbers y1, ..., yN separated by a
single space (0 < yi <= 1000).
Output Format
For each test-case output expected value of V(yp1, ..., ypn), rounded
to two digits after the decimal point.
Sample Input
6
3
1 2 3
3
3 3 3
3
2 2 3
4
10 2 4 4
5
10 10 10 5 10
6
1 2 3 4 5 6
Sample Output
4.33
3.00
4.00
6.00
5.80
11.15
Explanation
Case 1: We have V(1,2,3) = 1+2+3 = 6, V(1,3,2) = 1+2+1 = 4, V(2,1,3) =
1+1+3 = 5, V(2,3,1) = 1+2+1 = 4, V(3,1,2) = 1+1+2 = 4, V(3,2,1) =
1+1+1 = 3. Average of these values is 4.33.
Case 2: No matter what the permutation is, V(yp1, yp2, yp3) = 1+1+1 =
3, so the answer is 3.00.
Case 3: V(y1 ,y2 ,y3)=V(y2 ,y1 ,y3) = 5, V(y1, y3, y2)=V(y2, y3, y1) =
4, V(y3, y1, y2)=V(y3, y2, y1) = 3, and average of these values is
4.00.
A naive solution to the problem will run forever for N=50. I believe that the problem can be solved by independently calculating a value for each stick. I still need to know if there is any other efficient approach for this problem. On what basis do we have to independently calculate value for each stick?
We can solve this problem, by figure out:
if the k th stick is put in i th position, what is the expected ray-length of this stick.
then the problem can be solve by adding up all the expected length for all sticks in all positions.
Let expected[k][i] be the expected ray-length of k th stick put in i th position, let num[k][i][length] be the number of permutations that k th stick put in i th position with ray-length equals to length, then
expected[k][i] = sum( num[k][i][length] * length ) / N!
How to compute num[k][i][length]? For example, for length=3, consider the following graph:
...GxxxI...
Where I is the position, 3 'x' means we need 3 sticks that are strictly lower then I, and G means we need a stick that are at least as high as I.
Let s_i be the number of sticks that are smaller then the k th the stick, and g_i be the number of sticks that are greater or equal to the k th stick, then we can choose any one of g_i to put in G position, we can choose any length of s_i to fill the x position, so we have:
num[k][i][length] = P(s_i, length) * g_i * P(n-length-1-1)
In case that all the positions before I are all smaller then I, we don't need a greater stick in G, i.e. xxxI...., we have:
num[k][i][length] = P(s_i, length) * P(n-length-1)
And here's a piece of Python code that can solve this problem:
def solve(n, ys):
ret = 0
for y_i in ys:
s_i = len(filter(lambda x: x < y_i, ys))
g_i = len(filter(lambda x: x >= y_i, ys)) - 1
for i in range(n):
for length in range(1, i+1):
if length == i:
t_ret = combination[s_i][length] * factorial[length] * factorial[ n - length - 1 ]
else:
t_ret = combination[s_i][length] * factorial[length] * g_i * factorial[ n - length - 1 - 1 ]
ret += t_ret * length
return ret * 1.0 / factorial[n] + n
This is the same question as https://cs.stackexchange.com/questions/1076/how-to-approach-vertical-sticks-challenge and my answer there (which is a little simpler than those given earlier here) was:
Imagine a different problem: if you had to place k sticks of equal heights in n slots then the expected distance between sticks (and the expected distance between the first stick and a notional slot 0, and the expected distance between the last stick and a notional slot n+1) is (n+1)/(k+1) since there are k+1 gaps to fit in a length n+1.
Returning to this problem, a particular stick is interested in how many sticks (including itself) as as high or higher. If this is k, then the expected gap before it is also (n+1)/(k+1).
So the algorithm is simply to find this value for each stick and add up the expectation. For example, starting with heights of 3,2,5,3,3,4,1,2, the number of sticks with a greater or equal height is 5,7,1,5,5,2,8,7 so the expectation is 9/6+9/8+9/2+9/6+9/6+9/3+9/9+9/8 = 15.25.
This is easy to program: for example a single line in R
V <- function(Y){(length(Y) + 1) * sum(1 / (rowSums(outer(Y, Y, "<=")) + 1) )}
gives the values in the sample output in the original problem
> V(c(1,2,3))
[1] 4.333333
> V(c(3,3,3))
[1] 3
> V(c(2,2,3))
[1] 4
> V(c(10,2,4,4))
[1] 6
> V(c(10,10,10,5,10))
[1] 5.8
> V(c(1,2,3,4,5,6))
[1] 11.15
As you correctly, noted we can solve problem independently for each stick.
Let F(i, len) is number of permutations, that ray from stick i is exactly len.
Then answer is
(Sum(by i, len) F(i,len)*len)/(n!)
All is left is to count F(i, len). Let a(i) be number of sticks j, that y_j<=y_i. b(i) - number of sticks, that b_j>b_i.
In order to get ray of length len, we need to have situation like this.
B, l...l, O
len-1 times
Where O - is stick #i. B - is stick with bigger length, or beginning. l - is stick with heigth, lesser then ith.
This gives us 2 cases:
1) B is the beginning, this can be achieved in P(a(i), len-1) * (b(i)+a(i)-(len-1))! ways.
2) B is bigger stick, this can be achieved in P(a(i), len-1)*b(i)*(b(i)+a(i)-len)!*(n-len) ways.
edit: corrected b(i) as 2nd term in (mul)in place of a(i) in case 2.

Resources