EM Problem involving 3 coins - algorithm

I'm working on an estimation problem using the EM algorithm. The problem is as follows:
You have 3 coins with probabilities of being heads P1, P2, and P3 respectively. You flip coin 1. If coin 1=H, then you flip coin 2; if coin 1=T, then you flip coin 3. You only record whether coin 2 or 3 is heads or tails, not which coin was flipped. So the observations are strings of heads and tails, but nothing else. The problem is to estimate P1, P2, and P3.
My R code to do this is below. It's not working, and I can't figure out why. Any thoughts would be appreciated as I think this is a pretty crafty problem.
Ben
###############
#simulate data
p1<-.8
p2<-.8
p3<-.3
tosses<-1000
rbinom(tosses,size=1,prob=p1)->coin.1
pv<-rep(p3,tosses)
pv[coin.1==1]<-p2
#face now contains the probabilities of a head
rbinom(tosses,size=1,prob=pv)->face
rm(list=(ls()[ls()!="face"]))
#face is all you get to see!
################
#e-step
e.step<-function(x,theta.old) {
fun<-function(p,theta.old,x) {
theta.old[1]->p1
theta.old[2]->p2
theta.old[3]->p3
log(p1*p2^x*(1-p2)^(1-x))*(x*p1*p2+(1-x)*p1*(1-p2))->tmp1 #this is the first part of the expectation
log((1-p1)*p3^x*(1-p3)^(1-x))*(x*(1-p1)*p3+(1-x)*(1-p1)*(1-p3))->tmp2 #this is the second
mean(tmp1+tmp2)
}
return(fun)
}
#m-step
m.step<-function(fun,theta.old,face) {
nlminb(start=runif(3),objective=fun,theta.old=theta.old,x=face,lower=rep(.01,3),upper=rep(.99,3))$par
}
#initial estimates
length(face)->N
iter<-200
theta<-matrix(NA,iter,3)
c(.5,.5,.5)->theta[1,]
for (i in 2:iter) {
e.step(face,theta[i-1,])->tmp
m.step(tmp,theta[i-1,],face)->theta[i,]
print(c(i,theta[i,]))
if (max(abs(theta[i,]-theta[i-1,]))<.005) break("conv")
}
#note that this thing isn't going anywhere!

You can't estimate P1, P2 and P3 separately. The only useful information is the proportion of recorded heads and the total number of sets of flips (each set of flips is independent, so the order does not matter). This is like trying to solve one equation for three unknowns, and it cannot be done.
The probability of recording a head is P1*P2 + (1-P1)*P3 which in your example is 0.7
and of a tail is one minus that, i.e. P1*(1-P2) + (1-P1)*(1-P3) in your example 0.3
Here is a simple simulator
#simulate data
sim <- function(tosses, p1, p2, p3) {
coin.1 <- rbinom(tosses, size=1, prob=p1)
coin.2 <- rbinom(tosses, size=1, prob=p2)
coin.3 <- rbinom(tosses, size=1, prob=p3)
ifelse(coin.1 == 1, coin.2, coin.3) # returned
}
The following are illustrations all producing 0.7 (with some random fluctuations)
> mean(sim(100000, 0.8, 0.8, 0.3))
[1] 0.70172
> mean(sim(100000, 0.2, 0.3, 0.8))
[1] 0.69864
> mean(sim(100000, 0.5, 1.0, 0.4))
[1] 0.69795
> mean(sim(100000, 0.3, 0.7, 0.7))
[1] 0.69892
> mean(sim(100000, 0.5, 0.5, 0.9))
[1] 0.70054
> mean(sim(100000, 0.6, 0.9, 0.4))
[1] 0.70201
Nothing you can do subsequently will distinguish these.

Related

The plots of co-variance functions should start from 0-shift

The following was my question given by my teacher,
Generate a sequence of N = 1000 independent observations of random variable with distribution: (c) Exponential with parameter λ = 1 , by
inversion method.
Present graphically obtained sequences(except for those generated in point e) i.e. e.g. (a) i. plot in the coordinates (no. obs.,
value of the obs) ii. plot in the coordinates (obs no n, obs. no n +
i) for i = 1, 2, 3. iii. plot so called covariance function for some
values. i.e. and averages:
I have written the following code,
(*****************************************************************)
(*Task 01(c) and 02(a)*)
(*****************************************************************)
n = 1000;
taskC = Table[-Log[RandomReal[]], {n}];
ListPlot[taskC, AxesLabel->{"No. obs", "value of the obs"}]
i = 1;
ListPlot[Table[
{taskC[[k]], taskC[[k+i]]},
{k, 1, n-i,1}],
AxesLabel->{"obs.no.n", "obs.no.n+1"}]
i++;
ListPlot[Table[
{taskC[[k]], taskC[[k+i]]},
{k, 1, n-i,1}],
AxesLabel-> {"obs.no.n", "obs.no.n+2"}]
i++;
ListPlot[Table[
{taskC[[k]], taskC[[k+i]]},
{k,1,n-i,1}],
AxesLabel->{"obs.no.n", "obs.no.n+3"}]
avg = (1/n)*Sum[taskC[[i]], {i,n}];
ListPlot[Table[1/(n-tau) * Sum[(taskC[[i]]-avg)*(taskC[[i+tau]] - avg), n], {tau, 1,100}],
Joined->True,
AxesLabel->"Covariance Function"]
He has commented,
The plots of co-variance functions should start from 0-shift. Note
that for larger than 0 shifts you are estimating co-variance between
independent observations which is zero, while for 0 shift you are
estimating variance of observation which is large. Thus the contrast
between these two cases is a clear indication that the observations
are uncorrelated.
What did I do wrong?
How can I correct my code?
Zero-shift means calculating the covariance for tau = 0, which is simply the variance.
Labeled[ListPlot[Table[{tau,
1/(n - tau)*Sum[(taskC[[i]] - avg)*(taskC[[i + tau]] - avg), {i, n - tau}]},
{tau, 0, 5}], Filling -> Axis, FillingStyle -> Thick, PlotRange -> All,
Frame -> True, PlotRangePadding -> 0.2, AspectRatio -> 1],
{"Covariance Function K(n)", "n"}, {{Top, Left}, Bottom}]
Variance[taskC]
0.93484
Covariance[taskC, taskC]
0.93484
(* n = 1 *)
Covariance[Most[taskC], Rest[taskC]]
0.00926913

Issue with getting correct highest average speed

I try to do this task:codewars kata
Description:
In John's car the GPS records every s seconds the distance travelled
from an origin (distances are measured in an arbitrary but consistent
unit). For example, below is part of a record with s = 15:
x = [0.0, 0.19, 0.5, 0.75, 1.0, 1.25, 1.5, 1.75, 2.0, 2.25] The
sections are:
0.0-0.19, 0.19-0.5, 0.5-0.75, 0.75-1.0, 1.0-1.25, 1.25-1.50, 1.5-1.75, 1.75-2.0, 2.0-2.25 We can calculate John's average hourly speed on every section and we get:
[45.6, 74.4, 60.0, 60.0, 60.0, 60.0, 60.0, 60.0, 60.0] Given s and x
the task is to return as an integer the floor of the maximum average
speed per hour obtained on the sections of x. If x length is less than
or equal to 1 return 0: the car didn't move.
Example:
with the above data your function gps(x, s)should return 74
My code:
def gps(s, x)
i = 0
speed = 0
max = 0
0 if x.length <= 1
while i < x.length - 2
speed = get_speed(x[i].to_f, x[i + 1].to_f, s)
max = speed if speed > max
i += 1
end
print max.floor
end
def get_speed(a, b, s)
((b - a).abs * ((60 / s) * 60))
end
Problem is with passing some tests.
Tests:
gps(20, [0.0, 0.23, 0.46, 0.69, 0.92, 1.15, 1.38, 1.61]) result: 41 - correct
gps(12, [0.0, 0.11, 0.22, 0.33, 0.44, 0.65, 1.08, 1.26, 1.68, 1.89, 2.1, 2.31, 2.52, 3.25]) result: 77 - incorrect, should be 219.
I don't have idea where I do wrong. Could someone give some hint to resolve problem?
#mcfinnigan's answer correctly identifies the immediate mistake in your code, but the real underlying cause is that you are not writing idiomatic Ruby. If you were writing idiomatic Ruby (instead of FORTRAN in Ruby syntax, as you are doing), then you would use iterators instead of manually fiddling with loop indices and the problem wouldn't even arise in the first place. Something like this:
def gps(interval, measurements)
compute_result(interval, measurements).tap(&method(:print))
end
private
def compute_result(interval, measurements)
return 0 if measurements.length <= 1
hourly_speed(max_distance(*distances(*measurements)), interval)
end
def distances(*measurements)
measurements.
each_cons(2). # iterate over all consecutive pairs
map {|a, b| b - a } # transform to list of distances travelled
end
def max_distance(*distances)
distances.max
end
def hourly_speed(distance, time_in_seconds)
seconds_per_hour = 60.0 * 60
(distance * seconds_per_hour / time_in_seconds).floor
end
As you see, there are no loops, no indices, no loop conditions, in fact, apart from the edgecase of an empty measurements array, there are no conditions at all, and so there are no conditions, indices, loops etc. to get wrong.
The problem is broken down into smaller subproblems that can be tested and debugged individually. Every method returns a value (instead of just printing to the console), which makes it possible to easily test it automatically (and also to reuse it in other methods).
while i < x.length - 2
This appears to be the issue. A classic off-by-one error; you are not considering the final element in your array.
Change your condition to
while i < x.length - 1
and your bug goes away.

Calculating radial velocity from an inverse linear scale

I have a user input that can have an integer value of 1 through 50.
i have, let's imagine, a needle that turns, like it was a clock.
The speed of that turn is determined by the delta in radians it moves every frame.
So, if i have a speed of PI/2, the needle turns a half circle every frame.
I have come to the conclusion that the possible speed, should be between PI/8 (the fastest) and PI/256 (the slowest).
I am trying to build an algorithm that will translate the user input of 1 (the slowest) and 50 (the fastest) into PI/256 and PI/8 (the max value 50 is arbitrary, can be something else); obviously the numbers between should be in reverse correspondence.
what i need would be a formula like:
delta = userInput * (.............)
I'have been trying for hours, if someone could help me out would be very much appreciated.
Solve the line equation: y = m * x + b. I.e., plug in your two points to get two equations with m and b as unknowns, then solve for m and b.
(See my answer here for a more detailed explanation of why this works.)
This was just slightly too long for a comment. Whether the two scales agree in direction doesn't matter, although yours do: you said 1 (the slowest) corresponds to pi/256 (the slowest), and 50 (the fastest) corresponds to pi/8 (the fastest). 1 < 50, and pi/256 < pi/8.
So if that's the right ordering:
>>> a0, a1 = 1., 50.
>>> b0, b1 = pi/256, pi/8
>>> def rescale(x):
... return ((x-a0)/(a1-a0)) * (b1-b0) + b0
...
>>> rescale(1)
0.01227184630308513
>>> rescale(1) == pi/256
True
>>>
>>> rescale(50)
0.39269908169872414
>>> rescale(50) == pi/8
True
with 25 somewhere close to the middle:
>>> rescale(25)
0.198603553435643
If you want 1 to correspond to the fastest speed instead, then simply flip b0 and b1:
>>> a0, a1 = 1., 50.
>>> b0, b1 = pi/8, pi/256
>>> def rescale(x):
... return ((x-a0)/(a1-a0)) * (b1-b0) + b0
...
>>> rescale(1)
0.39269908169872414
>>> rescale(50)
0.012271846303085143
The formula continues to apply.

Can't DSolve two-body problem using Mathematica?

EDIT:
#auxsvr is correct that I had the force equations wrong, and about the
-3/2 exponent.
Another way to see this it to simply to 2 dimensions and consider a
force acting from the origin, proportional to 1/r^2 just like gravity,
where r is the distance from the origin.
At (x,y), the force acts in the direction (-x,-y). However, that's
just the direction, not the magnitude. If we use k as the constant of
proportionality, the force is (-kx, -ky).
The magnitude of the force is thus Sqrt[(-kx)^2+(-ky)^2], or
k*Sqrt[x^2+y^2], or k*Sqrt[r^2] or k*r
Since the force magnitude is also 1/r^2, this gives us k= 1/r^3.
The force is thus (-x/r^3, -y/r^3).
Since I was initially using r^2 as my primary quantity, that's (r^2)^(-3/2), which is where the 3/2 comes from.
This effectively invalidates my question, although it still makes an
interesting theoretical discussion.
I retried this Mathematica with the correct equations, but still got
no answer. As other points out, the result is only an ellipse under
certain conditions (could be a parabola or hyperbola in other cases).
Additionally, although the eventual orbit is a conic section, the
initial orbit may spiral in or out until the final conic section orbit
is achieved.
EDIT ENDS HERE
I'm using Mathematica to solve the two-body problem:
DSolve[{
d2[t] == (x1[t]-x0[t])^2 + (y1[t]-y0[t])^2 + (z1[t]-z0[t])^2,
D[x0[t], t,t] == (x1[t]-x0[t])/d2[t],
D[y0[t], t,t] == (y1[t]-y0[t])/d2[t],
D[z0[t], t,t] == (z1[t]-z0[t])/d2[t],
D[x1[t], t,t] == -(x1[t]-x0[t])/d2[t],
D[y1[t], t,t] == -(y1[t]-y0[t])/d2[t],
D[z1[t], t,t] == -(z1[t]-z0[t])/d2[t]
},
{x0,y0,z0,x1,y1,x1,d2},
t
]
But I get back:
There are fewer dependent variables than equations, so the system is overdetermined.
I count 7 equations and 7 dependent variables?
In fact, the system is semi-undetermined, since I don't provide positions and velocities at time 0.
I realize my equations themselves might be wrong for the two-body problem, but I'd still like to know why Mathematica complains about this.
How about NDSolve?
d2[t_] = (-x0[t] + x1[t])^2 + (-y0[t] + y1[t])^2 + (-z0[t] +
z1[t])^2; sol = {x0, y0, z0, x1, y1, z1} /.
NDSolve[{x0''[t] == (-x0[t] + x1[t])/d2[t],
y0''[t] == (-y0[t] + y1[t])/d2[t],
z0''[t] == (-z0[t] + z1[t])/d2[t], x1''[t] == -x0''[t],
y1''[t] == -y0''[t], z1''[t] == -z0''[t], x0[0] == 0, y0[0] == 0,
z0[0] == 0, x1[0] == 1, y1[0] == 0, z1[0] == 0, x0'[0] == -0.5,
y0'[0] == 1, z0'[0] == 0.5, x1'[0] == 0.5, y1'[0] == -1,
z1'[0] == -0.5}, {x0, y0, z0, x1, y1, z1}, {t, 0, 120}][[1]]
r = 3;
Animate[
Graphics3D[
{
PointSize -> 0.05,
Point[{sol[[1]][t], sol[[2]][t], sol[[3]][t]}],
Point[{sol[[4]][t], sol[[5]][t], sol[[6]][t]}],
Red,
Line[Table[{sol[[1]][t1], sol[[2]][t1], sol[[3]][t1]}, {t1, 0, t, 0.1}]],
Green,
Line[Table[{sol[[4]][t1], sol[[5]][t1], sol[[6]][t1]}, {t1, 0, t, 0.1}]]
},
PlotRange -> {{-r, r}, {-r, r}, {-r, r}}
], {t, 0, 120}, AnimationRate -> 4
]
I'm suprised no one else noticed that everyone wrote the equations of motion incorrectly, which is apparent from the plot, because bounded orbits in the gravitational potential of two bodies are always closed (Bertrand's theorem). The correct equations of motion are
{x0''[t] == (-x0[t] + x1[t])/d2[t]^(3/2),
y0''[t] == (-y0[t] + y1[t])/d2[t]^(3/2),
x1''[t] == -x0''[t],
y1''[t] == -y0''[t]}
with
d2[t_]:= (x1[t]-x0[t])^2 + (y1[t]-y0[t])^2
since the motion is planar for central force fields. Also, one must set the initial conditions appropriately, otherwise the centre of mass moves and the orbits are no longer conical sections.

How Could One Implement the K-Means++ Algorithm?

I am having trouble fully understanding the K-Means++ algorithm. I am interested exactly how the first k centroids are picked, namely the initialization as the rest is like in the original K-Means algorithm.
Is the probability function used based on distance or Gaussian?
In the same time the most long distant point (From the other centroids) is picked for a new centroid.
I will appreciate a step by step explanation and an example. The one in Wikipedia is not clear enough. Also a very well commented source code would also help. If you are using 6 arrays then please tell us which one is for what.
Interesting question. Thank you for bringing this paper to my attention - K-Means++: The Advantages of Careful Seeding
In simple terms, cluster centers are initially chosen at random from the set of input observation vectors, where the probability of choosing vector x is high if x is not near any previously chosen centers.
Here is a one-dimensional example. Our observations are [0, 1, 2, 3, 4]. Let the first center, c1, be 0. The probability that the next cluster center, c2, is x is proportional to ||c1-x||^2. So, P(c2 = 1) = 1a, P(c2 = 2) = 4a, P(c2 = 3) = 9a, P(c2 = 4) = 16a, where a = 1/(1+4+9+16).
Suppose c2=4. Then, P(c3 = 1) = 1a, P(c3 = 2) = 4a, P(c3 = 3) = 1a, where a = 1/(1+4+1).
I've coded the initialization procedure in Python; I don't know if this helps you.
def initialize(X, K):
C = [X[0]]
for k in range(1, K):
D2 = scipy.array([min([scipy.inner(c-x,c-x) for c in C]) for x in X])
probs = D2/D2.sum()
cumprobs = probs.cumsum()
r = scipy.rand()
for j,p in enumerate(cumprobs):
if r < p:
i = j
break
C.append(X[i])
return C
EDIT with clarification: The output of cumsum gives us boundaries to partition the interval [0,1]. These partitions have length equal to the probability of the corresponding point being chosen as a center. So then, since r is uniformly chosen between [0,1], it will fall into exactly one of these intervals (because of break). The for loop checks to see which partition r is in.
Example:
probs = [0.1, 0.2, 0.3, 0.4]
cumprobs = [0.1, 0.3, 0.6, 1.0]
if r < cumprobs[0]:
# this event has probability 0.1
i = 0
elif r < cumprobs[1]:
# this event has probability 0.2
i = 1
elif r < cumprobs[2]:
# this event has probability 0.3
i = 2
elif r < cumprobs[3]:
# this event has probability 0.4
i = 3
One Liner.
Say we need to select 2 cluster centers, instead of selecting them all randomly{like we do in simple k means}, we will select the first one randomly, then find the points that are farthest to the first center{These points most probably do not belong to the first cluster center as they are far from it} and assign the second cluster center nearby those far points.
I have prepared a full source implementation of k-means++ based on the book "Collective Intelligence" by Toby Segaran and the k-menas++ initialization provided here.
Indeed there are two distance functions here. For the initial centroids a standard one is used based numpy.inner and then for the centroids fixation the Pearson one is used. Maybe the Pearson one can be also be used for the initial centroids. They say it is better.
from __future__ import division
def readfile(filename):
lines=[line for line in file(filename)]
rownames=[]
data=[]
for line in lines:
p=line.strip().split(' ') #single space as separator
#print p
# First column in each row is the rowname
rownames.append(p[0])
# The data for this row is the remainder of the row
data.append([float(x) for x in p[1:]])
#print [float(x) for x in p[1:]]
return rownames,data
from math import sqrt
def pearson(v1,v2):
# Simple sums
sum1=sum(v1)
sum2=sum(v2)
# Sums of the squares
sum1Sq=sum([pow(v,2) for v in v1])
sum2Sq=sum([pow(v,2) for v in v2])
# Sum of the products
pSum=sum([v1[i]*v2[i] for i in range(len(v1))])
# Calculate r (Pearson score)
num=pSum-(sum1*sum2/len(v1))
den=sqrt((sum1Sq-pow(sum1,2)/len(v1))*(sum2Sq-pow(sum2,2)/len(v1)))
if den==0: return 0
return 1.0-num/den
import numpy
from numpy.random import *
def initialize(X, K):
C = [X[0]]
for _ in range(1, K):
#D2 = numpy.array([min([numpy.inner(c-x,c-x) for c in C]) for x in X])
D2 = numpy.array([min([numpy.inner(numpy.array(c)-numpy.array(x),numpy.array(c)-numpy.array(x)) for c in C]) for x in X])
probs = D2/D2.sum()
cumprobs = probs.cumsum()
#print "cumprobs=",cumprobs
r = rand()
#print "r=",r
i=-1
for j,p in enumerate(cumprobs):
if r 0:
for rowid in bestmatches[i]:
for m in range(len(rows[rowid])):
avgs[m]+=rows[rowid][m]
for j in range(len(avgs)):
avgs[j]/=len(bestmatches[i])
clusters[i]=avgs
return bestmatches
rows,data=readfile('/home/toncho/Desktop/data.txt')
kclust = kcluster(data,k=4)
print "Result:"
for c in kclust:
out = ""
for r in c:
out+=rows[r] +' '
print "["+out[:-1]+"]"
print 'done'
data.txt:
p1 1 5 6
p2 9 4 3
p3 2 3 1
p4 4 5 6
p5 7 8 9
p6 4 5 4
p7 2 5 6
p8 3 4 5
p9 6 7 8

Resources