Best iterative way to calculate the fundamental matrix of an absorbing Markov Chain? - algorithm

I have a very large absorbing Markov chain. I want to obtain the fundamental matrix of this chain to calculate the expected number of steps before absortion. From this question I know that this can be calculated by the equation
(I - Q)t=1
which can be obtained by using the following python code:
def expected_steps_fast(Q):
I = numpy.identity(Q.shape[0])
o = numpy.ones(Q.shape[0])
numpy.linalg.solve(I-Q, o)
However, I would like to calculate it using some kind of iterative method similar to the power iteration method used for calculate the PageRank. This method would allow me to calculate an approximation to the expected number of steps before absortion in a mapreduce-like system.
¿Does something similar exist?

If you have a sparse matrix, check if scipy.spare.linalg.spsolve works. No guarantees about numerical robustness, but at least for trivial examples it's significantly faster than solving with dense matrices.
import networkx as nx
import numpy as np
import scipy.sparse as sp
import scipy.sparse.linalg as spla
def example(n):
"""Generate a very simple transition matrix from a directed graph
g = nx.DiGraph()
for i in xrange(n-1):
g.add_edge(i+1, i)
g.add_edge(i, i+1)
g.add_edge(n-1, n)
g.add_edge(n, n)
m = nx.to_numpy_matrix(g)
# normalize rows to ensure m is a valid right stochastic matrix
m = m / np.sum(m, axis=1)
return m
A = sp.csr_matrix(example(2000)[:-1,:-1])
Ad = np.array(A.todense())
def sp_solve(Q):
I = sp.identity(Q.shape[0], format='csr')
o = np.ones(Q.shape[0])
return spla.spsolve(I-Q, o)
def dense_solve(Q):
I = numpy.identity(Q.shape[0])
o = numpy.ones(Q.shape[0])
return numpy.linalg.solve(I-Q, o)
Timings for sparse solution:
%timeit sparse_solve(A)
1000 loops, best of 3: 1.08 ms per loop
Timings for dense solution:
%timeit dense_solve(Ad)
1 loops, best of 3: 216 ms per loop
Like Tobias mentions in the comments, I would have expected other solvers to outperform the generic one, and they may for very large systems. For this toy example, the generic solve seems to work well enough.

I arraived to this answer thanks to #tobias-ribizel's suggestion of using the Neumann series. If we part from the following equation:
Using the Neumann series:
If we multiply each term of the series by the vector 1 we could operate separately over each row of the matrix Q and approximate successively with:
This is the python code I use to calculate this:
def expected_steps_iterative(Q, n=10):
N = Q.shape[0]
acc = np.ones(N)
r_k_1 = np.ones(N)
for k in range(1, n):
r_k = np.zeros(N)
for i in range(N):
for j in range(N):
r_k[i] += r_k_1[j] * Q[i, j]
if np.allclose(acc, acc+r_k, rtol=1e-8):
acc += r_k
acc += r_k
r_k_1 = r_k
return acc
And this is the code using Spark. This code expects that Q is a RDD where each row is a tuple (row_id, dict of weights for that row of the matrix).
def expected_steps_spark(sc, Q, n=10):
def dict2np(d, sz):
vec = np.zeros(sz)
for k, v in d.iteritems():
vec[k] = v
return vec
sz = Q.count()
acc = np.ones(sz)
x = {i:1.0 for i in range(sz)}
for k in range(1, n):
bc_x = sc.broadcast(x)
x_old = x
x = (u, ol): (u, reduce(lambda s, j: s + bc_x.value[j]*ol[j], ol, 0.0)))
x = x.collectAsMap()
v_old = dict2np(x_old, sz)
v = dict2np(x, sz)
acc += v
if np.allclose(v, v_old, rtol=1e-8):
return acc


Faster way to compute distributions from Markov chain?

Suppose that I have a probability transition matrix, say a matrix of dimensions 2000x2000, that represents a homogeneous Markov chain, and I want to get some statistics of each probability distribution of the first 200 steps of the chain (the distribution of the first row at each step), then I've written the following
using Distributions, LinearAlgebra
# This function defines our transition matrix:
function tm(N::Int, n0::Int)
[pdf(Hypergeometric(N-l,l,n0),k-l) for l in 0:N, k in 0:N]
# This computes the 5-percentile of a probability vector
function percentile5(M::Vector)
while s <= 0.05
i += 1
s += M[i]
return i-1
# This function compute a matrix with three rows: means, 5-percentiles
# and standard deviations. Each column represent a session.
function stats(N::Int, n0::Int, m::Int)
A = tm(N,n0)
B = I # Initilizing B with the identity matrix
sup = 0:N # The support of each distribution
sup2 = [k^2 for k in sup]
stats = zeros(3,m)
for i in 1:m
C = B[1,:]
stats[1,i] = sum(C .* sup) # Mean
stats[2,i] = percentile5(C) # 5-percentile
stats[3,i] = sqrt(sum(C .* sup2) - stats[1,i]^2) # Standard deviation
B = A*B
return stats
data = stats(2000,50,200)
My question is, there is a more efficient (faster) way to do the same computation? I don't see a better way to do it but maybe there are some tricks that speed-up this computation.
This is what I have running so far:
using Distributions, LinearAlgebra, SparseArrays
# This function defines our transition matrix:
function tm(N::Int, n0::Int)
[pdf(Hypergeometric(N-l,l,n0),k-l) for l in 0:N, k in 0:N]
# This computes the 5-percentile of a probability vector
function percentile5(M::AbstractVector)
s = zero(eltype(M))
res = length(M)
#inbounds for i = 1:length(M)
s += M[i]
if s > 0.05
res = i - 1
return res
# This function compute a matrix with three rows: means, 5-percentiles
# and standard deviations. Each column represent a session.
function stats(N::Int, n0::Int, m::Int)
A = sparse(transpose(tm(N, n0)))
C = zeros(size(A, 1))
C[1] = 1.0
sup = 0:N # The support of each distribution
sup2 = sup .^ 2
stats = zeros(3, m)
for i = 1:m
stats[1, i] = sum(C .* sup) # Mean
stats[2, i] = percentile5(C) # 5-percentile
stats[3, i] = sqrt(sum(C .* sup2) - stats[1, i]^2) # Standard deviation
C = A * C
return stats
It is around 4x faster (on smaller parameters - possibly much more speedup on large parameters). Basically uses the tips I've made in the comment:
using sparse arrays.
avoiding whole matrix multiply but using vector-matrix multiply instead.
Further improvement are possible (like simulation/ensemble method I've mentioned).

Memory efficient version for the construction disjoint random lists

We have to positive integers b,n with b < n/2. We want to generate two random disjoint lists I1, I2 both with b elements from {0,1,...,n}. A simple way to do this is the following.
def disjoint_sets(bound,n):
import random
L = random.sample(range(0,n+1), n+1)
I1 = L[0:bound]
I2 = L[bound:2*bound]
return I1,I2
For large b,n (say b=100, n>1e7) the previous is not memory efficient. Since L is large. I am wondering if there is a method to get I1,I2 without using range(0,n+1)?
Here is a hit-and-miss approach which works well for numbers in the range that you mentioned:
import random
def rand_sample(k,n):
#picks k distinct random integers from range(n)
#assumes that k is much smaller than n
choices = set()
sample = []
for i in range(k): #xrange(k) in Python 2
choice = random.randint(0,n-1)
while choice in choices:
choice = random.randint(0,n-1)
return sample
For your problem, you could do something like:
def rand_pair(b,n):
sample = rand_sample(2*b,n)
return sample[:b],sample[b:]

numpy: evaluating function in matrix, using previous array as argument in calculating the next

I have an m x n array: a, where the integers m > 1E6, and n <= 5.
I have functions F and G, which are composed like this: F( u, G ( u, t)). u is a 1 x n array, t is a scalar, and F and G returns 1 x n arrays.
I need to evaluate each row of a in F, and use previously evaluated row as the u-array for the next evaluation. I need to make m such evaluations.
This has to be really fast. I was previously impressed by scitools.std StringFunction evaluaion for a whole array, but this problem requires using the previously calculated array as an argument in calculating the next. I don't know if StringFunction can do this.
For example:
a = zeros((1000000, 4))
a[0] = asarray([1.,69.,3.,4.1])
# A is a float defined elsewhere, h is a function which accepts a float as its argument and returns an arbitrary float. h is defined elsewhere.
def G(u, t):
return asarray([u[0], u[1]*A, cos(u[2]), t*h(u[3])])
def F(u, t):
return u + G(u, t)
dt = 1E-6
for i in range(1, 1000000):
a[i] = F(a[i-1], i*dt)
i += 1
The problem with the above code is that it is slow as hell. I need to get these calculations done by numpy milliseconds.
How can I do what I want?
Thank you for our time.
Kind regards,
This sort of thing is very difficult to do in numpy. If we look at this by column we see a few simpler solutions.
a[:,0] is very easy:
col0 = np.ones((1000))*2
col0[0] = 1 #Or whatever start value.
np.cumprod(col0, out=col0)
np.allclose(col0, a[:1000,0])
As mentioned earlier this will overflow very quickly. a[:,1] can be done much along the same lines.
I do not believe there is a way to do the next two columns inside numpy alone quickly. We can turn to numba for this:
from numba import auotojit
def python_loop(start, count):
out = np.zeros((count), dtype=np.double)
out[0] = start
for x in xrange(count-1):
out[x+1] = out[x] + np.cos(out[x+1])
return out
numba_loop = autojit(python_loop)
%timeit python_loop(3,1000000)
1 loops, best of 3: 4.14 s per loop
%timeit numba_loop(3,1000000)
1 loops, best of 3: 42.5 ms per loop
Although its worth pointing out that this converges to pi/2 very very quickly and there is little point in calculating this recursion past ~20 values for any start value. This returns the exact same answer to double point precision- I didn't bother finding the cutoff, but it is much less then 50:
%timeit tmp = np.empty((1000000));
tmp[:50] = numba_loop(3,50);
tmp[50:] = np.pi/2
100 loops, best of 3: 2.25 ms per loop
You can do something similar with the fourth column. Of course you can autojit all of the functions, but this gives you several different options to try out depending on numba usage:
Use cumprod for the first two columns
Use an approximation for column 3 (and possible 4) where only the first few iterations are calculated
Implement columns 3 and 4 in numba using autojit
Wrap everything inside of an autojit loop (the best option)
The way you have presented this all rows past ~200 will either be np.inf or np.pi/2. Exploit this.
Slightly faster. Your first column is basicly 2^n. Calculating 2^n for n up to 1000000 is gonna overflow.. second column is even worse.
def calc(arr, t0=1E-6):
u = arr[0]
dt = 1E-6
h = lambda x: np.random.random(1)*50.0
def firstColGen(uStart):
u = uStart
while True:
u += u
yield u
def secondColGen(uStart, A):
u = uStart
while True:
u += u*A
yield u
def thirdColGen(uStart):
u = uStart
while True:
u += np.cos(u)
yield u
def fourthColGen(uStart, h, t0, dt):
u = uStart
t = t0
while True:
u += h(u) * dt
t += dt
yield u
first = firstColGen(u[0])
second = secondColGen(u[1], A)
third = thirdColGen(u[2])
fourth = fourthColGen(u[3], h, t0, dt)
for i in xrange(1, len(arr)):
arr[i] = [,,,]

Improving performance of interpolation (Barycentric formula)

I have been given an assignment in which I am supposed to write an algorithm which performs polynomial interpolation by the barycentric formula. The formulas states that:
p(x) = (SIGMA_(j=0 to n) w(j)*f(j)/(x - x(j)))/(SIGMA_(j=0 to n) w(j)/(x - x(j)))
I have written an algorithm which works just fine, and I get the polynomial output I desire. However, this requires the use of some quite long loops, and for a large grid number, lots of nastly loop operations will have to be done. Thus, I would appreciate it greatly if anyone has any hints as to how I may improve this, so that I will avoid all these loops.
In the algorithm, x and f stand for the given points we are supposed to interpolate. w stands for the barycentric weights, which have been calculated before running the algorithm. And grid is the linspace over which the interpolation should take place:
function p = barycentric_formula(x,f,w,grid)
%Assert x-vectors and f-vectors have same length.
if length(x) ~= length(f)
sprintf('Not equal amounts of x- and y-values. Function is terminated.')
n = length(x);
m = length(grid);
p = zeros(1,m);
% Loops for finding polynomial values at grid points. All values are
% calculated by the barycentric formula.
for i = 1:m
var = 0;
sum1 = 0;
sum2 = 0;
for j = 1:n
if grid(i) == x(j)
p(i) = f(j);
var = 1;
sum1 = sum1 + (w(j)*f(j))/(grid(i) - x(j));
sum2 = sum2 + (w(j)/(grid(i) - x(j)));
if var == 0
p(i) = sum1/sum2;
This is a classical case for matlab 'vectorization'. I would say - just remove the loops. It is almost that simple. First, have a look at this code:
function p = bf2(x, f, w, grid)
m = length(grid);
p = zeros(1,m);
for i = 1:m
var = grid(i)==x;
if any(var)
p(i) = f(var);
sum1 = sum((w.*f)./(grid(i) - x));
sum2 = sum(w./(grid(i) - x));
p(i) = sum1/sum2;
I have removed the inner loop over j. All I did here was in fact removing the (j) indexing and changing the arithmetic operators from / to ./ and from * to .* - the same, but with a dot in front to signify that the operation is performed on element by element basis. This is called array operators in contrast to ordinary matrix operators. Also note that treating the special case where the grid points fall onto x is very similar to what you had in the original implementation, only using a vector var such that x(var)==grid(i).
Now, you can also remove the outermost loop. This is a bit more tricky and there are two major approaches how you can do that in MATLAB. I will do it the simpler way, which can be less efficient, but more clear to read - using repmat:
function p = bf3(x, f, w, grid)
% Find grid points that coincide with x.
% The below compares all grid values with all x values
% and returns a matrix of 0/1. 1 is in the (row,col)
% for which grid(row)==x(col)
var = bsxfun(#eq, grid', x);
% find the logical indexes of those x entries
varx = sum(var, 1)~=0;
% and of those grid entries
varp = sum(var, 2)~=0;
% Outer-most loop removal - use repmat to
% replicate the vectors into matrices.
% Thus, instead of having a loop over j
% you have matrices of values that would be
% referenced in the loop
ww = repmat(w, numel(grid), 1);
ff = repmat(f, numel(grid), 1);
xx = repmat(x, numel(grid), 1);
gg = repmat(grid', 1, numel(x));
% perform the calculations element-wise on the matrices
sum1 = sum((ww.*ff)./(gg - xx),2);
sum2 = sum(ww./(gg - xx),2);
p = sum1./sum2;
% fix the case where grid==x and return
p(varp) = f(varx);
The fully vectorized version can be implemented with bsxfun rather than repmat. This can potentially be a bit faster, since the matrices are not explicitly formed. However, the speed difference may not be large for small system sizes.
Also, the first solution with one loop is also not too bad performance-wise. I suggest you test those and see, what is better. Maybe it is not worth it to fully vectorize? The first code looks a bit more readable..

Randomly Generate a set of numbers of n length totaling x

I'm working on a project for fun and I need an algorithm to do as follows:
Generate a list of numbers of Length n which add up to x
I would settle for list of integers, but ideally, I would like to be left with a set of floating point numbers.
I would be very surprised if this problem wasn't heavily studied, but I'm not sure what to look for.
I've tackled similar problems in the past, but this one is decidedly different in nature. Before I've generated different combinations of a list of numbers that will add up to x. I'm sure that I could simply bruteforce this problem but that hardly seems like the ideal solution.
Anyone have any idea what this may be called, or how to approach it? Thanks all!
Edit: To clarify, I mean that the list should be length N while the numbers themselves can be of any size.
edit2: Sorry for my improper use of 'set', I was using it as a catch all term for a list or an array. I understand that it was causing confusion, my apologies.
This is how to do it in Python
import random
def random_values_with_prescribed_sum(n, total):
x = [random.random() for i in range(n)]
k = total / sum(x)
return [v * k for v in x]
Basically you pick n random numbers, compute their sum and compute a scale factor so that the sum will be what you want it to be.
Note that this approach will not produce "uniform" slices, i.e. the distribution you will get will tend to be more "egalitarian" than it should be if it was picked at random among all distribution with the given sum.
To see the reason you can just picture what the algorithm does in the case of two numbers with a prescribed sum (e.g. 1):
The point P is a generic point obtained by picking two random numbers and it will be uniform inside the square [0,1]x[0,1]. The point Q is the point obtained by scaling P so that the sum is required to be 1. As it's clear from the picture the points close to the center of the have an higher probability; for example the exact center of the squares will be found by projecting any point on the diagonal (0,0)-(1,1), while the point (0, 1) will be found projecting only points from (0,0)-(0,1)... the diagonal length is sqrt(2)=1.4142... while the square side is only 1.0.
Actually, you need to generate a partition of x into n parts. This is usually done the in following way: The partition of x into n non-negative parts can be represented in the following way: reserve n + x free places, put n borders to some arbitrary places, and stones to the rest. The stone groups add up to x, thus the number of possible partitions is the binomial coefficient (n + x \atop n).
So your algorithm could be as follows: choose an arbitrary n-subset of (n + x)-set, it determines uniquely a partition of x into n parts.
In Knuth's TAOCP the chapter 3.4.2 discusses random sampling. See Algortihm S there.
Algorithm S: (choose n arbitrary records from total of N)
t = 0, m = 0;
u = random, uniformly distributed on (0, 1)
if (N - t)*u >= n - m, skip t-th record and increase t by 1; otherwise include t-th record in the sample, increase m and t by 1
if M < n, return to 2, otherwise, algorithm finished
The solution for non-integers is algorithmically trivial: you just select arbitrary n numbers that don't sum up to 0, and norm them by their sum.
If you want to sample uniformly in the region of N-1-dimensional space defined by x1 + x2 + ... + xN = x, then you're looking at a special case of sampling from a Dirichlet distribution. The sampling procedure is a little more involved than generating uniform deviates for the xi. Here's one way to do it, in Python:
xs = [random.gammavariate(1,1) for a in range(N)]
xs = [x*v/sum(xs) for v in xs]
If you don't care too much about the sampling properties of your results, you can just generate uniform deviates and correct their sum afterwards.
Here is a version of the above algorithm in Javascript
function getRandomArbitrary(min, max) {
return Math.random() * (max - min) + min;
function getRandomArray(min, max, n) {
var arr = [];
for (var i = 0, l = n; i < l; i++) {
arr.push(getRandomArbitrary(min, max))
return arr;
function randomValuesPrescribedSum(min, max, n, total) {
var arr = getRandomArray(min, max, n);
var sum = arr.reduce(function(pv, cv) { return pv + cv; }, 0);
var k = total/sum;
var delays = { return k*x; })
return delays;
You can call it with
var myarray = randomValuesPrescribedSum(0,1,3,3);
And then check it with
var sum = myarray.reduce(function(pv, cv) { return pv + cv;},0);
This code does a reasonable job. I think it produces a different distribution than 6502's answer, but I am not sure which is better or more natural. Certainly his code is clearer/nicer.
import random
def parts(total_sum, num_parts):
points = [random.random() for i in range(num_parts-1)]
ret = []
for i in range(1, len(points)):
ret.append((points[i] - points[i-1]) * total_sum)
return ret
def test(total_sum, num_parts):
ans = parts(total_sum, num_parts)
assert abs(sum(ans) - total_sum) < 1e-7
print ans
test(5.5, 3)
test(10, 1)
test(10, 5)
In python:
a: create a list of (random #'s 0 to 1) times total; append 0 and total to the list
b: sort the list, measure the distance between each element
c: round the list elements
import random
import time
TOTAL = 15
def random_sum_split(parts, total, places):
a = [0, total] + [random.random()*total for i in range(parts-1)]
b = [(a[i] - a[i-1]) for i in range(1, (parts+1))]
if places == None:
return b
c = [round(x, places) for x in b]
c.append(round(total-sum(c), places))
return c
def tick():
if info.tick == 1:
start = time.time()
alpha = random_sum_split(PARTS, TOTAL, PLACES)
end = time.time()
log('alpha: %s' % alpha)
log('total: %.7f' % sum(alpha))
log('parts: %s' % PARTS)
log('places: %s' % PLACES)
log('elapsed: %.7f' % (end-start))
[2014-06-13 01:00:00] alpha: [0.154, 3.617, 6.075, 5.154]
[2014-06-13 01:00:00] total: 15.0000000
[2014-06-13 01:00:00] parts: 4
[2014-06-13 01:00:00] places: 3
[2014-06-13 01:00:00] elapsed: 0.0005839
to the best of my knowledge this distribution is uniform
