randomized SVD singular values - performance

randomized SVD decomposes a matrix by extracting the first k singular values/vectors using k+p random projections. this works surprisingly well for large matrices.
my question concerns the singular values that are output from the algorithm. why aren't the values equal to the first k-singular values if you do the full SVD?
Below I have a simple implementation in R. Any suggestions on improving the performance would be appreciated.
rsvd = function(A, k=10, p=5) {
n = nrow(A)
y = A %*% matrix(rnorm(n * (k+p)), nrow=n)
q = qr.Q(qr(y))
b = t(q) %*% A
svd = svd(b)
list(u=q %*% svd$u, d=svd$d, v=svd$v)
> set.seed(10)
> A <- matrix(rnorm(500*500),500,500)
> svd(A)$d[1:15]
[1] 44.94307 44.48235 43.78984 43.44626 43.27146 43.15066 42.79720 42.54440 42.27439 42.21873 41.79763 41.51349 41.48338 41.35024 41.18068
> rsvd.o(A,10,5)$d
[1] 34.83741 33.83411 33.09522 32.65761 32.34326 31.80868 31.38253 30.96395 30.79063 30.34387 30.04538 29.56061 29.24128 29.12612 27.61804

I reckon that your algorithm is a modification of the algorithm of Martinsson et al.. If I understood it correctly, this is especially meant for approximations for low rank matrices. I might be wrong though.
The difference is easily explained by the huge difference between the actual rank of A (500) and the values of k (10) and p (5). Plus, Martinsson et al mention that the value for p should actually be larger than the chosen value for k.
So if we apply your solution taking their considerations into account, using :
A <- matrix(rnorm(500*500),500,500) # rank 500
B <- matrix(rnorm(500*50),500,500) # rank 50
We find for the timings that the use of a larger p value still results in a huge speed-up compared to the original svd algorithm.
> system.time(t1 <- svd(A)$d[1:5])
user system elapsed
0.8 0.0 0.8
> system.time(t2 <- rsvd(A,10,5)$d[1:5])
user system elapsed
0.01 0.00 0.02
> system.time(t3 <- rsvd(A,10,30)$d[1:5])
user system elapsed
0.04 0.00 0.03
> system.time(t4 <- svd(B)$d[1:5] )
user system elapsed
0.55 0.00 0.55
> system.time(t5 <-rsvd(B,10,5)$d[1:5] )
user system elapsed
0.02 0.00 0.02
> system.time(t6 <-rsvd(B,10,30)$d[1:5] )
user system elapsed
0.05 0.00 0.05
> system.time(t7 <-rsvd(B,25,30)$d[1:5] )
user system elapsed
0.06 0.00 0.06
But we see that using a higher p for a lower rank matrix indeed gives a better approximation. If we let k also approach the rank a bit closer, the difference between the real solution and the approximation becomes appx. 0, while the speed gain is still substantial.
> round(mean(t2/t1),2)
[1] 0.77
> round(mean(t3/t1),2)
[1] 0.82
> round(mean(t5/t4),2)
[1] 0.92
> round(mean(t6/t4),2)
[1] 0.97
> round(mean(t7/t4),2)
[1] 1
So in general I believe that one could conclude that :
p should be chosen so p > k (Martinsson calls it l if I'm right)
k shouldn't be too much different from rank(A)
For low rank matrices the result is generally better.
As far as I'm concerned, it's a neat way of doing it. I couldn't really find a more optimal way actually. The only thing I could say is that the construct t(q) %*% A is advised against. One should use crossprod(q,A) for that, which is supposed to be a tiny bit faster. But in your example the difference was nonexistent.

The paper by Halko, Martinsson and Tropp also recommends to do a couple of power iterations before computing the QR. We do 3 power iterations by default in the implementation in scikit-learn and we found it to work very well in practice.


Reduced chi-square too low (close to 0) after weighted fit - convolution integral - Python lmfit

I'm fitting the following data where t: time (s), G: counts per second, f: impulse function (mm/s):
t G f
0 4.58 0
900 11.73 (11/900)
1800 18.23 (8.25/900)
2700 19.33 (3/900)
3600 19.04 (0.5/900)
4500 17.21 0
5400 12.98 0
6300 11.59 0
7200 9.26 0
8100 7.66 0
9000 6.59 0
9900 5.68 0
10800 5.1 0
Using the following convolution integral:
And more specifically:
Where: lambda_1 = 0.000431062 and lambda_2 = 0.000580525.
The code used to perform that fitting is:
#Extract data into numpy arrays
#add parameters
params.add('c',value =1)
#define functions
def exp(x,k):
return np.exp(-x*k)
def residuals(params,x,y):
model = A*(np.convolve(exp(x,lambda_1), f))[:len(x)]*dt+B*np.convolve(exp(x,lambda_2), f)[:len(x)]*dt+C
return (model - y)*weights
#perform fit using leastsq
result = minimize(residuals, params, args=(t,g))
final = g + result.residual
It works, however I obtain a very low reduced chi-square (around 0) when I multiply the residual to be minimized by the weight (1/np.sqrt (g) (weighted fit). If I do not taken into account the weight (non-weighted fit), I obtain a reduced chi-square of 0.254. I would like to obtain a reduced chi-square around 1.
A reduced chi-square far below 1 would imply that your estimate of the uncertainty in the data is far too large. If I read your example correctly, you are using the square-root of G as the uncertainty in G. Using the square root is a standard approach for estimating uncertainties in values dominated by counting statistics.
But... your G is a floating point number that you describe as counts per second. I might assume counts per second over 900 seconds.
If that is right (and we assume for simplicity no significant uncertainty in that time duration), then the uncertainties should be 30x smaller than you have them. That is, you are using
g_values = [4.58 , 11.73, 18.23]
g_uncertainties = sqrt(g_values) = [2.1401, 3.4249, 4.2697]
but the uncertainties in the counts would be sqrt(g_values*900), and so the uncertainties in counts per second by sqrt(g_values*900)/900 = sqrt(g_values)/30.
More formally, the uncertainties in a value representing "counts per time" would add the uncertainties in counts and the uncertainties in time in quadrature. But again, the uncertainties in your time are probably very small (or, at least your time data implies that it is below 1 second).

how do you compute the constant, c, for the asymptotic runtimes of heapSort?

I am trying to understand how to compute the constant, c, when given the data. Before showing the data, I will inform you that I have already graphed the data with a linear trend on Excel. I am still quite baffled as to what I should use to calculate c.
Key question: How do you find some c that makes O(g(n)) true?
Expecting that you do not need to find T(n). The graphs you create should be sufficient.
Data for HeapSort:
1 0
5 0
10 0
50 0
100 0
500 0
1000 0
5000 0
10,000 0.01
50,000 0.04
100,000 0.1
500,000 0.484
1,000,000 1.346
5,000,000 6.596667
10,000,000 14.854
Generally, this sort of problem is solved by fitting the data to an expected function (such as t = cn + b, or t = cnlogn + b) using a least-squares method. Assuming that the "c" you are requesting is the constant factor in front of the main term of your runtime, you will get c with that method.
The value of c will of course be dependent on the particular code that is running and the particular machine on which it is running.

How to multiply each column of matrix A by each row of matrix B and sum resulting matrices in Matlab?

I have a problem which I hope can be easily solved.
A is a NG matrix, B is NG matrix. The goal is to get matrix C
which is equal to multiplying each column of transposed A by each row of B and summing resulting matrices; total number of such matrices before summing is NN, their size is GG
This can be easily done in MatLab with two for-loops:
for n=1:1:N
for m=1:1:N
However, for large matrices it is quite slow.
So, my question is:
is there a more efficient way for calculating C matrix in Matlab?
Thank you
If you write it all out for two 3×3 matrices, you'll find that the operation basically equals this:
C = bsxfun(#times, sum(B), sum(A).');
Running each of the answers here for N=50, G=100 and repeating each method 100 times:
Elapsed time is 13.839893 seconds. %// OP's original method
Elapsed time is 19.773445 seconds. %// Luis' method
Elapsed time is 0.306447 seconds. %// Robert's method
Elapsed time is 0.005036 seconds. %// Rody's method
(a factor of ≈ 4000 between the fastest and slowest method...)
I think this should improve the performance significantly
C = zeros(G);
for n = 1:N
C = C + sum(A,1)'*B(n,:);
You avoid one loop, and should also avoid the problems of running out of memory. According to my benchmarking, it's about 20 times faster than the approach with two loops. (Note, I had to benchmark in Octace since I don't have MATLAB on this PC).
Use bsxfun instead of the loops, and then sum twice:
C = sum(sum(bsxfun(#times, permute(A, [2 3 1]), permute(B,[3 2 4 1])), 3), 4);

Algorithm to smooth numbers with variable input time

I have an app that accepts integers at a variable rate every .25 to 2 seconds.
I'd like to output the data in a smoothed format for 3, 5 or 7 seconds depending on user input.
If the data always came in at the same rate, let's say every .25 seconds, then this would be easy. The variable rate is what confuses me.
Data might come in like this:
Time - Data
0.25 - 100
0.50 - 102
1.00 - 110
1.25 - 108
2.25 - 107
2.50 - 102
I'd like to display a 3 second rolling average every .25 seconds on my display.
The simplest form of doing this is to put each item into an array with a time stamp.
array.push([0.25, 100])
array.push([0.50, 102])
array.push([1.00, 110])
array.push([1.25, 108])
Then every .25 seconds I would read through the array, back to front, until I got to a time that was less than now() - rollingAverageTime. I would sum that and display it. I would then .Shift() the beginning of the array.
That seems not very efficient though. I was wondering if someone had a better way to do this.
Why don't you save the timestamp of the starting value and then accumulate the values and the number of samples until you get a timestamp that is >= startingTime + rollingAverageTime and then divide the accumulator by the number of samples taken?
If you want to preserve the number of samples, you can do this way:
Take the accumulator, and for each input value sum it and store the value and the timestamp in a shift register; at every cycle, you have to compare the latest sample's timestamp with the oldest timestamp in the shift register plus the smoothing time; if it's equal or more, subtract the oldest saved value from the accumulator, delete that entry from the shift register and output the accumulator, divided by the smoothing time. If you iterate you obtain a rolling average with (i think) the least amount of computation for each cycle:
a sum (to increment the accumulator)
a sum and a subtraction (to compare the timestamp)
a subtraction (from the accumulator)
a division (to calculate the average, done in a smart way can be a shift right)
For a total of about 4 algebric sums and a division (or shift)
For taking into account the time from the last sample as a weighting factor, you can divide the value for the ratio between this time and the averaging time, and you obtain an already weighted average, without having to divide the accumulator.
I added this part because it doesn't add computational load, so you can implement quite easy if you want to.
The answer from clabacchio has the basics right, but perhaps you need a bit more sophisticated answer.
Calculating the average:
0.25 - 100
0.50 - 102
1.00 - 110
In the above subset of the data what is the answer you want? You could use the mean of these numbers or you could do it in a weighted fashion. You could convert the data into:
0.50 - 0.25 = 0.25 ---- (100+102)/2 = 101
1.00 - 0.50 = 0.50 ---- (102+110)/2 = 106
Then you can take the weighted average of these values, weight being the time difference, and value being the average value.
The final answer = (0.25*101 + 0.5*106)/(0.25+0.5) = whatever the value is.
Now coming to "moving" averages:
You can either use previous k values or previous k seconds worth of data. In both cases you can keep two sums: weighted sum and sum of weights.
So... the worst case scenario is 4 readings per second over 7 seconds = 28 values in your array to process. That will be done in nanoseconds anyway, so not worth optimizing IMHO.

Performance of rbind.data.frame

I have a list of dataframes for which I am certain that they all contain at least one row (in fact, some contain only one row, and others contain a given number of rows), and that they all have the same columns (names and types). In case it matters, I am also certain that there are no NA's anywhere in the rows.
The situation can be simulated like this:
#create one row
onerowdfr<-do.call(data.frame, c(list(), rnorm(100) , lapply(sample(letters[1:2], 100, replace=TRUE), function(x){factor(x, levels=letters[1:2])})))
colnames(onerowdfr)<-c(paste("cnt", 1:100, sep=""), paste("cat", 1:100, sep=""))
#reuse it in a list
someParts<-lapply(rbinom(200, 1, 14/200)*6+1, function(reps){onerowdfr[rep(1, reps),]})
I've set the parameters (of the randomization) so that they approximate my true situation.
Now, I want to unite all these dataframes in one dataframe. I thought using rbind would do the trick, like this:
result<-do.call(rbind, someParts)
Now, on my system (which is not particularly slow), and with the settings above, this takes is the output of the system.time:
user system elapsed
5.61 0.00 5.62
Nearly 6 seconds for rbind-ing 254 (in my case) rows of 200 variables? Surely there has to be a way to improve the performance here? In my code, I have to do similar things very often (it is a from of multiple imputation), so I need this to be as fast as possible.
Can you build your matrices with numeric variables only and convert to a factor at the end? rbind is a lot faster on numeric matrices.
On my system, using data frames:
> system.time(result<-do.call(rbind, someParts))
user system elapsed
2.628 0.000 2.636
Building the list with all numeric matrices instead:
onerowdfr2 <- matrix(as.numeric(onerowdfr), nrow=1)
someParts2<-lapply(rbinom(200, 1, 14/200)*6+1,
function(reps){onerowdfr2[rep(1, reps),]})
results in a lot faster rbind.
> system.time(result2<-do.call(rbind, someParts2))
user system elapsed
0.001 0.000 0.001
EDIT: Here's another possibility; it just combines each column in turn.
> system.time({
+ n <- 1:ncol(someParts[[1]])
+ names(n) <- names(someParts[[1]])
+ result <- as.data.frame(lapply(n, function(i)
+ unlist(lapply(someParts, `[[`, i))))
+ })
user system elapsed
0.810 0.000 0.813
Still not nearly as fast as using matrices though.
If you only have numerics and factors, it's not that hard to convert everything to numeric, rbind them, and convert the necessary columns back to factors. This assumes all factors have exactly the same levels. Converting to a factor from an integer is also faster than from a numeric so I force to integer first.
someParts2 <- lapply(someParts, function(x)
matrix(unlist(x), ncol=ncol(x)))
result<-as.data.frame(do.call(rbind, someParts2))
a <- someParts[[1]]
f <- which(sapply(a, class)=="factor")
for(i in f) {
lev <- levels(a[[i]])
result[[i]] <- factor(as.integer(result[[i]]), levels=seq_along(lev), labels=lev)
The timing on my system is:
user system elapsed
0.090 0.00 0.091
Not a huge boost, but swapping rbind for rbind.fill from the plyr package knocks about 10% off the running time (with the sample dataset, on my machine).
If you really want to manipulate your data.frames faster, I would suggest to use the package data.table and the function rbindlist(). I did not perform extensive tests but for my dataset (3000 dataframes, 1000 rows x 40 columns each) rbindlist() takes only 20 seconds.
This is ~25% faster, but there has to be a better way...
N <- do.call(sum, lapply(someParts, nrow))
SP <- as.data.frame(lapply(someParts[[1]], function(x) rep(x,N)))
k <- 0
for(i in 1:length(someParts)) {
j <- k+1
k <- k + nrow(someParts[[i]])
SP[j:k,] <- someParts[[i]]
Make sure you're binding dataframe to dataframe. Ran into huge perf degradation when binding list to dataframe.
