Is it possible to speed up devectorized Julia codes

Is it possible to speed up devectorized Julia codes - performance

As a part of my script. I have some codes which are as follows (devectorized julia -as possible as)
for kk=1:n # Main loop
for j=1:m
rhs[j]=2*u0[j]-alf*dt*u1[j]-2*mu*u2[j];
end
c=lhs\rhs'; #c: coefficients to be obtained
u2=c'*h;
u1=c'*p.-c'*f;
u0=c'*Q-c'*f*x;
for j=1:m
for i=1:m
lhs[j,i]=2*(Q[i,j]-x[j]*f[i])+alf*dt*(p[i,j]-f[i])+eps*dt*(Q[i,j]-x[j]*f[i])*u1[j]+eps*u0[j]*dt*(p[i,j]-f[i])-2*mu*h[i,j];
end
end
end
where h, p, Q, lhs are mxm matrices; u0, u1, u2, rhs and x are 1xm arrays, alf, dt, mu, eps are scalar constants and f, c are mx1 arrays. I preallocated the matrices and arrays at the start of the script. Vectorized form of the above codes are as follows
for kk=1:n # Main loop
rhs=2*u0-alf*dt*u1-2*mu*u2;
c=lhs\rhs'; #c coefficients to be obtained
u2=c'*h;
u1=c'*p.-c'*f;
u0=c'*Q-c'*f*x;
lhs=2*(Q-f*x)+alf*dt*(p.-f)+eps*dt*(Q-f*x).*u1+eps*dt*u0.*(p.-f)-2*mu*h;
lhs=lhs';
end
For example for n=100 and m=64 the elapsed times are as follows:
devectorized julia: 1.8 seconds
vectorized julia: 0.2 seconds
vectorized numpy: 0.04 seconds
the vectorized julia code is approximately 9 times faster than devectorized julia code and vectorized python code is approximately 5 times faster than the vectorized julia code.
For n=500 and m=256
devectorized julia: 85.589233013 seconds
vectorized julia: 8.232898003 seconds
vectorized numpy: 1.62000012398 seconds
My question: Is it possible to increase the performance of julia in this case?

i think it's possible to also devectorize calculation of u0,u1,u2 like this:
function vectorized()
m = [1.0 2.0 3.0; 1.0 2.0 3.0; 1.0 2.0 3.0]
c = [1.0, 2.0, 3.0]
for i in 1:100000
x1 = c'*m
x2 = c'*m
x3 = c'*m
end
return
end
function vectime(N)
timings = Array(Float64, N)
# Force compilation
vectorized()
for itr in 1:N
timings[itr] = #elapsed vectorized()
end
return timings
end
println("vectorized=",mean(vectime(20)))
function devectorized()
m = [1.0 2.0 3.0; 1.0 2.0 3.0; 1.0 2.0 3.0]
c = [1.0, 2.0, 3.0]
x1 = [0.0, 0.0, 0.0]
x2 = [0.0, 0.0, 0.0]
x3 = [0.0, 0.0, 0.0]
mx = 3
for i in 1:100000
for k in 1:mx
for kk in 1:mx
x1[k]=x1[k]+c[k]*m[k,kk];
x2[k]=x2[k]+c[k]*m[k,kk];
x3[k]=x3[k]+c[k]*m[k,kk];
end
end
end
return
end
function dvectime(N)
timings = Array(Float64, N)
# Force compilation
devectorized()
for itr in 1:N
timings[itr] = #elapsed devectorized()
end
return timings
end
println("devectorized=",mean(dvectime(20)))
above code results:
vectorized=0.17680755404999998
devectorized=0.00441064295

Related

Faster way to compute distributions from Markov chain?

Suppose that I have a probability transition matrix, say a matrix of dimensions 2000x2000, that represents a homogeneous Markov chain, and I want to get some statistics of each probability distribution of the first 200 steps of the chain (the distribution of the first row at each step), then I've written the following
using Distributions, LinearAlgebra
# This function defines our transition matrix:
function tm(N::Int, n0::Int)
[pdf(Hypergeometric(N-l,l,n0),k-l) for l in 0:N, k in 0:N]
end
# This computes the 5-percentile of a probability vector
function percentile5(M::Vector)
s=0
i=0
while s <= 0.05
i += 1
s += M[i]
end
return i-1
end
# This function compute a matrix with three rows: means, 5-percentiles
# and standard deviations. Each column represent a session.
function stats(N::Int, n0::Int, m::Int)
A = tm(N,n0)
B = I # Initilizing B with the identity matrix
sup = 0:N # The support of each distribution
sup2 = [k^2 for k in sup]
stats = zeros(3,m)
for i in 1:m
C = B[1,:]
stats[1,i] = sum(C .* sup) # Mean
stats[2,i] = percentile5(C) # 5-percentile
stats[3,i] = sqrt(sum(C .* sup2) - stats[1,i]^2) # Standard deviation
B = A*B
end
return stats
end
data = stats(2000,50,200)
My question is, there is a more efficient (faster) way to do the same computation? I don't see a better way to do it but maybe there are some tricks that speed-up this computation.

This is what I have running so far:
using Distributions, LinearAlgebra, SparseArrays
# This function defines our transition matrix:
function tm(N::Int, n0::Int)
[pdf(Hypergeometric(N-l,l,n0),k-l) for l in 0:N, k in 0:N]
end
# This computes the 5-percentile of a probability vector
function percentile5(M::AbstractVector)
s = zero(eltype(M))
res = length(M)
#inbounds for i = 1:length(M)
s += M[i]
if s > 0.05
res = i - 1
break
end
end
return res
end
# This function compute a matrix with three rows: means, 5-percentiles
# and standard deviations. Each column represent a session.
function stats(N::Int, n0::Int, m::Int)
A = sparse(transpose(tm(N, n0)))
C = zeros(size(A, 1))
C[1] = 1.0
sup = 0:N # The support of each distribution
sup2 = sup .^ 2
stats = zeros(3, m)
for i = 1:m
stats[1, i] = sum(C .* sup) # Mean
stats[2, i] = percentile5(C) # 5-percentile
stats[3, i] = sqrt(sum(C .* sup2) - stats[1, i]^2) # Standard deviation
C = A * C
end
return stats
end
It is around 4x faster (on smaller parameters - possibly much more speedup on large parameters). Basically uses the tips I've made in the comment:
using sparse arrays.
avoiding whole matrix multiply but using vector-matrix multiply instead.
Further improvement are possible (like simulation/ensemble method I've mentioned).

Iterating a custom function efficiently in Julia

I have a operator T_ implemented quite efficiently in Julia and I want to iterate using the while loop. My operator is given by:
% parameters
β = 0.987
δ = 0.012;
% grids
Kss = 48.1905148382166
kgrid = range(0.75*Kss, stop=1.25*Kss, length=500);
zgrid = [-0.06725382459813659, -0.044835883065424395, -0.0224179415327122, 0 , 0.022417941532712187, 0.04483588306542438, 0.06725382459813657]
% auxiliary functions to build my operator
F_(z,k) = exp(z) * (k^(1/3));
u_(c) = (c^(1-2) - 1)/(1-2)
% T_operator
function T_(V, P, kgrid, zgrid, β, δ)
E = V * P'
T1 = similar(V)
for i in axes(T1, 2)
for j in axes(T1, 1)
temp = F_(zgrid[i], kgrid[j]) + (1-δ)*kgrid[j]
aux = -Inf
for l in eachindex(kgrid)
c = max(0.0, temp - kgrid[l])
aux = max(aux, u_(c) + β * E[l, i])
end
T1[j,i] = aux
end
end
return T1
end
Explaining briefly. This operator has as input
V is a 500x7 matrix and P a 7x7 transition matrix (i.e. each row sums one)
kgrid is a grid of length 500 and zgrid is a grid of length 7
β and δ particular parameters
T_ returns a T1 (500x7) matrix. More details about this operator and the correct way to run this operator can be found in this other question that I asked: Tricks to improve the performance of a cunstom function in Julia
Running this operator only once, it takes very little time, almost instantly. However, I need to iterate this operator until I get an acceptable tolerance error, but my implementation results in an inefficient process taking a long time:
max_it = 1000
it = 1
tol = 1e-3
dist = tol +1
V0 = repeat(sqrt.(a_grid), outer = [1,7]);
while it < max_it && dist > tol
TV= T_(V0,P,kgrid, zgrid, β, δ)
dist = maximum(abs.(TV - V0)) % Computing distance or error
V0 = TV % update
it = it + 1 % Updating iterations
% Some information about the state of the iteration
if rem(it, 100) == 0
println("Current iteration:")
println(it)
println("Current norm:")
println(dist)
end
end
I think a more efficient solution is to incorporate the while loop directly into the implementation of the T_ operator, but I spent the whole day trying this out and couldn't do it. Help.
UPDATE
This the MATLAB version. It is more efficient
V0 = repmat(sqrt(kgrid), 1, 7); % Concave and increasing guess
max_it = 1000;
tol = 1e-3;
%% Iteration
tic
norm = tol + 1;
it = 1;
tic;
[K, Z, new_K] = meshgrid(kgrid, zgrid, kgrid);
K = permute(K, [2, 1, 3]);
Z = permute(Z, [2, 1, 3]);
new_K = permute(new_K, [2, 1, 3]);
% Computing consumption on each possible state and choice
C = max(f(Z,K) + (1-delta)*K - new_K,0);
% All possible utilities
U = u(C);
disp('Starting value function iteration through the good and old brute force...')
while it < max_it & norm > tol
EV = V0 * P';
EV = permute(repmat(EV, 1, 1, nk), [3, 2, 1]);
H = U + beta*EV;
[TV, index] = max(H, [], 3);
it = it + 1; % Updating iterations
norm = max(max(abs(TV - V0))); % Computing error
V0 = TV;
if rem(it, 100) == 0
disp('Current iteration:')
disp(it)
disp('Current norm:')
disp(norm)
end
end
V = TV;
toc;

Just to get an idea of where just we're starting from, let's wrap your inital implementation in a function
function iterate_T_firstattempt(; max_it=1000, it=1, tol=1e-3, dist=tol+1)
V0 = repeat(sqrt.(kgrid), outer = [1,7]) # Assuming the `a_grid` was a typo from your comments
while it < max_it && dist > tol
TV = T_(V0, P, kgrid, zgrid, β, δ)
dist = maximum(abs.(TV - V0)) # Computing distance or error
V0 = TV # update
it += 1 # Updating iterations
# Some information about the state of the iteration
if rem(it, 100) == 0
println("Current iteration:")
println(it)
println("Current norm:")
println(dist)
end
end
end
and benchmark it with BenchmarkTools.jl
julia> #benchmark iterate_T_firstattempt()
sample with 1 evaluation.
Single result which took 7.056 s (0.00% GC) to evaluate,
with a memory estimate of 52.33 MiB, over 5875 allocations.
Oof, that's a lot of allocations. Some of these are coming from the use of global variables, others from type instability, yet others from the design of your functions. A few specific points:
The compiler's probably already making the right call, but we might as well add an #inline to your definition of u_(c) and F_(z,k) to make sure they get inlined. And why not on T_ itself too while we're at it.
You're doing a lot of indexing in the nested for loops, might as well throw an #inbounds on there given that there should be no way of getting out-of-bounds indexing.
One better: the loops in T_ look to be safely reorder-able, so we can go ahead and upgrade that #inbounds to a #turbo or #tturbo from LoopVectorization.jl for an even bigger speedup by using your CPU's SIMD instructions / Advanced Vector Extensions.
The calculation of dist = maximum(abs.(TV - V0)) involves at least two large allocations, we can avoid those with a simple mapreduce. Or to use those SIMD instructions again, vmapreduce, from LoopVectorization.jl
The line TV = T_(V0, P, kgrid, zgrid, β, δ) is also allocating, let's switch that out for an in-place version T_!.
As mentioned above, global variables are bad news. We can just move them into the function signature of iterate_T easily enough though, which should fix that problem.
While we're at it, let's also break out three-arg mul! from the LinearAlgebra stdlib for a non-allocating calculation of E = V * P'. And to get rid of one last sneaky source of type-instability (which was causing a final ~2k allocations), we should change that outer=[1,7] to outer=(1,7) -- a nice stable tuple instead of an array.
Putting it all together:
using LinearAlgebra, LoopVectorization
# parameters
β = 0.987
δ = 0.012
# grids
Kss = 48.1905148382166
kgrid = range(0.75*Kss, stop=1.25*Kss, length=500)
zgrid = [-0.06725382459813659, -0.044835883065424395, -0.0224179415327122, 0 , 0.022417941532712187, 0.04483588306542438, 0.06725382459813657]
P = rand(7,7)
P ./= sum(P,dims=2) # Rows sum to one
# auxiliary functions to build operator
#inline F_(z,k) = exp(z) * (k^(1/3))
#inline u_(c) = (c^(1-2) - 1)/(1-2)
# T_operator, in-place version
#inline function T_!(TV, E, V, P, kgrid, zgrid, β, δ)
mul!(E, V, P')
#tturbo for i in axes(TV, 2)
for j in axes(TV, 1)
temp = F_(zgrid[i], kgrid[j]) + (1-δ)*kgrid[j]
aux = -Inf
for l in eachindex(kgrid)
c = max(0.0, temp - kgrid[l])
aux = max(aux, u_(c) + β * E[l, i])
end
TV[j,i] = aux
end
end
return TV
end
function iterate_T(P, kgrid, zgrid, β, δ; max_it=1000, it=1, tol=1e-3, dist=tol+1)
V0 = repeat(sqrt.(kgrid), outer=(1,7))
# Preallocate temporary arrays
TV = similar(V0)
E = similar(V0)
# Iterate
for it = 1:max_it
# Non-allocating in-place T_!
TV = T_!(TV, E, V0, P, kgrid, zgrid, β, δ)
# Compute distance or error
dist = vmapreduce((a,b)->abs(a-b), max, TV, V0)
copyto!(V0, TV) # update
# # Some information about the state of the iteration
# if rem(it, 100) == 0
# println("Current iteration:")
# println(it)
# println("Current norm:")
# println(dist)
# end
(dist < tol) && break
end
return V0
end
we get
julia> #benchmark iterate_T($P, $kgrid, $zgrid, $β, $δ)
BenchmarkTools.Trial: 11 samples with 1 evaluation.
Range (min … max): 460.246 ms … 599.820 ms ┊ GC (min … max): 0.00% … 0.00%
Time (median): 474.826 ms ┊ GC (median): 0.00%
Time (mean ± σ): 486.661 ms ± 40.359 ms ┊ GC (mean ± σ): 0.00% ± 0.00%
█ █
█▁▁▇▁▇█▁▁▁▁▁▁▇▁▁▁▁▁▁▁▁▇▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▇ ▁
460 ms Histogram: frequency by time 600 ms <
Memory estimate: 86.42 KiB, allocs estimate: 9.
That's a bit more like it!

Parallel computation in Julia

I'm working with a code in Julia and I'm using parallel computation. Here it is the function I'm using (which is simplified from the real one).
I'm trying to evaluate what I called the "Hitting Time":
using Distributed
using LinearAlgebra
function action(Ntraj::Int64,
Tfinal::Float64,
dt::Float64)
# Output time vector
t = (1 : Ntime) * dt
#Vectors of Hitting Times
HittingTime = zeros(Ntraj)
#distributed for ktraj = 1 : Ntraj
HittingTimeBool = false
for jt=1:Ntime
if (HittingTimeBool == false && jt >0.1)
HittingTimeBool=true
HittingTime[ktraj] = jt*dt
println(HittingTime[ktraj])
end
end
end
println(HittingTime)
return (HittingTime)
end
So I run the function for 5 Trajectories (just to try to see what happens) and these that follows are the results
using Distributed
addprocs(4)
#everywhere include("untitled.jl")
(t, Fid, HittingTime) = #time action(5,10.,0.01);
From worker 2: 0.01
From worker 5: 0.01
From worker 3: 0.01
From worker 4: 0.01
From worker 6: 0.01
[0.0, 0.0, 0.0, 0.0, 0.0]
0.723837 seconds (121.59 k allocations: 6.331 MiB)
HittingTime
5-element Array{Float64,1}:
0.0
0.0
0.0
0.0
0.0
As you can see, in the for cycle the function enters the if and the value of HittingTime[ktraj] = jt*dt is stored. But when the for cycle ends, the values in the HittingTime array seems to disappear! I cannot use the hcat as for the FidelitytoTarget since the arrays have different dimensions, so how can I write some code to store these values?

You need to have a SharedArray to mutate the state across the workers.
using Distributed, SharedArrays
addprocs(4)
HittingTime=SharedArray{Float64}(nworkers())
res = #distributed (+) for i in 1:length(HittingTime)
HittingTime[i] = rand()
HittingTime[i]
end
#assert res ≈ sum(HittingTime)

BLAS v. parallel updates for Julia SharedArray objects

I am interested in using Julia SharedArrays for a scientific computing project. My current implementation appeals to BLAS for all matrix-vector operations, but I thought that perhaps a SharedArray would offer some speedup on multicore machines. My idea is to simply update an output vector index-by-index, farming the index updates to worker processes.
Previous discussions here about SharedArrays and here about shared memory objects did not offer clear guidance on this issue. It seems intuitively simple enough, but after testing, I'm somewhat confused as to why this approach works so poorly (see code below). For starters, it seems like #parallel for allocates a lot of memory. And if I prefix the loop with #sync, which seems like a smart thing to do if the whole output vector is required later, then the parallel loop is substantially slower (though without #sync, the loop is mighty quick).
Have I incorrectly interpreted the proper use of the SharedArray object? Or perhaps did I inefficiently assign the calculations?
### test for speed gain w/ SharedArray vs. Array ###
# problem dimensions
n = 10000; p = 25000
# set BLAS threads; 64 seems reasonable in testing
blas_set_num_threads(64)
# make normal Arrays
x = randn(n,p)
y = ones(p)
z = zeros(n)
# make SharedArrays
X = convert(SharedArray{Float64,2}, x)
Y = convert(SharedArray{Float64,1}, y)
Z = convert(SharedArray{Float64,1}, z)
# run BLAS.gemv! on Arrays twice, time second case
BLAS.gemv!('N', 1.0, x, y, 0.0, z)
#time BLAS.gemv!('N', 1.0, x, y, 0.0, z)
# does BLAS work equally well for SharedArrays?
# check timing result and ensure same answer
BLAS.gemv!('N', 1.0, X, Y, 0.0, Z)
#time BLAS.gemv!('N', 1.0, X, Y, 0.0, Z)
println("$(isequal(z,Z))") # should be true
# SharedArrays can be updated in parallel
# code a loop to farm updates to worker nodes
# use transposed X to place rows of X in columnar format
# should (hopefully) help with performance issues from stride
Xt = X'
#parallel for i = 1:n
Z[i] = dot(Y, Xt[:,i])
end
# now time the synchronized copy of this
#time #sync #parallel for i = 1:n
Z[i] = dot(Y, Xt[:,i])
end
# still get same result?
println("$(isequal(z,Z))") # should be true
Output from test with 4 workers + 1 master node:
elapsed time: 0.109010169 seconds (80 bytes allocated)
elapsed time: 0.110858551 seconds (80 bytes allocated)
true
elapsed time: 1.726231048 seconds (119936 bytes allocated)
true

You're running into several issues, of which the most important is that Xt[:,i] creates a new array (allocating memory). Here's a demonstration that gets you closer to what you want:
n = 10000; p = 25000
# make normal Arrays
x = randn(n,p)
y = ones(p)
z = zeros(n)
# make SharedArrays
X = convert(SharedArray, x)
Y = convert(SharedArray, y)
Z = convert(SharedArray, z)
Xt = X'
#everywhere function dotcol(a, B, j)
length(a) == size(B,1) || throw(DimensionMismatch("a and B must have the same number of rows"))
s = 0.0
#inbounds #simd for i = 1:length(a)
s += a[i]*B[i,j]
end
s
end
function run1!(Z, Y, Xt)
for j = 1:size(Xt, 2)
Z[j] = dotcol(Y, Xt, j)
end
Z
end
function runp!(Z, Y, Xt)
#sync #parallel for j = 1:size(Xt, 2)
Z[j] = dotcol(Y, Xt, j)
end
Z
end
run1!(Z, Y, Xt)
runp!(Z, Y, Xt)
#time run1!(Z, Y, Xt)
zc = copy(sdata(Z))
fill!(Z, -1)
#time runp!(Z, Y, Xt)
#show sdata(Z) == zc
Results (when starting julia -p 8):
julia> include("/tmp/paralleldot.jl")
elapsed time: 0.465755791 seconds (80 bytes allocated)
elapsed time: 0.076751406 seconds (282 kB allocated)
sdata(Z) == zc = true
For comparison, when running on this same machine:
julia> blas_set_num_threads(8)
julia> #time A_mul_B!(Z, X, Y);
elapsed time: 0.067611858 seconds (80 bytes allocated)
So the raw Julia implementation is at least competitive with BLAS.

How can I generate random numbers whose average follows a sine wave in Ruby?

I'm not a math guy, so I don't really know what I'm trying to do is called, but I'm sure there's a name for it. ;-)
I'm wanting to generate an array of random numbers in Ruby whose average at each element in the array follows a sine wave. What I mean by average at each element is the average at element n would be ary[0..n].inject(:+).to_f / (n + 1). So, if I loop from 0..n over the array of random numbers and generate the average like I described, I'd like the resulting values to follow a sine wave. I just don't know how to actually generate the random numbers in such a way...
# assuming `ary` is the array of random numbers
# I'm trying to figure out how to generate...
averages = []
(0..ary.size).each do |n|
averages << ary[0..n].inject(:+).to_f / (n + 1)
end
# `averages` should plot as a sine wave now...

Here's an idea. Create a class that has some sample size over which it generates points in a sine wave plus some random "fudge factor" (variance) above or below that point. This way, if you plot the number of points in the sample size you should see a sine wave with "roughness" according to the configured variance (fudge factor).
class RandomSineWave
attr_reader :size
def initialize(size=20, variance=0.2)
#size = size
#step = 2 * Math::PI / size
#position = 0
#variance = variance
end
def next
#position = 0 if #position >= 2 * Math::PI
next_rand = Math.sin(#position) + (rand * #variance) - (#variance / 2)
#position += #step
next_rand
end
end
# Generate TSV output for demonstration.
rsw = RandomSineWave.new
rsw.size.times { |i| puts [i, rsw.next].join "\t" }
You can fiddle with the "roughness" by modifying the second argument to the constructor:
rsw = RandomSineWave.new(20, 0.8) # Results plotted below...

As I understand, given some positive integer n, you want to construct an array of n probability distributions such that the expected values of partial sums of random variables describes a sine wave. I presume the sine wave is over the interval (0..2*π) and that the expected values are to be evenly spaced over that interval.
We must first ask if these probability distributions are statistically independent. If they are not, it becomes hopelessly complex, so I will assume they are independent. Whether they are identical distributions, after adjusting for differences in their means, is not necessary or even important. I'll come back to that later.
Since you want the expected values of partial sums of the random variables Xi to describe a sine wave, we require that:
E [∑j=0...iXj] = k * sin(2*π*i/n)
for all i = 0...n-1, for a given scale factor, k (with (E[..] denoting "expected value"). We can assume, without loss of generality, that k=1, as we can always scale the random variables by k, resulting in their means being scaled by the same constant.
Because the distributions are independent, we can write:
∑j=0...imj = sin(2*π*i/n)
where
mi = E[Xi] is Xi's mean.
In Ruby-speak, for an array x of n values (floats), this is:
x[0,i].reduce(:+) = Math::sin(2.0 * Math::PI * i.to_f/n)
We can easily compute x. Assume n = 36.
For i = 0:
x[0,0].reduce(:+) = Math::sin(2.0 * Math::PI * 0.0/36)
# x[0] = 0
Let:
s = x[0]
#=> 0.0
For i = 1:
x[0,1].reduce(:+) = Math::sin(2.0 * Math::PI * 1.0/36).round(6)
#=> 0.0 + x[1] = Math::sin(0.17453292519943295).round(6)
#=> = 0.173648
So
x[1] = 0.173648 - 0.0
#=> 0.173648
Now let
s += x[1]
#=> 0.173648
For i = 2:
x[0,2].reduce(:+) = Math::sin(2.0 * Math::PI * 2.0/36).round(6)
#=> s + x[2] = Math::sin(0.3490658503988659).round(6)
#=> 0.173648 + x[2] = 0.342020
So
x[2] = 0.342020 - 0.173648
#=> 0.168372
We then update s:
s += 0.168372
#=> 0.173648 += 0.168372
#=> 0.342020
and then compute x[3] similarly, then each of the remaining x's:
def compute(n, p=6)
sum = 0.0
n.times.map do |i|
if i.zero?
[0.0, 0.0, 0.0, 0.0]
else
x = Math::sin(2.0 * Math::PI * i.to_f/n) - sum
sum += x
[(2.0*(i.to_f/n)*Math::PI).round(p), x.round(p),
sum.round(p), Math::sin(sum).round(p)]
end
end
end
compute(36)
# radians x sum sin(sum) degrees
# [[0.0, 0.0, 0.0, 0.0 ], 0
# [0.174533, 0.173648, 0.173648, 0.172777],
# ...
# [1.396263, 0.045115, 0.984808, 0.833166],
# [1.570796, 0.015192, 1.0, 0.841471], 90
# [1.745329, -0.015192, 0.984808, 0.833166],
# ...
# [2.967060, -0.168372, 0.173648, 0.172777],
# [3.141593, -0.173648, 0.0, 0.0 ], 180
# [3.316126, -0.173648, -0.173648, -0.172777],
# ...
# [4.537856, -0.045115, -0.984808, -0.833166],
# [4.712389, -0.015192, -1.0, -0.841471], 270
# [4.886922, 0.015192, -0.984808, -0.833166],
# ...
# [5.934119, 0.15798, -0.34202, -0.335391],
# [6.108652, 0.168372, -0.173648, -0.172777]] 350
I will add a plot of these values when I have time to familiarize myself with #maeric's nifty plotting tool.
Now that we have the means, we can consider constructing probability distributions having those means.
Suppose, for example, we assume each random variable has the same uniform distribution with range (max-min) of rng, for varying means. If the mean were, say, 0.325467, we could generate a pseudo random-variate as follows:
rng * (rand-0.325467)
where
(0.5-0.174533).round(6)
#=> 0.325467
We therefore can generate pseudo-random variates for a uniform distribution with a given range and mean as follows:
def uniform_rv(rng, mean)
rng.to_f * (rand -0.5 -mean)
end

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Is it possible to speed up devectorized Julia codes - performance

Related

Faster way to compute distributions from Markov chain?

Iterating a custom function efficiently in Julia

Parallel computation in Julia

BLAS v. parallel updates for Julia SharedArray objects

How can I generate random numbers whose average follows a sine wave in Ruby?

Categories

Resources