Inverse of cumsum in Julia - matrix

The matrix Y is defined as
Y = cumsum(cumsum(X,dims=1), dims=2)
For example,
julia> X = [1 4 2 3; 2 4 5 2; 4 3 4 1; 2 5 4 2];
julia> Y = cumsum(cumsum(X,dims=1), dims=2)
4x4 Matrix{Int64}:
1 5 7 10
3 11 18 23
7 18 29 35
9 25 40 48
I want to reproduce the matrix X from Y. It seems that function diff is helpful. However, as you can see below, we cannot reproduce the first line and first column of X.
julia> diff(diff(y, dims=1), dims=2)
3x3 Matrix{Int64}:
4 5 2
3 4 1
5 4 2
So, I concatenate zeros. Then, it works.
julia> y00 = vcat(zeros(5)',hcat(zeros(4), y))
5x5 Matrix{Int64}:
0 0 0 0 0
0 1 5 7 10
0 3 11 18 23
0 7 18 29 35
0 9 25 40 48
julia> diff(diff(y00, dims=1), dims=2)
4x4 Matrix{Int64}:
1 5 7 10
3 11 18 23
7 18 29 35
9 25 40 48
But I think concatenating takes time and memory.
Is there any better idea to reproduce X from Y?
Context
I want to expand the above matrices X and Y to any dimensional array. For example, I want to reconstruct a three-dimensional array X from given three-dimensional array
Y = cumsum( cumsum( cumsum(X, dims=1), dims=2), dims=3)

When both speed and succinctness are required, it's hard to beat powerful Julia packages like Tullio.jl. Here is a one-liner that's about 4X faster than the fastest solution by #DanGetz.
using Tullio
cumdiff(Y) = #tullio X[i,j] = Y[i,j] - Y[i,j-1] - Y[i-1,j] + Y[i-1,j-1]
Benchmarking with a 100-by-100 matrix gives:
X = rand(0:100,100,100)
Y = cumsum(cumsum(X,dims=1), dims=2)
#btime cumdiff($Y)
#btime decumsum3($Y)
4.957 μs (17 allocations: 464 bytes)
21.300 μs (2 allocations: 78.17 KiB)
Fix: The code above was using the predefined X instead of creating a new one. This is fixed below, and the speedup is more like 3.5X and not 4X.
function cumdiff(Y)
X = similar(Y)
X[1] = Y[1]
for i = 2:size(Y,1) X[i,1] = Y[i,1] - Y[i-1,1] end
for j = 2:size(Y,2) X[1,j] = Y[1,j] - Y[1,j-1] end
#tullio X[i,j] = Y[i,j] - Y[i,j-1] - Y[i-1,j] + Y[i-1,j-1]
end
#btime cumdiff($Y)
#btime decumsum3($Y)
6.000 μs (4 allocations: 78.23 KiB)
21.300 μs (2 allocations: 78.17 KiB)

See EDIT section below.
Some options so far:
decumsum1(X) = begin
Z = copy(X)
Z[2:end,:] .-= Z[1:end-1,:]
Z[:,2:end] .-= Z[:,1:end-1]
return Z
end
decumsum2(X) = begin # This is from question #
r,c = size(X)
Z = vcat(zeros(eltype(X),r+1)',
hcat(zeros(eltype(X),c), X))
return diff(diff(Z, dims=1), dims=2)
end
decumsum3(Y) = [Y[I]-(I[2]==1 ? 0 : Y[I[1],I[2]-1])-
(I[1]==1 ? 0 : Y[I[1]-1,I[2]])+
((I[1]==1 || I[2]==1) ? 0 : Y[I[1]-1,I[2]-1])
for I in CartesianIndices(Y)]
function decumsum5(Y)
R = similar(Y)
h,w = size(Y)
R[1,1] = Y[1,1]
#inbounds for i=2:h R[i,1] = Y[i,1]-Y[i-1,1] ; end
#inbounds for j=2:w R[1,j] = Y[1,j]-Y[1,j-1] ; end
#inbounds for i=2:h,j=2:w R[i,j] = Y[i,j]-Y[i-1,j]-Y[i,j-1]+Y[i-1,j-1] ; end
return R
end
Giving the following benchmarks:
julia> using BenchmarkTools
julia> decumsum1(Y) == decumsum2(Y) == decumsum3(Y) == X
true
julia> #btime decumsum1($Y);
352.571 ns (5 allocations: 832 bytes)
julia> #btime decumsum2($Y);
475.438 ns (9 allocations: 1.14 KiB)
julia> #btime decumsum3($Y);
96.875 ns (1 allocation: 192 bytes)
julia> #btime decumsum5($Y);
60.805 ns (1 allocation: 192 bytes)
EDIT: Perhaps the prettier solutions is:
decumsum(Y; dims) = [Y[I] - (
I[dims]==1 ? 0 : Y[(ifelse(k == dims,I[k]-1,I[k])
for k in 1:ndims(Y))...]
) for I in CartesianIndices(Y)]
and with it, the cumsum can be walked back:
julia> decumsum(decumsum(Y, dims=1), dims=2)
4×4 Matrix{Int64}:
1 4 2 3
2 4 5 2
4 3 4 1
2 5 4 2
julia> decumsum(decumsum(Y, dims=1), dims=2) == X
true
julia> #btime decumsum(decumsum($Y, dims=1), dims=2);
165.656 ns (2 allocations: 384 bytes)
with nice performance and also generalized to any Array dimension.
Update: another version decumsum5 added. Still faster.

Related

Generate array of complex numbers with absolute value one in Julia?

In Julia, I would like to randomly generate an array of arbitrary size, where all the elements of the array are complex numbers with absolute value one. Is there perhaps any way to do this within Julia?
I've got four options so far:
f1(n) = exp.((2*im*π).*rand(n))
f2(n) = map(x->(z = x[1]+im*x[2] ; z ./ abs(z) ),
eachcol(randn(2,n)))
f3(n) = [im*x[1]+x[2] for x in sincos.(2π*rand(n))]
f4(n) = cispi.(2 .*rand(n))
We have:
julia> using BenchmarkTools
julia> begin
#btime f1(1_000);
#btime f2(1_000);
#btime f3(1_000);
#btime f4(1_000);
end;
29.390 μs (2 allocations: 23.69 KiB)
15.559 μs (2 allocations: 31.50 KiB)
25.733 μs (4 allocations: 47.38 KiB)
27.662 μs (2 allocations: 23.69 KiB)
Not a crucial difference.
One way is:
randcomplex() = (c = Complex(rand(2)...); c / abs(c))
randcomplex(numwanted) = [randcomplex() for _ in 1:numwanted]
or
randcomplex(dims...) = (a = zeros(Complex, dims...); for i in eachindex(a) a[i] = randcomplex() end; a)
If you are looking for something faster, here are two options. They return a perhaps slightly unfamiliar type, but it is equivalent to a regular Vector
function f5(n)
r = rand(2, n)
for i in 1:n
a = sqrt(r[1, i]^2 + r[2, i]^2)
r[1, i] /= a
r[2, i] /= a
end
return reinterpret(reshape, ComplexF64, r)
end
using LoopVectorization: #turbo
function f5t(n)
r = rand(2, n)
#turbo for i in 1:n
a = sqrt(r[1, i]^2 + r[2, i]^2)
r[1, i] /= a
r[2, i] /= a
end
return reinterpret(reshape, ComplexF64, r)
end
julia> #btime f5(1000);
4.186 μs (1 allocation: 15.75 KiB)
julia> #btime f5t(1000);
2.900 μs (1 allocation: 15.75 KiB)

Average over the columns of the matrix in Julia

I have a big matrix with float entries of the form
[ a b c d
e f g h
i j k l
m n o p ]
some of the values are outliers, so I wanted to average each of the entries with values with its recent k entries in the corresponding column and preserve the shape. In other words to have something like this for k = 3:
[ a b c d
(e + a)/2 (f + b)/2 (g + c)/2 (h + d)/2
(e + a + i)/3 (f + b + j)/3 (g + c + k)/3 (h + d + l)/3
(e + i + m)/3 (f + j + n)/3 (g + k + o)/3 (h + l + p)/3 ]
etc.
You can do this with RollingFunctions and mapslices:
julia> a = reshape(1:16, 4, 4)
4×4 reshape(::UnitRange{Int64}, 4, 4) with eltype Int64:
1 5 9 13
2 6 10 14
3 7 11 15
4 8 12 16
julia> using RollingFunctions
julia> mapslices(x -> runmean(x, 3), a, dims = 1)
4×4 Matrix{Float64}:
1.0 5.0 9.0 13.0
1.5 5.5 9.5 13.5
2.0 6.0 10.0 14.0
3.0 7.0 11.0 15.0
I didn't know about RollingFunctions, but a regular loop is 4X faster. I'm not sure if it's some kind of type instability caused by mapslices?.
function runmean(a,W)
A = similar(a)
for j in axes(A,2), i in axes(A,1)
l = max(1, i-W+1)
A[i,j] = mean(a[k,j] for k=l:i)
end
A
end
Testing yields:
using RollingFunctions
#btime mapslices(x -> runmean(x, 3), A, dims = 1) setup=(A = rand(0.0:9,1000,1000))
#btime runmean(A,3) setup=(A = rand(0.0:9,1000,1000))
15.326 ms (10498 allocations: 23.45 MiB)
4.410 ms (2 allocations: 7.63 MiB)

Optimal way to generate and call many random numbers?

I have a general question. I've got a Julia programme that needs to use a random number each time it iterates through a for loop. I'm wondering is there any performance benefits to be gain if I make batches of random numbers before the loop and store them in an array calling these pre-made random numbers instead of generating them on the fly? And, if so, is there an optimum batch size?
As Peter O. commented it depends. But let me give you an example where batching is desired:
julia> using Random, BenchmarkTools
julia> function f1()
x = Vector{Float64}(undef, 10^6)
y = zeros(10^6)
for i in 1:100
rand!(x)
y .+= x
end
return y
end
f1 (generic function with 1 method)
julia> function f2()
y = zeros(10^6)
#inbounds for i in 1:100
#simd for j in 1:10^6
y[j] += rand()
end
end
return y
end
f2 (generic function with 1 method)
julia> function f3()
y = zeros(10^6)
#inbounds for i in 1:100
for j in 1:10^6
y[j] += rand()
end
end
return y
end
f3 (generic function with 1 method)
julia> function f4()
x = Vector{Float64}(undef, 10^6)
y = zeros(10^6)
#inbounds for i in 1:100
rand!(x)
#simd for j in 1:10^6
y[j] += x[j]
end
end
return y
end
f4 (generic function with 1 method)
julia> function f5()
x = Vector{Float64}(undef, 10^6)
y = zeros(10^6)
#inbounds for i in 1:100
rand!(x)
for j in 1:10^6
y[j] += x[j]
end
end
return y
end
f5 (generic function with 1 method)
julia> #btime f1();
171.816 ms (4 allocations: 15.26 MiB)
julia> #btime f2();
370.950 ms (2 allocations: 7.63 MiB)
julia> #btime f3();
412.871 ms (2 allocations: 7.63 MiB)
julia> #btime f4();
172.355 ms (4 allocations: 15.26 MiB)
julia> #btime f5();
174.676 ms (4 allocations: 15.26 MiB)
As you can see f1 (and two variants using the loop f4 and f5) are much faster than when not using the cache for storing generated random variables (f2 and f3 functions). I have shown both variants using and not using #simd for comparison.
EDIT
The comment by rafak is very good. Here are the benchmarks. As you can see there is still some difference, but much lower (as the most cost is generation of random numbers and not addition).
julia> function g1(rnd)
x = Vector{Float64}(undef, 10^6)
y = zeros(10^6)
for i in 1:100
rand!(rnd, x)
y .+= x
end
return y
end
g1 (generic function with 1 method)
julia> function g2(rnd)
y = zeros(10^6)
#inbounds for i in 1:100
#simd for j in 1:10^6
y[j] += rand(rnd)
end
end
return y
end
g2 (generic function with 1 method)
julia> function g3(rnd)
y = zeros(10^6)
#inbounds for i in 1:100
for j in 1:10^6
y[j] += rand(rnd)
end
end
return y
end
g3 (generic function with 1 method)
julia> using Random
julia> rnd = MersenneTwister();
julia> #btime g1($rnd);
168.874 ms (4 allocations: 15.26 MiB)
julia> #btime g2($rnd);
193.398 ms (2 allocations: 7.63 MiB)
julia> #btime g3($rnd);
192.320 ms (2 allocations: 7.63 MiB)

How to write a parallel loop in julia?

I have the following Julia code and I would like to parallelize it.
using DistributedArrays
function f(x)
return x^2;
end
y = DArray[]
#parallel for i in 1:100
y[i] = f(i)
end
println(y)
The output is DistributedArrays.DArray[]. I would like to have the value of y as follows: y=[1,4,9,16,...,10000]
You can use n-dimensional distributed array comprehensions:
First you need to add some more processes, either local or remote:
julia> addprocs(CPU_CORES - 1);
Then you must use DistributedArrays at every one of the spawned processes:
julia> #everywhere using DistributedArrays
Finally you can use the #DArray macro, like this:
julia> x = #DArray [#show x^2 for x = 1:10];
From worker 2: x ^ 2 = 1
From worker 2: x ^ 2 = 4
From worker 4: x ^ 2 = 64
From worker 2: x ^ 2 = 9
From worker 4: x ^ 2 = 81
From worker 4: x ^ 2 = 100
From worker 3: x ^ 2 = 16
From worker 3: x ^ 2 = 25
From worker 3: x ^ 2 = 36
From worker 3: x ^ 2 = 49
You can see it does what you expect:
julia> x
10-element DistributedArrays.DArray{Int64,1,Array{Int64,1}}:
1
4
9
16
25
36
49
64
81
100
Remember it works with an arbitrary number of dimensions:
julia> y = #DArray [#show i + j for i = 1:3, j = 4:6];
From worker 4: i + j = 7
From worker 4: i + j = 8
From worker 4: i + j = 9
From worker 2: i + j = 5
From worker 2: i + j = 6
From worker 2: i + j = 7
From worker 3: i + j = 6
From worker 3: i + j = 7
From worker 3: i + j = 8
julia> y
3x3 DistributedArrays.DArray{Int64,2,Array{Int64,2}}:
5 6 7
6 7 8
7 8 9
julia>
This is the most julian way to do what you intended IMHO.
We can look at macroexpand output in order to see what's going on:
Note: this output has been slightly edited for readability, T stands for:
DistributedArrays.Tuple{DistributedArrays.Vararg{DistributedArrays.UnitRange{DistributedArrays.Int}}}
julia> macroexpand(:(#DArray [i^2 for i = 1:10]))
:(
DistributedArrays.DArray(
(
#231#I::T -> begin
[i ^ 2 for i = (1:10)[#231#I[1]]]
end
),
DistributedArrays.tuple(DistributedArrays.length(1:10))
)
)
Which basically is the same as manually typing:
julia> n = 10; dims = (n,);
julia> DArray(x -> [i^2 for i = (1:n)[x[1]]], dims)
10-element DistributedArrays.DArray{Any,1,Array{Any,1}}:
1
4
9
16
25
36
49
64
81
100
julia>
Hi Kira,
I am new on Julia, but facing the same problem. Try this approach and see if it fits your needs.
function f(x)
return x^2;
end
y=#parallel vcat for i= 1:100
f(i);
end;
println(y)
Regards, RN

Matrix manipulation in Octave

I want to map a mX1 matrix X into mXp matrix Y where each row in the new matrix is as follows:
Y = [ X X.^2 X.^3 ..... X.^p]
I tried to use the following code:
Y = zeros(m, p);
for i=1:m
Y(i,:) = X(i);
for c=2:p
Y(i,:) = [Y(i,:) X(i).^p];
end
end
What you want do is called brodcasting. If you are using Octave 3.8 or later, the following will work fine:
octave> X = (1:5)'
X =
1
2
3
4
5
octave> P = (1:5)
P =
1 2 3 4 5
octave> X .^ P
ans =
1 1 1 1 1
2 4 8 16 32
3 9 27 81 243
4 16 64 256 1024
5 25 125 625 3125
The important thing to note is how X and P are a column and row vector respectively. See the octave manual on the topic.
For older of versions of Octave (without automatic broadcasting), the same can be accomplished with bsxfun (#power, X, P)

Resources