How to check for an undef value in a Matrix (in Julia) and assign a new value? - matrix

I want to create a matrix A of undefined values and have the following code that works just fine.
A = Matrix{Tuple{Float64, Array{Int64, 1}}}(undef, 100, 100)
Later, I want to check if a particular cell is undefined and if so, assign a value after computing it. I tried isdefined(A, i, j) but that gave an error for too many arguments. How can I check for #undef and assign only if it is undefined?
The documentation on isdefined provides a method only for a single dimensional array, how do I achieve the same on a matrix?

Use isassigned:
julia> A[2,3]=(3.0, [])
(3.0, Any[])
julia> isassigned(A,2,3)
true
julia> isassigned(A,3,3)
false

You can use the isassigned function (which is mentioned in the help string of isdefined, btw). Like isdefined it appears to only accept linear indices, but you can get those from LinearIndices.
julia> A = Matrix{Tuple{Float64, Array{Int64, 1}}}(undef, 100, 100);
julia> A[5, 4] = (2.1, [5])
(2.1, [5])
julia> isassigned(A, LinearIndices(A)[1, 1])
false
julia> isassigned(A, LinearIndices(A)[5, 4])
true
Edit: As demonstrated in the answer from #PrzemyslawSzufel, you don't need linear indices. Seems to be be undocumented, though, up to and including v1.5.1

Related

Replace values with for loop

Suppose I have the following function:
function y1(x)
y = x^(2) - 4
return y
end
Now, I want to evaluate all the values from this sequence: collect(range(-10,10, 1000))
I tried this
y_1 = zeros(1000);
for x in collect(range(-10, 10, 1000))
y_1 = y1.(x)
end
Note that I use the broadcast operator to apply the function y1 for every value that takes the iterator. But if I don't use it I get the same result.
But as an answer, I just get 96.0.
How can I refill the y_1 vector with the for loop, so I get the evaluated values?
The evaluated vector should be of size 1000
Thanks in advance!
Edit:
I found a way to get to my desired result without the for loop:
y_1 = y1.(collect(range(-10, 10, 1000)))
But I still want to know how can I do it in a loop.
The broadcast operator broadcasts the function over the entire iterator by itself i.e. y1.(arr) will
call y1 on each of the elements of the array arr
collect the results of all those calls, and
allocate memory to store those results as an array too
So the following are all equivalent in terms of functionality:
julia> arr = range(-4, 5, length = 10) #define a simple range
-4.0:1.0:5.0
julia> y1.(arr)
10-element Vector{Float64}:
12.0
5.0
0.0
-3.0
-4.0
-3.0
0.0
5.0
12.0
21.0
julia> [y1(x) for x in arr]
10-element Vector{Float64}:
(same values as above)
julia> map(y1, arr)
10-element Vector{Float64}:
(same values as above)
julia> y_1 = zeros(10);
julia> for (i, x) in pairs(arr)
y_1[i] = y1(x)
end
julia> y_1
10-element Vector{Float64}:
(same values as above)
In practice, there maybe other considerations, including performance, that decides between these and other choices.
As an aside, note that very often you don't want to collect a range in Julia i.e. don't think of collect as somehow equivalent to c() in R. For many operations, the ranges can be directly used, including for iteration in for loops. collect should only be necessary in the rare cases where an actual Vector is necessary, for eg. a value in the middle of the array needs to be changed for some reason. As a general rule, use the range results as they are, until and unless you get an error that requires you to change it.

Manipulating several variables within a for loop in Julia

I'm new to Julia. I want to write code which, for each of several vectors, outputs a new vector, the name of which depends on the name of the input vector.
For example, the following code works
a = ones(10)
b = ones(10)
for var in [a, b]
global log_var = log.(var)
end
except I want the resulting vectors to be named log_a and log_b (rather than have the loop overwrite log_var). I had thought this would be simple, but having read a few tutorials about locals in Julia, I'm still lost! Is there a simple way to go about this?
In case this question is unclear, I'll describe how I would do this in Stata, with which I'm more familiar:
clear
set obs 10
gen a = 1
gen b = 1
foreach var in a b {
gen log_`var' = log(`var')
}
Thank you!
if you are looking for something similar to what you do in stata, you can use DataFrames.jl,
julia> using DataFrames
julia> df = DataFrame(a=ones(10), b=ones(10))
julia> for col in ["a", "b"]
df[:, "log_"*col] = log.(df[:, col])
end
julia> df
You really probably don't want to do that. But, if you had to, you could do it pretty easily with metaprogramming. In this case for example:
macro logify(variable)
quote
$(esc(Symbol("log_$variable"))) = log.($variable)
end
end
then
julia> b = rand(5)
5-element Vector{Float64}:
0.29129581739244315
0.21098023915449915
0.8736387630142392
0.34378216482772417
0.621583372934101
julia> #logify b;
julia> log_b
5-element Vector{Float64}:
-1.2334159735391819
-1.555990803188027
-0.13508830339365252
-1.0677470639708686
-0.4754852291054692
In general, any time you need to depend on the name of a variable rather than its contents, you're going to need metaprogramming.
However, to emphasize, again, this feels like a bad idea.
Rather than defining new top-level variables, you might consider instead using some sort of data structure like a Dict or a NamedTuple or a DataFrame, or even just a multidimensional Array. For example, with NamedTuples:
julia> data = (a = rand(5), b = rand(5));
julia> typeof(data)
NamedTuple{(:a, :b), Tuple{Vector{Float64}, Vector{Float64}}}
julia> data.a
5-element Vector{Float64}:
0.7146929585896256
0.5248314042991269
0.040560190890127856
0.9714549101298824
0.9477790450084252
julia> data.b
5-element Vector{Float64}:
0.6856764745285641
0.3066093923258396
0.5655243277481422
0.13478854894985115
0.8495720250298817
julia> logdata = NamedTuple{keys(data)}(log.(data[x]) for x in keys(data));
julia> logdata.a
5-element Vector{Float64}:
-0.335902257064951
-0.6446782026336225
-3.204968213346185
-0.02896042387181646
-0.05363387877891503
julia> logdata.b
5-element Vector{Float64}:
-0.3773493739743169
-1.182180679204628
-0.5700019644606769
-2.0040480325554944
-0.1630225562612911
Not really recommended for such usage, but a quick and dirty variant is
for var in [:a, :b]
#eval global $(Symbol("log_", var)) = log.($var)
end

Is there a unified syntax for element-wise in-place operations on scalars and arrays in Julia?

Consider the following accumulator type, which works like an array in that you can push things to it, but only tracks its mean:
mutable struct Accumulator{T}
data::T
count::Int64
end
function Base.push!(acc::Accumulator, term)
acc.data += term # <-- in-place addition
acc.count += 1
acc
end
mean(acc::Accumulator) = acc.data ./ acc.count
I want this to work for T being a scalar or an array type. However,
it turns out that for T being an array type, the addition in push! creates a temporary. This is because in Julia, x+=a is equivalent to x=x+a, and I suspect Julia cannot guarantee that acc.data and term do not alias.
A simple fix is to replace += with element-wise addition, .+=. However, this will then break scalar types, which do not allow this. So the only way I came up with to fix this problem is to add a specialization of the following form:
function Base.push!(acc::Accumulator, term::AbstractArray)
acc.data .+= term # <-- element-wise addition
acc.count += 1
acc
end
This is however somewhat ugly and also brittle... does anyone know a better way of doing this, preferrably in a generic fashion and without the temporary creation?
Oddly enough, Numbers are iterable in Julia, but that doesn't seem to help us here, because there is no setindex! method for Numbers.
Here are two different approaches. The first uses iterator traits and the second just patches up the method signatures a bit to address corner cases.
Iterator traits
We can use the IteratorSize trait to distinguish between scalars and vectors. For scalars, Base.IteratorSize(x) returns Base.HasShape{0}. For arrays, Base.IteratorSize(x) returns Base.HasShape{N}, where N is the number of dimensions of the array.
mutable struct Accumulator{T}
data::T
count::Int64
end
function Base.push!(acc::Accumulator{T}, term::S) where {T, S}
_push_acc!(Base.IteratorSize(T), Base.IteratorSize(S), acc, term)
end
function _push_acc!(::Base.HasShape{0}, ::Base.HasShape{0}, acc::Accumulator, term)
acc.data += term
acc.count += 1
acc
end
function _push_acc!(::Base.HasShape{N}, ::Base.HasShape{N}, acc::Accumulator, term) where {N}
acc.data .+= term
acc.count += 1
acc
end
function _push_acc!(::Base.HasShape{M}, ::Base.HasShape{N}, ::Accumulator, ::Any) where {M, N}
throw(ArgumentError("Accumulator and term have inconsistent shapes"))
end
In action at the REPL:
julia> a = Accumulator(1, 0)
Accumulator{Int64}(1, 0)
julia> b = Accumulator([1, 2], 0)
Accumulator{Array{Int64,1}}([1, 2], 0)
julia> push!(a, 42)
Accumulator{Int64}(43, 1)
julia> push!(b, [3, 4])
Accumulator{Array{Int64,1}}([4, 6], 1)
julia> push!(a, [5, 6])
ERROR: ArgumentError: Accumulator and term have inconsistent shapes
Stacktrace:
[1] _push_acc!(::Base.HasShape{0}, ::Base.HasShape{1}, ::Accumulator{Int64}, ::Array{Int64,1}) at ...
[2] push!(::Accumulator{Int64}, ::Array{Int64,1}) at ...
[3] top-level scope at REPL[6]:1
julia> push!(b, 10)
ERROR: ArgumentError: Accumulator and term have inconsistent shapes
Stacktrace:
[1] _push_acc!(::Base.HasShape{1}, ::Base.HasShape{0}, ::Accumulator{Array{Int64,1}}, ::Int64) at ...
[2] push!(::Accumulator{Array{Int64,1}}, ::Int64) at ...
[3] top-level scope at REPL[7]:1
Patching the method signatures
Instead of using iterator traits, we could just make a couple small tweaks to your push! method signatures to prevent pushing an array onto a scalar.
mutable struct Accumulator{T}
data::T
count::Int64
end
function Base.push!(acc::Accumulator, term)
acc.data += term
acc.count += 1
acc
end
function Base.push!(acc::Accumulator{T}, term::AbstractArray) where {T <: AbstractArray}
acc.data .+= term
acc.count += 1
acc
end
function Base.push!(::Accumulator, ::AbstractArray)
throw(ArgumentError("Can't push an array onto a scalar"))
end
Now we get a sensible error message if we try to push an array onto a scalar:
julia> a = Accumulator(42, 0)
Accumulator{Int64}(42, 0)
julia> push!(a, [1, 2])
ERROR: ArgumentError: Can't push an array onto a scalar

Faster image resizing

I have a stack of images (3D array) and I want to improve their resolution (upsampling). I run the following code snippet that I find a little slow ...
Is there any way to improve the speed of this piece of code? (without using multiprocessing)
using BenchmarkTools
using Interpolations
function doInterpol(arr::Array{Int, 2}, h, w)
A = interpolate(arr, BSpline(Linear()))
return A[1:2/(h-1)/2:2, 1:2/(w-1)/2:2]
end
function applyResize!(arr3D_hd::Array, arr3D_ld::Array, t::Int, h::Int, w::Int)
for i = 1:1:t
#inbounds arr3D_hd[i, :, :] = doInterpol(arr3D_ld[i, :, :], h, w)
end
end
t, h, w = 502, 65, 47
h_target, w_target = 518, 412
arr3D_ld = reshape(collect(1:t*h*w), (t, h, w))
arr3D_hd = Array{Float32}(undef, t, h_target, w_target)
applyResize!(arr3D_hd, arr3D_ld, t, h_target, w_target)
When I benchmark the following:
#btime applyResize!(arr3D_hd, arr3D_ld, t, h_target, w_target)
I got :
2.334 s (68774 allocations: 858.01 MiB)
I ran it multiple time and results are in [1.8s - 2.8s] interval.
Julia stores arrays in column-major order. This means that slices like arr[i, : ,:] perform much worse than arr[:,:,i] (which is contiguous in memory). Therefore, a way to gain some speed is to index your arrays using (h,w,t) rather than (t, w, h).
A second issue is that taking slices like arr[i,:,:] copies data. It seems to have negligible impact here, but it might be good to get into the habit of using array views instead of slices when you can. A view is a small wrapper object that behaves in the same way as a slice of a larger array, but does not hold a copy of the data: it directly accesses the data of the parent array (see the example below to maybe better understand what a view is).
Note that both these issues are mentioned in the Julia performance tips; it might be useful to read the remaining pieces of advice in this page.
Putting this together, your example can be rewritten like:
function applyResize2!(arr3D_hd::Array, arr3D_ld::Array, h::Int, w::Int, t)
#inbounds for i = 1:1:t
A = interpolate(#view(arr3D_ld[:, :, i]), BSpline(Linear()))
arr3D_hd[:, :, i] .= A(1:2/(h-1)/2:2, 1:2/(w-1)/2:2)
end
end
which is used with arrays stored a bit differently from your case:
# Note the order of indices
julia> arr3D_ld = reshape(collect(1:t*h*w), (h, w, t));
julia> arr3D_hd = Array{Float32}(undef, h_target, w_target, t);
# Don't forget to escape arguments with a $ when using btime
# (not really an issue here, but could have been one)
julia> #btime applyResize2!($arr3D_hd, $arr3D_ld, h_target, w_target, t)
506.449 ms (6024 allocations: 840.11 MiB)
This is roughly a speed-up by a factor 3.4 w.r.t your original code, which benchmarks like this on my machine:
julia> arr3D_ld = reshape(collect(1:t*h*w), (t, h, w));
julia> arr3D_hd = Array{Float32}(undef, t, h_target, w_target);
julia> #btime applyResize!($arr3D_hd, $arr3D_ld, t, h_target, w_target)
1.733 s (50200 allocations: 857.30 MiB)
NB: Your original code uses a syntax like A[x, y] to get interpolated values. This seems to be deprecated in favor of A(x, y). I might not have the same version of Interpolations as you, though...
Example illustrating the behavior of views
julia> a = rand(3,3)
3×3 Array{Float64,2}:
0.042097 0.767261 0.0433798
0.791878 0.764044 0.605218
0.332268 0.197196 0.722173
julia> v = #view(a[:,2]) # creates a view instead of a slice
3-element view(::Array{Float64,2}, :, 2) with eltype Float64:
0.7672610491393876
0.7640443797187411
0.19719581867637093
julia> v[3] = 42 # equivalent to a[3,2] = 42
42
Use
itp = interpolate(arr3D_ld, (NoInterp(), BSpline(Linear()), BSpline(Linear())));
A = itp(1:size(itp,1), 1:2/517:2, 1:2/411:2);
It should give a ~7x performance improvement compared to your version.
As François Févotte noted, it's also important to pay attention to deprecation warnings, as they slow down execution.

Parallel programming in Julia

I have been following the docs for parallel programming in julia and for my mind, which thinks like openMP or MPI, I find the design choice quite strange.
I have an application where I want data to be distributed among processes, and then I want to tell each process to apply some operation to whatever data it is assigned, yet I do not see a way of doing this in Julia. Here is an example
julia> r = remotecall(2, rand, 2)
RemoteRef{Channel{Any}}(2,1,30)
julia> fetch(r)
2-element Array{Float64,1}:
0.733308
0.45227
so on process 2 lives a random array with 2 elements. I can apply some function to this array via
julia> remotecall_fetch(2, getindex, r, 1)
0.7333080770447185
but why does it not work if i apply a function which should change the vector, like:
julia> remotecall_fetch(2, setindex!, r, 1,1)
ERROR: On worker 2:
MethodError: `setindex!` has no method matching setindex!(::RemoteRef{Channel{Any}}, ::Int64, ::Int64)
in anonymous at multi.jl:892
in run_work_thunk at multi.jl:645
[inlined code] from multi.jl:892
in anonymous at task.jl:63
in remotecall_fetch at multi.jl:731
in remotecall_fetch at multi.jl:734
I don't quite know how to describe it, but it seems like the workers can only return "new" things. I don't see how I can send some variables and a function to a worker and have the function modify the variables in place. In the above example, I'd like the array to live on a single process and ideally I'd be able to tell that process to perform some operations on that array. After all the operations are finished I could then fetch results etc.
I think you can achive this with the macro #spawnat:
julia> addprocs(2)
2-element Array{Int64,1}:
2
3
julia> r = remotecall(2, rand, 2)
RemoteRef{Channel{Any}}(2,1,3)
julia> fetch(r)
2-element Array{Float64,1}:
0.149753
0.687653
julia> remotecall_fetch(2, getindex, r, 1)
0.14975250913699378
julia> #spawnat 2 setindex!(fetch(r), 320.0, 1)
RemoteRef{Channel{Any}}(2,1,6)
julia> fetch(r)
2-element Array{Float64,1}:
320.0
0.687653
julia> #spawnat 2 setindex!(fetch(r), 950.0, 2)
RemoteRef{Channel{Any}}(2,1,8)
julia> fetch(r)
2-element Array{Float64,1}:
320.0
950.0
But with remotecall_fetch, it looks like the returned array is really a copy:
julia> remotecall_fetch(2, setindex!, fetch(r), 878.99, 1)
2-element Array{Float64,1}:
878.99
950.0
julia> remotecall_fetch(2, setindex!, fetch(r), 232.99, 2)
2-element Array{Float64,1}:
320.0
232.99
julia> fetch(r)
2-element Array{Float64,1}:
320.0
950.0
with: Julia Version 0.4.3
You may find Distributed Arrays useful, based on the description of your need.

Resources