Replace values with for loop - for-loop

Suppose I have the following function:
function y1(x)
y = x^(2) - 4
return y
end
Now, I want to evaluate all the values from this sequence: collect(range(-10,10, 1000))
I tried this
y_1 = zeros(1000);
for x in collect(range(-10, 10, 1000))
y_1 = y1.(x)
end
Note that I use the broadcast operator to apply the function y1 for every value that takes the iterator. But if I don't use it I get the same result.
But as an answer, I just get 96.0.
How can I refill the y_1 vector with the for loop, so I get the evaluated values?
The evaluated vector should be of size 1000
Thanks in advance!
Edit:
I found a way to get to my desired result without the for loop:
y_1 = y1.(collect(range(-10, 10, 1000)))
But I still want to know how can I do it in a loop.

The broadcast operator broadcasts the function over the entire iterator by itself i.e. y1.(arr) will
call y1 on each of the elements of the array arr
collect the results of all those calls, and
allocate memory to store those results as an array too
So the following are all equivalent in terms of functionality:
julia> arr = range(-4, 5, length = 10) #define a simple range
-4.0:1.0:5.0
julia> y1.(arr)
10-element Vector{Float64}:
12.0
5.0
0.0
-3.0
-4.0
-3.0
0.0
5.0
12.0
21.0
julia> [y1(x) for x in arr]
10-element Vector{Float64}:
(same values as above)
julia> map(y1, arr)
10-element Vector{Float64}:
(same values as above)
julia> y_1 = zeros(10);
julia> for (i, x) in pairs(arr)
y_1[i] = y1(x)
end
julia> y_1
10-element Vector{Float64}:
(same values as above)
In practice, there maybe other considerations, including performance, that decides between these and other choices.
As an aside, note that very often you don't want to collect a range in Julia i.e. don't think of collect as somehow equivalent to c() in R. For many operations, the ranges can be directly used, including for iteration in for loops. collect should only be necessary in the rare cases where an actual Vector is necessary, for eg. a value in the middle of the array needs to be changed for some reason. As a general rule, use the range results as they are, until and unless you get an error that requires you to change it.

Related

Julia: random multinomial numbers allocation

I have to generate a vector of random number drawn from a Multinomial distribution. Since I need to do it inside a loop I would like to reduce the number of allocations.
This is my code:
pop = zeros(100)
p = rand(100)
p /= sum(p)
#imagine the next line inside a for loop
#time pop = rand(Multinomial(100, p)) #test1
#time pop .= rand(Multinomial(100, p)) #test2
Why test1 is 2 allocations and test2 is 4? Is there a way of doing it with 0 allocations?
Thanks in advance.
Each sample of a multinomial distribution Multinomial(n, p) is a n-dimensional integer vector that sums to n. So this vector is allocated inside of rand.
I believe you want using rand! which works in-place:
julia> m = Multinomial(100, p);
julia> #time rand!(m, pop);
0.000010 seconds
Note that I create m object first, as creation of it allocates, so it should be done once.

Faster image resizing

I have a stack of images (3D array) and I want to improve their resolution (upsampling). I run the following code snippet that I find a little slow ...
Is there any way to improve the speed of this piece of code? (without using multiprocessing)
using BenchmarkTools
using Interpolations
function doInterpol(arr::Array{Int, 2}, h, w)
A = interpolate(arr, BSpline(Linear()))
return A[1:2/(h-1)/2:2, 1:2/(w-1)/2:2]
end
function applyResize!(arr3D_hd::Array, arr3D_ld::Array, t::Int, h::Int, w::Int)
for i = 1:1:t
#inbounds arr3D_hd[i, :, :] = doInterpol(arr3D_ld[i, :, :], h, w)
end
end
t, h, w = 502, 65, 47
h_target, w_target = 518, 412
arr3D_ld = reshape(collect(1:t*h*w), (t, h, w))
arr3D_hd = Array{Float32}(undef, t, h_target, w_target)
applyResize!(arr3D_hd, arr3D_ld, t, h_target, w_target)
When I benchmark the following:
#btime applyResize!(arr3D_hd, arr3D_ld, t, h_target, w_target)
I got :
2.334 s (68774 allocations: 858.01 MiB)
I ran it multiple time and results are in [1.8s - 2.8s] interval.
Julia stores arrays in column-major order. This means that slices like arr[i, : ,:] perform much worse than arr[:,:,i] (which is contiguous in memory). Therefore, a way to gain some speed is to index your arrays using (h,w,t) rather than (t, w, h).
A second issue is that taking slices like arr[i,:,:] copies data. It seems to have negligible impact here, but it might be good to get into the habit of using array views instead of slices when you can. A view is a small wrapper object that behaves in the same way as a slice of a larger array, but does not hold a copy of the data: it directly accesses the data of the parent array (see the example below to maybe better understand what a view is).
Note that both these issues are mentioned in the Julia performance tips; it might be useful to read the remaining pieces of advice in this page.
Putting this together, your example can be rewritten like:
function applyResize2!(arr3D_hd::Array, arr3D_ld::Array, h::Int, w::Int, t)
#inbounds for i = 1:1:t
A = interpolate(#view(arr3D_ld[:, :, i]), BSpline(Linear()))
arr3D_hd[:, :, i] .= A(1:2/(h-1)/2:2, 1:2/(w-1)/2:2)
end
end
which is used with arrays stored a bit differently from your case:
# Note the order of indices
julia> arr3D_ld = reshape(collect(1:t*h*w), (h, w, t));
julia> arr3D_hd = Array{Float32}(undef, h_target, w_target, t);
# Don't forget to escape arguments with a $ when using btime
# (not really an issue here, but could have been one)
julia> #btime applyResize2!($arr3D_hd, $arr3D_ld, h_target, w_target, t)
506.449 ms (6024 allocations: 840.11 MiB)
This is roughly a speed-up by a factor 3.4 w.r.t your original code, which benchmarks like this on my machine:
julia> arr3D_ld = reshape(collect(1:t*h*w), (t, h, w));
julia> arr3D_hd = Array{Float32}(undef, t, h_target, w_target);
julia> #btime applyResize!($arr3D_hd, $arr3D_ld, t, h_target, w_target)
1.733 s (50200 allocations: 857.30 MiB)
NB: Your original code uses a syntax like A[x, y] to get interpolated values. This seems to be deprecated in favor of A(x, y). I might not have the same version of Interpolations as you, though...
Example illustrating the behavior of views
julia> a = rand(3,3)
3×3 Array{Float64,2}:
0.042097 0.767261 0.0433798
0.791878 0.764044 0.605218
0.332268 0.197196 0.722173
julia> v = #view(a[:,2]) # creates a view instead of a slice
3-element view(::Array{Float64,2}, :, 2) with eltype Float64:
0.7672610491393876
0.7640443797187411
0.19719581867637093
julia> v[3] = 42 # equivalent to a[3,2] = 42
42
Use
itp = interpolate(arr3D_ld, (NoInterp(), BSpline(Linear()), BSpline(Linear())));
A = itp(1:size(itp,1), 1:2/517:2, 1:2/411:2);
It should give a ~7x performance improvement compared to your version.
As François Févotte noted, it's also important to pay attention to deprecation warnings, as they slow down execution.

reduction parallel loop in julia

We can use
c = #parallel (vcat) for i=1:10
(i,i+1)
end
But when I'm trying to use push!() instead of vcat() I'm getting some error. How can I use push!() in this parallel loop?
c = #parallel (push!) for i=1:10
(c, (i,i+1))
end
The #parallel is somewhat similar to foldl(op, itr) in that it uses the first value of itr as an initial first parameter for op. push! lacks the required symmetry between the operands. Perhaps what you are looking for is:
julia> c = #parallel (append!) for i=1:10
[(i,i+1)]
end
Elaborating a bit on Dan's point; to see how the parallel macro works, see the difference between the following two invocations:
julia> #parallel print for i in 1:10
(i,i+1)
end
(1, 2)(2, 3)nothing(3, 4)nothing(4, 5)nothing(5, 6)nothing(6, 7)nothing(7, 8)nothing(8, 9)nothing(9, 10)nothing(10, 11)
julia> #parallel string for i in 1:10
(i,i+1)
end
"(1, 2)(2, 3)(3, 4)(4, 5)(5, 6)(6, 7)(7, 8)(8, 9)(9, 10)(10, 11)"
From the top one it should be clear what's going on. Each iteration produces an output. When it comes to using the specified function on those outputs, this is done in output pairs. Two first pair of outputs is fed to print, and the result of the print operation then becomes the first item in the next pair to be processed. Since the output is nothing, print prints nothing then (3,4). The result of this print statement is nothing, therefore the next pair to be printed is nothing and (4,5), and so on until all elements are consumed. I.e. in terms of pseudocode, this is what's happening:
Step 1: state = print((1,2), (2,3)); # state becomes nothing
Step 2: state = print(state, (3,4)); # state becomes nothing again
Step 3: state = print(state, (4,5)); # and so forth
The reason string works as expected is because what's happening is the following steps:
Step 1: state = string((1,2),(2,3));
Step 2: state = string(state, (3,4));
Step 3: state = string(state, (4,5);
etc
In general, the function you pass to the parallel macro should be something that takes two inputs of the same type, and outputs an object of the same type.
Therefore you cannot use push!, because this always uses two inputs of different types (one array, and one plain element), and outputs an array. Therefore you need to use append! instead, which fits the specification.
Also note that the order of outputs is not guaranteed. (here it happens to be in order because I only used 1 worker). If you want something where the order of operations matters, then you shouldn't use this construct. E.g., obviously in something like addition it doesn't matter, because addition is a completely associative operation; but if I used string, if outputs are processed in different order, then obviously you could end up with a different string than what you'd expect.
EDIT - addressing benchmark between vcat / append! / indexed assignment
I think the most efficient way to do this is in fact via normal indexing onto a preallocated array. But between append! and vcat, append will most certainly be faster as vcat always makes a copy (as I understand it).
Benchmarks:
function parallelWithVcat!( A::Array{Tuple{Int64, Int64}, 1} )
A = #parallel vcat for i = 1:10000
(i, i+1)
end
end;
function parallelWithFunction!( A::Array{Tuple{Int64, Int64}, 1} )
A = #parallel append! for i in 1:10000
[(i, i+1)];
end
end;
function parallelWithPreallocation!( A::Array{Tuple{Int64, Int64}, 1} )
#parallel for i in 1:10000
A[i] = (i, i+1);
end
end;
A = Array{Tuple{Int64, Int64}, 1}(10000);
### first runs omitted, all benchmarks here are from 2nd runs ###
# first on a single worker:
#time for n in 1:100; parallelWithVcat!(A); end
#> 8.050429 seconds (24.65 M allocations: 75.341 GiB, 15.42% gc time)
#time for n in 1:100; parallelWithFunction!(A); end
#> 0.072325 seconds (1.01 M allocations: 141.846 MiB, 52.69% gc time)
#time for n in 1:100; parallelWithPreallocation!(A); end
#> 0.000387 seconds (4.21 k allocations: 234.750 KiB)
# now with true parallelism:
addprocs(10);
#time for n in 1:100; parallelWithVcat!(A); end
#> 1.177645 seconds (160.02 k allocations: 109.618 MiB, 0.75% gc time)
#time for n in 1:100; parallelWithFunction!(A); end
#> 0.060813 seconds (111.87 k allocations: 70.585 MiB, 3.91% gc time)
#time for n in 1:100; parallelWithPreallocation!(A); end
#> 0.058134 seconds (116.16 k allocations: 4.174 MiB)
If someone can suggest an even more efficient way, please do so!
Note in particular that the indexed assignment is much faster than the rest, such that it appears (for this example at least) that most of its computation in the parallel case appears to be lost on the parallelisation itself.
Disclaimer: I make no claim that the above are correct summonings of the #parallel spell. I have not delved into the inner workings of the macro in detail to be able to claim otherwise. In particular, I am not aware which parts the macro causes to be processed remotely vs local (e.g. the assignment part). Caution is advised, ymmv, etc.

Efficient partial permutation sort in Julia

I am dealing with a problem that requires a partial permutation sort by magnitude in Julia. If x is a vector of dimension p, then what I need are the first k indices corresponding to the k components of x that would appear first in a partial sort by absolute value of x.
Refer to Julia's sorting functions here. Basically, I want a cross between sortperm and select!. When Julia 0.4 is released, I will be able to obtain the same answer by applying sortperm! (this function) to the vector of indices and choosing the first k of them. However, using sortperm! is not ideal here because it will sort the remaining p-k indices of x, which I do not need.
What would be the most memory-efficient way to do the partial permutation sort? I hacked a solution by looking at the sortperm source code. However, since I am not versed in the ordering modules that Julia uses there, I am not sure if my approach is intelligent.
One important detail: I can ignore repeats or ambiguities here. In other words, I do not care about the ordering by abs() of indices for two components 2 and -2. My actual code uses floating point values, so exact equality never occurs for practical purposes.
# initialize a vector for testing
x = [-3,-2,4,1,0,-1]
x2 = copy(x)
k = 3 # num components desired in partial sort
p = 6 # num components in x, x2
# what are the indices that sort x by magnitude?
indices = sortperm(x, by = abs, rev = true)
# now perform partial sort on x2
select!(x2, k, by = abs, rev = true)
# check if first k components are sorted here
# should evaluate to "true"
isequal(x2[1:k], x[indices[1:k]])
# now try my partial permutation sort
# I only need indices2[1:k] at end of day!
indices2 = [1:p]
select!(indices2, 1:k, 1, p, Base.Perm(Base.ord(isless, abs, true, Base.Forward), x))
# same result? should evaluate to "true"
isequal(indices2[1:k], indices[1:k])
EDIT: With the suggested code, we can briefly compare performance on much larger vectors:
p = 10000; k = 100; # asking for largest 1% of components
x = randn(p); x2 = copy(x);
# run following code twice for proper timing results
#time {indices = sortperm(x, by = abs, rev = true); indices[1:k]};
#time {indices2 = [1:p]; select!(indices2, 1:k, 1, p, Base.Perm(Base.ord(isless, abs, true, Base.Forward), x))};
#time selectperm(x,k);
My output:
elapsed time: 0.048876901 seconds (19792096 bytes allocated)
elapsed time: 0.007016534 seconds (2203688 bytes allocated)
elapsed time: 0.004471847 seconds (1657808 bytes allocated)
The following version appears to be relatively space-efficient because it uses only an integer array of the same length as the input array:
function selectperm (x,k)
if k > 1 then
kk = 1:k
else
kk = 1
end
z = collect(1:length(x))
return select!(z,1:k,by = (i)->abs(x[i]), rev = true)
end
x = [-3,-2,4,1,0,-1]
k = 3 # num components desired in partial sort
print (selectperm(x,k))
The output is:
[3,1,2]
... as expected.
I'm not sure if it uses less memory than the originally-proposed solution (though I suspect the memory usage is similar) but the code may be clearer and it does produce only the first k indices whereas the original solution produced all p indices.
(Edit)
selectperm() has been edited to deal with the BoundsError that occurs if k=1 in the call to select!().

Tuple unpacking: dummy variable vs index

What is the usual/clearest way to write this in Python?
value, _ = func_returning_a_tuple()
or:
value = func_returning_a_tuple()[0]
value = func_returning_a_tuple()[0] seems clearer and also can be generalized.
What if the function was returning a tuple with more than 2 values?
What if the program logic is interested in the 4th element of an umpteen tuple?
What if the size of the returned tuple varies?
None of these questions affects the subcript-based idiom, but do in the case of multi-assignement idiom.
If you'd appreciate a handy way to do this in python3.x, check out the python enhancement proposal (PEP) 3132 on this page of What's New in Python:
Extended Iterable Unpacking. You can now write things like a, b, *rest = some_sequence. And even *rest, a = stuff. The rest object is always a (possibly empty) list; the right-hand side may be any iterable. Example:
(a, *rest, b) = range(5)
This sets a to 0, b to 4, and rest to [1, 2, 3].
For extracting a single item, indexing is a bit more idiomatic. When you're extracting two or more items, unpacking becomes more idiomatic. That's just empirical observation on my part; I don't know of any style guides recommending or mandating either choice!-)
For list/generator comprehensions with key/value pairs I think the usage of the dummy variable can be quite neat, especially where the unpacked value needs to be used more than once (avoiding repeated indexing), e.g.:
l = [('a', 1.54), ('b', 4.34), ('c', 3.22), ('d', 6.43)]
s = [x * (1.0 - x) * (2.0 - x) for _, x in l]
versus:
s = [x[0] * (1.0 - x[0]) * (2.0 - x[0]) for x in l]
Another thing to note is that while unpacking and indexing are roughly as expensive as one another, extended unpacking seems to be an order of magnitude slower.
With Python 3.2 using %timeit in IPython:
Regular unpacking:
>>> x = (1, 2)
>>> %timeit y, _ = x
10000000 loops, best of 3: 50 ns per loop
>>> %timeit y, _ = x
10000000 loops, best of 3: 50.4 ns per loop
Extended unpacking:
>>> x = (1, 2, 3)
>>> %timeit y, *_ = x
1000000 loops, best of 3: 1.02 us per loop
>>> %timeit y = x[0]
10000000 loops, best of 3: 68.9 ns per loop

Resources