Sorting array of struct in Julia - sorting

Suppose if I have the following in Julia:
mutable struct emptys
begin_time::Dict{Float64,Float64}; finish_time::Dict{Float64,Float64}; Revenue::Float64
end
population = [emptys(Dict(),Dict(),-Inf) for i in 1:n_pop] #n_pop is a large positive integer value.
for ind in 1:n_pop
r = rand()
append!(population[ind].Revenue, r)
append!(population[ind].begin_time, Dict(r=>cld(r^2,rand())))
append!(population[ind].finish_time, Dict(r=>r^3/rand()))
end
Now I want to sort this population based on the Revenue value. Is there any way for Julia to achieve this? If I were to do it in Python it would be something like this:
sorted(population, key = lambda x: x.Revenue) # The population in Python can be prepared using https://pypi.org/project/ypstruct/ library.
Please help.

There is a whole range of sorting functions in Julia. The key functions are sort (corresponding to Python's sorted) and sort! (corresponding to Python's list.sort).
And as in Python, they have a couple of keyword arguments, one of which is by, corresponding to key.
Hence the translation of
sorted(population, key = lambda x: x.Revenue)
would be
getrevenue(e::emptys) = e.Revenue
sort(population, by=getrevenue)
Or e -> e.Revenue, but having a getter function is good style anyway.

Related

Julia type instability: Array of LinearInterpolations

I am trying to improve the performance of my code by removing any sources of type instability.
For example, I have several instances of Array{Any} declarations, which I know generally destroy performance. Here is a minimal example (greatly simplified compared to my code) of a 2D Array of LinearInterpolation objects, i.e
n,m=5,5
abstract_arr=Array{Any}(undef,n+1,m+1)
arr_x=LinRange(1,10,100)
for l in 1:n
for alpha in 1:m
abstract_arr[l,alpha]=LinearInterpolation(arr_x,alpha.*arr_x.^n)
end
end
so that typeof(abstract_arr) gives Array{Any,2}.
How can I initialize abstract_arr to avoid using Array{Any} here?
And how can I do this in general for Arrays whose entries are structures like Dicts() where the Dicts() are dictionaries of 2-tuples of Float64?
If you make a comprehension, the type will be figured out for you:
arr = [LinearInterpolation(arr_x, ;alpha.*arr_x.^n) for l in 1:n, alpha in 1:m]
isconcretetype(eltype(arr)) # true
When it can predict the type & length, it will make the right array the first time. When it cannot, it will widen or extend it as necessary. So probably some of these will be Vector{Int}, and some Vector{Union{Nothing, Int}}:
[rand()>0.8 ? nothing : 0 for i in 1:3]
[rand()>0.8 ? nothing : 0 for i in 1:3]
[rand()>0.8 ? nothing : 0 for i in 1:10]
The main trick is that you just need to know the type of the object that is returned by LinearInterpolation, and then you can specify that instead of Any when constructing the array. To determine that, let's look at the typeof one of these objects
julia> typeof(LinearInterpolation(arr_x,arr_x.^2))
Interpolations.Extrapolation{Float64, 1, ScaledInterpolation{Float64, 1, Interpolations.BSplineInterpolation{Float64, 1, Vector{Float64}, BSpline{Linear{Throw{OnGrid}}}, Tuple{Base.OneTo{Int64}}}, BSpline{Linear{Throw{OnGrid}}}, Tuple{LinRange{Float64}}}, BSpline{Linear{Throw{OnGrid}}}, Throw{Nothing}}
This gives a fairly complicated type, but we don't necessarily need to use the whole thing (though in some cases it might be more efficient to). So for instance, we can say
using Interpolations
n,m=5,5
abstract_arr=Array{Interpolations.Extrapolation}(undef,n+1,m+1)
arr_x=LinRange(1,10,100)
for l in 1:n
for alpha in 1:m
abstract_arr[l,alpha]=LinearInterpolation(arr_x,alpha.*arr_x.^n)
end
end
which gives us a result of type
julia> typeof(abstract_arr)
Matrix{Interpolations.Extrapolation} (alias for Array{Interpolations.Extrapolation, 2})
Since the return type of this LinearInterpolation does not seem to be of known size, and
julia> isbitstype(typeof(LinearInterpolation(arr_x,arr_x.^2)))
false
each assignment to this array will still trigger allocations, and consequently there actually may not be much or any performance gain from the added type stability when it comes to filling the array. Nonetheless, there may still be performance gains down the line when it comes to using values stored in this array (depending on what is subsequently done with them).

How do I tell Julia in which order to iterate in an Array?

This answser led me to an other question:
When defining a new structure like this one:
struct ReversedRowMajor{T,A} <: AbstractMatrix{T}
data::A
end
ReversedRowMajor(data::AbstractMatrix{T}) where {T} = ReversedRowMajor{T, typeof(data)}(data)
Base.size(R::ReversedRowMajor) = reverse(size(R.data))
Base.getindex(R::ReversedRowMajor, i::Int, j::Int) = R.data[end-j+1, end-i+1]
If R is a ReversedRowMajor array, when accessing R[:,:] Julia will iterate through the CartesianIndices in the order which should be the fastest for the array i.e. the memory order (see array performance tips) but in this case it's not the expected one, since we are permutating the indices: (i,j) → (end-j+1, end-i+1).
So the question is: given an array, is there a way to tell Julia which axis is the fastest one?
(see also Multidimensional algorithms and iteration)

Bloom filters and its multiple hash functions

I'm implementing a simple Bloom filter as an exercise.
Bloom filters require multiple hash functions, which for practical purposes I don't have.
Assuming I want to have 3 hash functions, isn't it enough to just take the hash of the object I'm checking membership for, hashing it (with murmur3) and then add +1, +2, +3 (for the 3 different hashes) before hashing them again?
As the murmur3 function has a very good avalanche effect (really spreads out results) wouldn't this for all purposes be reasonable?
Pseudo-code:
function generateHashes(obj) {
long hash = murmur3_hash(obj);
long hash1 = murmur3_hash(hash+1);
long hash2 = murmur3_hash(hash+2);
long hash3 = murmur3_hash(hash+3);
(hash1, hash2, hash3)
}
If not, what would be a simple, useful approach to this? I'd like to have a solution that would allow me to easily scale for more hash functions if needed be.
AFAIK, the usual approach is to not actually use multiple hash functions. Rather, hash once and split the resulting hash into 2, 3, or how many parts you want for your Bloom filter. So for example create a hash of 128 bits and split it into 2 hashes 64 bit each.
https://github.com/Claudenw/BloomFilter/wiki/Bloom-Filters----An-overview
The hashing functions of Bloom filter should be independent and random enough. MurmurHash is great for this purpose. So your approach is correct, and you can generate as many new hashes your way. For the educational purposes it is fine.
But in real world, running hashing function multiple times is slow, so the usual approach is to create ad-hoc hashes without actually calculating the hash.
To correct #memo, this is done not by splitting the hash into multiple parts, as the width of the hash should remain constant (and you can't split 64 bit hash to more than 64 parts ;) ). The approach is to get a two independent hashes and combine them.
function generateHashes(obj) {
// initialization phase
long h1 = murmur3_hash(obj);
long h2 = murmur3_hash(h1);
int k = 3; // number of desired hash functions
long hash[k];
// generation phase
for (int i=0; i<k; i++) {
hash[i] = h1 + (i*h2);
}
return hash;
}
As you see, this way creating a new hash is a simple multiply-add operation.
It would not be a good approach. Let me try and explain. Bloom filter allows you to test if an element most likely belongs to a set, or if it absolutely doesn’t. In others words, false positives may occur, but false negatives won’t.
Reference: https://sc5.io/posts/what-are-bloom-filters-and-why-are-they-useful/
Let us consider an example:
You have an input string 'foo' and we pass it to the multiple hash functions. murmur3 hash gives the output K, and subsequent hashes on this hash value give x, y and z
Now assume you have another string 'bar' and as it happens, its murmur3 hash is also K. The remaining hash values? They will be x, y and z because in your proposed approach the subsequent hash functions are not dependent on the input, but instead on the output of first hash function.
long hash1 = murmur3_hash(hash+1);
long hash2 = murmur3_hash(hash+2);
long hash3 = murmur3_hash(hash+3);
As explained in the link, the purpose is to perform a probabilistic search in a set. If we perform search for 'foo' or for 'bar' we would say that it is 'likely' that both of them are present. So the % of false positives will increase.
In other words this bloom filter will behave like a simple hash-function. The 'bloom' aspect of it will not come into picture because only the first hash function is determining the outcome of search.
Hope I was able to explain sufficiently. Let me know in comments if you have some more follow-up queries. Would be happy to assist.

Cannot convert object of type 'list' to a numeric value

I am making a pyomo model, where i want to use random numbers for my two dimensional parameters. I put a small python script for random numbers that looks exactly what i wanted to see for my two dimensional parameter. I am getting a TypeError: Cannot convert object of type 'list'(value =[[....]] to a numeric value. in my objective function. Below is my objective function and random numbers script.
model.obj = Objective(expr=sum(model.C[v,l] * model.T[v,l] for v in model.V for l in model.L) + \
sum(model.OC[o,l] * model.D[o,l] for o in model.O for l in model.L), sense=minimize)
import random
C = [[] for i in range(7)]
for i in range(7):
for j in range(5):
C[i]+= [random.randint(100,500)]
model.C = Param(model.V, model.L, initialize=C)
Please let me know if someone can help fixing this.
You should initialize your parameter using a function instead of a nested list
def init_c(m, i, j):
return random.randint(100,500)
model.c = Param(model.V, model.L, initialize=init_c)

Shared Array with more complicated element type

A simplified example:
nsize = 100
vsize = 10000
varray = [rand(vsize) for i in 1:nsize] #say, I have a set of vectors.
for k in 1:nsize
varray[k] = rand(vsize, vsize) * varray[k]
end
Obviously, the above for loop can be parallelized.
According to Parallel Map and Loops in Julia manual,
I need to used SharedArray. However, ShardArray cannot have Array{Float64,1} as element type.
julia> a = SharedArray(Array{Float64,1}, nsize)
ERROR: ArgumentError: type of SharedArray elements must be bits types, got Array{Float64,1}
in __SharedArray#138__ at sharedarray.jl:45
in SharedArray at sharedarray.jl:116
How can I solve this problem?
Currently, you can't, because a SharedArray requires a contiguous block of memory, which means that its elements must be "bits types," and this isn't true for Array. (Array is implemented in C and has some header information, which makes them not densely-packable.)
However, if all of your "element" arrays have the same size, and you don't absolutely require the ability to modify single elements of the "inner" arrays, you could try using StaticArrays as elements. (Thanks to #Wouter in the comments below for pointing out that this needed updating.)

Resources