Confused about type declarations in Julia

Confused about type declarations in Julia - performance

I'm trying to define a struct NodeLocator
#with_kw mutable struct NodeLocator{T<:Int64}
length::T = 0
value::Vector{T} = [0]
maxtreelength::T = 0 end
And there are no problems with it.
However, I thought that my definition would be cleaner if I instead did
#with_kw mutable struct NodeLocator{T<:Union{Int64, Vector{Int64}}} # changed definition of T
length::T = 0
value::T = [0] # changed type declaration here
maxtreelength::T = 0 end
#since
julia> Vector{Int64} <: Union{Int64,Vector{Int64}}
true
But in this case I get an error when constructing objects of this type:
ERROR: MethodError: no method matching NodeLocator(::Int64,
::Vector{Int64}, ::Int64) Closest candidates are: NodeLocator(::T,
!Matched::T, ::T) where T<:Union{Int64, Vector{Int64}} at
C:\Users\ivica.julia\packages\Parameters\MK0O4\src\Parameters.jl:526
For context, their construction includes an elementwise addition of Int64s to NodeLocator.value, and I think that what's happening is that after the first step the type of that field gets locked at Int64 and cannot be changed to Vector{Int64}.
How can I parameterise the type declaration of NodeLocator.value without running into this error? - and does it make sense to do so? My main issue is computation time so performance is quite important for me.

It appears that you are using the #with_kw macro from the Parameters package, so let's start by loading that package:
julia> using Parameters
Now, copying your definition
julia> #with_kw mutable struct NodeLocator{T<:Union{Int64, Vector{Int64}}}
length::T = 0
value::T = [0] # changed type declaration here
maxtreelength::T = 0 end
NodeLocator
we can see that this works fine if we construct a struct with either all ints or all vectors
julia> NodeLocator(0,0,0)
NodeLocator{Int64}
length: Int64 0
value: Int64 0
maxtreelength: Int64 0
julia> NodeLocator([0],[0],[0])
NodeLocator{Vector{Int64}}
length: Array{Int64}((1,)) [0]
value: Array{Int64}((1,)) [0]
maxtreelength: Array{Int64}((1,)) [0]
but fails for mixtures, including the default you have specified (effectively, (0, [0], 0))
julia> NodeLocator([0],[0],0)
ERROR: MethodError: no method matching NodeLocator(::Vector{Int64}, ::Vector{Int64}, ::Int64)
Closest candidates are:
NodeLocator(::T, ::T, ::T) where T<:Union{Int64, Vector{Int64}} at ~/.julia/packages/Parameters/MK0O4/src/Parameters.jl:526
Stacktrace:
[1] top-level scope
# REPL[38]:1
julia> NodeLocator() # use defaults
ERROR: MethodError: no method matching NodeLocator(::Int64, ::Vector{Int64}, ::Int64)
Closest candidates are:
NodeLocator(::T, ::T, ::T) where T<:Union{Int64, Vector{Int64}} at ~/.julia/packages/Parameters/MK0O4/src/Parameters.jl:526
Stacktrace:
[1] NodeLocator(; length::Int64, value::Vector{Int64}, maxtreelength::Int64)
# Main ~/.julia/packages/Parameters/MK0O4/src/Parameters.jl:545
[2] NodeLocator()
# Main ~/.julia/packages/Parameters/MK0O4/src/Parameters.jl:545
[3] top-level scope
# REPL[39]:1
This is because as written, you have forced each field of the struct to be the same type T which is in turn a subtype of the union.
If you really want each type to be allowed to be a Union, you could instead write something along the lines of:
julia> #with_kw mutable struct NodeLocator{T<:Int64}
length::Union{T,Vector{T}} = 0
value::Union{T,Vector{T}} = [0] # changed type declaration here
maxtreelength::Union{T,Vector{T}} = 0 end
julia> NodeLocator() # use defaults
NodeLocator{Int64}
length: Int64 0
value: Array{Int64}((1,)) [0]
maxtreelength: Int64 0
However, unless you really need the flexibility for all three fields to be unions, your original definition is actually much preferable. Adding Unions where you don't need them can only slow you down.
One thing that would make the code cleaner and faster would be if you could use a non-mutable struct. This is often possible even when you might not realize it at first, since an Array within a non-mutable struct is still itself mutable, so you can swap out elements within that array without needing the whole struct to be mutable. But there are certainly cases where you do indeed need the mutability -- and if you do need that, go for it.
Finally, I should note that there is not necessarily much point in constraining the numeric type to be T<:Int64 unless that is really the only numeric type you'll ever need. You could easily make your code more general by instead writing something like T<:Integer or T<:Number, which should not generally have any adverse performance costs.

Related

How is the itab struct actually having a list of function pointers?

Researching the interface value in go - I found a great (maybe outdated) article by Russ Cox.
According to it:
The itable begins with some metadata about the types involved and then becomes a list of function pointers.
The implementation for this itable should be the one from src/runtime/runtime2.go:
type itab struct {
inter *interfacetype
_type *_type
hash uint32 // copy of _type.hash. Used for type switches.
_ [4]byte
fun [1]uintptr // variable sized. fun[0]==0 means _type does not implement inter.
}
First confusing thing is - how is an array - variable sized?
Second, assuming that we have a function pointer at index 0 for a method that satisfies the interface, where could we store a second/third/... function pointer?

The compiled code and runtime access fun as if the field is declared fun [n]uintpr where n is the number of methods in the interface. The second method is stored at fun[1], the third at fun[2] and so on. The Go Language does not have a variable size array feature like this, but unsafe shenanigans can be used to simulate the feature.
Here's how itab is allocated:
m = (*itab)(persistentalloc(unsafe.Sizeof(itab{})+uintptr(len(inter.mhdr)-1)*goarch.PtrSize, 0, &memstats.other_sys))
The function persistentalloc allocates memory. The first argument to the function is the size to allocate. The expression inter.mhdr is the number of methods in the interface.
Here's code that creates a slice on the variable size array:
methods := (*[1 << 16]unsafe.Pointer)(unsafe.Pointer(&m.fun[0]))[:ni:ni]
The expression methods[i] refers to the same element as m.fun[i] in a hypothetical world where m.fun is a variable size array with length > i. Later code uses normal slice syntax with methods to access the variable size array m.fun.

Julia type instability: Array of LinearInterpolations

I am trying to improve the performance of my code by removing any sources of type instability.
For example, I have several instances of Array{Any} declarations, which I know generally destroy performance. Here is a minimal example (greatly simplified compared to my code) of a 2D Array of LinearInterpolation objects, i.e
n,m=5,5
abstract_arr=Array{Any}(undef,n+1,m+1)
arr_x=LinRange(1,10,100)
for l in 1:n
for alpha in 1:m
abstract_arr[l,alpha]=LinearInterpolation(arr_x,alpha.*arr_x.^n)
end
end
so that typeof(abstract_arr) gives Array{Any,2}.
How can I initialize abstract_arr to avoid using Array{Any} here?
And how can I do this in general for Arrays whose entries are structures like Dicts() where the Dicts() are dictionaries of 2-tuples of Float64?

If you make a comprehension, the type will be figured out for you:
arr = [LinearInterpolation(arr_x, ;alpha.*arr_x.^n) for l in 1:n, alpha in 1:m]
isconcretetype(eltype(arr)) # true
When it can predict the type & length, it will make the right array the first time. When it cannot, it will widen or extend it as necessary. So probably some of these will be Vector{Int}, and some Vector{Union{Nothing, Int}}:
[rand()>0.8 ? nothing : 0 for i in 1:3]
[rand()>0.8 ? nothing : 0 for i in 1:3]
[rand()>0.8 ? nothing : 0 for i in 1:10]

The main trick is that you just need to know the type of the object that is returned by LinearInterpolation, and then you can specify that instead of Any when constructing the array. To determine that, let's look at the typeof one of these objects
julia> typeof(LinearInterpolation(arr_x,arr_x.^2))
Interpolations.Extrapolation{Float64, 1, ScaledInterpolation{Float64, 1, Interpolations.BSplineInterpolation{Float64, 1, Vector{Float64}, BSpline{Linear{Throw{OnGrid}}}, Tuple{Base.OneTo{Int64}}}, BSpline{Linear{Throw{OnGrid}}}, Tuple{LinRange{Float64}}}, BSpline{Linear{Throw{OnGrid}}}, Throw{Nothing}}
This gives a fairly complicated type, but we don't necessarily need to use the whole thing (though in some cases it might be more efficient to). So for instance, we can say
using Interpolations
n,m=5,5
abstract_arr=Array{Interpolations.Extrapolation}(undef,n+1,m+1)
arr_x=LinRange(1,10,100)
for l in 1:n
for alpha in 1:m
abstract_arr[l,alpha]=LinearInterpolation(arr_x,alpha.*arr_x.^n)
end
end
which gives us a result of type
julia> typeof(abstract_arr)
Matrix{Interpolations.Extrapolation} (alias for Array{Interpolations.Extrapolation, 2})
Since the return type of this LinearInterpolation does not seem to be of known size, and
julia> isbitstype(typeof(LinearInterpolation(arr_x,arr_x.^2)))
false
each assignment to this array will still trigger allocations, and consequently there actually may not be much or any performance gain from the added type stability when it comes to filling the array. Nonetheless, there may still be performance gains down the line when it comes to using values stored in this array (depending on what is subsequently done with them).

What is the difference of immutable and mutable variables?

I have read almost every definition of immutable/mutable variables on the internet but as a beginner I just do not grasp it fully so I was wondering if someone could really explain it in layman terms.
Immutable variables (or objects) in any programming language is what I understand when you cannot change the value of that variable after it has been assigned a value. For example, I am using the Haskell programming language, and I write:
let x = 5
Since Haskell has immutable variables, x can never have any other value than 5. So if I after that line of code write:
x = 2
I have in fact not changed the value of x but made a new variable with the same name, which will now be the one that is referenced when I call x, so after both lines of code I can only reach an x with the value of 2.
But what is a mutable variable then, and what programming languages have it? This is where it gets foggy for me. Because when people say mutable variable, they are obviously referring to a variable or object which value you can indeed change after it has been assigned an initial value.
Does this mean that if you have a mutable variable you actually manipulate that place in the computers memory for that variable, and in case of immutable variable you cannot manipulate that place in the computers memory or what?
I don't know how to explain my question any further, as I said, I understand that mutable = can change value of variable after initial value assignment, immutable = cannot. I get the definition. But I don't understand what it actually means in terms of what is going on "behind the scenes". I guess I am looking for easy examples on actual mutable variables.

This has nothing to do with immutability
let x = 5
x = 2
This is reassignment and definitely not allowed in Haskell
First let's look at a regular let assignment
Prelude> let x = 5 in x
5
it :: Num a => a
You can bind x using let, and rebind a new x in a nested let – this effectively shadows the outer x
Prelude> let x = 5 in let x = 2 in x
2
it :: Num a => a
Remember a let is basically a lambda
Prelude> (\x -> x) 5
5
it :: Num a => a
And of course a lambda can return a lambda; illustrates shadowing agian
Prelude> (\x -> (\x -> x)) 5 2
2
it :: Num a => a

I believe a shorthand answer to your question would be that a Mutable variable that holds a value that you would later want to be adjusted.
Depending on the language you're using is its method.
In Kotlin val and var are used to declare variables.
val is a constant while var is adjustable.

Mutable respectively immutable does not concern the variables but the values. Note: one also says the type is (im-)mutable.
For instance if the value is of an immutable type StudentCard, with fields ID and Name, then after creation the fields no longer can be changed. On a name change the student card must be reissued, a new StudentCard must be created.
A more elementary immutable type is String in java. Assigning the same value to two variables is no problem as one variable may not change the shared value. Sharing a mutable value could be dangerous.
A commont mutable type is the array in java and others. Sharing an array, say storing an array parameter in a field, risks that somewhere else the array content is changed inadvertently changing your field.

Shared Array with more complicated element type

A simplified example:
nsize = 100
vsize = 10000
varray = [rand(vsize) for i in 1:nsize] #say, I have a set of vectors.
for k in 1:nsize
varray[k] = rand(vsize, vsize) * varray[k]
end
Obviously, the above for loop can be parallelized.
According to Parallel Map and Loops in Julia manual,
I need to used SharedArray. However, ShardArray cannot have Array{Float64,1} as element type.
julia> a = SharedArray(Array{Float64,1}, nsize)
ERROR: ArgumentError: type of SharedArray elements must be bits types, got Array{Float64,1}
in __SharedArray#138__ at sharedarray.jl:45
in SharedArray at sharedarray.jl:116
How can I solve this problem?

Currently, you can't, because a SharedArray requires a contiguous block of memory, which means that its elements must be "bits types," and this isn't true for Array. (Array is implemented in C and has some header information, which makes them not densely-packable.)
However, if all of your "element" arrays have the same size, and you don't absolutely require the ability to modify single elements of the "inner" arrays, you could try using StaticArrays as elements. (Thanks to #Wouter in the comments below for pointing out that this needed updating.)

Some `Fixnum` properties

I found the following features of Fixnum in doc.
Fixnum objects have immediate value. This means that when they are assigned or passed as parameters, the actual object is passed, rather than a reference to that object.
Can the same be shown in IRB? Hope then only it will be correctly understood by me?
Assignment does not alias Fixnum objects.
What does it actually then?
There is effectively only one Fixnum object instance for any given integer value, so, for example, you cannot add a singleton method to a Fixnum.
Couldn't understand the reason not to add the singleton method with Fixnum object instances.
I gave some try to the second point as below:
a = 1 # => 1
a.class # => Fixnum
b = a # => 1
b.class # => Fixnum
a == b # => true
a.equal? b # => true
a.object_id == b.object_id # => true
But still I am in confusion. Can anyone help me here to understand the core of those features please?

In Ruby, most objects require memory to store their class and instance variables. Once this memory is allocated, Ruby represents each object by this memory location. When the object is assigned to a variable or passed to a function, it is the location of this memory that is passed, not the data at this memory. Singleton methods make use of this. When you define a singleton method, Ruby silently replaces the objects class with a new singleton class. Because each object stores its class, Ruby can easily replace an object's class with a new class that implements the singleton methods (and inherits from the original class).
This is no longer true for objects that are immediate values: true, false, nil, all symbols, and integers that are small enough to fit within a Fixnum. Ruby does not allocate memory for instances of these objects, it does not internally represent the objects as a location in memory. Instead, it infers the instance of the object based on its internal representation. What this means is twofold:
The class of each object is no longer stored in memory at a particular location, and is instead implicitly determined by the type of immediate object. This is why Fixnums cannot have singleton methods.
Immediate objects with the same state (e.g., two Fixnums of integer 2378) are actually the same instance. This is because the instance is determined by this state.
To get a better sense of this, consider the following operations on a Fixnum:
>> x = 3 + 7
=> 10
>> x.object_id == 10.object_id
=> true
>> x.object_id == (15-5).object_id
=> true
Now, consider them using strings:
>> x = "a" + "b"
=> "ab"
>> x.object_id == "ab".object_id
=> false
>> x.object_id == "Xab"[1...3].object_id
=> false
>> x == "ab"
=> true
>> x == "Xab"[1...3]
=> true
The reason the object ids of the Fixnums are equal is that they're immediate objects with the same internal representation. The strings, on the other hand, exist in allocated memory. The object id of each string is the location of its object state in memory.
Some low-level information
To understand this, you have to understand how Ruby (at least 1.8 and 1.9) treat Fixnums internally. In Ruby, all objects are represented in C code by variables of type VALUE. Ruby imposes the following requirements for VALUE:
The type VALUE is is the smallest integer of sufficient size to hold a pointer. This means, in C, that sizeof(VALUE) == sizeof(void*).
Any non-immediate object must be aligned on a 4-byte boundary. This means that any object allocated by Ruby will have address 4*i for some integer i. This also means that all pointers have zero values in their two least significant bits.
The first requirement allows Ruby to store both pointers to objects and immediate values in a variable of type VALUE. The second requirement allows Ruby to detect Fixnum and Symbol objects based on the two least significant bits.
To make this more concrete, consider the internal binary representation of a Ruby object z, which we'll call Rz in a 32-bit architecture:
MSB LSB
3 2 1
1098 7654 3210 9876 5432 1098 7654 32 10
XXXX XXXX XXXX XXXX XXXX XXXX XXXX AB CD
Ruby then interprets Rz, the representation of z, as follows:
If D==1, then z is a Fixnum. The integer value of this Fixnum is stored in the upper 31 bits of the representation, and is recovered by performing an arithmetic right shift to recover the signed integer stored in these bits.
Three special representations are tested (all with D==0)
if Rz==0, then z is false
if Rz==2, then z is true
if Rz==4, then z is nil
If ABCD == 1110, then 'z' is a Symbol. The symbol is converted into a unique ID by right-shifting the eight least-significant bits (i.e., z>>8 in C). On 32-bit architectures, this allows 2^24 different IDs (over 10 million). On 64-bit architectures, this allows 2^48 different IDs.
Otherwise, Rz represents an address in memory for an instance of a Ruby object, and the type of z is determined by the class information at that location.

Well...
This is an internal implementation detail on MRI. You see, in Ruby (on the C side of things), all Ruby objects are a VALUE. In most cases, a VALUE is a pointer to an object living on the stack. But the immediate values (Fixnums, true, false, nil, and something else) live in the VALUE where a pointer to an object would usually live.
It makes the variable the exact same object. Now, I have no idea how this works internally, because it assigns it to the VALUE itself, but it does.
Because every time I use the number 1 in my program, I'm using the same object. So if I define a singleton method on 1, every place in my program, required programs, etc., will have that singleton method. And singleton methods are usually used for local monkeypatching. So, to prevent this, Ruby just doesn't let you.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio