Understanding immutable composite types with fields of mutable types in Julia

Understanding immutable composite types with fields of mutable types in Julia - immutability

Initial note: I'm working in Julia, but this question probably applies to many languages.
Setup: I have a composite type as follows:
type MyType
x::Vector{String}
end
I write some methods to act on MyType. For example, I write a method that allows me to insert a new element in x, e.g. function insert!(d::MyType, itemToInsert::String).
Question: Should MyType be mutable or immutable?
My understanding: I've read the Julia docs on this, as well as more general (and highly upvoted) questions on Stackoverflow (e.g. here or here), but I still don't really have a good handle on what it means to be mutable/immutable from a practical perspective (especially for the case of an immutable composite type, containing a mutable array of immutable types!)
Nonetheless, here is my attempt: If MyType is immutable, then it means that the field x must always point to the same object. That object itself (a vector of Strings) is mutable, so it is perfectly okay for me to insert new elements into it. What I am not allowed to do is try and alter MyType so that the field x points to an entirely different object. For example, methods that do the following are okay:
MyType.x[1] = "NewValue"
push!(MyType.x, "NewElementToAdd")
But methods that do the following are not okay:
MyType.x = ["a", "different", "string", "array"]
Is this right? Also, is the idea that the object that an immutable types field values are locked to are those that are created within the constructor?
Final Point: I apologise if this appears to duplicate other questions on SO. As stated, I have looked through them and wasn't able to get the understanding that I was after.

So here is something mind bending to consider (at least to me):
julia> immutable Foo
data::Vector{Float64}
end
julia> x = Foo([1.0, 2.0, 4.0])
Foo([1.0,2.0,4.0])
julia> append!(x.data, x.data); pointer(x.data)
Ptr{Float64} #0x00007ffbc3332018
julia> append!(x.data, x.data); pointer(x.data)
Ptr{Float64} #0x00007ffbc296ac28
julia> append!(x.data, x.data); pointer(x.data)
Ptr{Float64} #0x00007ffbc34809d8
So the data address is actually changing as the vector grows and needs to be reallocated! But - you can't change data yourself, as you point out.
I'm not sure there is a 100% right answer is really. I primarily use immutable for simple types like the Complex example in the docs in some performance critical situations, and I do it for "defensive programming" reasons, e.g. the code has no need to write to the fields of this type so I make it an error to do so. They are a good choice IMO whenever the type is a sort of an extension of a number, e.g. Complex, RGBColor, and I use them in place of tuples, as a kind of named tuple (tuples don't seem to perform well with Julia right now anyway, wheres immutable types perform excellently).

Related

Julia: Abstract types vs Union of types

I am quite new to Julia, and I still have some doubts on which style is better when trying to do certain things... For instance, I have lots of doubts on the performance or style differences of using Abstract types vs defining Unions.
An example: Let's imagine we want to implement several types of units (Mob, Knight, ...) which should share most (if not all) of their attributes and most (if not all) of their methods.
I see two options to provide structure: First, one could declare an abstract type AbstractUnit from which the other types derive from, and then create methods for the abstract type. It would look something like this:
abstract type AbstractUnit end
mutable struct Knight <: AbstractUnit
id :: Int
[...]
end
mutable struct Peasant <: AbstractUnit
id :: Int
[...]
end
id(u::T) where T <: AbstractUnit = u.id
[...]
Alternatively, one could define a union of types and create methods for the union. It would look something like this:
mutable struct Knight
id :: Int
[...]
end
mutable struct Peasant
id :: Int
[...]
end
const Unit = Union{Knight,Peasant}
id(u::Unit) = u.id
[...]
I understand some of the conceptual differences between these two approaches, and think the first one is more expandable. However, I have a lot of doubts in terms of performance. For instance, how bad would it be creating arrays of the AbstractUnit vs arrays of the union type in terms of memory allocation at runtime?
Thanks!

Absolutely use AbstractUnit instead of Unit in your methods' argument type annotations. Unions are abstract types too, actually, but as you pointed out, you can't add new types to it. In either case, the method is compiled to a specialization for each concrete type like Knight or Peasant, so your methods' performance won't be different.
As for Arrays' element type parameters, there is the isbits Union optimization, but as the name suggests it only works if all the types in your Union are isbitstypes (no pointers, immutable). Your structs are mutable so that's already not applicable. See, memory access is faster when instances are directly stored in Arrays, and the element type parameter (T in Vector{T}) must be a concrete isbitstype to allow this. When the element type parameter is abstract or mutable, generally Arrays only directly store pointers to the actual instances because multiple or mutable concrete types can have unknown and varying memory size. If the abstract type is an isbits Union though, instances could be stored directly in the Array: enough memory is allocated per element to contain the largest concrete type in the Union as well as a tag byte for each element specifying its type. A byte only has 256 values, so presumably this only works for Unions of at most 256 concrete types.
Another possible optimization by using a Vector{Unit} over Vector{AbstractUnit} is Union-splitting a type instability. I'm really not going to be able to make an example method that explains it better than the linked blog, so I'll just give the short version. When Julia's compiler fails to infer the type of a variable in your method at all (::Any annotations in #code_warntype), inner method calls involving the variable must do type checks and dispatch (selecting the specialization) at run-time, which can cost significant time. However, if Julia's compiler can infer the variable to be a Union of a few concrete types (in practice, at most 4), conditional branches for each type can be used to eliminate most type checks and do dispatch at compile-time. Vector{AbstractUnit} can contain any number of types <: AbstractUnit, so the compiler cannot use Union-splitting. Vector{Unit} however lets the compiler know the elements must be either Knight or Peasant, which allows Union-splitting.
P.S. This is often a source of confusion for beginners, but while Unit is an abstract type, Vector{Unit} is a concrete type, just with an abstract type parameter. After all, it can have instances, all Arrays directly containing pointers to Knight or Peasant instances.

Vectorize object oriented implementation in MATLAB

I'm trying to optimize a given object oriented code in matlab. It is an economical model and consists of a Market and Agents. The time consuming part is to update certain attributes of all Agents during each timestep which is implemented in a for loop.
However, I fail to vectorize the object oriented code.
Here is an example (Note, the second thing that slows down the code so far is the fact, that new entries are attached to the end of the vector. I'm aware of that and will fix that also):
for i=1:length(obj.traders)
obj.traders(i).update(obj.Price,obj.Sentiment(end),obj.h);
end
Where update looks like
function obj=update(obj,price,s,h)
obj.pos(end+1)=obj.p;
obj.wealth(end+1)=obj.w(1,1,1);
obj.g(end+1)=s;
obj.price=price;
obj.Update_pos(sentiment,h);
if (obj.c)
obj.Switch_Pos;
end
...
My first idea was to try something like
obj.traders(:).update(obj.Price,obj.Sentiment(end),obj.h);
Which didn't work. If someone has any suggestions how to vectorize this code, while keeping the object oriented implementation, I would be very happy.

I cannot provide a complete solution as this depends on the details of your implementation, but here are some tips which you could use to improve your code:
Remembering that a MATLAB object generally behaves like a struct, assignment of a constant value to a field can be done using [obj.field] =deal(val); e.g.:
[obj.trader.price] = deal(obj.Price);
This can also be extended to non-constant RHS, using cell, like so:
[aStruct.(fieldNamesCell{idx})] = deal(valueCell{:}); %// or deal(numericVector(:));
To improve the update function, I would suggest making several lines where you create the RHS vectors\cells followed by "simultaneous" assignment to all relevant fields of the objects in the array.
Other than that consider:
setfield: s = setfield(s,{sIndx1,...,sIndxM},'field',{fIndx1,...,fIndxN},value);
structfun:
s = structfun(#(x)x(1:3), s, 'UniformOutput', false, 'ErrorHandler', #errfn);
"A loop-based solution can be flexible and easily readable".
P.S.
On a side note, I'd suggest you name the obj in your functions according to the class name, which would make it more readable to others, i.e.:
function obj=update(obj,price,s,h) => function traderObj=update(traderObj,price,s,h)

Performance of thenComparing vs thenComparingInt - which to use?

I have a question, if I'm comparing ints, is there a performance difference in calling thenComparingInt(My::intMethod) vs thenComparing(My::intMethod), in other words, if I'm comparing differemt types, both reference and primitive, e.g. String, int, etc. Part of me just wants to say comparing().thenComparing().thenComparing() etc, but should I do comparing.thenComparing().thenComparingInt() if the 3rd call was comparing an int or Integer value?
I am assuming that comparing() and thenComparing() use the compareTo method to compare any given type behind the scenes or possibly for ints, the Integer.compare? I'm also assuming the answer to my original question may involve performance in that thenComparingInt would know an int is being passed in, whereas, thenComparing would have to autobox int to Integer then maybe cast to Object?
Also, another question whilst I think of it - is there a way of chaining method references, e.g. Song::getArtist::length where getArtist returns a string? Reason is I wanted to do something like this:
songlist.sort(
Comparator.comparing((Song s) -> s.getArtist().length()));
songlist.sort(
Comparator.comparing(Song::getArtist,
Comparator.comparingInt(String::length)));
songlist.sort(
Comparator.comparing(Song::getArtist, String::length));
Of the 3 examples, the top two compile but the bottom seems to throw a compilation error in Eclipse, I would have thought the 2nd argument of String::length was valid? But maybe not as it's expecting a Comparator not a function?

Question 1
I would think thenComparingInt(My::intMethod) might be better since it should avoid boxing, but you would have to try out both versions to see if it really makes a difference.
Question 2
songlist.sort(
Comparator.comparing(Song::getArtist, String::length));
Is invalid because the 2nd parameter should be a Comparator not a method that returns int.

Initialize member variables in a method and not the constructor

I have a public method which uses a variable (only in the scope of the public method) I pass as a parameter we will call A, this method calls a private method multiple times which also requires the parameter.
At present I am passing the parameter every time but it looks weird, is it bad practice to make this member variable of the class or would the uncertainty about whether it is initialized out way the advantages of not having to pass it?
Simplified pseudo code:
public_method(parameter a)
do something with a
private_method(string_a, a)
private_method(string_b, a)
private_method(string_c, a)
private_method(String, parameter a)
do something with String and a
Additional information: parameter a is a read only map with over 100 entries and in reality I will be calling private_method about 50 times

I had this same problem myself.
I implemented it differently in 3 different contexts to see hands-on what are result using 3 different strategies, see below.
Note that I am type of programmer that makes many changes to the code always trying to improve it. Thus I settle only for the code that is amenable to changes, readbale, would you call this "flexible" code. I settle only for very clear code.
After experimentation, I came to these results:
Passing a as parameter is perfectly OK if you have one or two - short number - of such values. Passing in parmeters has very good visibility, clarity, clear passing lines, well visible lifetime (initialization points, destruction points), amenable to changes, easy to track.
If number of such values begin to grow to >= 5-6 values, I swithc to approach #3 below.
Passing values through class members -- did not do good to clarity of my code, eventually I got rid of it. It makes for less clear code. Code becomes muddled. I did not like it. It had no advantages.
As alternative to (1) and (2), I adopted Inner class approach, in cases when amount of such values is > 5 (which makes for too long argument list).
I pack those values into small Inner class and pass such object by reference as argument to all internal members.
Public function of a class usually creates an object of Inner class (I call is Impl or Ctx or Args) and passes it down to private functions.
This combines clarity of arg passing with brevity. It's perfect.
Good luck
Edit
Consider preparing array of strings and using a loop rather than writing 50 almost-identical calls. Something like char *strings[] = {...} (C/C++).

This really depends on your use case. Does 'a' represent a state that your application/object care about? Then you might want to make it a member of your object. Evaluate the big picture, think about maintenance, extensibility when designing structures.

If your parameter a is a of a class of your own, you might consider making the private_method a public method for the variable a.
Otherwise, I do not think this looks weird. If you only need a in just 1 function, making it a private variable of your class would be silly (at least to me). However, if you'd need it like 20 times I would do so :P Or even better, just make 'a' an object of your own that has that certain function you need.

A method should ideally not pass more than 7 parameters. Using the number of parameters more than 6-7 usually indicates a problem with the design (do the 7 parameters represent an object of a nested class?).
As for your question, if you want to make the parameter private only for the sake of passing between private methods without the parameter having anything to do with the current state of the object (or some information about the object), then it is not recommended that you do so.
From a performance point of view (memory consumption), reference parameters can be passed around as method parameters without any significant impact on the memory consumption as they are passed by reference rather than by value (i.e. a copy of the data is not created). For small number of parameters that can be grouped together you can use a struct. For example, if the parameters represent x and y coordinates of a point, then pass them in a single Point structure.
Bottomline
Ask yourself this question, does the parameter that you are making as a members represent any information (data) about the object? (data can be state or unique identification information). If the answer to his question is a clear no, then do not include the parameter as a member of the class.
More information
Limit number of parameters per method?
Parameter passing in C#

How does Integer === 3 work?

So as I understand it, the === operator tests to see if the RHS object is a member of the LHS object. That makes sense. But how does this work in Ruby? I'm looking at the Ruby docs and I only see === defined in Object, I don't see it in Integer itself. Is it just not documented?

Integer is a class, which (at least in Ruby) means that it is just a boring old normal object like any other object, which just happens to be an instance of the Class class (instead of, say, Object or String or MyWhateverFoo).
Class in turn is a subclass of Module (although arguably it shouldn't be, because it violates the Liskov Substition Principle, but that is a discussion for another forum, and is also a dead horse that has already been beaten many many times). And in Module#=== you will find the definition you are looking for, which Class inherits from Module and instances of Class (like Integer) understand.
Module#=== is basically defined symmetric to Object#kind_of?, it returns true if its argument is an instance of itself. So, 3 is an instance of Integer, therefore Integer === 3 returns true, just as 3.kind_of?(Integer) would.
So as I understand it, the === operator tests to see if the RHS object is a member of the LHS object.
Not necessarily. === is a method, just like any other method. It does whatever I want it to do. And in some cases the "is member of" analogy breaks down. In this case it is already pretty hard to swallow. If you are a hardcore type theory freak, then viewing a type as a set and instances of that type as members of a set is totally natural. And of course for Array and Hash the definition of "member" is also obvious.
But what about Regexp? Again, if you are formal languages buff and know your Chomsky backwards, then interpreting a Regexp as an infinite set of words and Strings as members of that set feels completely natural, but if not, then it sounds kind of weird.
So far, I have failed to come up with a concise description of precisely what === means. In fact, I haven't even come up with a good name for it. It is usually called the triple equals operator, threequals operator or case equality operator, but I strongly dislike those names, because it has absolutely nothing to do with equality.
So, what does it do? The best I have come up with is: imagine you are making a table, and one of the column headers is Integer. Would it make sense to write 3 in that column? If one of the column headers is /ab*a/, would it make sense to write 'abbbba' in that column?
Based on that definition, it could be called the subsumption operator, but that's even worse than the other examples ...

It's defined on Module, which Class is a subclass of, which Integer is an instance of.
In other words, when you run Integer === 3, you're calling '===' (with the parameter 3) on the object referred to to by the constant Integer, which is an instance of the class named Class. Since Class is a subclass of Module and doesn't define its own ===, you get the implementation of === defined on Module.
See the API docs for Module for more information.

Umm, Integer is a subclass of Object.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio