Why are so few things #specialized in Scala's standard library? - performance

I've searched for the use of #specialized in the source code of the standard library of Scala 2.8.1. It looks like only a handful of traits and classes use this annotation: Function0, Function1, Function2, Tuple1, Tuple2, Product1, Product2, AbstractFunction0, AbstractFunction1, AbstractFunction2.
None of the collection classes are #specialized. Why not? Would this generate too many classes?
This means that using collection classes with primitive types is very inefficient, because there will be a lot of unnecessary boxing and unboxing going on.
What's the most efficient way to have an immutable list or sequence (with IndexedSeq characteristics) of Ints, avoiding boxing and unboxing?

Specialization has a high cost on the size of classes, so it must be added with careful consideration. In the particular case of collections, I imagine the impact will be huge.
Still, it is an on-going effort -- Scala library has barely started to be specialized.

Specialized can be expensive ( exponential ) in both size of classes and compile time. Its not just the size like the accepted answer says.
Open your scala REPL and type this.
import scala.{specialized => sp}
trait S1[#sp A, #sp B, #sp C, #sp D] { def f(p1:A): Unit }
Sorry :-). Its like a compiler bomb.
Now, lets take a simple trait
trait Foo[Int]{ }
The above will result in two compiled classes. Foo, the pure interface and Foo$1, the class implementation.
Now,
trait Foo[#specialized A] { }
A specialized template parameter here gets expanded/rewritten for 9 different primitive types ( void, boolean, byte, char, int, long, short, double, float ). So, basically you end up with 20 classes instead of 2.
Going back to the trait with 5 specialized template parameters, the classes get generated for every combination of possible primitive types. i.e its exponential in complexity.
2 * 10 ^ (no of specialized parameters)
If you are defining a class for a specific primitive type, you should be more explicit about it such as
trait Foo[#specialized(Int) A, #specialized(Int,Double) B] { }
Understandably one has to be frugal using specialized when building general purpose libraries.
Here is Paul Phillips ranting about it.

Partial answer to my own question: I can wrap an array in an IndexedSeq like this:
import scala.collection.immutable.IndexedSeq
def arrayToIndexedSeq[#specialized(Int) T](array: Array[T]): IndexedSeq[T] = new IndexedSeq[T] {
def apply(idx: Int): T = array(idx)
def length: Int = array.length
}
(Ofcourse you could still modify the contents if you have access to the underlying array, but I would make sure that the array isn't passed to other parts of my program).

Related

Is there a way to map an array of objects in golang?

Coming from Nodejs, I could do something like:
// given an array `list` of objects with a field `fruit`:
fruits = list.map(el => el.fruit) # which will return an array of fruit strings
Any way to do that in an elegant one liner in golang?
I know I can do it with a range loop, but I am looking for the possibility of a one liner solution
In Go, arrays are inflexible (because their length is encoded in their type) and costly to pass to functions (because a function operates on copies of its array arguments). I'm assuming you'd like to operate on slices rather than on arrays.
Because methods cannot take additional type arguments, you cannot simply declare a generic Map method in Go. However, you can define Map as a generic top-level function:
func Map[T, U any](ts []T, f func(T) U) []U {
us := make([]U, len(ts))
for i := range ts {
us[i] = f(ts[i])
}
return us
}
Then you can write the following code,
names := []string{"Alice", "Bob", "Carol"}
fmt.Println(Map(names, utf8.RuneCountInString))
which prints [5 3 5] to stdout (try it out in this Playground).
Go 1.18 saw the addition of a golang.org/x/exp/slices package, which provides many convenient operations on slices, but a Map function is noticeably absent from it. The omission of that function was the result of a long discussion in the GitHub issue dedicated to the golang.org/x/exp/slices proposal; concerns included the following:
hidden cost (O(n)) of operations behind a one-liner
uncertainty about error handling inside Map
risk of encouraging a style that strays too far from Go's traditional style
Russ Cox ultimately elected to drop Map from the proposal because it's
probably better as part of a more comprehensive streams API somewhere else.

Struct with abstract-type field or with concrete-type Union in Julia?

I have two concrete types called CreationOperator and AnnihilationOperator and I want to define a new concrete type that represents a string of operators and a real coefficient that multiplies it.
I found natural to define an abstract type FermionicOperator, from which both CreationOperator and AnnihilationOperator inherit, i.e.
abstract type FermionicOperator end
struct CreationOperator <: FermionicOperator
...
end
struct AnnihilationOperator <: FermionicOperator
...
end
because I can define many functions with signatures of the type function(op1::FermionicOperator, op2::FermionicOperator) = ..., such as arithmetic operations (I am building an algebra system, so I have to define operations such as *, +, etc. on the operators).
Then I would go on and define a concrete type OperatorString
struct OperatorString
coef::Float64
ops::Vector{FermionicOperator}
end
However, according to the Julia manual, I believe that OperatorString is not ideal for performance, because the compiler does not know anything about FermionicOperator and thus functions involving OperatorString will be inefficient (and I will have many functions manipulating strings of operators).
I found the following solution, however I am not sure about its implication and if it really makes a difference.
Instead of defining FermionicOperator as an abstract type, I define it is as the Union of CreationOperator and AnnihilationOperator, i.e.
struct CreationOperator
...
end
struct AnnihilationOperator
...
end
FermionicOperator = Union{CreationOperator,AnnihilationOperator}
This would still allow functions of the form function(op1::FermionicOperator, op2::FermionicOperator) = ..., but at the same time, to my understanding, Union{CreationOperator,AnnihilationOperator} is a concrete type, such that OperatorString is well-defined and the compilers can optimize things if it's the case.
I am particularly in doubt because I also considered to use the built-in Expr struct to define my string of operators (actually it would be more general), whose field args is a vector with abstract-type elements: very similar to my first design attempt. However, while implementing arithmetic operations on Expr I had the feeling I was doing something "wrong" and I was better off defining my own types.
If your ops field is a vector that, in any given instance, is either all CreationOperators or all AnnihilationOperators, then the recommended solution is to use a parameterized struct.
abstract type FermionicOperator end
struct CreationOperator <: FermionicOperator
...
end
struct AnnihilationOperator <: FermionicOperator
...
end
struct OperatorString{T<:FermionicOperator}
coef::Float64
ops::Vector{T}
function OperatorString(coef::Float64, ops::Vector{T}) where {T<:FermionicOperator}
return new{T}(coef, ops)
end
end
If your ops field is a vector that, in any given instance, may be a mixture of CreationOperators and AnnihilationOperators, then you can use a Union. Because the union is small (2 types), your code will remain performant.
struct CreationOperator
value::Int
end
struct AnnihilationOperator
value::Int
end
const Fermionic = Union{CreationOperator, AnnihilationOperator}
struct OperatorString
coef::Float64
ops::Vector{Fermionic}
function OperatorString(coef::Float64, ops::Vector{Fermionic})
return new(coef, ops)
end
end
Although not shown, even with the Union approach, you may want to use the abstract type also -- just for future simplicity and flexibility in function dispatch. It is helpful in developing robust multidispatch-driven logic.

Why is it that traits for operator overloading require ownership of self? [duplicate]

I made a two element Vector struct and I want to overload the + operator.
I made all my functions and methods take references, rather than values, and I want the + operator to work the same way.
impl Add for Vector {
fn add(&self, other: &Vector) -> Vector {
Vector {
x: self.x + other.x,
y: self.y + other.y,
}
}
}
Depending on which variation I try, I either get lifetime problems or type mismatches. Specifically, the &self argument seems to not get treated as the right type.
I have seen examples with template arguments on impl as well as Add, but they just result in different errors.
I found How can an operator be overloaded for different RHS types and return values? but the code in the answer doesn't work even if I put a use std::ops::Mul; at the top.
I am using rustc 1.0.0-nightly (ed530d7a3 2015-01-16 22:41:16 +0000)
I won't accept "you only have two fields, why use a reference" as an answer; what if I wanted a 100 element struct? I will accept an answer that demonstrates that even with a large struct I should be passing by value, if that is the case (I don't think it is, though.) I am interested in knowing a good rule of thumb for struct size and passing by value vs struct, but that is not the current question.
You need to implement Add on &Vector rather than on Vector.
impl<'a, 'b> Add<&'b Vector> for &'a Vector {
type Output = Vector;
fn add(self, other: &'b Vector) -> Vector {
Vector {
x: self.x + other.x,
y: self.y + other.y,
}
}
}
In its definition, Add::add always takes self by value. But references are types like any other1, so they can implement traits too. When a trait is implemented on a reference type, the type of self is a reference; the reference is passed by value. Normally, passing by value in Rust implies transferring ownership, but when references are passed by value, they're simply copied (or reborrowed/moved if it's a mutable reference), and that doesn't transfer ownership of the referent (because a reference doesn't own its referent in the first place). Considering all this, it makes sense for Add::add (and many other operators) to take self by value: if you need to take ownership of the operands, you can implement Add on structs/enums directly, and if you don't, you can implement Add on references.
Here, self is of type &'a Vector, because that's the type we're implementing Add on.
Note that I also specified the RHS type parameter with a different lifetime to emphasize the fact that the lifetimes of the two input parameters are unrelated.
1 Actually, reference types are special in that you can implement traits for references to types defined in your crate (i.e. if you're allowed to implement a trait for T, then you're also allowed to implement it for &T). &mut T and Box<T> have the same behavior, but that's not true in general for U<T> where U is not defined in the same crate.
If you want to support all scenarios, you must support all the combinations:
&T op U
T op &U
&T op &U
T op U
In rust proper, this was done through an internal macro.
Luckily, there is a rust crate, impl_ops, that also offers a macro to write that boilerplate for us: the crate offers the impl_op_ex! macro, which generates all the combinations.
Here is their sample:
#[macro_use] extern crate impl_ops;
use std::ops;
impl_op_ex!(+ |a: &DonkeyKong, b: &DonkeyKong| -> i32 { a.bananas + b.bananas });
fn main() {
let total_bananas = &DonkeyKong::new(2) + &DonkeyKong::new(4);
assert_eq!(6, total_bananas);
let total_bananas = &DonkeyKong::new(2) + DonkeyKong::new(4);
assert_eq!(6, total_bananas);
let total_bananas = DonkeyKong::new(2) + &DonkeyKong::new(4);
assert_eq!(6, total_bananas);
let total_bananas = DonkeyKong::new(2) + DonkeyKong::new(4);
assert_eq!(6, total_bananas);
}
Even better, they have a impl_op_ex_commutative! that'll also generate the operators with the parameters reversed if your operator happens to be commutative.

fast copying object content in scala

I have a class with few Int and Double fields. What is the fastes way to copy all data from one object to another?
class IntFields {
private val data : Array[Int] = Array(0,0)
def first : Int = data(0)
def first_= (value: Int) = data(0) = value
def second : Int = data(1)
def second_= (value : Int) = data(1) = value
def copyFrom(another : IntFields) =
Array.copy(another.data,0,data,0,2)
}
This is the way I may suggest. But I doubt it is really effective, since I have no clear understanding scala's internals
update1:
In fact I'm searching for scala's equivalent of c++ memcpy. I need just take one simple object and copy it contents byte by byte.
Array copying is just a hack, I've googled for normal scala supported method and find none.
update2:
I've tried to microbenchmark two holders: simple case class with 12 variables and one backed up with array. In all benchmarks (simple copying and complex calculations over collection) array-based solution works slower for about 7%.
So, I need other means for simulating memcpy.
Since both arrays used for Array.copy are arrays of primitive integers (i.e. it is not the case that one of the holds boxed integers, in which case a while loop with boxing/unboxing would have been used to copy the elements), it is equally effective as the Java System.arraycopy is. Which is to say - if this were a huge array, you would probably see the difference in performance compared to a while loop in which you copy the elements. Since the array only has 2 elements, it is probably more efficient to just do:
def copyFrom(another: IntFields) {
data(0) = another.data(0)
data(1) = another.data(1)
}
EDIT:
I'd say that the fastest thing is to just copy the fields one-by-one. If performance is really important, you should consider using Unsafe.getInt - some report it should be faster than using System.arraycopy for small blocks: https://stackoverflow.com/questions/5574241/interesting-uses-of-sun-misc-unsafe

Is Odersky serious with "bills !*&^%~ code!"?

In his book programming in scala (Chapter 5 Section 5.9 Pg 93)
Odersky mentioned this expression "bills !*&^%~ code!"
In the footnote on same page:
"By now you should be able to figure out that given this code,the Scala compiler would
invoke (bills.!*&^%~(code)).!()."
That's a bit to cryptic for me, could someone explain what's going on here?
What Odersky means to say is that it would be possible to have valid code looking like that. For instance, the code below:
class BadCode(whose: String, source: String) {
def ! = println(whose+", what the hell do you mean by '"+source+"'???")
}
class Programmer(who: String) {
def !*&^%~(source: String) = new BadCode(who, source)
}
val bills = new Programmer("Bill")
val code = "def !*&^%~(source: String) = new BadCode(who, source)"
bills !*&^%~ code!
Just copy&paste it on the REPL.
The period is optional for calling a method that takes a single parameter, or has an empty parameter list.
When this feature is utilized, the next chunk after the space following the method name is assumed to be the single parameter.
Therefore,
(bills.!*&^%~(code)).!().
is identical to
bills !*&^%~ code!
The second exclamation mark calls a method on the returned value from the first method call.
I'm not sure if the book provides method signatures but I assume it's just a comment on Scala's syntactic sugar so it assumes if you type:
bill add monkey
where there is an object bill which has a method add which takes a parameter then it automatically interprets it as:
bill.add(monkey)
Being a little Scala rusty, I'm not entirely sure how it splits code! into (code).!() except for a vague tickling of the grey cells that the ! operator is used to fire off an actor which in compiler terms might be interpretted as an implicit .!() method on the object.
The combination of the '.()' being optional with method calls (as Wysawyg explained above) and the ability to use (almost) whatever characters you like for naming methods, makes it possible to write methods in Scala that look like operator overloading. You can even invent your own operators.
For example, I have a program that deals with 3D computer graphics. I have my own class Vector for representing a 3D vector:
class Vector(val x: Double, val y: Double, val z: Double) {
def +(v: Vector) = new Vector(x + v.x, y + v.y, z + v.z)
// ...etc.
}
I've also defined a method ** (not shown above) to compute the cross product of two vectors. It's very convenient that you can create your own operators like that in Scala, not many other programming languages have this flexibility.

Resources