Assume I have implemented a Vector class. In C++ it is possible to do "scaling" in natural math expressions by overloading operator* at global scope:
template <typename T> // T can be int, double, complex<>, etc.
Vector operator*(const T& t, const Vector& v);
template <typename T> // T can be int, double, complex<>, etc.
Vector operator*(const Vector& v, const T& t);
However, when it goes to Ruby, since parameters are not typed, it would be possible to write
class Vector
def *(another)
case another
when Vector then ...
when Numeric then ...
end
end
end
This allows Vector * Numeric, but not Numeric * Vector. Is there a way of solve it?
[Using Numeric rather than Numerical in my reply.]
The most general way to do this is to add a coerce method to Vector. When Ruby encounters 5 * your_vector, the call to 5.*(your_vector) fails, it will then call your_vector.coerce(5). Your coerce method will pass back two items and the * method will be retried on those items.
Conceptually, something like this happens after the 5.*(your_vector) failure:
first, second = your_vector.coerce(5)
first.*(second)
The most simple approach is to pass back your_vector as the first item and 5 as the second.
def coerce(other)
case other
when Numeric
return self, other
else
raise TypeError, "#{self.class} can't be coerced into #{other.class}"
end
end
That works for commutative operations, but not so well for non-commutative operations. If you have a simple, self-contained program that only needs * to work, you could get away with it. If you're developing a library or need something more generic, and it makes sense to transform 5 into a Vector, you can do that in coerce:
def coerce(other)
case other
when Numeric
return Vector.new(other), self
else
raise TypeError, "#{self.class} can't be coerced into #{other.class}"
end
end
This is a much more robust solution, if it makes semantic sense. If it doesn't make semantic sense, you can create an intermediate type that you can transform Numeric into that does know how to multiply with Vector. This the approach that Matrix takes.
As a last resort, you can pull out the big guns and use alias_method to redefine * on Numeric to handle Vector. I'm not going to add the code for this approach, since doing it wrong will lead to disaster, and I haven't thought though any edge cases involved.
Related
When casting a vector integers (i.e. Eigen::VectorXi) to a vector of doubles, and then operating on that vector of doubles, the generated assembly is dramatically different if the return type of the cast is auto.
In other words, using:
Eigen::VectorXi int_vec(3);
int_vec << 1, 2, 3;
Eigen::VectorXd dbl_vec = int_vec.cast<double>();
Compared to:
Eigen::VectorXi int_vec(3);
int_vec << 1, 2, 3;
auto dbl_vec = int_vec.cast<double>();
Here are two examples on godbolt:
VectorXd return type: https://godbolt.org/z/0FLC4r
auto return type: https://godbolt.org/z/MGxCaL
What are the ramifications of using auto for the return here? I thought it would be more efficient by avoiding a copy, but now I'm not sure.
Indeed, in your code in the question you avoid a copy (indeed, until dbl_vec is used, it's essentially a noop). However, in the code on godbolt, you traverse the original int_vec and evaluate dbl_vec at least twice, possibly thrice:
max + std::log((dbl_vec.array() - max)
^^^ ^^^^^^^ ^^^
I'm not sure if the two calls to max are collapsed into a temporary or not. I'd hope so.
In any case, kmdreko is right and you should avoid using auto with Eigen unless you know exactly what you're doing. In this case, the auto is an expression template that does not get evaluated until used. If you use it more than once, then it gets evaluated more than once. If the evaluation is expensive, then the savings from not using a copy are lost (with interest) to the additional evaluation times.
I have two concrete types called CreationOperator and AnnihilationOperator and I want to define a new concrete type that represents a string of operators and a real coefficient that multiplies it.
I found natural to define an abstract type FermionicOperator, from which both CreationOperator and AnnihilationOperator inherit, i.e.
abstract type FermionicOperator end
struct CreationOperator <: FermionicOperator
...
end
struct AnnihilationOperator <: FermionicOperator
...
end
because I can define many functions with signatures of the type function(op1::FermionicOperator, op2::FermionicOperator) = ..., such as arithmetic operations (I am building an algebra system, so I have to define operations such as *, +, etc. on the operators).
Then I would go on and define a concrete type OperatorString
struct OperatorString
coef::Float64
ops::Vector{FermionicOperator}
end
However, according to the Julia manual, I believe that OperatorString is not ideal for performance, because the compiler does not know anything about FermionicOperator and thus functions involving OperatorString will be inefficient (and I will have many functions manipulating strings of operators).
I found the following solution, however I am not sure about its implication and if it really makes a difference.
Instead of defining FermionicOperator as an abstract type, I define it is as the Union of CreationOperator and AnnihilationOperator, i.e.
struct CreationOperator
...
end
struct AnnihilationOperator
...
end
FermionicOperator = Union{CreationOperator,AnnihilationOperator}
This would still allow functions of the form function(op1::FermionicOperator, op2::FermionicOperator) = ..., but at the same time, to my understanding, Union{CreationOperator,AnnihilationOperator} is a concrete type, such that OperatorString is well-defined and the compilers can optimize things if it's the case.
I am particularly in doubt because I also considered to use the built-in Expr struct to define my string of operators (actually it would be more general), whose field args is a vector with abstract-type elements: very similar to my first design attempt. However, while implementing arithmetic operations on Expr I had the feeling I was doing something "wrong" and I was better off defining my own types.
If your ops field is a vector that, in any given instance, is either all CreationOperators or all AnnihilationOperators, then the recommended solution is to use a parameterized struct.
abstract type FermionicOperator end
struct CreationOperator <: FermionicOperator
...
end
struct AnnihilationOperator <: FermionicOperator
...
end
struct OperatorString{T<:FermionicOperator}
coef::Float64
ops::Vector{T}
function OperatorString(coef::Float64, ops::Vector{T}) where {T<:FermionicOperator}
return new{T}(coef, ops)
end
end
If your ops field is a vector that, in any given instance, may be a mixture of CreationOperators and AnnihilationOperators, then you can use a Union. Because the union is small (2 types), your code will remain performant.
struct CreationOperator
value::Int
end
struct AnnihilationOperator
value::Int
end
const Fermionic = Union{CreationOperator, AnnihilationOperator}
struct OperatorString
coef::Float64
ops::Vector{Fermionic}
function OperatorString(coef::Float64, ops::Vector{Fermionic})
return new(coef, ops)
end
end
Although not shown, even with the Union approach, you may want to use the abstract type also -- just for future simplicity and flexibility in function dispatch. It is helpful in developing robust multidispatch-driven logic.
I made a two element Vector struct and I want to overload the + operator.
I made all my functions and methods take references, rather than values, and I want the + operator to work the same way.
impl Add for Vector {
fn add(&self, other: &Vector) -> Vector {
Vector {
x: self.x + other.x,
y: self.y + other.y,
}
}
}
Depending on which variation I try, I either get lifetime problems or type mismatches. Specifically, the &self argument seems to not get treated as the right type.
I have seen examples with template arguments on impl as well as Add, but they just result in different errors.
I found How can an operator be overloaded for different RHS types and return values? but the code in the answer doesn't work even if I put a use std::ops::Mul; at the top.
I am using rustc 1.0.0-nightly (ed530d7a3 2015-01-16 22:41:16 +0000)
I won't accept "you only have two fields, why use a reference" as an answer; what if I wanted a 100 element struct? I will accept an answer that demonstrates that even with a large struct I should be passing by value, if that is the case (I don't think it is, though.) I am interested in knowing a good rule of thumb for struct size and passing by value vs struct, but that is not the current question.
You need to implement Add on &Vector rather than on Vector.
impl<'a, 'b> Add<&'b Vector> for &'a Vector {
type Output = Vector;
fn add(self, other: &'b Vector) -> Vector {
Vector {
x: self.x + other.x,
y: self.y + other.y,
}
}
}
In its definition, Add::add always takes self by value. But references are types like any other1, so they can implement traits too. When a trait is implemented on a reference type, the type of self is a reference; the reference is passed by value. Normally, passing by value in Rust implies transferring ownership, but when references are passed by value, they're simply copied (or reborrowed/moved if it's a mutable reference), and that doesn't transfer ownership of the referent (because a reference doesn't own its referent in the first place). Considering all this, it makes sense for Add::add (and many other operators) to take self by value: if you need to take ownership of the operands, you can implement Add on structs/enums directly, and if you don't, you can implement Add on references.
Here, self is of type &'a Vector, because that's the type we're implementing Add on.
Note that I also specified the RHS type parameter with a different lifetime to emphasize the fact that the lifetimes of the two input parameters are unrelated.
1 Actually, reference types are special in that you can implement traits for references to types defined in your crate (i.e. if you're allowed to implement a trait for T, then you're also allowed to implement it for &T). &mut T and Box<T> have the same behavior, but that's not true in general for U<T> where U is not defined in the same crate.
If you want to support all scenarios, you must support all the combinations:
&T op U
T op &U
&T op &U
T op U
In rust proper, this was done through an internal macro.
Luckily, there is a rust crate, impl_ops, that also offers a macro to write that boilerplate for us: the crate offers the impl_op_ex! macro, which generates all the combinations.
Here is their sample:
#[macro_use] extern crate impl_ops;
use std::ops;
impl_op_ex!(+ |a: &DonkeyKong, b: &DonkeyKong| -> i32 { a.bananas + b.bananas });
fn main() {
let total_bananas = &DonkeyKong::new(2) + &DonkeyKong::new(4);
assert_eq!(6, total_bananas);
let total_bananas = &DonkeyKong::new(2) + DonkeyKong::new(4);
assert_eq!(6, total_bananas);
let total_bananas = DonkeyKong::new(2) + &DonkeyKong::new(4);
assert_eq!(6, total_bananas);
let total_bananas = DonkeyKong::new(2) + DonkeyKong::new(4);
assert_eq!(6, total_bananas);
}
Even better, they have a impl_op_ex_commutative! that'll also generate the operators with the parameters reversed if your operator happens to be commutative.
I've searched for the use of #specialized in the source code of the standard library of Scala 2.8.1. It looks like only a handful of traits and classes use this annotation: Function0, Function1, Function2, Tuple1, Tuple2, Product1, Product2, AbstractFunction0, AbstractFunction1, AbstractFunction2.
None of the collection classes are #specialized. Why not? Would this generate too many classes?
This means that using collection classes with primitive types is very inefficient, because there will be a lot of unnecessary boxing and unboxing going on.
What's the most efficient way to have an immutable list or sequence (with IndexedSeq characteristics) of Ints, avoiding boxing and unboxing?
Specialization has a high cost on the size of classes, so it must be added with careful consideration. In the particular case of collections, I imagine the impact will be huge.
Still, it is an on-going effort -- Scala library has barely started to be specialized.
Specialized can be expensive ( exponential ) in both size of classes and compile time. Its not just the size like the accepted answer says.
Open your scala REPL and type this.
import scala.{specialized => sp}
trait S1[#sp A, #sp B, #sp C, #sp D] { def f(p1:A): Unit }
Sorry :-). Its like a compiler bomb.
Now, lets take a simple trait
trait Foo[Int]{ }
The above will result in two compiled classes. Foo, the pure interface and Foo$1, the class implementation.
Now,
trait Foo[#specialized A] { }
A specialized template parameter here gets expanded/rewritten for 9 different primitive types ( void, boolean, byte, char, int, long, short, double, float ). So, basically you end up with 20 classes instead of 2.
Going back to the trait with 5 specialized template parameters, the classes get generated for every combination of possible primitive types. i.e its exponential in complexity.
2 * 10 ^ (no of specialized parameters)
If you are defining a class for a specific primitive type, you should be more explicit about it such as
trait Foo[#specialized(Int) A, #specialized(Int,Double) B] { }
Understandably one has to be frugal using specialized when building general purpose libraries.
Here is Paul Phillips ranting about it.
Partial answer to my own question: I can wrap an array in an IndexedSeq like this:
import scala.collection.immutable.IndexedSeq
def arrayToIndexedSeq[#specialized(Int) T](array: Array[T]): IndexedSeq[T] = new IndexedSeq[T] {
def apply(idx: Int): T = array(idx)
def length: Int = array.length
}
(Ofcourse you could still modify the contents if you have access to the underlying array, but I would make sure that the array isn't passed to other parts of my program).
first of all, I apologize for the overly verbose question. I couldn't think of any other way to accurately summarize my problem... Now on to the actual question:
I'm currently experimenting with C++0x rvalue references... The following code produces unwanted behavior:
#include <iostream>
#include <utility>
struct Vector4
{
float x, y, z, w;
inline Vector4 operator + (const Vector4& other) const
{
Vector4 r;
std::cout << "constructing new temporary to store result"
<< std::endl;
r.x = x + other.x;
r.y = y + other.y;
r.z = z + other.z;
r.w = w + other.w;
return r;
}
Vector4&& operator + (Vector4&& other) const
{
std::cout << "reusing temporary 2nd operand to store result"
<< std::endl;
other.x += x;
other.y += y;
other.z += z;
other.w += w;
return std::move(other);
}
friend inline Vector4&& operator + (Vector4&& v1, const Vector4& v2)
{
std::cout << "reusing temporary 1st operand to store result"
<< std::endl;
v1.x += v2.x;
v1.y += v2.y;
v1.z += v2.z;
v1.w += v2.w;
return std::move(v1);
}
};
int main (void)
{
Vector4 r,
v1 = {1.0f, 1.0f, 1.0f, 1.0f},
v2 = {2.0f, 2.0f, 2.0f, 2.0f},
v3 = {3.0f, 3.0f, 3.0f, 3.0f},
v4 = {4.0f, 4.0f, 4.0f, 4.0f},
v5 = {5.0f, 5.0f, 5.0f, 5.0f};
///////////////////////////
// RELEVANT LINE HERE!!! //
///////////////////////////
r = v1 + v2 + (v3 + v4) + v5;
return 0;
}
results in the output
constructing new temporary to store result
constructing new temporary to store result
reusing temporary 1st operand to store result
reusing temporary 1st operand to store result
while I had hoped for something like
constructing new temporary to store result
reusing temporary 1st operand to store result
reusing temporary 2nd operand to store result
reusing temporary 2nd operand to store result
After trying to re-enact what the compiler was doing (I'm using MinGW G++ 4.5.2 with option -std=c++0x in case it matters), it actually seems quite logical. The standard says that arithmetic operations of equal precedence are evaluated/grouped left-to-right (why I assumed right-to-left I don't know, I guess it's more intuitive to me). So what happened here is that the compiler evaluated the sub-expression (v3 + v4) first (since it's in parentheses?), and then began matching the operations in the expression left-to-right against the operator overloads, resulting in a call to Vector4 operator + (const Vector4& other) for the sub-expression v1 + v2. If I want to avoid the unnecessary temporary, I'd have to make sure that no more than one lvalue operand appears to the immediate left of any parenthesized sub-expression, which is counter-intuitive to anyone using this "library" and innocently expecting optimal performance (as in minimizing the creation of temporaries).
(I'm aware that there's ambiguity in my code regarding operator + (Vector4&& v1, const Vector4& v2) and operator + (Vector4&& other) when (v3 + v4) is to be added to the result of v1 + v2, resulting in a warning. But it's harmless in my case and I don't want to add yet another overload for two rvalue reference operands - anyone know if there's a way to disable this warning in gcc?)
Long story short, my question boils down to: Is there any way or pattern (preferably compiler-independent) this vector class could be rewritten to enable arbitrary use of parentheses in expressions that still results in the "optimal" choice of operator overloads (optimal in terms of "performance", i.e. maximizing the binding to rvalue references)? Perhaps I'm asking for too much though and it's impossible... if so, then that's fine too. I just want to make sure I'm not missing anything.
Many thanks in advance
Addendum
First thanks to the quick responses I got, within minutes (!) - I really should have started posting here sooner...
It's becoming tedious replying in the comments, so I think a clarification of my intent with this class design is in order. Maybe you can point me to a fundamental conceptual flaw in my thought process if there is one.
You may notice that I don't hold any resources in the class like heap memory. Its members are only scalar types even. At first sight this makes it a suspect candidate for move-semantics based optimizations (see also this question that actually helped me a great deal grasping the concepts behind rvalue references).
However, since the classes this one is supposed to be a prototype for will be used in a performance-critical context (a 3D engine to be precise), I want to optimize every little thing possible. Low-complexity algorithms and maths-related techniques like look-up tables should of course make up the bulk of the optimizations as anything else would simply be addressing the symptoms and not eradicating the real reason for bad performance. I am well aware of that.
With that out of the way, my intent here is to optimize algebraic expressions with vectors and matrices that are essentially plain-old-data structs without pointers to data in them (mainly due to the performance drawbacks you get with data on the heap [having to dereference additional pointers, cache considerations etc.]).
I don't care about move-assignment or construction, I just don't want more temporaries being created during the evaluation of a complicated algebraic expression than absolutely necessary (usually just one or two, e.g. a matrix and a vector).
Those are my thoughts that might be erroneous. If they are, please correct me:
To achieve this without relying on RVO, return-by-reference is necessary (again: keep in mind I don't have remote resources, only scalar data members).
Returning by reference makes the function-call expression an lvalue, implying the returned object is not a temporary, which is bad, but returning by rvalue reference makes the function-call expression an xvalue (see 3.10.1), which is okay in the context of my approach (see 4)
Returning by reference is dangerous, because of the possibly short lifetime of objects, but:
temporaries are guaranteed to live until the end of the evaluation of the expression they were created in, therefore:
making it safe to return by reference from those operators that take at least one rvalue-reference as their argument, if the object referenced by this rvalue reference argument is the one being returned by reference. Therefore:
Any arbitrary expression that only employs binary operators can be evaluated by creating only one temporary when not more than one PoD-like type is involved, and the binary operations don't require a temporary by nature (like matrix multiplication)
(Another reason to return by rvalue-reference is because it behaves like returning by value in terms of rvalue-ness of the function-call expression; and it's required for the operator/function-call expression to be an rvalue in order to bind to subsequent calls to operators that take rvalue references. As stated in (2), calls to functions that return by reference are lvalues, and would therefore bind to operators with the signature T operator+(const T&, const T&), resulting in the creation of an unnecessary temporary)
I could achieve the desired performance by using a C-style approach of functions like add(Vector4 *result, Vector4 *v1, Vector4 *v2), but come on, we're living in the 21st century...
In summary, my goal is creating a vector class that achieves the same performance as the C-approach using overloaded operators. If that in itself is impossible, than I guess it can't be helped. But I'd appreciate if someone could explain to me why my approach is doomed to fail (the left-to-right operator evaluation issue that was the initial reason for my post aside, of course).
As a matter of fact, I've been using the "real" vector class this one is a simplification of for a while without any crashes or corrupted memory so far. And in fact, I never actually return local objects as references, so there shouldn't be any problems. I dare say what I'm doing is standard-compliant.
Any help on the original issue would of course be appreciated as well!
many thanks for all the patience again
You should not return an rvalue reference, you should return a value. In addition, you should not specify both a member and a free operator+. I'm amazed that even compiled.
Edit:
r = v1 + v2 + (v3 + v4) + v5;
How could you possibly only have one temporary value when you're performing two sub-computations? That's just impossible. You can't re-write the Standard and change this.
You will just have to trust your users to do something not completely stupid, like write the above line of code, and expect to have just one temporary.
I recommend modeling your code after the basic_string operator+() found in chapter 21 of N3225.