What does the 'where' clause within a trait do? - syntax

If I have this code:
trait Trait {
fn f(&self) -> i32 where Self: Sized;
fn g(&self) -> i32;
}
fn object_safety_dynamic(x: &Trait) {
x.f(); // error
x.g(); // works
}
What does the where clause actually do?
Naively, I was thinking where Self: Sized; dictates something about the type implementing Trait, like 'if you implement Trait for type A your type A must be sized, i.e., it can be i32 but not [i32].
However, such a constraint would rather go as trait Trait: Sized (correct me if I am wrong)?
Now I noticed where Self: Sized; actually determines if I can call f or g from within object_safety_dynamic.
My questions:
What happens here behind the scenes?
What (in simple English) am I actually telling the compiler by where Self: Sized; that makes g() work but f() not?
In particular: Since &self is a reference anyway, what compiled difference exists between f and g for various (sized or unsized) types. Wouldn't it always boil down to something like _vtable_f_or_g(*self) -> i32, regardless of where or if the type is sized or not?
Why can I implement Trait for both u8 and [u8]. Shouldn't the compiler actually stop me from implementing f() for [u8], instead of throwing an error at the call site?

fn f(&self) -> i32 where Self: Sized;
This says that f is only defined for types that also implement Sized. Unsized types may still implement Trait, but f will not be available.
Inside object_safety_dynamic, calling x.f() is actually doing: (*x).f(). While x is sized because it's a pointer, *x might not be because it could be any implementation of Trait. But code inside the function has to work for any valid argument, so you are not allowed to call x.f() there.

What does the where clause actually do?
Naively, I was thinking where Self: Sized; dictates something about the type implementing Trait, like 'if you implement Trait for type A your type A must be sized, i.e., it can be i32 but not [i32].
However, such a constraint would rather go as trait Trait: Sized
This is correct.
However, in this case, the bound applies only to the function. where bounds on functions are only checked at the callsite.
What happens here behind the scenes?
There is a confusing bit about rust's syntax which is that Trait can refer to either
The trait Trait; or
The "trait object" Trait, which is actually a type, not an object.
Sized is a trait, and any type T that is Sized may have its size taken as a constant, by std::mem::size_of::<T>(). Such types that are not sized are str and [u8], whose contents do not have a fixed size.
The type Trait is also unsized. Intuitively, this is because Trait as a type consists of all values of types that implement the trait Trait, which may have varying size. This means you can never have a value of type Trait - you can only refer to one via a "fat pointer" such as &Trait or Box<Trait> and so on. These have the size of 2 pointers - one for a vtable, one for the data. It looks roughly like this:
struct &Trait {
pub data: *mut (),
pub vtable: *mut (),
}
There is automatically an impl of the form:
impl Trait /* the trait */ for Trait /* the type */ {
fn f(&self) -> i32 where Self: Sized { .. }
fn g(&self) -> i32 {
/* vtable magic: something like (self.vtable.g)(self.data) */
}
}
What (in simple English) am I actually telling the compiler by where Self: Sized; that makes g() work but f() not?
Note that since, as I mentioned, Trait is not Sized, the bound Self: Sized is not satisfied and so the function f cannot be called where Self == Trait.
In particular: Since &self is a reference anyway, what compiled difference exists between f and g for various (sized or unsized) types. Wouldn't it always boil down to something like _vtable_f_or_g(*self) -> i32, regardless of where or if the type is sized or not?
The type Trait is always unsized. It doesn't matter which type has been coerced to Trait. The way you call the function with a Sized variable is to use it directly:
fn generic<T: Trait + Sized>(x: &T) { // the `Sized` bound is implicit, added here for clarity
x.f(); // compiles just fine
x.g();
}
Why can I implement Trait for both u8 and [u8]. Shouldn't the compiler actually stop me from implementing f() for [u8], instead of throwing an error at the call site?
Because the trait is not bounded by Self: Sized - the function f is. So there is nothing stopping you from implementing the function - it's just that the bounds on the function can never be satisfied, so you can never call it.

Related

What is the difference between method call syntax `foo.method()` and UFCS `Foo::method(&foo)`?

Is there any difference in Rust between calling a method on a value, like this:
struct A { e: u32 }
impl A {
fn show(&self) {
println!("{}", self.e)
}
}
fn main() {
A { e: 0 }.show();
}
...and calling it on the type, like this:
fn main() {
A::show(&A { e: 0 })
}
Summary: The most important difference is that the universal function call syntax (UFCS) is more explicit than the method call syntax.
With UFCS there is basically no ambiguity what function you want to call (there is still a longer form of the UFCS for trait methods, but let's ignore that for now). The method call syntax, on the other hand, requires more work in the compiler to figure out which method to call and how to call it. This manifests in mostly two things:
Method resolution: figure out if the method is inherent (bound to the type, not a trait) or a trait method. And in the latter case, also figure out which trait it belongs to.
Figure out the correct receiver type (self) and potentially use type coercions to make the call work.
Receiver type coercions
Let's take a look at this example to understand the type coercions to the receiver type:
struct Foo;
impl Foo {
fn on_ref(&self) {}
fn on_mut_ref(&mut self) {}
fn on_value(self) {}
}
fn main() {
let reference = &Foo; // type `&Foo`
let mut_ref = &mut Foo; // type `&mut Foo`
let mut value = Foo; // type `Foo`
// ...
}
So we have three methods that take Foo, &Foo and &mut Foo receiver and we have three variables with those types. Let's try out all 9 combinations with each, method call syntax and UFCS.
UFCS
Foo::on_ref(reference);
//Foo::on_mut_ref(reference); error: mismatched types
//Foo::on_value(reference); error: mismatched types
//Foo::on_ref(mut_ref); error: mismatched types
Foo::on_mut_ref(mut_ref);
//Foo::on_value(mut_ref); error: mismatched types
//Foo::on_ref(value); error: mismatched types
//Foo::on_mut_ref(value); error: mismatched types
Foo::on_value(value);
As we can see, only the calls succeed where the types are correct. To make the other calls work we would have to manually add & or &mut or * in front of the argument. That's the standard behavior for all function arguments.
Method call syntax
reference.on_ref();
//reference.on_mut_ref(); error: cannot borrow `*reference` as mutable
//reference.on_value(); error: cannot move out of `*reference`
mut_ref.on_ref();
mut_ref.on_mut_ref();
//mut_ref.on_value(); error: cannot move out of `*mut_ref`
value.on_ref();
value.on_mut_ref();
value.on_value();
Only three of the method calls lead to an error while the others succeed. Here, the compiler automatically inserts deref (dereferencing) or autoref (adding a reference) coercions to make the call work. Also note that the three errors are not "type mismatch" errors: the compiler already tried to adjust the type correctly, but this lead to other errors.
There are some additional coercions:
Unsize coercions, described by the Unsize trait. Allows you to call slice methods on arrays and to coerce types into trait objects of traits they implement.
Advanced deref coercions via the Deref trait. This allows you to call slice methods on Vec, for example.
Method resolution: figuring out what method to call
When writing lhs.method_name(), then the method method_name could be an inherent method of the type of lhs or it could belong to a trait that's in scope (imported). The compiler has to figure out which one to call and has a number of rules for this. When getting into the details, these rules are actually really complex and can lead to some surprising behavior. Luckily, most programmers will never have to deal with that and it "just works" most of the time.
To give a coarse overview how it works, the compiler tries the following things in order, using the first method that is found.
Is there an inherent method with the name method_name where the receiver type fits exactly (does not need coercions)?
Is there a trait method with the name method_name where the receiver type fits exactly (does not need coercions)?
Is there an inherent method with the name method_name? (type coercions will be performed)
Is there a trait method with the name method_name? (type coercions will be performed)
(Again, note that this is still a simplification. Different type of coercions are preferred over others, for example.)
This shows one rule that most programmers know: inherent methods have a higher priority than trait methods. But a bit unknown is the fact that whether or not the receiver type fits perfectly is a more important factor. There is a quiz that nicely demonstrates this: Rust Quiz #23. More details on the exact method resolution algorithm can be found in this StackOverflow answer.
This set of rules can actually make a bunch of changes to an API to be breaking changes. We currently have to deal with that in the attempt to add an IntoIterator impl for arrays.
Another – minor and probably very obvious – difference is that for the method call syntax, the type name does not have to be imported.
Apart from that it's worth pointing out what is not different about the two syntaxes:
Runtime behavior: no difference whatsoever.
Performance: the method call syntax is "converted" (desugared) into basically the UFCS pretty early inside the compiler, meaning that there aren't any performance differences either.

How do I constrain associated types on a non-owned trait? [duplicate]

I have this code:
extern crate serde;
use serde::de::DeserializeOwned;
use serde::Serialize;
trait Bar<'a, T: 'a>
where
T: Serialize,
&'a T: DeserializeOwned,
{
}
I would like to write this using an associated type, because the type T is unimportant to the users of this type. I got this far:
trait Bar {
type T: Serialize;
}
I cannot figure out how to specify the other bound.
Ultimately, I want to use a function like this:
extern crate serde_json;
fn test<I: Bar>(t: I::T) -> String {
serde_json::to_string(&t).unwrap()
}
The "correct" solution is to place the bounds on the trait, but referencing the associated type. In this case, you can also use higher ranked trait bounds to handle the reference:
trait Bar
where
Self::T: Serialize,
// ^^^^^^^ Bounds on an associated type
for<'a> &'a Self::T: DeserializeOwned,
// ^^^^^^^^^^^ Higher-ranked trait bounds
{
type T;
}
However, this doesn't work yet.
I believe that you will need to either:
wait for issue 20671 and/or issue 50346 to be fixed.
wait for the generic associated types feature which introduces where clauses on associated types.
In the meantime, the workaround is to duplicate the bound everywhere it's needed:
fn test<I: Bar>(t: I::T) -> String
where
for<'a> &'a I::T: DeserializeOwned,
{
serde_json::to_string(&t).unwrap()
}

"too many parameters" on perfectly fine function

I have code similar to this:
pub trait WorldImpl {
fn new(size: (usize, usize), seed: u32) -> World;
fn three() -> bool;
fn other() -> bool;
fn non_self_methods() -> bool;
}
pub type World = Vec<Vec<UnitOfSpace>>;
// I'm doing this because I want a SPECIAL version of Vec<Vec<UnitOfSpace>>, so I can treat it like a struct but have it be a normal type underneath.
impl WorldImpl for World {
fn new(size: (usize, usize), seed: u32) -> World {
// Code
vec![/* vector stuff */]
}
// Implement other three methods
}
let w = World::new((120, 120), /* seed from UNIX_EPOCH stuff */);
And I get this error, which is clearly wrong:
error[E0061]: this function takes 0 parameters but 2 parameters were supplied
--> src/main.rs:28:28
|
28 | let world = World::new((120 as usize, 120 as usize),
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ expected 0 parameters
I'm thinking two things:
This is not idiomatic and Rust was never meant to be used this way. In this case, I need to know how to really do this.
It's a stupid error that I'm missing.
When I try similar code to the above on the playground, it works just fine, no errors. I have not found any information on any errors like this anywhere else, so I'll not be surprised to find out I'm just using the language wrong. I have no particular attachment to any of my code, so please tell me what the idiom is for this!
What you are trying to do doesn't quite make sense. You have made World a type alias for Vec<Vec<UnitOfSpace>>, so they are completely interchangeable - the implementations you add for one will apply to the other and vice versa.
If you want to treat this type differently then wrap it in a newtype:
struct World(Vec<Vec<UnitOfSpace>>);
This is now a distinct type from Vec<Vec<UnitOfSpace>>, but with zero runtime overhead.
Your actual error is because you have added a method called new to World as part of its implementation of WorldImpl, but World is a Vec which already has a new method (with zero args!).
Your type World is an alias for Vec<Vec<UnitOfSpace>>. Vec<T> provides an inherent associated function called new that takes no parameters. The compiler prefers selecting inherent associated functions to associated functions defined in traits, thus it selects the inherent new with no parameters instead of your own new that takes 2 parameters.
Here are a few options to solve this:
Invoke the trait's associated function explicitly:
let w = <World as WorldImpl>::new((120, 120), /* seed from UNIX_EPOCH stuff */);
Make World a newtype (struct World(Vec<Vec<UnitOfSpace>>);), which will let you define inherent associated functions (but then Vec's inherent methods won't be available on World).
Rename WorldImpl::new to a name that is not used by an inherent associated function on Vec.

Generalizing a function for an enum

I have an enum that looks like this
pub enum IpNetwork {
V4(Ipv4Network),
V6(Ipv6Network),
}
Each of those variants represents either a IPv4 or v6 CIDR. Now, Ipv4Network and Ipv6Network each has a method to get the prefix defined like this
// For Ipv4Network
pub fn prefix(&self) -> u8
// For Ipv6Network
pub fn prefix(&self) -> u128
How do I generalize the prefix method for the IpNetwork enum? I know that I can just have u128 as the return type, but is that approach idiomatic?
So you want a prefix function that operates on the IpNetwork type, but are unsure what the return type should be. Below is a possible approach you could follow.
The argument against using an enum
As bheklilr mentioned in a comment, one of the alternatives is introducing an enum: pub enum Prefix { V4(u8), V6(u128) }.
This could make sense depending on your use case, but it seems like overkill to me here. In the end, you would end up pattern matching on the result of your generic prefix function. In that case, you could better pattern match on the IpNetwork object itself and call its corresponding prefix function.
The case for u128
If you just want to obtain the integer value and don't need to differentiate between IPV4 and IPV6, returning an integer seems to be the way to go. A u8 can be casted to u128 without any problem and the overhead is negligible.
As far as I know the standard library doesn't hold functionality for generic numeric types. You could, however, define a trait and implement it for u8 and u128.
Also, there is the num crate, which does basically that.

Explicit lifetime error in rust

I have a rust enum that I want to use, however I recieve the error;
error: explicit lifetime bound required
numeric(Num),
~~~
The enum in question:
enum expr{
numeric(Num),
symbol(String),
}
I don't think I understand what is being borrowed here. My intent was for the Num or String to have the same lifetime as the containing expr allowing me to return them from functions.
The error message is somewhat misleading. Num is a trait and it is a dynamically sized type, so you can't have values of it without some kind of indirection (a reference or a Box). The reason for this is simple; just ask yourself a question: what size (in bytes) expr enum values must have? It is certainly at least as large as String, but what about Num? Arbitrary types can implement this trait, so in order to be sound expr has to have infinite size!
Hence you can use traits as types only with some kind of pointer: &Num or Box<Num>. Pointers always have fixed size, and trait objects are "fat" pointers, keeping additional information within them to help with method dispatching.
Also traits are usually used as bounds for generic type parameters. Because generics are monomorphized, they turn into static types in the compiled code, so their size is always statically known and they don't need pointers. Using generics should be the default approach, and you should switch to trait objects only when you know why generics won't work for you.
These are possible variants of your type definition. With generics:
enum Expr<N: Num> {
Numeric(N),
Symbol(String)
}
Trait object through a reference:
enum Expr<'a> { // '
Numeric(&'a Num + 'a),
Symbol(String)
}
Trait object with a box:
enum Expr {
Numeric(Box<Num + 'static>), // ' // I used 'static because numbers usually don't contain references inside them
Symbol(String)
}
You can read more about generics and traits in the official guide, though at the moment it lacks information on trait objects. Please do ask if you don't understand something.
Update
'a in
enum Expr<'a> { // '
Numeric(&'a Num + 'a),
Symbol(String)
}
is a lifetime parameter. It defines both the lifetime of a reference and of trait object internals inside Numeric variant. &'a Num + 'a is a type that you can read as "a trait object behind a reference which lives at least as long as 'a with references inside it which also live at least as long as 'a". That is, first, you specify 'a as a reference lifetime: &'a, and second, you specify the lifetime of trait object internals: Num + 'a. The latter is needed because traits can be implemented for any types, including ones which contain references inside them, so you need to put the minimum lifetime of these references into trait object type too, otherwise borrow checking won't work correctly with trait objects.
With Box the situation is very similar. Box<Num + 'static> is "a trait object inside a heap-allocated box with references inside it which live at least as long as 'static". The Box type is a smart pointer for heap-allocated owned data. Because it owns the data it holds, it does not need a lifetime parameter like references do. However, the trait object still can contain references inside it, and that's why Num + 'a is still used; I just chose to use 'static lifetime instead of adding another lifetime parameter. This is because numerical types are usually simple and don't have references inside them, and it is equivalent to 'static bound. You are free to add a lifetime parameter if you want, of course.
Note that all of these variants are correct:
&'a SomeTrait + 'a
&'a SomeTrait + 'static
Box<SomeTrait + 'a> // '
Box<SomeTrait + 'static>
Even this is correct, with 'a and 'b as different lifetime parameters:
&'a SomeTrait + 'b
though this is rarely useful, because 'b must be at least as long as 'a (otherwise internals of the trait object could be invalidated while it itself is still alive), so you can just as well use &'a SomeTrait + 'a.

Resources