Explicit lifetime error in rust - enums

I have a rust enum that I want to use, however I recieve the error;
error: explicit lifetime bound required
numeric(Num),
~~~
The enum in question:
enum expr{
numeric(Num),
symbol(String),
}
I don't think I understand what is being borrowed here. My intent was for the Num or String to have the same lifetime as the containing expr allowing me to return them from functions.

The error message is somewhat misleading. Num is a trait and it is a dynamically sized type, so you can't have values of it without some kind of indirection (a reference or a Box). The reason for this is simple; just ask yourself a question: what size (in bytes) expr enum values must have? It is certainly at least as large as String, but what about Num? Arbitrary types can implement this trait, so in order to be sound expr has to have infinite size!
Hence you can use traits as types only with some kind of pointer: &Num or Box<Num>. Pointers always have fixed size, and trait objects are "fat" pointers, keeping additional information within them to help with method dispatching.
Also traits are usually used as bounds for generic type parameters. Because generics are monomorphized, they turn into static types in the compiled code, so their size is always statically known and they don't need pointers. Using generics should be the default approach, and you should switch to trait objects only when you know why generics won't work for you.
These are possible variants of your type definition. With generics:
enum Expr<N: Num> {
Numeric(N),
Symbol(String)
}
Trait object through a reference:
enum Expr<'a> { // '
Numeric(&'a Num + 'a),
Symbol(String)
}
Trait object with a box:
enum Expr {
Numeric(Box<Num + 'static>), // ' // I used 'static because numbers usually don't contain references inside them
Symbol(String)
}
You can read more about generics and traits in the official guide, though at the moment it lacks information on trait objects. Please do ask if you don't understand something.
Update
'a in
enum Expr<'a> { // '
Numeric(&'a Num + 'a),
Symbol(String)
}
is a lifetime parameter. It defines both the lifetime of a reference and of trait object internals inside Numeric variant. &'a Num + 'a is a type that you can read as "a trait object behind a reference which lives at least as long as 'a with references inside it which also live at least as long as 'a". That is, first, you specify 'a as a reference lifetime: &'a, and second, you specify the lifetime of trait object internals: Num + 'a. The latter is needed because traits can be implemented for any types, including ones which contain references inside them, so you need to put the minimum lifetime of these references into trait object type too, otherwise borrow checking won't work correctly with trait objects.
With Box the situation is very similar. Box<Num + 'static> is "a trait object inside a heap-allocated box with references inside it which live at least as long as 'static". The Box type is a smart pointer for heap-allocated owned data. Because it owns the data it holds, it does not need a lifetime parameter like references do. However, the trait object still can contain references inside it, and that's why Num + 'a is still used; I just chose to use 'static lifetime instead of adding another lifetime parameter. This is because numerical types are usually simple and don't have references inside them, and it is equivalent to 'static bound. You are free to add a lifetime parameter if you want, of course.
Note that all of these variants are correct:
&'a SomeTrait + 'a
&'a SomeTrait + 'static
Box<SomeTrait + 'a> // '
Box<SomeTrait + 'static>
Even this is correct, with 'a and 'b as different lifetime parameters:
&'a SomeTrait + 'b
though this is rarely useful, because 'b must be at least as long as 'a (otherwise internals of the trait object could be invalidated while it itself is still alive), so you can just as well use &'a SomeTrait + 'a.

Related

What is the difference between method call syntax `foo.method()` and UFCS `Foo::method(&foo)`?

Is there any difference in Rust between calling a method on a value, like this:
struct A { e: u32 }
impl A {
fn show(&self) {
println!("{}", self.e)
}
}
fn main() {
A { e: 0 }.show();
}
...and calling it on the type, like this:
fn main() {
A::show(&A { e: 0 })
}
Summary: The most important difference is that the universal function call syntax (UFCS) is more explicit than the method call syntax.
With UFCS there is basically no ambiguity what function you want to call (there is still a longer form of the UFCS for trait methods, but let's ignore that for now). The method call syntax, on the other hand, requires more work in the compiler to figure out which method to call and how to call it. This manifests in mostly two things:
Method resolution: figure out if the method is inherent (bound to the type, not a trait) or a trait method. And in the latter case, also figure out which trait it belongs to.
Figure out the correct receiver type (self) and potentially use type coercions to make the call work.
Receiver type coercions
Let's take a look at this example to understand the type coercions to the receiver type:
struct Foo;
impl Foo {
fn on_ref(&self) {}
fn on_mut_ref(&mut self) {}
fn on_value(self) {}
}
fn main() {
let reference = &Foo; // type `&Foo`
let mut_ref = &mut Foo; // type `&mut Foo`
let mut value = Foo; // type `Foo`
// ...
}
So we have three methods that take Foo, &Foo and &mut Foo receiver and we have three variables with those types. Let's try out all 9 combinations with each, method call syntax and UFCS.
UFCS
Foo::on_ref(reference);
//Foo::on_mut_ref(reference); error: mismatched types
//Foo::on_value(reference); error: mismatched types
//Foo::on_ref(mut_ref); error: mismatched types
Foo::on_mut_ref(mut_ref);
//Foo::on_value(mut_ref); error: mismatched types
//Foo::on_ref(value); error: mismatched types
//Foo::on_mut_ref(value); error: mismatched types
Foo::on_value(value);
As we can see, only the calls succeed where the types are correct. To make the other calls work we would have to manually add & or &mut or * in front of the argument. That's the standard behavior for all function arguments.
Method call syntax
reference.on_ref();
//reference.on_mut_ref(); error: cannot borrow `*reference` as mutable
//reference.on_value(); error: cannot move out of `*reference`
mut_ref.on_ref();
mut_ref.on_mut_ref();
//mut_ref.on_value(); error: cannot move out of `*mut_ref`
value.on_ref();
value.on_mut_ref();
value.on_value();
Only three of the method calls lead to an error while the others succeed. Here, the compiler automatically inserts deref (dereferencing) or autoref (adding a reference) coercions to make the call work. Also note that the three errors are not "type mismatch" errors: the compiler already tried to adjust the type correctly, but this lead to other errors.
There are some additional coercions:
Unsize coercions, described by the Unsize trait. Allows you to call slice methods on arrays and to coerce types into trait objects of traits they implement.
Advanced deref coercions via the Deref trait. This allows you to call slice methods on Vec, for example.
Method resolution: figuring out what method to call
When writing lhs.method_name(), then the method method_name could be an inherent method of the type of lhs or it could belong to a trait that's in scope (imported). The compiler has to figure out which one to call and has a number of rules for this. When getting into the details, these rules are actually really complex and can lead to some surprising behavior. Luckily, most programmers will never have to deal with that and it "just works" most of the time.
To give a coarse overview how it works, the compiler tries the following things in order, using the first method that is found.
Is there an inherent method with the name method_name where the receiver type fits exactly (does not need coercions)?
Is there a trait method with the name method_name where the receiver type fits exactly (does not need coercions)?
Is there an inherent method with the name method_name? (type coercions will be performed)
Is there a trait method with the name method_name? (type coercions will be performed)
(Again, note that this is still a simplification. Different type of coercions are preferred over others, for example.)
This shows one rule that most programmers know: inherent methods have a higher priority than trait methods. But a bit unknown is the fact that whether or not the receiver type fits perfectly is a more important factor. There is a quiz that nicely demonstrates this: Rust Quiz #23. More details on the exact method resolution algorithm can be found in this StackOverflow answer.
This set of rules can actually make a bunch of changes to an API to be breaking changes. We currently have to deal with that in the attempt to add an IntoIterator impl for arrays.
Another – minor and probably very obvious – difference is that for the method call syntax, the type name does not have to be imported.
Apart from that it's worth pointing out what is not different about the two syntaxes:
Runtime behavior: no difference whatsoever.
Performance: the method call syntax is "converted" (desugared) into basically the UFCS pretty early inside the compiler, meaning that there aren't any performance differences either.

Composition vs inheritance with anonymous struct

I was reading this slideshow, which says:
var hits struct {
sync.Mutex
n int
}
hits.Lock()
hits.n++
hits.Unlock()
How does that work exactly? Seems like hits isn't composed of a mutex and integer, but is a mutex and integer?
It is composition. Using an anonymous field (embedded field), the containing struct will have a value of the embedded type, and you can refer to it: the unqualified type name acts as the field name.
So you could just as easily write:
hits.Mutex.Lock()
hits.n++
hits.Mutex.Unlock()
When you embed a type, fields and methods of the embedded type get promoted, so you can refer to them without specifying the field name (which is the embedded type name), but this is just syntactic sugar. Quoting from Spec: Selectors:
A selector f may denote a field or method f of a type T, or it may refer to a field or method f of a nested embedded field of T.
Beyond the field / method promotion, the method set of the embedder type will also contain the method set of the embedded type. Quoting from Spec: Struct types:
Given a struct type S and a defined type T, promoted methods are included in the method set of the struct as follows:
If S contains an embedded field T, the method sets of S and *S both include promoted methods with receiver T. The method set of *S also includes promoted methods with receiver *T.
If S contains an embedded field *T, the method sets of S and *S both include promoted methods with receiver T or *T.
This is not inheritance in the OOP sense, but something similar. This comes handy when you want to implement an interface: if you embed a type that already implements the interface, so will your struct type. You can also provide your own implementation of some methods, which gives the feeling of method overriding, but must not be forgetten that selectors that denote methods of the embedded type will get the embedded value as the receiver (and not the embedder value), and selectors that denote your methods defined on the struct type (that may or may not "shadow" a method of the embedded type) will get the embedder struct value as the receiver.
It's called embedding, hits is composed of a sync.Mutex and an int. This should be true since there is really no inheritance in Go. This is more of a "has a" rather than an "is a" relationship between the members and the struct.
Read here for a more complete explanation
Quoted from the link
The methods of embedded types come along for free
That means you can access them like hits.Lock() instead of the longer form hits.Mutex.Lock() because the function Lock() is not ambiguous.
See the Go-syntax representation of hits variable:
fmt.Printf("%#v\n", &hits)
// &struct { sync.Mutex; n int }{Mutex:sync.Mutex{state:0, sema:0x0}, n:1}
When you declare the variable, it simply initializes the fields in struct with their default values.
Also, compiler automatically sets the name of the embedded struct as a field. So you can also access like:
hits.Mutex.Lock()
hits.Mutex.Unlock()
And you have access to all methods and exported fields (if any) of sync.Mutex.

Why is it impossible to instantiate a data structure due to "overflow while adding drop-check rules"? [duplicate]

This question already has an answer here:
"overflow while adding drop-check rules" while implementing a fingertree
(1 answer)
Closed 4 years ago.
Here is a data structure I can write down and which is accepted by the Rust compiler:
pub struct Pair<S, T>(S, T);
pub enum List<T> {
Nil,
Cons(T, Box<List<Pair<i32, T>>>),
}
However, I cannot write
let x: List<i32> = List::Nil;
playground
as Rust will complain about an "overflow while adding drop-check rules".
Why shouldn't it be possible to instantiate List::Nil?
It should be noted that the following works:
pub struct Pair<S, T>(S, T);
pub enum List<T> {
Nil,
Cons(T, Box<List<T>>),
}
fn main() {
let x: List<i32> = List::Nil;
}
playground
When the type hasn't been instantiated, the compiler is mostly worried about the size of the type being constant and known at compile-time so it can be placed on the stack. The Rust compiler will complain if the type is infinite, and quite often a Box will fix that by creating a level of indirection to a child node, which is also of known size because it boxes its own child too.
This won't work for your type though.
When you instantiate List<T>, the type of the second argument of the Cons variant is:
Box<List<Pair<i32, T>>>
Notice that the inner List has a type argument Pair<i32, T>, not T.
That inner list also has a Cons, whose second argument has type:
Box<List<Pair<i32, Pair<i32, T>>>>
Which has a Cons, whose second argument has type:
Box<List<Pair<i32, Pair<i32, Pair<i32, T>>>>>
And so on.
Now this doesn't exactly explain why you can't use this type. The size of the type will increase linearly with how deeply it is within the List structure. When the list is short (or empty) it doesn't reference any complex types.
Based on the error text, the reason for the overflow is related to drop-checking. The compiler is checking that the type is dropped correctly, and if it comes across another type in the process it will check if that type is dropped correctly too. The problem is that each successive Cons contains a completely new type, which gets bigger the deeper you go, and the compiler has to check if each of these will be dropped correctly. The process will never end.

Why is Vec::len a method instead of a public property?

I noticed that Rust's Vec::len method just accesses the vector's len property. Why isn't len just a public property, rather than wrapping a method around it?
I assume this is so that in case the implementation changes in the future, nothing will break because Vec::len can change the way it gets the length without any users of Vec knowing, but I don't know if there are any other reasons.
The second part of my question is about when I'm designing an API. If I am building my own API, and I have a struct with a len property, should I make len private and create a public len() method? Is it bad practice to make fields public in Rust? I wouldn't think so, but I don't notice this being done often in Rust. For example, I have the following struct:
pub struct Segment {
pub dol_offset: u64,
pub len: usize,
pub loading_address: u64,
pub seg_type: SegmentType,
pub seg_num: u64,
}
Should any of those fields be private and instead have a wrapper function like Vec does? If so, then why? Is there a good guideline to follow for this in Rust?
One reason is to provide the same interface for all containers that implement some idea of length. (Such as std::iter::ExactSizeIterator.)
In the case of Vec, len() is acting like a getter:
impl<T> Vec<T> {
pub fn len(&self) -> usize {
self.len
}
}
While this ensures consistency across the standard library, there is another reason underlying this design choice...
This getter protects from external modification of len. If the condition Vec::len <= Vec::buf::cap is not ever satisfied, Vec's methods may try to access memory illegally. For instance, the implementation of Vec::push:
pub fn push(&mut self, value: T) {
if self.len == self.buf.cap() {
self.buf.double();
}
unsafe {
let end = self.as_mut_ptr().offset(self.len as isize);
ptr::write(end, value);
self.len += 1;
}
}
will attempt to write to memory past the actual end of the memory owned by the container. Because of this critical requirement, modification to len is forbidden.
Philosophy
It's definitely good to use a getter like this in library code (crazy people out there might try to modify it!).
However, one should design their code in a manner that minimizes the requirement of getters/setters. A class should act on its own members as much as possible. These actions should be made available to the public through methods. And here I mean methods that do useful things -- not just a plain ol' getter/setter that returns/sets a variable. Setters in particular can be made redundant through the use of constructors or methods. Vec shows us some of these "setters":
push
insert
pop
reserve
...
Thus, Vec implements algorithms that provide access to the outside world. But it manages its innards by itself.
The Vec struct looks something like this[1]:
pub struct Vec<T> {
ptr: *mut T,
capacity: usize,
len: usize,
}
The idea is that ptr points at a block of allocated memory of size capacity. If the size of the Vec needs to be bigger than the capacity then new memory is allocated. The unused portion of the allocated memory is uninitialised and could contain arbitrary data.
When you call mutating methods on Vec like push or pop, they carefully manage the Vec's internal state, increase capacity when necessary, and ensure that items that are removed are properly dropped.
If len was a public field, any code with an owned Vec, or a mutable reference to one, could set len to any value. Set it higher than it should be and you'll be able to read from uninitialised memory, causing Undefined Behaviour. Set it lower and you'll be effectively removing elements without properly dropping them.
In some other programming languages (e.g. JavaScript) the API for arrays or vectors specifically lets you change the size by setting a length property. It's not unreasonable to think that a programmer who is used to that approach could do this accidentally in Rust.
Keeping all the fields private and using a getter method for len() allows Vec to protect the mutability of its internals, make strong memory guarantees and prevent users from accidentally doing bad things to themselves.
[1] In practice, there are abstraction layers built over this data structure, so it looks a little different.

What does the 'where' clause within a trait do?

If I have this code:
trait Trait {
fn f(&self) -> i32 where Self: Sized;
fn g(&self) -> i32;
}
fn object_safety_dynamic(x: &Trait) {
x.f(); // error
x.g(); // works
}
What does the where clause actually do?
Naively, I was thinking where Self: Sized; dictates something about the type implementing Trait, like 'if you implement Trait for type A your type A must be sized, i.e., it can be i32 but not [i32].
However, such a constraint would rather go as trait Trait: Sized (correct me if I am wrong)?
Now I noticed where Self: Sized; actually determines if I can call f or g from within object_safety_dynamic.
My questions:
What happens here behind the scenes?
What (in simple English) am I actually telling the compiler by where Self: Sized; that makes g() work but f() not?
In particular: Since &self is a reference anyway, what compiled difference exists between f and g for various (sized or unsized) types. Wouldn't it always boil down to something like _vtable_f_or_g(*self) -> i32, regardless of where or if the type is sized or not?
Why can I implement Trait for both u8 and [u8]. Shouldn't the compiler actually stop me from implementing f() for [u8], instead of throwing an error at the call site?
fn f(&self) -> i32 where Self: Sized;
This says that f is only defined for types that also implement Sized. Unsized types may still implement Trait, but f will not be available.
Inside object_safety_dynamic, calling x.f() is actually doing: (*x).f(). While x is sized because it's a pointer, *x might not be because it could be any implementation of Trait. But code inside the function has to work for any valid argument, so you are not allowed to call x.f() there.
What does the where clause actually do?
Naively, I was thinking where Self: Sized; dictates something about the type implementing Trait, like 'if you implement Trait for type A your type A must be sized, i.e., it can be i32 but not [i32].
However, such a constraint would rather go as trait Trait: Sized
This is correct.
However, in this case, the bound applies only to the function. where bounds on functions are only checked at the callsite.
What happens here behind the scenes?
There is a confusing bit about rust's syntax which is that Trait can refer to either
The trait Trait; or
The "trait object" Trait, which is actually a type, not an object.
Sized is a trait, and any type T that is Sized may have its size taken as a constant, by std::mem::size_of::<T>(). Such types that are not sized are str and [u8], whose contents do not have a fixed size.
The type Trait is also unsized. Intuitively, this is because Trait as a type consists of all values of types that implement the trait Trait, which may have varying size. This means you can never have a value of type Trait - you can only refer to one via a "fat pointer" such as &Trait or Box<Trait> and so on. These have the size of 2 pointers - one for a vtable, one for the data. It looks roughly like this:
struct &Trait {
pub data: *mut (),
pub vtable: *mut (),
}
There is automatically an impl of the form:
impl Trait /* the trait */ for Trait /* the type */ {
fn f(&self) -> i32 where Self: Sized { .. }
fn g(&self) -> i32 {
/* vtable magic: something like (self.vtable.g)(self.data) */
}
}
What (in simple English) am I actually telling the compiler by where Self: Sized; that makes g() work but f() not?
Note that since, as I mentioned, Trait is not Sized, the bound Self: Sized is not satisfied and so the function f cannot be called where Self == Trait.
In particular: Since &self is a reference anyway, what compiled difference exists between f and g for various (sized or unsized) types. Wouldn't it always boil down to something like _vtable_f_or_g(*self) -> i32, regardless of where or if the type is sized or not?
The type Trait is always unsized. It doesn't matter which type has been coerced to Trait. The way you call the function with a Sized variable is to use it directly:
fn generic<T: Trait + Sized>(x: &T) { // the `Sized` bound is implicit, added here for clarity
x.f(); // compiles just fine
x.g();
}
Why can I implement Trait for both u8 and [u8]. Shouldn't the compiler actually stop me from implementing f() for [u8], instead of throwing an error at the call site?
Because the trait is not bounded by Self: Sized - the function f is. So there is nothing stopping you from implementing the function - it's just that the bounds on the function can never be satisfied, so you can never call it.

Resources