Confusion in Validating References with Lifetimes in Rust [duplicate] - validation

This question already has answers here:
Why are explicit lifetimes needed in Rust?
(10 answers)
Semantics of lifetime parameters
(2 answers)
Closed 6 months ago.
I am a beginner in rust, following rust-lang/book.
In it's ch10.3. Validating References with Lifetimes there is a Listing 10-20:
fn main() {
let string1 = String::from("abcd");
let string2 = "xyz";
let result = longest(string1.as_str(), string2);
println!("The longest string is {}", result);
}
fn longest(x: &str, y: &str) -> &str { // <-- ERROR
if x.len() > y.len() {
x
} else {
y
}
}
There are two points they have mentioned :
Rust can’t tell whether the reference being returned refers to x or y. // <-- no need, according to me
We also don’t know the concrete lifetimes of the references that will be passed in, to determine whether the reference we return will always be valid.
In the code below, their is no error (as expected) :
fn main() {
let string1 = String::from("abcd") ;
let string2 = "xyz";
let x: &str = &string1.as_str();
let y: &str = &string2;
let result =
if x.len() > y.len() {
x
} else {
y
};
println!("The longest string is {}", result);
}
Confusion :
Why Rust need to tell whether the reference being returned refers to x or y ?
Silly question, but I want to know...
Edited
Solution :
Suppose that function call is call by customer, and
function as the seller
In snippet one,
Then, function call expect that it will get one of the value, passed in argument, in return (as in snippet one)
But, if seller is biased or accidently give value other than parameters. like -
fn longest(x: &str, y: &str) -> &str {
let z = "Other String";
&z
}
Then, both function call and function both will get error message
But, their is no any mistake of customer.
Therefore, Rust ensure that customer will not get any error, for the mistake of seller, with the help of annotating lifetime parameter.
This is also the reason of, "Why Typescript introduced in Javascript".
In snippet two,
Both customer and seller is the same function
The related question, mentioned below
Why are explicit lifetimes needed in Rust?

In the second snippet, the lifetime used is the shorter of x and y.
But Rust does not do lifetime inference (or any inference at all) across function boundaries. It always requires you to specify explicitly the types and lifetimes involved. Thus, the lifetime that was inferred in the second snippet needs to be specified explicitly in the first.
The most important reason for that is to avoid unintentional breakage. If functions' type would be inferred it would be too easy to break APIs accidentally. Thus Rust by design requires you to specify signatures explicitly.

First Case
Suppose that Rust didn't give an error with your definition of longest(). Then it's possible to use longest() such that the returned address is stored in a variable that has a longer lifetime than the string slices passed in. For example, something like this:
let result: &str;
{
let x = String::from("welcome");
let y = String::from("bye");
result = longest(&x, &y);
} // `x` and `y` go out of scope, so `&x` and `&y` are no longer valid.
// This would be undefined behavior, because the data pointed to
// by `result` is no longer valid.
println!("result: {}", result);
Since result is used after x and y go out of scope, and result points to the data in either x or y, this would lead to undefined behavior. But Rust doesn't allow this; instead, the Rust compiler forces you to make the returned value of longest() has a sufficiently long lifetime.
So if the compiler didn't give an error with how you wrote longest(), then yes in your example there wouldn't be undefined behavior (because x, y, and result all have the same lifetime), but in general certain invocations of longest() and variables subsequently going out of scope could lead to undefined behavior. So to prevent this, Rust forces you to annotate the lifetimes to make sure the returned address has a long enough lifetime.
Second Case
The variables x, y, and result are all cleaned up at the same time when they go out of scope. So the address referenced by result is always valid whether it's the address of x or the address of y. So there's no error.

Related

Matching borrowed enum - why is this syntax equivalent? [duplicate]

This question already has an answer here:
Why does pattern matching on &Option<T> yield something of type Some(&T)?
(1 answer)
Closed 3 years ago.
I have the following piece of code, which compiles using rustc v1.36:
enum Number {
Integer(i32),
Real(f32),
}
fn foo1(number: &mut Number) {
if let Number::Integer(n) = number {
let _y: &mut i32 = n;
}
}
fn foo2(number: &mut Number) {
if let &mut Number::Integer(ref mut n) = number {
let _y: &mut i32 = n;
}
}
Funny enough, I can understand how 'foo2' does the matching, but not so for 'foo1', while 'foo1' is the kind of code you will see in any Rust project. Can someone explain how the matching syntax in these 2 is equivalent? And thus it extend to other code (structures?) as well?
This functionality was added in Rust 1.26, and is called 'default binding modes' (or 'match ergonomics', after the RFC that proposed it). It effectively allows pattern matching to automatically dereference values, and to add ref and ref mut to variable bindings where needed.
The rules for this behaviour are discussed in detail in the RFC, but it effectively boils down to:
Variable bindings within a pattern can be resolved in one of three modes:
'move' (the default), which will move the value.
'ref', which will immutably reference the value.
'ref mut', which will mutably reference the value.
When a variable binding is encountered within a pattern without an explicit ref, mut or ref mut, the current binding mode will be used.
When a reference is pattern matched using a non-reference pattern:
The value will be auto-dereferenced.
The binding mode may change for any nested patterns:
If the type of the reference is &T, the binding mode will change to 'ref'.
If the type of the reference is &mut T and the current binding mode is not 'ref', the binding mode will change to 'ref mut'.
This may sound complicated, but as you can see from the end result, it tends to line up with how you'd intuitively write the match!

Why does _ destroy at the end of statement?

I've seen a few other questions and answers stating that let _ = foo() destroys the result at the end of the statement rather than at scope exit, which is what let _a = foo() does.
I am unable to find any official description of this, nor any rationale for this syntax.
I'm interested in a few inter-twined things:
Is there even a mention of it in the official documentation?
What is the history behind this choice? Is it simply natural fall-out from Rust's binding / destructuring rules? Is it something inherited from another language? Or does it have some other origin?
Is there some use-case this syntax addresses that could not have been achieved using explicit scoping?
Is it simply natural fall-out from Rust's binding / destructuring rules?
Yes. You use _ to indicate that you don't care about a value in a pattern and that it should not be bound in the first place. If a value is never bound to a variable, there's nothing to hold on to the value, so it must be dropped.
All the Places Patterns Can Be Used:
match Arms
Conditional if let Expressions
while let Conditional Loops
for Loops
let Statements
Function Parameters
Is there even a mention of it in the official documentation?
Ignoring an Entire Value with _
Of note is that _ isn't a valid identifier, thus you can't use it as a name:
fn main() {
let _ = 42;
println!("{}", _);
}
error: expected expression, found reserved identifier `_`
--> src/main.rs:3:20
|
3 | println!("{}", _);
| ^ expected expression
achieved using explicit scoping
I suppose you could have gone this route and made expressions doing this just "hang around" until the scope was over, but I don't see any value to it:
let _ = vec![5];
vec![5]; // Equivalent
// Gotta wait for the scope to end to clean these up, or call `drop` explicitly
The only reason that you'd use let _ = foo() is when the function requires that you use its result, and you know that you don't need it. Otherwise, this:
let _ = foo();
is exactly the same as this:
foo();
For example, suppose foo has a signature like this:
fn foo() -> Result<String, ()>;
You will get a warning if you don't use the result, because Result has the #[must_use] attribute. Destructuring and ignoring the result immediately is a concise way of avoiding this warning in cases where you know it's ok, without introducing a new variable that lasts for the full scope.
If you didn't pattern match against the result then the value would be dropped as soon as the foo function returns. It seems reasonable that Rust would behave the same regardless of whether you explicitly said you don't want it or just didn't use it.

Pybind11: Follow up to binding a function with std::initializer_list

I know that there is a similar question here: Binding a function with std::initializer_list argument using pybind11 but because I cannot comment (not enough reputation) I ask my question here: Do the results from the above-linked question also apply to constructors: I.e. if I have a constructor which takes std::initializer_list<T> is there no way to bind it?
There's no simple way to bind it, at least. Basically, as mentioned in the other post (and my original response in the pybind11 issue tracker), we can't dynamically construct a std::initializer_list: it's a compile-time construct. Constructor vs method vs function doesn't matter here: we cannot convert a set of dynamic arguments into the compile-time initializer_list construct.
But let me give you a way that you could, partially, wrap it if you're really stuck with a C++ design that requires it. You first have to decide how many arguments you're going to support. For example, let's say you want to support 1, 2, or 3 int arguments passed via initializer_list<int> in the bound constructor for a MyType. You could write:
#include <stl.h>
py::class_<MyType>(m, "MyClass")
.def(py::init([](int a) { return new MyClass({ a }); }))
.def(py::init([](int a, int b) { return new MyClass({ a, b }); }))
.def(py::init([](int a, int b, int c) { return new MyClass({ a, b, c }); }))
.def(py::init([](std::vector<int> v) {
if (vals.size() == 1) return new MyClass({ v[0] });
elsif (vals.size() == 2) return new MyClass({ v[0], v[1] });
elsif (vals.size() == 3) return new MyClass({ v[0], v[1], v[2] });
else throw std::runtime_error("Wrong number of ints for a MyClass");
});
where the first three overloads take integer values as arguments and the last one takes a list. (There's no reason you'd have to use both approaches--I'm just doing it for the sake of example).
Both of these are rather gross, and don't scale well, but they exhibit the fundamental issue: each size of an initializer_list needs to be compiled from a different piece of C++ code. And that's why pybind11 can't support it: we'd have to compile different versions of the conversion code for each possible initializer_list argument length--and so either the binary size explodes for any number of arguments that might be used, or there's an arbitrary argument size cut-off beyond which you start getting a fatal error. Neither of those are nice options.
Edit: As for your question specifically about constructors: there's no difference here. The issue is that we can't convert arguments into the required type, and argument conversion is identical whether for a constructor, method, or function.

Why is it that traits for operator overloading require ownership of self? [duplicate]

I made a two element Vector struct and I want to overload the + operator.
I made all my functions and methods take references, rather than values, and I want the + operator to work the same way.
impl Add for Vector {
fn add(&self, other: &Vector) -> Vector {
Vector {
x: self.x + other.x,
y: self.y + other.y,
}
}
}
Depending on which variation I try, I either get lifetime problems or type mismatches. Specifically, the &self argument seems to not get treated as the right type.
I have seen examples with template arguments on impl as well as Add, but they just result in different errors.
I found How can an operator be overloaded for different RHS types and return values? but the code in the answer doesn't work even if I put a use std::ops::Mul; at the top.
I am using rustc 1.0.0-nightly (ed530d7a3 2015-01-16 22:41:16 +0000)
I won't accept "you only have two fields, why use a reference" as an answer; what if I wanted a 100 element struct? I will accept an answer that demonstrates that even with a large struct I should be passing by value, if that is the case (I don't think it is, though.) I am interested in knowing a good rule of thumb for struct size and passing by value vs struct, but that is not the current question.
You need to implement Add on &Vector rather than on Vector.
impl<'a, 'b> Add<&'b Vector> for &'a Vector {
type Output = Vector;
fn add(self, other: &'b Vector) -> Vector {
Vector {
x: self.x + other.x,
y: self.y + other.y,
}
}
}
In its definition, Add::add always takes self by value. But references are types like any other1, so they can implement traits too. When a trait is implemented on a reference type, the type of self is a reference; the reference is passed by value. Normally, passing by value in Rust implies transferring ownership, but when references are passed by value, they're simply copied (or reborrowed/moved if it's a mutable reference), and that doesn't transfer ownership of the referent (because a reference doesn't own its referent in the first place). Considering all this, it makes sense for Add::add (and many other operators) to take self by value: if you need to take ownership of the operands, you can implement Add on structs/enums directly, and if you don't, you can implement Add on references.
Here, self is of type &'a Vector, because that's the type we're implementing Add on.
Note that I also specified the RHS type parameter with a different lifetime to emphasize the fact that the lifetimes of the two input parameters are unrelated.
1 Actually, reference types are special in that you can implement traits for references to types defined in your crate (i.e. if you're allowed to implement a trait for T, then you're also allowed to implement it for &T). &mut T and Box<T> have the same behavior, but that's not true in general for U<T> where U is not defined in the same crate.
If you want to support all scenarios, you must support all the combinations:
&T op U
T op &U
&T op &U
T op U
In rust proper, this was done through an internal macro.
Luckily, there is a rust crate, impl_ops, that also offers a macro to write that boilerplate for us: the crate offers the impl_op_ex! macro, which generates all the combinations.
Here is their sample:
#[macro_use] extern crate impl_ops;
use std::ops;
impl_op_ex!(+ |a: &DonkeyKong, b: &DonkeyKong| -> i32 { a.bananas + b.bananas });
fn main() {
let total_bananas = &DonkeyKong::new(2) + &DonkeyKong::new(4);
assert_eq!(6, total_bananas);
let total_bananas = &DonkeyKong::new(2) + DonkeyKong::new(4);
assert_eq!(6, total_bananas);
let total_bananas = DonkeyKong::new(2) + &DonkeyKong::new(4);
assert_eq!(6, total_bananas);
let total_bananas = DonkeyKong::new(2) + DonkeyKong::new(4);
assert_eq!(6, total_bananas);
}
Even better, they have a impl_op_ex_commutative! that'll also generate the operators with the parameters reversed if your operator happens to be commutative.

Does an equivalent function in OCaml exist that works the same way as "set!" in Scheme?

I'm trying to make a function that defines a vector that varies based on the function's input, and set! works great for this in Scheme. Is there a functional equivalent for this in OCaml?
I agree with sepp2k that you should expand your question, and give more detailed examples.
Maybe what you need are references.
As a rough approximation, you can see them as variables to which you can assign:
let a = ref 5;;
!a;; (* This evaluates to 5 *)
a := 42;;
!a;; (* This evaluates to 42 *)
Here is a more detailed explanation from http://caml.inria.fr/pub/docs/u3-ocaml/ocaml-core.html:
The language we have described so far is purely functional. That is, several evaluations of the same expression will always produce the same answer. This prevents, for instance, the implementation of a counter whose interface is a single function next : unit -> int that increments the counter and returns its new value. Repeated invocation of this function should return a sequence of consecutive integers — a different answer each time.
Indeed, the counter needs to memorize its state in some particular location, with read/write accesses, but before all, some information must be shared between two calls to next. The solution is to use mutable storage and interact with the store by so-called side effects.
In OCaml, the counter could be defined as follows:
let new_count =
let r = ref 0 in
let next () = r := !r+1; !r in
next;;
Another, maybe more concrete, example of mutable storage is a bank account. In OCaml, record fields can be declared mutable, so that new values can be assigned to them later. Hence, a bank account could be a two-field record, its number, and its balance, where the balance is mutable.
type account = { number : int; mutable balance : float }
let retrieve account requested =
let s = min account.balance requested in
account.balance <- account.balance -. s; s;;
In fact, in OCaml, references are not primitive: they are special cases of mutable records. For instance, one could define:
type 'a ref = { mutable content : 'a }
let ref x = { content = x }
let deref r = r.content
let assign r x = r.content <- x; x
set! in Scheme assigns to a variable. You cannot assign to a variable in OCaml, at all. (So "variables" are not really "variable".) So there is no equivalent.
But OCaml is not a pure functional language. It has mutable data structures. The following things can be assigned to:
Array elements
String elements
Mutable fields of records
Mutable fields of objects
In these situations, the <- syntax is used for assignment.
The ref type mentioned by #jrouquie is a simple, built-in mutable record type that acts as a mutable container of one thing. OCaml also provides ! and := operators for working with refs.

Resources