This question already has an answer here:
Why does pattern matching on &Option<T> yield something of type Some(&T)?
(1 answer)
Closed 3 years ago.
I have the following piece of code, which compiles using rustc v1.36:
enum Number {
Integer(i32),
Real(f32),
}
fn foo1(number: &mut Number) {
if let Number::Integer(n) = number {
let _y: &mut i32 = n;
}
}
fn foo2(number: &mut Number) {
if let &mut Number::Integer(ref mut n) = number {
let _y: &mut i32 = n;
}
}
Funny enough, I can understand how 'foo2' does the matching, but not so for 'foo1', while 'foo1' is the kind of code you will see in any Rust project. Can someone explain how the matching syntax in these 2 is equivalent? And thus it extend to other code (structures?) as well?
This functionality was added in Rust 1.26, and is called 'default binding modes' (or 'match ergonomics', after the RFC that proposed it). It effectively allows pattern matching to automatically dereference values, and to add ref and ref mut to variable bindings where needed.
The rules for this behaviour are discussed in detail in the RFC, but it effectively boils down to:
Variable bindings within a pattern can be resolved in one of three modes:
'move' (the default), which will move the value.
'ref', which will immutably reference the value.
'ref mut', which will mutably reference the value.
When a variable binding is encountered within a pattern without an explicit ref, mut or ref mut, the current binding mode will be used.
When a reference is pattern matched using a non-reference pattern:
The value will be auto-dereferenced.
The binding mode may change for any nested patterns:
If the type of the reference is &T, the binding mode will change to 'ref'.
If the type of the reference is &mut T and the current binding mode is not 'ref', the binding mode will change to 'ref mut'.
This may sound complicated, but as you can see from the end result, it tends to line up with how you'd intuitively write the match!
Related
This question already has answers here:
Why are explicit lifetimes needed in Rust?
(10 answers)
Semantics of lifetime parameters
(2 answers)
Closed 6 months ago.
I am a beginner in rust, following rust-lang/book.
In it's ch10.3. Validating References with Lifetimes there is a Listing 10-20:
fn main() {
let string1 = String::from("abcd");
let string2 = "xyz";
let result = longest(string1.as_str(), string2);
println!("The longest string is {}", result);
}
fn longest(x: &str, y: &str) -> &str { // <-- ERROR
if x.len() > y.len() {
x
} else {
y
}
}
There are two points they have mentioned :
Rust can’t tell whether the reference being returned refers to x or y. // <-- no need, according to me
We also don’t know the concrete lifetimes of the references that will be passed in, to determine whether the reference we return will always be valid.
In the code below, their is no error (as expected) :
fn main() {
let string1 = String::from("abcd") ;
let string2 = "xyz";
let x: &str = &string1.as_str();
let y: &str = &string2;
let result =
if x.len() > y.len() {
x
} else {
y
};
println!("The longest string is {}", result);
}
Confusion :
Why Rust need to tell whether the reference being returned refers to x or y ?
Silly question, but I want to know...
Edited
Solution :
Suppose that function call is call by customer, and
function as the seller
In snippet one,
Then, function call expect that it will get one of the value, passed in argument, in return (as in snippet one)
But, if seller is biased or accidently give value other than parameters. like -
fn longest(x: &str, y: &str) -> &str {
let z = "Other String";
&z
}
Then, both function call and function both will get error message
But, their is no any mistake of customer.
Therefore, Rust ensure that customer will not get any error, for the mistake of seller, with the help of annotating lifetime parameter.
This is also the reason of, "Why Typescript introduced in Javascript".
In snippet two,
Both customer and seller is the same function
The related question, mentioned below
Why are explicit lifetimes needed in Rust?
In the second snippet, the lifetime used is the shorter of x and y.
But Rust does not do lifetime inference (or any inference at all) across function boundaries. It always requires you to specify explicitly the types and lifetimes involved. Thus, the lifetime that was inferred in the second snippet needs to be specified explicitly in the first.
The most important reason for that is to avoid unintentional breakage. If functions' type would be inferred it would be too easy to break APIs accidentally. Thus Rust by design requires you to specify signatures explicitly.
First Case
Suppose that Rust didn't give an error with your definition of longest(). Then it's possible to use longest() such that the returned address is stored in a variable that has a longer lifetime than the string slices passed in. For example, something like this:
let result: &str;
{
let x = String::from("welcome");
let y = String::from("bye");
result = longest(&x, &y);
} // `x` and `y` go out of scope, so `&x` and `&y` are no longer valid.
// This would be undefined behavior, because the data pointed to
// by `result` is no longer valid.
println!("result: {}", result);
Since result is used after x and y go out of scope, and result points to the data in either x or y, this would lead to undefined behavior. But Rust doesn't allow this; instead, the Rust compiler forces you to make the returned value of longest() has a sufficiently long lifetime.
So if the compiler didn't give an error with how you wrote longest(), then yes in your example there wouldn't be undefined behavior (because x, y, and result all have the same lifetime), but in general certain invocations of longest() and variables subsequently going out of scope could lead to undefined behavior. So to prevent this, Rust forces you to annotate the lifetimes to make sure the returned address has a long enough lifetime.
Second Case
The variables x, y, and result are all cleaned up at the same time when they go out of scope. So the address referenced by result is always valid whether it's the address of x or the address of y. So there's no error.
I am having trouble trying to understand pattern matching rules in Rust. I originally thought that the idea behind patterns are to match the left-hand side and right-hand side like so:
struct S {
x: i32,
y: (i32, i32)
}
let S { x: a, y: (b, c) } = S { x: 1, y: (2, 3) };
// `a` matches `1`, `(b, c)` matches `(2, 3)`
However, when we want to bind a reference to a value on the right-hand side, we need to use the ref keyword.
let &(ref a, ref b) = &(3, 4);
This feels rather inconsistent.
Why can't we use the dereferencing operator * to match the left-hand side and right-hand side like this?
let &(*a, *b) = &(3, 4);
// `*a` matches `3`, `*b` matches `4`
Why isn't this the way patterns work in Rust? Is there a reason why this isn't the case, or have I totally misunderstood something?
Using the dereferencing operator would be very confusing in this case. ref effectively takes a reference to the value. These are more-or-less equivalent:
let bar1 = &42;
let ref bar2 = 42;
Note that in let &(ref a, ref b) = &(3, 4), a and b both have the type &i32 — they are references. Also note that since match ergonomics, let (a, b) = &(3, 4) is the same and shorter.
Furthermore, the ampersand (&) and asterisk (*) symbols are used for types. As you mention, pattern matching wants to "line up" the value with the pattern. The ampersand is already used to match and remove one layer of references in patterns:
let foo: &i32 = &42;
match foo {
&v => println!("{}", v),
}
By analogy, it's possible that some variant of this syntax might be supported in the future for raw pointers:
let foo: *const i32 = std::ptr::null();
match foo {
*v => println!("{}", v),
}
Since both ampersand and asterisk could be used to remove one layer of reference/pointer, they cannot be used to add one layer. Thus some new keyword was needed and ref was chosen.
See also:
Meaning of '&variable' in arguments/patterns
What is the syntax to match on a reference to an enum?
How can the ref keyword be avoided when pattern matching in a function taking &self or &mut self?
How does Rust pattern matching determine if the bound variable will be a reference or a value?
Why does pattern matching on &Option<T> yield something of type Some(&T)?
In this specific case, you can achieve the same with neither ref nor asterisk:
fn main() {
let (a, b) = &(3, 4);
show_type_name(a);
show_type_name(b);
}
fn show_type_name<T>(_: T) {
println!("{}", std::any::type_name::<T>()); // rust 1.38.0 and above
}
It shows both a and b to be of type &i32. This ergonomics feature is called binding modes.
But it still doesn't answer the question of why ref pattern in the first place. I don't think there is a definite answer to that. The syntax simply settled on what it is now regarding identifier patterns.
I have seen the # symbol used in macros but I cannot find mention of it in the Rust Book or in any official documentation or blog posts. For example, in this Stack Overflow answer it is used like this:
macro_rules! instructions {
(enum $ename:ident {
$($vname:ident ( $($vty: ty),* )),*
}) => {
enum $ename {
$($vname ( $($vty),* )),*
}
impl $ename {
fn len(&self) -> usize {
match self {
$($ename::$vname(..) => instructions!(#count ($($vty),*))),*
}
}
}
};
(#count ()) => (0);
(#count ($a:ty)) => (1);
(#count ($a:ty, $b:ty)) => (2);
(#count ($a:ty, $b:ty, $c:ty)) => (3);
}
instructions! {
enum Instruction {
None(),
One(u8),
Two(u8, u8),
Three(u8, u8, u8)
}
}
fn main() {
println!("{}", Instruction::None().len());
println!("{}", Instruction::One(1).len());
println!("{}", Instruction::Two(1, 2).len());
println!("{}", Instruction::Three(1, 2, 3).len());
}
From the usage, it appears that it is used for declaring another macro that is local to the main one.
What does this symbol mean and why would you use it rather than just creating another top-level macro?
In the pattern-matching part of a macro, symbols can mean whatever the author desires them to mean. A leading symbol # is often used to denote an "implementation detail" of the macro — a part of the macro that an external user is not expected to use.
In this example, I used it to pattern-match the tuple parameters to get a count of the tuple parameters.
Outside of macros, the # symbol is used to match a pattern while also assigning a name to the entire pattern:
match age {
x # 0 => println!("0: {}", x),
y # 1 => println!("1: {}", y),
z => println!("{}", z),
}
With a bit of a stretch, this same logic can be applied to the use in the macro — we are pattern-matching the tuple, but also attaching a name to that specific pattern. I think I've even seen people use something even more parallel: (count # .... However, The Little Book of Rust Macros points out:
The reason for using # is that, as of Rust 1.2, the # token is not used in prefix position; as such, it cannot conflict with anything. Other symbols or unique prefixes may be used as desired, but use of # has started to become widespread, so using it may aid readers in understanding your code.
rather than just creating another top-level macro
Creating another macro is likely better practice, but only in modern Rust. Before recent changes to Rust that made it so you could import macros directly, having multiple macros could be tricky for end users who tried to selectively import macros.
See also:
Internal rules in The Little Book of Rust Macros
Why must I use macros only used by my dependencies
What does the '#' symbol do in Rust?
Appendix B: Operators and Symbols
I've seen a few other questions and answers stating that let _ = foo() destroys the result at the end of the statement rather than at scope exit, which is what let _a = foo() does.
I am unable to find any official description of this, nor any rationale for this syntax.
I'm interested in a few inter-twined things:
Is there even a mention of it in the official documentation?
What is the history behind this choice? Is it simply natural fall-out from Rust's binding / destructuring rules? Is it something inherited from another language? Or does it have some other origin?
Is there some use-case this syntax addresses that could not have been achieved using explicit scoping?
Is it simply natural fall-out from Rust's binding / destructuring rules?
Yes. You use _ to indicate that you don't care about a value in a pattern and that it should not be bound in the first place. If a value is never bound to a variable, there's nothing to hold on to the value, so it must be dropped.
All the Places Patterns Can Be Used:
match Arms
Conditional if let Expressions
while let Conditional Loops
for Loops
let Statements
Function Parameters
Is there even a mention of it in the official documentation?
Ignoring an Entire Value with _
Of note is that _ isn't a valid identifier, thus you can't use it as a name:
fn main() {
let _ = 42;
println!("{}", _);
}
error: expected expression, found reserved identifier `_`
--> src/main.rs:3:20
|
3 | println!("{}", _);
| ^ expected expression
achieved using explicit scoping
I suppose you could have gone this route and made expressions doing this just "hang around" until the scope was over, but I don't see any value to it:
let _ = vec![5];
vec![5]; // Equivalent
// Gotta wait for the scope to end to clean these up, or call `drop` explicitly
The only reason that you'd use let _ = foo() is when the function requires that you use its result, and you know that you don't need it. Otherwise, this:
let _ = foo();
is exactly the same as this:
foo();
For example, suppose foo has a signature like this:
fn foo() -> Result<String, ()>;
You will get a warning if you don't use the result, because Result has the #[must_use] attribute. Destructuring and ignoring the result immediately is a concise way of avoiding this warning in cases where you know it's ok, without introducing a new variable that lasts for the full scope.
If you didn't pattern match against the result then the value would be dropped as soon as the foo function returns. It seems reasonable that Rust would behave the same regardless of whether you explicitly said you don't want it or just didn't use it.
I made a two element Vector struct and I want to overload the + operator.
I made all my functions and methods take references, rather than values, and I want the + operator to work the same way.
impl Add for Vector {
fn add(&self, other: &Vector) -> Vector {
Vector {
x: self.x + other.x,
y: self.y + other.y,
}
}
}
Depending on which variation I try, I either get lifetime problems or type mismatches. Specifically, the &self argument seems to not get treated as the right type.
I have seen examples with template arguments on impl as well as Add, but they just result in different errors.
I found How can an operator be overloaded for different RHS types and return values? but the code in the answer doesn't work even if I put a use std::ops::Mul; at the top.
I am using rustc 1.0.0-nightly (ed530d7a3 2015-01-16 22:41:16 +0000)
I won't accept "you only have two fields, why use a reference" as an answer; what if I wanted a 100 element struct? I will accept an answer that demonstrates that even with a large struct I should be passing by value, if that is the case (I don't think it is, though.) I am interested in knowing a good rule of thumb for struct size and passing by value vs struct, but that is not the current question.
You need to implement Add on &Vector rather than on Vector.
impl<'a, 'b> Add<&'b Vector> for &'a Vector {
type Output = Vector;
fn add(self, other: &'b Vector) -> Vector {
Vector {
x: self.x + other.x,
y: self.y + other.y,
}
}
}
In its definition, Add::add always takes self by value. But references are types like any other1, so they can implement traits too. When a trait is implemented on a reference type, the type of self is a reference; the reference is passed by value. Normally, passing by value in Rust implies transferring ownership, but when references are passed by value, they're simply copied (or reborrowed/moved if it's a mutable reference), and that doesn't transfer ownership of the referent (because a reference doesn't own its referent in the first place). Considering all this, it makes sense for Add::add (and many other operators) to take self by value: if you need to take ownership of the operands, you can implement Add on structs/enums directly, and if you don't, you can implement Add on references.
Here, self is of type &'a Vector, because that's the type we're implementing Add on.
Note that I also specified the RHS type parameter with a different lifetime to emphasize the fact that the lifetimes of the two input parameters are unrelated.
1 Actually, reference types are special in that you can implement traits for references to types defined in your crate (i.e. if you're allowed to implement a trait for T, then you're also allowed to implement it for &T). &mut T and Box<T> have the same behavior, but that's not true in general for U<T> where U is not defined in the same crate.
If you want to support all scenarios, you must support all the combinations:
&T op U
T op &U
&T op &U
T op U
In rust proper, this was done through an internal macro.
Luckily, there is a rust crate, impl_ops, that also offers a macro to write that boilerplate for us: the crate offers the impl_op_ex! macro, which generates all the combinations.
Here is their sample:
#[macro_use] extern crate impl_ops;
use std::ops;
impl_op_ex!(+ |a: &DonkeyKong, b: &DonkeyKong| -> i32 { a.bananas + b.bananas });
fn main() {
let total_bananas = &DonkeyKong::new(2) + &DonkeyKong::new(4);
assert_eq!(6, total_bananas);
let total_bananas = &DonkeyKong::new(2) + DonkeyKong::new(4);
assert_eq!(6, total_bananas);
let total_bananas = DonkeyKong::new(2) + &DonkeyKong::new(4);
assert_eq!(6, total_bananas);
let total_bananas = DonkeyKong::new(2) + DonkeyKong::new(4);
assert_eq!(6, total_bananas);
}
Even better, they have a impl_op_ex_commutative! that'll also generate the operators with the parameters reversed if your operator happens to be commutative.