I am having trouble trying to understand pattern matching rules in Rust. I originally thought that the idea behind patterns are to match the left-hand side and right-hand side like so:
struct S {
x: i32,
y: (i32, i32)
}
let S { x: a, y: (b, c) } = S { x: 1, y: (2, 3) };
// `a` matches `1`, `(b, c)` matches `(2, 3)`
However, when we want to bind a reference to a value on the right-hand side, we need to use the ref keyword.
let &(ref a, ref b) = &(3, 4);
This feels rather inconsistent.
Why can't we use the dereferencing operator * to match the left-hand side and right-hand side like this?
let &(*a, *b) = &(3, 4);
// `*a` matches `3`, `*b` matches `4`
Why isn't this the way patterns work in Rust? Is there a reason why this isn't the case, or have I totally misunderstood something?
Using the dereferencing operator would be very confusing in this case. ref effectively takes a reference to the value. These are more-or-less equivalent:
let bar1 = &42;
let ref bar2 = 42;
Note that in let &(ref a, ref b) = &(3, 4), a and b both have the type &i32 — they are references. Also note that since match ergonomics, let (a, b) = &(3, 4) is the same and shorter.
Furthermore, the ampersand (&) and asterisk (*) symbols are used for types. As you mention, pattern matching wants to "line up" the value with the pattern. The ampersand is already used to match and remove one layer of references in patterns:
let foo: &i32 = &42;
match foo {
&v => println!("{}", v),
}
By analogy, it's possible that some variant of this syntax might be supported in the future for raw pointers:
let foo: *const i32 = std::ptr::null();
match foo {
*v => println!("{}", v),
}
Since both ampersand and asterisk could be used to remove one layer of reference/pointer, they cannot be used to add one layer. Thus some new keyword was needed and ref was chosen.
See also:
Meaning of '&variable' in arguments/patterns
What is the syntax to match on a reference to an enum?
How can the ref keyword be avoided when pattern matching in a function taking &self or &mut self?
How does Rust pattern matching determine if the bound variable will be a reference or a value?
Why does pattern matching on &Option<T> yield something of type Some(&T)?
In this specific case, you can achieve the same with neither ref nor asterisk:
fn main() {
let (a, b) = &(3, 4);
show_type_name(a);
show_type_name(b);
}
fn show_type_name<T>(_: T) {
println!("{}", std::any::type_name::<T>()); // rust 1.38.0 and above
}
It shows both a and b to be of type &i32. This ergonomics feature is called binding modes.
But it still doesn't answer the question of why ref pattern in the first place. I don't think there is a definite answer to that. The syntax simply settled on what it is now regarding identifier patterns.
Related
This question already has answers here:
Why are explicit lifetimes needed in Rust?
(10 answers)
Semantics of lifetime parameters
(2 answers)
Closed 6 months ago.
I am a beginner in rust, following rust-lang/book.
In it's ch10.3. Validating References with Lifetimes there is a Listing 10-20:
fn main() {
let string1 = String::from("abcd");
let string2 = "xyz";
let result = longest(string1.as_str(), string2);
println!("The longest string is {}", result);
}
fn longest(x: &str, y: &str) -> &str { // <-- ERROR
if x.len() > y.len() {
x
} else {
y
}
}
There are two points they have mentioned :
Rust can’t tell whether the reference being returned refers to x or y. // <-- no need, according to me
We also don’t know the concrete lifetimes of the references that will be passed in, to determine whether the reference we return will always be valid.
In the code below, their is no error (as expected) :
fn main() {
let string1 = String::from("abcd") ;
let string2 = "xyz";
let x: &str = &string1.as_str();
let y: &str = &string2;
let result =
if x.len() > y.len() {
x
} else {
y
};
println!("The longest string is {}", result);
}
Confusion :
Why Rust need to tell whether the reference being returned refers to x or y ?
Silly question, but I want to know...
Edited
Solution :
Suppose that function call is call by customer, and
function as the seller
In snippet one,
Then, function call expect that it will get one of the value, passed in argument, in return (as in snippet one)
But, if seller is biased or accidently give value other than parameters. like -
fn longest(x: &str, y: &str) -> &str {
let z = "Other String";
&z
}
Then, both function call and function both will get error message
But, their is no any mistake of customer.
Therefore, Rust ensure that customer will not get any error, for the mistake of seller, with the help of annotating lifetime parameter.
This is also the reason of, "Why Typescript introduced in Javascript".
In snippet two,
Both customer and seller is the same function
The related question, mentioned below
Why are explicit lifetimes needed in Rust?
In the second snippet, the lifetime used is the shorter of x and y.
But Rust does not do lifetime inference (or any inference at all) across function boundaries. It always requires you to specify explicitly the types and lifetimes involved. Thus, the lifetime that was inferred in the second snippet needs to be specified explicitly in the first.
The most important reason for that is to avoid unintentional breakage. If functions' type would be inferred it would be too easy to break APIs accidentally. Thus Rust by design requires you to specify signatures explicitly.
First Case
Suppose that Rust didn't give an error with your definition of longest(). Then it's possible to use longest() such that the returned address is stored in a variable that has a longer lifetime than the string slices passed in. For example, something like this:
let result: &str;
{
let x = String::from("welcome");
let y = String::from("bye");
result = longest(&x, &y);
} // `x` and `y` go out of scope, so `&x` and `&y` are no longer valid.
// This would be undefined behavior, because the data pointed to
// by `result` is no longer valid.
println!("result: {}", result);
Since result is used after x and y go out of scope, and result points to the data in either x or y, this would lead to undefined behavior. But Rust doesn't allow this; instead, the Rust compiler forces you to make the returned value of longest() has a sufficiently long lifetime.
So if the compiler didn't give an error with how you wrote longest(), then yes in your example there wouldn't be undefined behavior (because x, y, and result all have the same lifetime), but in general certain invocations of longest() and variables subsequently going out of scope could lead to undefined behavior. So to prevent this, Rust forces you to annotate the lifetimes to make sure the returned address has a long enough lifetime.
Second Case
The variables x, y, and result are all cleaned up at the same time when they go out of scope. So the address referenced by result is always valid whether it's the address of x or the address of y. So there's no error.
This question already has an answer here:
Why does pattern matching on &Option<T> yield something of type Some(&T)?
(1 answer)
Closed 3 years ago.
I have the following piece of code, which compiles using rustc v1.36:
enum Number {
Integer(i32),
Real(f32),
}
fn foo1(number: &mut Number) {
if let Number::Integer(n) = number {
let _y: &mut i32 = n;
}
}
fn foo2(number: &mut Number) {
if let &mut Number::Integer(ref mut n) = number {
let _y: &mut i32 = n;
}
}
Funny enough, I can understand how 'foo2' does the matching, but not so for 'foo1', while 'foo1' is the kind of code you will see in any Rust project. Can someone explain how the matching syntax in these 2 is equivalent? And thus it extend to other code (structures?) as well?
This functionality was added in Rust 1.26, and is called 'default binding modes' (or 'match ergonomics', after the RFC that proposed it). It effectively allows pattern matching to automatically dereference values, and to add ref and ref mut to variable bindings where needed.
The rules for this behaviour are discussed in detail in the RFC, but it effectively boils down to:
Variable bindings within a pattern can be resolved in one of three modes:
'move' (the default), which will move the value.
'ref', which will immutably reference the value.
'ref mut', which will mutably reference the value.
When a variable binding is encountered within a pattern without an explicit ref, mut or ref mut, the current binding mode will be used.
When a reference is pattern matched using a non-reference pattern:
The value will be auto-dereferenced.
The binding mode may change for any nested patterns:
If the type of the reference is &T, the binding mode will change to 'ref'.
If the type of the reference is &mut T and the current binding mode is not 'ref', the binding mode will change to 'ref mut'.
This may sound complicated, but as you can see from the end result, it tends to line up with how you'd intuitively write the match!
So I have some json response content represented as string and I want to get its property names.
What I am doing
let properties = Newtonsoft.Json.Linq.JObject.Parse(responseContent).Properties()
let propertyNames, (jprop: JProperty) = properties.Select(jprop => jprop.Name);
According to this answer I needed to annotate the call to the extension method, however, I still get the error.
A unique overload for method 'Select' could not be determined based on type information prior to this program point. A type annotation may be needed. Candidates: (extension) Collections.Generic.IEnumerable.Select<'TSource,'TResult>(selector: Func<'TSource,'TResult>) : Collections.Generic.IEnumerable<'TResult>, (extension) Collections.Generic.IEnumerable.Select<'TSource,'TResult>(selector: Func<'TSource,int,'TResult>) : Collections.Generic.IEnumerable<'TResult>
Am I doing something wrong?
First, the syntax x => y you're trying to use is C# syntax for lambda expressions, not F# syntax. In F#, the correct syntax for lambda-expressions is fun x -> y.
Second, the syntax let a, b = c means "destructure the pair". For example:
let pair = (42, "foo")
let a, b = pair // Here, a = 42 and b = "foo"
You can provide a type annotation for one of the pair elements:
let a, (b: string) = pair
But this won't have any effect on pair the way you apparently expect it to work.
In order to provide type annotation for the argument of a lambda expression, just annotate the argument, what could be simpler?
fun (x: string) -> y
So, putting all of the above together, this is how your line should look:
let propertyNames = properties.Select(fun (jprop: JProperty) -> jprop.Name)
(also, note the absence of semicolon at the end. F# doesn't require semicolons)
If you have this level of difficulty with basic syntax, I suggest you read up on F# and work your way through a few examples before trying to implement something complex.
I made a two element Vector struct and I want to overload the + operator.
I made all my functions and methods take references, rather than values, and I want the + operator to work the same way.
impl Add for Vector {
fn add(&self, other: &Vector) -> Vector {
Vector {
x: self.x + other.x,
y: self.y + other.y,
}
}
}
Depending on which variation I try, I either get lifetime problems or type mismatches. Specifically, the &self argument seems to not get treated as the right type.
I have seen examples with template arguments on impl as well as Add, but they just result in different errors.
I found How can an operator be overloaded for different RHS types and return values? but the code in the answer doesn't work even if I put a use std::ops::Mul; at the top.
I am using rustc 1.0.0-nightly (ed530d7a3 2015-01-16 22:41:16 +0000)
I won't accept "you only have two fields, why use a reference" as an answer; what if I wanted a 100 element struct? I will accept an answer that demonstrates that even with a large struct I should be passing by value, if that is the case (I don't think it is, though.) I am interested in knowing a good rule of thumb for struct size and passing by value vs struct, but that is not the current question.
You need to implement Add on &Vector rather than on Vector.
impl<'a, 'b> Add<&'b Vector> for &'a Vector {
type Output = Vector;
fn add(self, other: &'b Vector) -> Vector {
Vector {
x: self.x + other.x,
y: self.y + other.y,
}
}
}
In its definition, Add::add always takes self by value. But references are types like any other1, so they can implement traits too. When a trait is implemented on a reference type, the type of self is a reference; the reference is passed by value. Normally, passing by value in Rust implies transferring ownership, but when references are passed by value, they're simply copied (or reborrowed/moved if it's a mutable reference), and that doesn't transfer ownership of the referent (because a reference doesn't own its referent in the first place). Considering all this, it makes sense for Add::add (and many other operators) to take self by value: if you need to take ownership of the operands, you can implement Add on structs/enums directly, and if you don't, you can implement Add on references.
Here, self is of type &'a Vector, because that's the type we're implementing Add on.
Note that I also specified the RHS type parameter with a different lifetime to emphasize the fact that the lifetimes of the two input parameters are unrelated.
1 Actually, reference types are special in that you can implement traits for references to types defined in your crate (i.e. if you're allowed to implement a trait for T, then you're also allowed to implement it for &T). &mut T and Box<T> have the same behavior, but that's not true in general for U<T> where U is not defined in the same crate.
If you want to support all scenarios, you must support all the combinations:
&T op U
T op &U
&T op &U
T op U
In rust proper, this was done through an internal macro.
Luckily, there is a rust crate, impl_ops, that also offers a macro to write that boilerplate for us: the crate offers the impl_op_ex! macro, which generates all the combinations.
Here is their sample:
#[macro_use] extern crate impl_ops;
use std::ops;
impl_op_ex!(+ |a: &DonkeyKong, b: &DonkeyKong| -> i32 { a.bananas + b.bananas });
fn main() {
let total_bananas = &DonkeyKong::new(2) + &DonkeyKong::new(4);
assert_eq!(6, total_bananas);
let total_bananas = &DonkeyKong::new(2) + DonkeyKong::new(4);
assert_eq!(6, total_bananas);
let total_bananas = DonkeyKong::new(2) + &DonkeyKong::new(4);
assert_eq!(6, total_bananas);
let total_bananas = DonkeyKong::new(2) + DonkeyKong::new(4);
assert_eq!(6, total_bananas);
}
Even better, they have a impl_op_ex_commutative! that'll also generate the operators with the parameters reversed if your operator happens to be commutative.
Ok, so let's say I have a type defined like so:
type Foo =
| Bar of (SomeType * SomeType * SomeType * SomeType)
| ...(other defs)
so I have a Bar, that is basically a tuple of 4 SomeTypes. I want to access individual members of the tuple. I tried this:
let Bar (one, two, three, four) = someBar
But when I try to refer to one, or two later on in the function it says that "the value or constructor is not defined" So it is not treating the assignment as expected. What is the correct way to do this?
Also, if i try:
let one,two,three,four = someBar
It complains:
someBar was expected to have type 'a*'b*'c*'d but here has type Foo
thanks,
You just need to add another set of parentheses:
let (Bar(one,two,three,four)) = someBar
As Stephen points out, without the additional parens the compiler treats this line of code as the definition of a new function called Bar. He is also right that pattern matching would probably be more appropriate if there are other cases in the discriminated union.
Given
type Foo =
| Bar of (int * int * int * int)
| Bar2 of string
let x = Bar(1,2,3,4)
let Bar(y1,y2,y3,y4) = x
the last let binding is interpreted as a function, Bar : 'a * 'b * 'c * 'd -> Foo. The function name is throwing you off, since it is the same as your union case, but it's the same as if you had defined let some_func_takes_a_tuple_and_returns_x (y1,y2,y3,y4) = x.
I think you may have to be a little more verbose:
let y1,y2,y3,y4 =
match x with
| Bar(y1,y2,y3,y4) -> y1,y2,y3,y4
Which is fair enough, since unlike tuple decomposition let bindings, decomposing Bar here is dangerous because the match is incomplete (x could actually be some other Foo case, like Bar2).
Edit
#kvb knows the secret to making this work as you expect!