What does an # symbol mean in a Rust declarative macro? - syntax

I have seen the # symbol used in macros but I cannot find mention of it in the Rust Book or in any official documentation or blog posts. For example, in this Stack Overflow answer it is used like this:
macro_rules! instructions {
(enum $ename:ident {
$($vname:ident ( $($vty: ty),* )),*
}) => {
enum $ename {
$($vname ( $($vty),* )),*
}
impl $ename {
fn len(&self) -> usize {
match self {
$($ename::$vname(..) => instructions!(#count ($($vty),*))),*
}
}
}
};
(#count ()) => (0);
(#count ($a:ty)) => (1);
(#count ($a:ty, $b:ty)) => (2);
(#count ($a:ty, $b:ty, $c:ty)) => (3);
}
instructions! {
enum Instruction {
None(),
One(u8),
Two(u8, u8),
Three(u8, u8, u8)
}
}
fn main() {
println!("{}", Instruction::None().len());
println!("{}", Instruction::One(1).len());
println!("{}", Instruction::Two(1, 2).len());
println!("{}", Instruction::Three(1, 2, 3).len());
}
From the usage, it appears that it is used for declaring another macro that is local to the main one.
What does this symbol mean and why would you use it rather than just creating another top-level macro?

In the pattern-matching part of a macro, symbols can mean whatever the author desires them to mean. A leading symbol # is often used to denote an "implementation detail" of the macro — a part of the macro that an external user is not expected to use.
In this example, I used it to pattern-match the tuple parameters to get a count of the tuple parameters.
Outside of macros, the # symbol is used to match a pattern while also assigning a name to the entire pattern:
match age {
x # 0 => println!("0: {}", x),
y # 1 => println!("1: {}", y),
z => println!("{}", z),
}
With a bit of a stretch, this same logic can be applied to the use in the macro — we are pattern-matching the tuple, but also attaching a name to that specific pattern. I think I've even seen people use something even more parallel: (count # .... However, The Little Book of Rust Macros points out:
The reason for using # is that, as of Rust 1.2, the # token is not used in prefix position; as such, it cannot conflict with anything. Other symbols or unique prefixes may be used as desired, but use of # has started to become widespread, so using it may aid readers in understanding your code.
rather than just creating another top-level macro
Creating another macro is likely better practice, but only in modern Rust. Before recent changes to Rust that made it so you could import macros directly, having multiple macros could be tricky for end users who tried to selectively import macros.
See also:
Internal rules in The Little Book of Rust Macros
Why must I use macros only used by my dependencies
What does the '#' symbol do in Rust?
Appendix B: Operators and Symbols

Related

Confusion in Validating References with Lifetimes in Rust [duplicate]

This question already has answers here:
Why are explicit lifetimes needed in Rust?
(10 answers)
Semantics of lifetime parameters
(2 answers)
Closed 6 months ago.
I am a beginner in rust, following rust-lang/book.
In it's ch10.3. Validating References with Lifetimes there is a Listing 10-20:
fn main() {
let string1 = String::from("abcd");
let string2 = "xyz";
let result = longest(string1.as_str(), string2);
println!("The longest string is {}", result);
}
fn longest(x: &str, y: &str) -> &str { // <-- ERROR
if x.len() > y.len() {
x
} else {
y
}
}
There are two points they have mentioned :
Rust can’t tell whether the reference being returned refers to x or y. // <-- no need, according to me
We also don’t know the concrete lifetimes of the references that will be passed in, to determine whether the reference we return will always be valid.
In the code below, their is no error (as expected) :
fn main() {
let string1 = String::from("abcd") ;
let string2 = "xyz";
let x: &str = &string1.as_str();
let y: &str = &string2;
let result =
if x.len() > y.len() {
x
} else {
y
};
println!("The longest string is {}", result);
}
Confusion :
Why Rust need to tell whether the reference being returned refers to x or y ?
Silly question, but I want to know...
Edited
Solution :
Suppose that function call is call by customer, and
function as the seller
In snippet one,
Then, function call expect that it will get one of the value, passed in argument, in return (as in snippet one)
But, if seller is biased or accidently give value other than parameters. like -
fn longest(x: &str, y: &str) -> &str {
let z = "Other String";
&z
}
Then, both function call and function both will get error message
But, their is no any mistake of customer.
Therefore, Rust ensure that customer will not get any error, for the mistake of seller, with the help of annotating lifetime parameter.
This is also the reason of, "Why Typescript introduced in Javascript".
In snippet two,
Both customer and seller is the same function
The related question, mentioned below
Why are explicit lifetimes needed in Rust?
In the second snippet, the lifetime used is the shorter of x and y.
But Rust does not do lifetime inference (or any inference at all) across function boundaries. It always requires you to specify explicitly the types and lifetimes involved. Thus, the lifetime that was inferred in the second snippet needs to be specified explicitly in the first.
The most important reason for that is to avoid unintentional breakage. If functions' type would be inferred it would be too easy to break APIs accidentally. Thus Rust by design requires you to specify signatures explicitly.
First Case
Suppose that Rust didn't give an error with your definition of longest(). Then it's possible to use longest() such that the returned address is stored in a variable that has a longer lifetime than the string slices passed in. For example, something like this:
let result: &str;
{
let x = String::from("welcome");
let y = String::from("bye");
result = longest(&x, &y);
} // `x` and `y` go out of scope, so `&x` and `&y` are no longer valid.
// This would be undefined behavior, because the data pointed to
// by `result` is no longer valid.
println!("result: {}", result);
Since result is used after x and y go out of scope, and result points to the data in either x or y, this would lead to undefined behavior. But Rust doesn't allow this; instead, the Rust compiler forces you to make the returned value of longest() has a sufficiently long lifetime.
So if the compiler didn't give an error with how you wrote longest(), then yes in your example there wouldn't be undefined behavior (because x, y, and result all have the same lifetime), but in general certain invocations of longest() and variables subsequently going out of scope could lead to undefined behavior. So to prevent this, Rust forces you to annotate the lifetimes to make sure the returned address has a long enough lifetime.
Second Case
The variables x, y, and result are all cleaned up at the same time when they go out of scope. So the address referenced by result is always valid whether it's the address of x or the address of y. So there's no error.

Matching borrowed enum - why is this syntax equivalent? [duplicate]

This question already has an answer here:
Why does pattern matching on &Option<T> yield something of type Some(&T)?
(1 answer)
Closed 3 years ago.
I have the following piece of code, which compiles using rustc v1.36:
enum Number {
Integer(i32),
Real(f32),
}
fn foo1(number: &mut Number) {
if let Number::Integer(n) = number {
let _y: &mut i32 = n;
}
}
fn foo2(number: &mut Number) {
if let &mut Number::Integer(ref mut n) = number {
let _y: &mut i32 = n;
}
}
Funny enough, I can understand how 'foo2' does the matching, but not so for 'foo1', while 'foo1' is the kind of code you will see in any Rust project. Can someone explain how the matching syntax in these 2 is equivalent? And thus it extend to other code (structures?) as well?
This functionality was added in Rust 1.26, and is called 'default binding modes' (or 'match ergonomics', after the RFC that proposed it). It effectively allows pattern matching to automatically dereference values, and to add ref and ref mut to variable bindings where needed.
The rules for this behaviour are discussed in detail in the RFC, but it effectively boils down to:
Variable bindings within a pattern can be resolved in one of three modes:
'move' (the default), which will move the value.
'ref', which will immutably reference the value.
'ref mut', which will mutably reference the value.
When a variable binding is encountered within a pattern without an explicit ref, mut or ref mut, the current binding mode will be used.
When a reference is pattern matched using a non-reference pattern:
The value will be auto-dereferenced.
The binding mode may change for any nested patterns:
If the type of the reference is &T, the binding mode will change to 'ref'.
If the type of the reference is &mut T and the current binding mode is not 'ref', the binding mode will change to 'ref mut'.
This may sound complicated, but as you can see from the end result, it tends to line up with how you'd intuitively write the match!

Why is there no semicolon after a macro call?

I am following some tutorial I found on Rust, and I ran across something that my Java/C/C++ mind cannot comprehend:
impl fmt::Display for Matrix {
fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
write!(f, "({}, {})\n({}, {})", self.0, self.1, self.2, self.3)
}
}
I don't understand the lack of semicolon at the end of the write! macro call. I get an error from the compiler if I add it.
I am guessing that if the semicolon is not there then the Result from write! is used as return value of fmt, but can anybody provide a more specific explanation to why that is and if it always applies?
The write macro uses the write_fmt() method either from std::fmt::Write or from std::io::Write; both return Result<(), Error> and you need to omit the semicolon in order to have it as the output of fmt().
from The Rust Book, 1st edition:
Rust functions return exactly one value, and you declare the type
after an ‘arrow’, which is a dash (-) followed by a greater-than sign
(>). The last line of a function determines what it returns. You’ll
note the lack of a semicolon here. If we added it in we would
get an error.
This reveals two interesting things about Rust: it is an
expression-based language, and semicolons are different from
semicolons in other ‘curly brace and semicolon’-based languages.

Why is it that traits for operator overloading require ownership of self? [duplicate]

I made a two element Vector struct and I want to overload the + operator.
I made all my functions and methods take references, rather than values, and I want the + operator to work the same way.
impl Add for Vector {
fn add(&self, other: &Vector) -> Vector {
Vector {
x: self.x + other.x,
y: self.y + other.y,
}
}
}
Depending on which variation I try, I either get lifetime problems or type mismatches. Specifically, the &self argument seems to not get treated as the right type.
I have seen examples with template arguments on impl as well as Add, but they just result in different errors.
I found How can an operator be overloaded for different RHS types and return values? but the code in the answer doesn't work even if I put a use std::ops::Mul; at the top.
I am using rustc 1.0.0-nightly (ed530d7a3 2015-01-16 22:41:16 +0000)
I won't accept "you only have two fields, why use a reference" as an answer; what if I wanted a 100 element struct? I will accept an answer that demonstrates that even with a large struct I should be passing by value, if that is the case (I don't think it is, though.) I am interested in knowing a good rule of thumb for struct size and passing by value vs struct, but that is not the current question.
You need to implement Add on &Vector rather than on Vector.
impl<'a, 'b> Add<&'b Vector> for &'a Vector {
type Output = Vector;
fn add(self, other: &'b Vector) -> Vector {
Vector {
x: self.x + other.x,
y: self.y + other.y,
}
}
}
In its definition, Add::add always takes self by value. But references are types like any other1, so they can implement traits too. When a trait is implemented on a reference type, the type of self is a reference; the reference is passed by value. Normally, passing by value in Rust implies transferring ownership, but when references are passed by value, they're simply copied (or reborrowed/moved if it's a mutable reference), and that doesn't transfer ownership of the referent (because a reference doesn't own its referent in the first place). Considering all this, it makes sense for Add::add (and many other operators) to take self by value: if you need to take ownership of the operands, you can implement Add on structs/enums directly, and if you don't, you can implement Add on references.
Here, self is of type &'a Vector, because that's the type we're implementing Add on.
Note that I also specified the RHS type parameter with a different lifetime to emphasize the fact that the lifetimes of the two input parameters are unrelated.
1 Actually, reference types are special in that you can implement traits for references to types defined in your crate (i.e. if you're allowed to implement a trait for T, then you're also allowed to implement it for &T). &mut T and Box<T> have the same behavior, but that's not true in general for U<T> where U is not defined in the same crate.
If you want to support all scenarios, you must support all the combinations:
&T op U
T op &U
&T op &U
T op U
In rust proper, this was done through an internal macro.
Luckily, there is a rust crate, impl_ops, that also offers a macro to write that boilerplate for us: the crate offers the impl_op_ex! macro, which generates all the combinations.
Here is their sample:
#[macro_use] extern crate impl_ops;
use std::ops;
impl_op_ex!(+ |a: &DonkeyKong, b: &DonkeyKong| -> i32 { a.bananas + b.bananas });
fn main() {
let total_bananas = &DonkeyKong::new(2) + &DonkeyKong::new(4);
assert_eq!(6, total_bananas);
let total_bananas = &DonkeyKong::new(2) + DonkeyKong::new(4);
assert_eq!(6, total_bananas);
let total_bananas = DonkeyKong::new(2) + &DonkeyKong::new(4);
assert_eq!(6, total_bananas);
let total_bananas = DonkeyKong::new(2) + DonkeyKong::new(4);
assert_eq!(6, total_bananas);
}
Even better, they have a impl_op_ex_commutative! that'll also generate the operators with the parameters reversed if your operator happens to be commutative.

General-purpose language to specify value constraints

I am looking for a general-purpose way of defining textual expressions which allow a value to be validated.
For example, I have a value which should only be set to 1, 2, 3, 10, 11, or 12.
Its constraint might be defined as: (value >= 1 && value <= 3) || (value >= 10 && value <= 12)
Or another value which can be 1, 3, 5, 7, 9 etc... would have a constraint like value % 2 == 1 or IsOdd(value).
(To help the user correct invalid values, I'd like to show the constraint - so something descriptive like IsOdd is preferable.)
These constraints would be evaluated both on client-side (after user input) and server-side.
Therefore a multi-platform solution would be ideal (specifically Win C#/Linux C++).
Is there an existing language/project which allows evaluation or parsing of similar simple expressions?
If not, where might I start creating my own?
I realise this question is somewhat vague as I am not entirely sure what I am after. Searching turned up no results, so even some terms as a starting point would be helpful. I can then update/tag the question accordingly.
You may want to investigate dependently typed languages like Idris or Agda.
The type system of such languages allows encoding of value constraints in types. Programs that cannot guarantee the constraints will simply not compile. The usual example is that of matrix multiplication, where the dimensions must match. But this is so to speak the "hello world" of dependently typed languages, the type system can do much more for you.
If you end up starting your own language I'd try to stay implementation-independent as long as possible. Look for the formal expression grammars of a suitable programming language (e.g. C) and add special keywords/functions as required. Once you have a formal definition of your language, implement a parser using your favourite parser generator.
That way, even if your parser is not portable to a certain platform you at least have a formal standard from where to start a separate parser implementation.
You may also want to look at creating a Domain Specific Language (DSL) in Ruby. (Here's a good article on what that means and what it would look like: http://jroller.com/rolsen/entry/building_a_dsl_in_ruby)
This would definitely give you the portability you're looking for, including maybe using IronRuby in your C# environment, and you'd be able to leverage the existing logic and mathematical operations of Ruby. You could then have constraint definition files that looked like this:
constrain 'wakeup_time' do
6 <= value && value <= 10
end
constrain 'something_else' do
check (value % 2 == 1), MustBeOdd
end
# constrain is a method that takes one argument and a code block
# check is a function you've defined that takes a two arguments
# MustBeOdd is the name of an exception type you've created in your standard set
But really, the great thing about a DSL is that you have a lot of control over what the constraint files look like.
there are a number of ways to verify a list of values across multiple languages. My preferred method is to make a list of the permitted values and load them into a dictionary/hashmap/list/vector (dependant on the language and your preference) and write a simple isIn() or isValid() function, that will check that the value supplied is valid based on its presence in the data structure. The beauty of this is that the code is trivial and can be implemented in just about any language very easily. for odd-only or even-only numeric validity again, a small library of different language isOdd() functions will suffice: if it isn't odd it must by definition be even (apart from 0 but then a simple exception can be set up to handle that, or you can simply specify in your code documentation that for logical purposes your code evaluates 0 as odd/even (your choice)).
I normally cart around a set of c++ and c# functions to evaluate isOdd() for similar reasons to what you have alluded to, and the code is as follows:
C++
bool isOdd( int integer ){ return (integer%2==0)?false:true; }
you can also add inline and/or fastcall to the function depending on need or preference; I tend to use it as an inline and fastcall unless there is a need to do otherwise (huge performance boost on xeon processors).
C#
Beautifully the same line works in C# just add static to the front if it is not going to be part of another class:
static bool isOdd( int integer ){ return (integer%2==0)?false:true; }
Hope this helps, in any event let me know if you need any further info:)
Not sure if it's what you looking for, but judging from your starting conditions (Win C#/Linux C++) you may not need it to be totally language agnostic. You can implement such a parser yourself in C++ with all the desired features and then just use it in both C++ and C# projects - thus also bypassing the need to add external libraries.
On application design level, it would be (relatively) simple - you create a library which is buildable cross-platform and use it in both projects. The interface may be something simple like:
bool VerifyConstraint_int(int value, const char* constraint);
bool VerifyConstraint_double(double value, const char* constraint);
// etc
Such interface will be usable both in Linux C++ (by static or dynamic linking) and in Windows C# (using P/Invoke). You can have same codebase compiling on both platforms.
The parser (again, judging from what you've described in the question) may be pretty simple - a tree holding elements of types Variable and Expression which can be Evaluated with a given Variable value.
Example class definitions:
class Entity {public: virtual VARIANT Evaluate() = 0;} // boost::variant may be used typedef'd as VARIANT
class BinaryOperation: public Entity {
private:
Entity& left;
Entity& right;
enum Operation {PLUS,MINUS,EQUALS,AND,OR,GREATER_OR_EQUALS,LESS_OR_EQUALS};
public:
virtual VARIANT Evaluate() override; // Evaluates left and right operands and combines them
}
class Variable: public Entity {
private:
VARIANT value;
public:
virtual VARIANT Evaluate() override {return value;};
}
Or, you can just write validation code in C++ and use it both in C# and C++ applications :)
My personal choice would be Lua. The downside to any DSL is the learning curve of a new language and how to glue the code with the scripts but I've found Lua has lots of support from the user base and several good books to help you learn.
If you are after making somewhat generic code that a non programmer can inject rules for allowable input it's going to take some upfront work regardless of the route you take. I highly suggest not rolling your own because you'll likely find people wanting more features that an already made DSL will have.
If you are using Java then you can use the Object Graph Navigation Library.
It enables you to write java applications that can parse,compile and evaluate OGNL expressions.
OGNL expressions include basic java,C,C++,C# expressions.
You can compile an expression that uses some variables, and then evaluate that expression
for some given variables.
An easy way to achieve validation of expressions is to use Python's eval method. It can be used to evaluate expressions just like the one you wrote. Python's syntax is easy enough to learn for simple expressions and english-like. Your expression example is translated to:
(value >= 1 and value <= 3) or (value >= 10 and value <= 12)
Code evaluation provided by users might pose a security risk though as certain functions could be used to be executed on the host machine (such as the open function, to open a file). But the eval function takes extra arguments to restrict the allowed functions. Hence you can create a safe evaluation environment.
# Import math functions, and we'll use a few of them to create
# a list of safe functions from the math module to be used by eval.
from math import *
# A user-defined method won't be reachable in the evaluation, as long
# as we provide the list of allowed functions and vars to eval.
def dangerous_function(filename):
print open(filename).read()
# We're building the list of safe functions to use by eval:
safe_list = ['math','acos', 'asin', 'atan', 'atan2', 'ceil', 'cos', 'cosh', 'degrees', 'e', 'exp', 'fabs', 'floor', 'fmod', 'frexp', 'hypot', 'ldexp', 'log', 'log10', 'modf', 'pi', 'pow', 'radians', 'sin', 'sinh', 'sqrt', 'tan', 'tanh']
safe_dict = dict([ (k, locals().get(k, None)) for k in safe_list ])
# Let's test the eval method with your example:
exp = "(value >= 1 and value <= 3) or (value >= 10 and value <= 12)"
safe_dict['value'] = 2
print "expression evaluation: ", eval(exp, {"__builtins__":None},safe_dict)
-> expression evaluation: True
# Test with a forbidden method, such as 'abs'
exp = raw_input("type an expression: ")
-> type an expression: (abs(-2) >= 1 and abs(-2) <= 3) or (abs(-2) >= 10 and abs(-2) <= 12)
print "expression evaluation: ", eval(exp, {"__builtins__":None},safe_dict)
-> expression evaluation:
-> Traceback (most recent call last):
-> File "<stdin>", line 1, in <module>
-> File "<string>", line 1, in <module>
-> NameError: name 'abs' is not defined
# Let's test it again, without any extra parameters to the eval method
# that would prevent its execution
print "expression evaluation: ", eval(exp)
-> expression evaluation: True
# Works fine without the safe dict! So the restrictions were active
# in the previous example..
# is odd?
def isodd(x): return bool(x & 1)
safe_dict['isodd'] = isodd
print "expression evaluation: ", eval("isodd(7)", {"__builtins__":None},safe_dict)
-> expression evaluation: True
print "expression evaluation: ", eval("isodd(42)", {"__builtins__":None},safe_dict)
-> expression evaluation: False
# A bit more complex this time, let's ask the user a function:
user_func = raw_input("type a function: y = ")
-> type a function: y = exp(x)
# Let's test it:
for x in range(1,10):
# add x in the safe dict
safe_dict['x']=x
print "x = ", x , ", y = ", eval(user_func,{"__builtins__":None},safe_dict)
-> x = 1 , y = 2.71828182846
-> x = 2 , y = 7.38905609893
-> x = 3 , y = 20.0855369232
-> x = 4 , y = 54.5981500331
-> x = 5 , y = 148.413159103
-> x = 6 , y = 403.428793493
-> x = 7 , y = 1096.63315843
-> x = 8 , y = 2980.95798704
-> x = 9 , y = 8103.08392758
So you can control the allowed functions that should be used by the eval method, and have a sandbox environment that can evaluate expressions.
This is what we used in a previous project I worked in. We used Python expressions in custom Eclipse IDE plug-ins, using Jython to run in the JVM. You could do the same with IronPython to run in the CLR.
The examples I used in part inspired / copied from the Lybniz project explanation on how to run a safe Python eval environment. Read it for more details!
You might want to look at Regular-Expressions or RegEx. It's proven and been around for a long time. There's a regex library all the major programming/script languages out there.
Libraries:
C++: what regex library should I use?
C# Regex Class
Usage
Regex Email validation
Regex to validate date format dd/mm/yyyy

Resources