How do I constrain associated types on a non-owned trait? [duplicate] - syntax

I have this code:
extern crate serde;
use serde::de::DeserializeOwned;
use serde::Serialize;
trait Bar<'a, T: 'a>
where
T: Serialize,
&'a T: DeserializeOwned,
{
}
I would like to write this using an associated type, because the type T is unimportant to the users of this type. I got this far:
trait Bar {
type T: Serialize;
}
I cannot figure out how to specify the other bound.
Ultimately, I want to use a function like this:
extern crate serde_json;
fn test<I: Bar>(t: I::T) -> String {
serde_json::to_string(&t).unwrap()
}

The "correct" solution is to place the bounds on the trait, but referencing the associated type. In this case, you can also use higher ranked trait bounds to handle the reference:
trait Bar
where
Self::T: Serialize,
// ^^^^^^^ Bounds on an associated type
for<'a> &'a Self::T: DeserializeOwned,
// ^^^^^^^^^^^ Higher-ranked trait bounds
{
type T;
}
However, this doesn't work yet.
I believe that you will need to either:
wait for issue 20671 and/or issue 50346 to be fixed.
wait for the generic associated types feature which introduces where clauses on associated types.
In the meantime, the workaround is to duplicate the bound everywhere it's needed:
fn test<I: Bar>(t: I::T) -> String
where
for<'a> &'a I::T: DeserializeOwned,
{
serde_json::to_string(&t).unwrap()
}

Related

Rust cannot infer an appropriate lifetime for autoref due to conflicting requirements [duplicate]

I have a value and I want to store that value and a reference to
something inside that value in my own type:
struct Thing {
count: u32,
}
struct Combined<'a>(Thing, &'a u32);
fn make_combined<'a>() -> Combined<'a> {
let thing = Thing { count: 42 };
Combined(thing, &thing.count)
}
Sometimes, I have a value and I want to store that value and a reference to
that value in the same structure:
struct Combined<'a>(Thing, &'a Thing);
fn make_combined<'a>() -> Combined<'a> {
let thing = Thing::new();
Combined(thing, &thing)
}
Sometimes, I'm not even taking a reference of the value and I get the
same error:
struct Combined<'a>(Parent, Child<'a>);
fn make_combined<'a>() -> Combined<'a> {
let parent = Parent::new();
let child = parent.child();
Combined(parent, child)
}
In each of these cases, I get an error that one of the values "does
not live long enough". What does this error mean?
Let's look at a simple implementation of this:
struct Parent {
count: u32,
}
struct Child<'a> {
parent: &'a Parent,
}
struct Combined<'a> {
parent: Parent,
child: Child<'a>,
}
impl<'a> Combined<'a> {
fn new() -> Self {
let parent = Parent { count: 42 };
let child = Child { parent: &parent };
Combined { parent, child }
}
}
fn main() {}
This will fail with the error:
error[E0515]: cannot return value referencing local variable `parent`
--> src/main.rs:19:9
|
17 | let child = Child { parent: &parent };
| ------- `parent` is borrowed here
18 |
19 | Combined { parent, child }
| ^^^^^^^^^^^^^^^^^^^^^^^^^^ returns a value referencing data owned by the current function
error[E0505]: cannot move out of `parent` because it is borrowed
--> src/main.rs:19:20
|
14 | impl<'a> Combined<'a> {
| -- lifetime `'a` defined here
...
17 | let child = Child { parent: &parent };
| ------- borrow of `parent` occurs here
18 |
19 | Combined { parent, child }
| -----------^^^^^^---------
| | |
| | move out of `parent` occurs here
| returning this value requires that `parent` is borrowed for `'a`
To completely understand this error, you have to think about how the
values are represented in memory and what happens when you move
those values. Let's annotate Combined::new with some hypothetical
memory addresses that show where values are located:
let parent = Parent { count: 42 };
// `parent` lives at address 0x1000 and takes up 4 bytes
// The value of `parent` is 42
let child = Child { parent: &parent };
// `child` lives at address 0x1010 and takes up 4 bytes
// The value of `child` is 0x1000
Combined { parent, child }
// The return value lives at address 0x2000 and takes up 8 bytes
// `parent` is moved to 0x2000
// `child` is ... ?
What should happen to child? If the value was just moved like parent
was, then it would refer to memory that no longer is guaranteed to
have a valid value in it. Any other piece of code is allowed to store
values at memory address 0x1000. Accessing that memory assuming it was
an integer could lead to crashes and/or security bugs, and is one of
the main categories of errors that Rust prevents.
This is exactly the problem that lifetimes prevent. A lifetime is a
bit of metadata that allows you and the compiler to know how long a
value will be valid at its current memory location. That's an
important distinction, as it's a common mistake Rust newcomers make.
Rust lifetimes are not the time period between when an object is
created and when it is destroyed!
As an analogy, think of it this way: During a person's life, they will
reside in many different locations, each with a distinct address. A
Rust lifetime is concerned with the address you currently reside at,
not about whenever you will die in the future (although dying also
changes your address). Every time you move it's relevant because your
address is no longer valid.
It's also important to note that lifetimes do not change your code; your
code controls the lifetimes, your lifetimes don't control the code. The
pithy saying is "lifetimes are descriptive, not prescriptive".
Let's annotate Combined::new with some line numbers which we will use
to highlight lifetimes:
{ // 0
let parent = Parent { count: 42 }; // 1
let child = Child { parent: &parent }; // 2
// 3
Combined { parent, child } // 4
} // 5
The concrete lifetime of parent is from 1 to 4, inclusive (which I'll
represent as [1,4]). The concrete lifetime of child is [2,4], and
the concrete lifetime of the return value is [4,5]. It's
possible to have concrete lifetimes that start at zero - that would
represent the lifetime of a parameter to a function or something that
existed outside of the block.
Note that the lifetime of child itself is [2,4], but that it refers
to a value with a lifetime of [1,4]. This is fine as long as the
referring value becomes invalid before the referred-to value does. The
problem occurs when we try to return child from the block. This would
"over-extend" the lifetime beyond its natural length.
This new knowledge should explain the first two examples. The third
one requires looking at the implementation of Parent::child. Chances
are, it will look something like this:
impl Parent {
fn child(&self) -> Child { /* ... */ }
}
This uses lifetime elision to avoid writing explicit generic
lifetime parameters. It is equivalent to:
impl Parent {
fn child<'a>(&'a self) -> Child<'a> { /* ... */ }
}
In both cases, the method says that a Child structure will be
returned that has been parameterized with the concrete lifetime of
self. Said another way, the Child instance contains a reference
to the Parent that created it, and thus cannot live longer than that
Parent instance.
This also lets us recognize that something is really wrong with our
creation function:
fn make_combined<'a>() -> Combined<'a> { /* ... */ }
Although you are more likely to see this written in a different form:
impl<'a> Combined<'a> {
fn new() -> Combined<'a> { /* ... */ }
}
In both cases, there is no lifetime parameter being provided via an
argument. This means that the lifetime that Combined will be
parameterized with isn't constrained by anything - it can be whatever
the caller wants it to be. This is nonsensical, because the caller
could specify the 'static lifetime and there's no way to meet that
condition.
How do I fix it?
The easiest and most recommended solution is to not attempt to put
these items in the same structure together. By doing this, your
structure nesting will mimic the lifetimes of your code. Place types
that own data into a structure together and then provide methods that
allow you to get references or objects containing references as needed.
There is a special case where the lifetime tracking is overzealous:
when you have something placed on the heap. This occurs when you use a
Box<T>, for example. In this case, the structure that is moved
contains a pointer into the heap. The pointed-at value will remain
stable, but the address of the pointer itself will move. In practice,
this doesn't matter, as you always follow the pointer.
Some crates provide ways of representing this case, but they
require that the base address never move. This rules out mutating
vectors, which may cause a reallocation and a move of the
heap-allocated values.
rental (no longer maintained or supported)
owning_ref (has multiple soundness issues)
ouroboros
self_cell
Examples of problems solved with Rental:
Is there an owned version of String::chars?
Returning a RWLockReadGuard independently from a method
How can I return an iterator over a locked struct member in Rust?
How to return a reference to a sub-value of a value that is under a mutex?
How do I store a result using Serde Zero-copy deserialization of a Futures-enabled Hyper Chunk?
How to store a reference without having to deal with lifetimes?
In other cases, you may wish to move to some type of reference-counting, such as by using Rc or Arc.
More information
After moving parent into the struct, why is the compiler not able to get a new reference to parent and assign it to child in the struct?
While it is theoretically possible to do this, doing so would introduce a large amount of complexity and overhead. Every time that the object is moved, the compiler would need to insert code to "fix up" the reference. This would mean that copying a struct is no longer a very cheap operation that just moves some bits around. It could even mean that code like this is expensive, depending on how good a hypothetical optimizer would be:
let a = Object::new();
let b = a;
let c = b;
Instead of forcing this to happen for every move, the programmer gets to choose when this will happen by creating methods that will take the appropriate references only when you call them.
A type with a reference to itself
There's one specific case where you can create a type with a reference to itself. You need to use something like Option to make it in two steps though:
#[derive(Debug)]
struct WhatAboutThis<'a> {
name: String,
nickname: Option<&'a str>,
}
fn main() {
let mut tricky = WhatAboutThis {
name: "Annabelle".to_string(),
nickname: None,
};
tricky.nickname = Some(&tricky.name[..4]);
println!("{:?}", tricky);
}
This does work, in some sense, but the created value is highly restricted - it can never be moved. Notably, this means it cannot be returned from a function or passed by-value to anything. A constructor function shows the same problem with the lifetimes as above:
fn creator<'a>() -> WhatAboutThis<'a> { /* ... */ }
If you try to do this same code with a method, you'll need the alluring but ultimately useless &'a self. When that's involved, this code is even more restricted and you will get borrow-checker errors after the first method call:
#[derive(Debug)]
struct WhatAboutThis<'a> {
name: String,
nickname: Option<&'a str>,
}
impl<'a> WhatAboutThis<'a> {
fn tie_the_knot(&'a mut self) {
self.nickname = Some(&self.name[..4]);
}
}
fn main() {
let mut tricky = WhatAboutThis {
name: "Annabelle".to_string(),
nickname: None,
};
tricky.tie_the_knot();
// cannot borrow `tricky` as immutable because it is also borrowed as mutable
// println!("{:?}", tricky);
}
See also:
Cannot borrow as mutable more than once at a time in one code - but can in another very similar
What about Pin?
Pin, stabilized in Rust 1.33, has this in the module documentation:
A prime example of such a scenario would be building self-referential structs, since moving an object with pointers to itself will invalidate them, which could cause undefined behavior.
It's important to note that "self-referential" doesn't necessarily mean using a reference. Indeed, the example of a self-referential struct specifically says (emphasis mine):
We cannot inform the compiler about that with a normal reference,
since this pattern cannot be described with the usual borrowing rules.
Instead we use a raw pointer, though one which is known to not be null,
since we know it's pointing at the string.
The ability to use a raw pointer for this behavior has existed since Rust 1.0. Indeed, owning-ref and rental use raw pointers under the hood.
The only thing that Pin adds to the table is a common way to state that a given value is guaranteed to not move.
See also:
How to use the Pin struct with self-referential structures?
A slightly different issue which causes very similar compiler messages is object lifetime dependency, rather than storing an explicit reference. An example of that is the ssh2 library. When developing something bigger than a test project, it is tempting to try to put the Session and Channel obtained from that session alongside each other into a struct, hiding the implementation details from the user. However, note that the Channel definition has the 'sess lifetime in its type annotation, while Session doesn't.
This causes similar compiler errors related to lifetimes.
One way to solve it in a very simple way is to declare the Session outside in the caller, and then for annotate the reference within the struct with a lifetime, similar to the answer in this Rust User's Forum post talking about the same issue while encapsulating SFTP. This will not look elegant and may not always apply - because now you have two entities to deal with, rather than one that you wanted!
Turns out the rental crate or the owning_ref crate from the other answer are the solutions for this issue too. Let's consider the owning_ref, which has the special object for this exact purpose:
OwningHandle. To avoid the underlying object moving, we allocate it on the heap using a Box, which gives us the following possible solution:
use ssh2::{Channel, Error, Session};
use std::net::TcpStream;
use owning_ref::OwningHandle;
struct DeviceSSHConnection {
tcp: TcpStream,
channel: OwningHandle<Box<Session>, Box<Channel<'static>>>,
}
impl DeviceSSHConnection {
fn new(targ: &str, c_user: &str, c_pass: &str) -> Self {
use std::net::TcpStream;
let mut session = Session::new().unwrap();
let mut tcp = TcpStream::connect(targ).unwrap();
session.handshake(&tcp).unwrap();
session.set_timeout(5000);
session.userauth_password(c_user, c_pass).unwrap();
let mut sess = Box::new(session);
let mut oref = OwningHandle::new_with_fn(
sess,
unsafe { |x| Box::new((*x).channel_session().unwrap()) },
);
oref.shell().unwrap();
let ret = DeviceSSHConnection {
tcp: tcp,
channel: oref,
};
ret
}
}
The result of this code is that we can not use the Session anymore, but it is stored alongside with the Channel which we will be using. Because the OwningHandle object dereferences to Box, which dereferences to Channel, when storing it in a struct, we name it as such. NOTE: This is just my understanding. I have a suspicion this may not be correct, since it appears to be quite close to discussion of OwningHandle unsafety.
One curious detail here is that the Session logically has a similar relationship with TcpStream as Channel has to Session, yet its ownership is not taken and there are no type annotations around doing so. Instead, it is up to the user to take care of this, as the documentation of handshake method says:
This session does not take ownership of the socket provided, it is
recommended to ensure that the socket persists the lifetime of this
session to ensure that communication is correctly performed.
It is also highly recommended that the stream provided is not used
concurrently elsewhere for the duration of this session as it may
interfere with the protocol.
So with the TcpStream usage, is completely up to the programmer to ensure the correctness of the code. With the OwningHandle, the attention to where the "dangerous magic" happens is drawn using the unsafe {} block.
A further and a more high-level discussion of this issue is in this Rust User's Forum thread - which includes a different example and its solution using the rental crate, which does not contain unsafe blocks.
I've found the Arc (read-only) or Arc<Mutex> (read-write with locking) patterns to be sometimes quite useful tradeoff between performance and code complexity (mostly caused by lifetime-annotation).
Arc:
use std::sync::Arc;
struct Parent {
child: Arc<Child>,
}
struct Child {
value: u32,
}
struct Combined(Parent, Arc<Child>);
fn main() {
let parent = Parent { child: Arc::new(Child { value: 42 }) };
let child = parent.child.clone();
let combined = Combined(parent, child.clone());
assert_eq!(combined.0.child.value, 42);
assert_eq!(child.value, 42);
// combined.0.child.value = 50; // fails, Arc is not DerefMut
}
Arc + Mutex:
use std::sync::{Arc, Mutex};
struct Child {
value: u32,
}
struct Parent {
child: Arc<Mutex<Child>>,
}
struct Combined(Parent, Arc<Mutex<Child>>);
fn main() {
let parent = Parent { child: Arc::new(Mutex::new(Child {value: 42 }))};
let child = parent.child.clone();
let combined = Combined(parent, child.clone());
assert_eq!(combined.0.child.lock().unwrap().value, 42);
assert_eq!(child.lock().unwrap().value, 42);
child.lock().unwrap().value = 50;
assert_eq!(combined.0.child.lock().unwrap().value, 50);
}
See also RwLock (When or why should I use a Mutex over an RwLock?)
As a newcomer to Rust, I had a case similar to your last example:
struct Combined<'a>(Parent, Child<'a>);
fn make_combined<'a>() -> Combined<'a> {
let parent = Parent::new();
let child = parent.child();
Combined(parent, child)
}
In the end, I solved it by using this pattern:
fn make_parent_and_child<'a>(anchor: &'a mut DataAnchorFor1<Parent>) -> Child<'a> {
// construct parent, then store it in anchor object the caller gave us a mut-ref to
*anchor = DataAnchorFor1::holding(Parent::new());
// now retrieve parent from storage-slot we assigned to in the previous line
let parent = anchor.val1.as_mut().unwrap();
// now proceed with regular code, except returning only the child
// (the parent can already be accessed by the caller through the anchor object)
let child = parent.child();
child
}
// this is a generic struct that we can define once, and use whenever we need this pattern
// (it can also be extended to have multiple slots, naturally)
struct DataAnchorFor1<T> {
val1: Option<T>,
}
impl<T> DataAnchorFor1<T> {
fn empty() -> Self {
Self { val1: None }
}
fn holding(val1: T) -> Self {
Self { val1: Some(val1) }
}
}
// for my case, this was all I needed
fn main_simple() {
let anchor = DataAnchorFor1::empty();
let child = make_parent_and_child(&mut anchor);
let child_processing_result = do_some_processing(child);
println!("ChildProcessingResult:{}", child_processing_result);
}
// but if access to parent-data later on is required, you can use this
fn main_complex() {
let anchor = DataAnchorFor1::empty();
// if you want to use the parent object (which is stored in anchor), you must...
// ...wrap the child-related processing in a new scope, so the mut-ref to anchor...
// ...gets dropped at its end, letting us access anchor.val1 (the parent) directly
let child_processing_result = {
let child = make_parent_and_child(&mut anchor);
// do the processing you want with the child here (avoiding ref-chain...
// ...back to anchor-data, if you need to access parent-data afterward)
do_some_processing(child)
};
// now that scope is ended, we can access parent data directly
// so print out the relevant data for both parent and child (adjust to your case)
let parent = anchor.val1.unwrap();
println!("Parent:{} ChildProcessingResult:{}", parent, child_processing_result);
}
This is far from a universal solution! But it worked in my case, and only required usage of the main_simple pattern above (not the main_complex variant), because in my case the "parent" object was just something temporary (a database "Client" object) that I had to construct to pass to the "child" object (a database "Transaction" object) so I could run some database commands.
Anyway, it accomplished the encapsulation/simplification-of-boilerplate that I needed (since I had many functions that needed creation of a Transaction/"child" object, and now all they need is that generic anchor-object creation line), while avoiding the need for using a whole new library.
These are the libraries I'm aware of that may be relevant:
owning-ref
rental
ouroboros
reffers
self_cell
escher
rust-viewbox
However, I scanned through them, and they all seem to have issues of one kind or another (not being updated in years, having multiple unsoundness issues/concerns raised, etc.), so I was hesitant to use them.
So while this isn't as generic of a solution, I figured I would mention it for people with similar use-cases:
Where the caller only needs the "child" object returned.
But the called-function needs to construct a "parent" object to perform its functions.
And the borrowing rules requires that the "parent" object be stored somewhere that persists beyond the "make_parent_and_child" function. (in my case, this was a start_transaction function)

What does the 'where' clause within a trait do?

If I have this code:
trait Trait {
fn f(&self) -> i32 where Self: Sized;
fn g(&self) -> i32;
}
fn object_safety_dynamic(x: &Trait) {
x.f(); // error
x.g(); // works
}
What does the where clause actually do?
Naively, I was thinking where Self: Sized; dictates something about the type implementing Trait, like 'if you implement Trait for type A your type A must be sized, i.e., it can be i32 but not [i32].
However, such a constraint would rather go as trait Trait: Sized (correct me if I am wrong)?
Now I noticed where Self: Sized; actually determines if I can call f or g from within object_safety_dynamic.
My questions:
What happens here behind the scenes?
What (in simple English) am I actually telling the compiler by where Self: Sized; that makes g() work but f() not?
In particular: Since &self is a reference anyway, what compiled difference exists between f and g for various (sized or unsized) types. Wouldn't it always boil down to something like _vtable_f_or_g(*self) -> i32, regardless of where or if the type is sized or not?
Why can I implement Trait for both u8 and [u8]. Shouldn't the compiler actually stop me from implementing f() for [u8], instead of throwing an error at the call site?
fn f(&self) -> i32 where Self: Sized;
This says that f is only defined for types that also implement Sized. Unsized types may still implement Trait, but f will not be available.
Inside object_safety_dynamic, calling x.f() is actually doing: (*x).f(). While x is sized because it's a pointer, *x might not be because it could be any implementation of Trait. But code inside the function has to work for any valid argument, so you are not allowed to call x.f() there.
What does the where clause actually do?
Naively, I was thinking where Self: Sized; dictates something about the type implementing Trait, like 'if you implement Trait for type A your type A must be sized, i.e., it can be i32 but not [i32].
However, such a constraint would rather go as trait Trait: Sized
This is correct.
However, in this case, the bound applies only to the function. where bounds on functions are only checked at the callsite.
What happens here behind the scenes?
There is a confusing bit about rust's syntax which is that Trait can refer to either
The trait Trait; or
The "trait object" Trait, which is actually a type, not an object.
Sized is a trait, and any type T that is Sized may have its size taken as a constant, by std::mem::size_of::<T>(). Such types that are not sized are str and [u8], whose contents do not have a fixed size.
The type Trait is also unsized. Intuitively, this is because Trait as a type consists of all values of types that implement the trait Trait, which may have varying size. This means you can never have a value of type Trait - you can only refer to one via a "fat pointer" such as &Trait or Box<Trait> and so on. These have the size of 2 pointers - one for a vtable, one for the data. It looks roughly like this:
struct &Trait {
pub data: *mut (),
pub vtable: *mut (),
}
There is automatically an impl of the form:
impl Trait /* the trait */ for Trait /* the type */ {
fn f(&self) -> i32 where Self: Sized { .. }
fn g(&self) -> i32 {
/* vtable magic: something like (self.vtable.g)(self.data) */
}
}
What (in simple English) am I actually telling the compiler by where Self: Sized; that makes g() work but f() not?
Note that since, as I mentioned, Trait is not Sized, the bound Self: Sized is not satisfied and so the function f cannot be called where Self == Trait.
In particular: Since &self is a reference anyway, what compiled difference exists between f and g for various (sized or unsized) types. Wouldn't it always boil down to something like _vtable_f_or_g(*self) -> i32, regardless of where or if the type is sized or not?
The type Trait is always unsized. It doesn't matter which type has been coerced to Trait. The way you call the function with a Sized variable is to use it directly:
fn generic<T: Trait + Sized>(x: &T) { // the `Sized` bound is implicit, added here for clarity
x.f(); // compiles just fine
x.g();
}
Why can I implement Trait for both u8 and [u8]. Shouldn't the compiler actually stop me from implementing f() for [u8], instead of throwing an error at the call site?
Because the trait is not bounded by Self: Sized - the function f is. So there is nothing stopping you from implementing the function - it's just that the bounds on the function can never be satisfied, so you can never call it.

Generalizing a function for an enum

I have an enum that looks like this
pub enum IpNetwork {
V4(Ipv4Network),
V6(Ipv6Network),
}
Each of those variants represents either a IPv4 or v6 CIDR. Now, Ipv4Network and Ipv6Network each has a method to get the prefix defined like this
// For Ipv4Network
pub fn prefix(&self) -> u8
// For Ipv6Network
pub fn prefix(&self) -> u128
How do I generalize the prefix method for the IpNetwork enum? I know that I can just have u128 as the return type, but is that approach idiomatic?
So you want a prefix function that operates on the IpNetwork type, but are unsure what the return type should be. Below is a possible approach you could follow.
The argument against using an enum
As bheklilr mentioned in a comment, one of the alternatives is introducing an enum: pub enum Prefix { V4(u8), V6(u128) }.
This could make sense depending on your use case, but it seems like overkill to me here. In the end, you would end up pattern matching on the result of your generic prefix function. In that case, you could better pattern match on the IpNetwork object itself and call its corresponding prefix function.
The case for u128
If you just want to obtain the integer value and don't need to differentiate between IPV4 and IPV6, returning an integer seems to be the way to go. A u8 can be casted to u128 without any problem and the overhead is negligible.
As far as I know the standard library doesn't hold functionality for generic numeric types. You could, however, define a trait and implement it for u8 and u128.
Also, there is the num crate, which does basically that.

Can Rust attributes be used for something like mapping URL routes to functions? [duplicate]

I've gotten as far as having the custom attribute invoked:
#[plugin_registrar]
pub fn registrar(reg: &mut rustc::plugin::Registry) {
use syntax::parse::token::intern;
use syntax::ext::base;
// Register the `#[dummy]` attribute.
reg.register_syntax_extension(intern("dummy"),
base::ItemDecorator(dummy_expand));
}
// Decorator for `dummy` attribute
pub fn dummy_expand(context: &mut ext::base::ExtCtxt, span: codemap::Span, meta_item: Gc<ast::MetaItem>, item: Gc<ast::Item>, push: |Gc<ast::Item>|) {
match item.node {
ast::ItemFn(decl, ref style, ref abi, ref generics, block) => {
trace!("{}", decl);
// ...? Add something here.
}
_ => {
context.span_err(span, "dummy is only permissiable on functions");
}
}
}
Invoked via:
#![feature(phase)]
#[phase(plugin)]
extern crate dummy_ext;
#[test]
#[dummy]
fn hello() {
println!("Put something above this...");
}
...and I've seen a few examples around of things that use quote_expr!( ... ) to do this, but I don't really understand them.
Let's say I want to add this statement (or is it expression?) to the top of any function tagged #[dummy]:
println!("dummy");
How do I achieve that?
There's two tasks here:
creating the AST you wish to insert
transforming the AST of some function (e.g. inserting another piece)
Notes:
when I say "item" in this answer, I specifically meant the item AST node, e.g. fn, struct, impl.
when doing anything with macros, rustc --pretty expanded foo.rs is your best friend (works best on smallest examples, e.g. avoiding #[deriving] and println!, unless you're trying to debug those specifically).
AST creation
There's 3 basic ways to create chunks of AST from scratch:
manually writing out the structs & enums,
using the methods of AstBuilder to abbreviate that, and
using quotation to avoid that altogether.
In this case, we can use quoting, so I won't waste time on the others. The quote macros take an ExtCtxt ("extension context") and an expression or item etc. and create an AST value that represents that item, e.g.
let x: Gc<ast::Expr> = quote_expr!(cx, 1 + 2);
creates an Expr_ with value ExprBinary, that contains two ExprLits (for the 1 and 2 literals).
Hence, to create the desired expression, quote_expr!(cx, println!("dummy")) should work. Quotation is more powerful than just this: you can use $ it to splice a variable storing AST into an expression, e.g., if we have the x as above, then
let y = quote_expr!(cx, if $x > 0 { println!("dummy") });
will create an AST reprsenting if 1 + 2 > 0 { println!("dummy") }.
This is all very unstable, and the macros are feature gated. A full "working" example:
#![feature(quote)]
#![crate_type = "dylib"]
extern crate syntax;
use syntax::ext::base::ExtCtxt;
use syntax::ast;
use std::gc::Gc;
fn basic_print(cx: &mut ExtCtxt) -> Gc<ast::Expr> {
quote_expr!(cx, println!("dummy"))
}
fn quoted_print(cx: &mut ExtCtxt) -> Gc<ast::Expr> {
let p = basic_print(cx);
quote_expr!(cx, if true { $p })
}
As of 2014-08-29, the list of quoting macros is: quote_tokens, quote_expr, quote_ty, quote_method, quote_item, quote_pat, quote_arm, quote_stmt. (Each essentially creates the similarly-named type in syntax::ast.)
(Be warned: they are implemented in a very hacky way at the moment, by just stringifying their argument and reparsing, so it's relatively easy to encounter confusing behaviour.)
AST transformation
We now know how to make isolated chunks of AST, but how can we feed them back into the main code?
Well, the exact method depends on what you are trying to do. There's a variety of different types of syntax extensions.
If you just wanted to expand to some expression in place (like println!), NormalTT is correct,
if you want to create new items based on an existing one, without modifying anything, use ItemDecorator (e.g. #[deriving] creates some impl blocks based on the struct and enum items to which it is attached)
if you want to take an item and actually change it, use ItemModifier
Thus, in this case, we want an ItemModifier, so that we can change #[dummy] fn foo() { ... } into #[dummy] fn foo() { println!("dummy"); .... }. Let's declare a function with the right signature:
fn dummy_expand(cx: &mut ExtCtxt, sp: Span, _: Gc<ast::MetaItem>, item: Gc<ast::Item>) -> Gc<Item>
This is registered with
reg.register_syntax_extension(intern("dummy"), base::ItemModifier(dummy_expand));
We've got the boilerplate set-up, we just need to write the implementation. There's two approaches. We could just add the println! to the start of the function's contents, or we could change the contents from foo(); bar(); ... to println!("dummy"); { foo(); bar(); ... } by just creating two new expressions.
As you found, an ItemFn can be matched with
ast::ItemFn(decl, ref style, ref abi, ref generics, block)
where block is the actual contents. The second approach I mention above is easiest, just
let new_contents = quote_expr!(cx,
println!("dummy");
$block
);
and then to preserve the old information, we'll construct a new ItemFn and wrap it back up with the right method on AstBuilder. In total:
#![feature(quote, plugin_registrar)]
#![crate_type = "dylib"]
// general boilerplate
extern crate syntax;
extern crate rustc;
use syntax::ast;
use syntax::codemap::Span;
use syntax::ext::base::{ExtCtxt, ItemModifier};
// NB. this is important or the method calls don't work
use syntax::ext::build::AstBuilder;
use syntax::parse::token;
use std::gc::Gc;
#[plugin_registrar]
pub fn registrar(reg: &mut rustc::plugin::Registry) {
// Register the `#[dummy]` attribute.
reg.register_syntax_extension(token::intern("dummy"),
ItemModifier(dummy_expand));
}
fn dummy_expand(cx: &mut ExtCtxt, sp: Span, _: Gc<ast::MetaItem>,
item: Gc<ast::Item>) -> Gc<ast::Item> {
match item.node {
ast::ItemFn(decl, ref style, ref abi, ref generics, block) => {
let new_contents = quote_expr!(&mut *cx,
println!("dummy");
$block
);
let new_item_ = ast::ItemFn(decl, style.clone(),
abi.clone(), generics.clone(),
// AstBuilder to create block from expr
cx.block_expr(new_contents));
// copying info from old to new
cx.item(item.span, item.ident, item.attrs.clone(), new_item_)
}
_ => {
cx.span_err(sp, "dummy is only permissible on functions");
item
}
}
}

Using traits as types in enums

Here's my code:
trait UnaryOperator {
fn apply(&self, expr: Expression) -> Expression;
}
pub enum Expression {
UnaryOp(UnaryOperator, Expression),
Value(i64)
}
Which gives the following errors:
error: the trait 'core::marker::sized' is not implemented for type 'parser::UnaryOperator'
note: 'parser::UnaryOperator' does not have a constant size known at compile-time
I'm not sure how to accomplish what I want. I've tried:
trait UnaryOperator: Sized {
...
}
As well as
pub enum Expression {
UnaryOp(UnaryOperator + Sized, Expression),
...
}
And neither solved the issue.
I've seen ways to possibly accomplish what I want with generics, but then it seems like two expressions with different operators would be different types, but that's not what I want. I want all expressions to be the same type regardless of what the operators are.
Traits do not have a known size - they are unsized. To see why, check this out:
trait AddOne {
fn add_one(&self) -> u8;
}
struct Alpha {
a: u8,
}
struct Beta {
a: [u8; 1024],
}
impl AddOne for Alpha {
fn add_one(&self) -> { 0 }
}
impl AddOne for Beta {
fn add_one(&self) -> { 0 }
}
Both Alpha and Beta implement AddOne, so how big should some arbitrary AddOne be? Oh, and remember that other crates may implement your trait sometime in the future.
That's why you get the first error. There are 3 main solutions (note that none of these solutions immediately fix your problem...):
Use a Box<Trait>. This is kind-of-similar-but-different to languages like Java, where you just accept an interface. This has a known size (a pointer's worth) and owns the trait. This has the downside of requiring an allocation.
trait UnaryOperator {
fn apply(&self, expr: Expression) -> Expression;
}
pub enum Expression {
UnaryOp(Box<UnaryOperator>, Expression),
Value(i64)
}
Use a reference to the trait. This also has a known size (a pointer's worth two pointers' worth, see Matthieu M.'s comment). The downside is that something has to own the object and you need to track the lifetime:
trait UnaryOperator {
fn apply<'a>(&self, expr: Expression<'a>) -> Expression<'a>;
}
pub enum Expression<'a> {
UnaryOp(&'a UnaryOperator, Expression<'a>),
Value(i64)
}
Use a generic. This has a fixed size because each usage of the enum will have been specialized for the specific type. This has the downside of causing code bloat if you have many distinct specializations. Update As you point out, this means that Expression<A> and Expression<B> would have different types. Depending on your usage, this could be a problem. You wouldn't be able to easily create a Vec<Expression<A>> if you had both.
trait UnaryOperator {
fn apply<U>(&self, expr: Expression<U>) -> Expression<U>;
}
pub enum Expression<U>
where U: UnaryOperator
{
UnaryOp(U, Expression<U>),
Value(i64)
}
Now, all of these fail as written because you have a recursive type definition. Let's look at this simplification:
enum Expression {
A(Expression),
B(u8),
}
How big is Expression? Well, it needs to have enough space to hold... an Expression! Which needs to be able to hold an Expression.... you see where this is going.
You need to add some amount of indirection here. Similar concepts to #1 and #2 apply - you can use a Box or a reference to get a fixed size:
enum Expression {
A(Box<Expression>),
B(u8),
}

Resources