rust: weird behaviour from a "file repository" struct - windows

I was running some tests to see what is the best way to implement some kind of repository struct containing every file used by the program for an application I want to develop. (if this makes any sense).
This is what I made:
use std::path::Path;
use std::fs::{File, OpenOptions};
use std::io::{Write, Read};
#[derive(Debug)]
struct Repo<'a> {
path: Box<&'a Path>,
}
impl Repo<'_> {
pub fn new() -> Self {
Self { path: Box::new(Path::new("test.txt")) }
}
pub fn get_mut(&self) -> File {
OpenOptions::new().write(true).read(false).open(&*self.path).unwrap()
}
pub fn get_imm(&self) -> File {
OpenOptions::new().write(false).read(true).open(&*self.path).unwrap()
}
}
fn main() {
let f = Repo::new();
let mut f_mut = f.get_mut();
let mut f_imm = f.get_imm();
let mut buf1 = String::new();
f_imm.read_to_string(&mut buf1).unwrap();
println!("1: {}", buf1);
f_mut.write("123456".to_string().as_bytes()).unwrap();
let mut buf2 = String::new();
f_imm.read_to_string(&mut buf2).unwrap();
println!("2: {}", buf2);
}
Imagine having an empty "test.txt" file, the expected output from the two println!() would be:
1:
2: 123456
and in fact it is. bu if i try to change the written value ("123456") to something else and then recompile and run the program some weird things happen; sometimes the expected value of the second output is missing some characters or is completely empty. this code behaves correctly only when "test.txt" is empty or just by seemingly random chance.
could someone explain to me why this happens and why this code is probably not very good practice?
or maybe it's a problem with my machine or even the rustc itself.
running this on windows 10. (also english is not my first language)

You seem to be opening the file twice with different access modes at the same time. What probably is happening is the operating system has not finished writing the data passed to f_mut to the file.
This is because file I/O operations incorporate buffering to increase performance, and the buffer gets flushed occasionally, but the frequency of it is not universally defined. Luckily, it is possible to flush the data manually, before reading the contents.
The code would not be the best practice, due to the fact that you are keeping 2 references to the same file open, and the said buffering does not work nicely when the handles are separate. Keeping the structure the same, I would open a file with both read and write access, and abstract away the reading/writing operations. For individual reads/writes, you would need to perform file seeking - seek to the end of the file before writing, and seek to the start of the file before reading.

Related

Rust cannot infer an appropriate lifetime for autoref due to conflicting requirements [duplicate]

I have a value and I want to store that value and a reference to
something inside that value in my own type:
struct Thing {
count: u32,
}
struct Combined<'a>(Thing, &'a u32);
fn make_combined<'a>() -> Combined<'a> {
let thing = Thing { count: 42 };
Combined(thing, &thing.count)
}
Sometimes, I have a value and I want to store that value and a reference to
that value in the same structure:
struct Combined<'a>(Thing, &'a Thing);
fn make_combined<'a>() -> Combined<'a> {
let thing = Thing::new();
Combined(thing, &thing)
}
Sometimes, I'm not even taking a reference of the value and I get the
same error:
struct Combined<'a>(Parent, Child<'a>);
fn make_combined<'a>() -> Combined<'a> {
let parent = Parent::new();
let child = parent.child();
Combined(parent, child)
}
In each of these cases, I get an error that one of the values "does
not live long enough". What does this error mean?
Let's look at a simple implementation of this:
struct Parent {
count: u32,
}
struct Child<'a> {
parent: &'a Parent,
}
struct Combined<'a> {
parent: Parent,
child: Child<'a>,
}
impl<'a> Combined<'a> {
fn new() -> Self {
let parent = Parent { count: 42 };
let child = Child { parent: &parent };
Combined { parent, child }
}
}
fn main() {}
This will fail with the error:
error[E0515]: cannot return value referencing local variable `parent`
--> src/main.rs:19:9
|
17 | let child = Child { parent: &parent };
| ------- `parent` is borrowed here
18 |
19 | Combined { parent, child }
| ^^^^^^^^^^^^^^^^^^^^^^^^^^ returns a value referencing data owned by the current function
error[E0505]: cannot move out of `parent` because it is borrowed
--> src/main.rs:19:20
|
14 | impl<'a> Combined<'a> {
| -- lifetime `'a` defined here
...
17 | let child = Child { parent: &parent };
| ------- borrow of `parent` occurs here
18 |
19 | Combined { parent, child }
| -----------^^^^^^---------
| | |
| | move out of `parent` occurs here
| returning this value requires that `parent` is borrowed for `'a`
To completely understand this error, you have to think about how the
values are represented in memory and what happens when you move
those values. Let's annotate Combined::new with some hypothetical
memory addresses that show where values are located:
let parent = Parent { count: 42 };
// `parent` lives at address 0x1000 and takes up 4 bytes
// The value of `parent` is 42
let child = Child { parent: &parent };
// `child` lives at address 0x1010 and takes up 4 bytes
// The value of `child` is 0x1000
Combined { parent, child }
// The return value lives at address 0x2000 and takes up 8 bytes
// `parent` is moved to 0x2000
// `child` is ... ?
What should happen to child? If the value was just moved like parent
was, then it would refer to memory that no longer is guaranteed to
have a valid value in it. Any other piece of code is allowed to store
values at memory address 0x1000. Accessing that memory assuming it was
an integer could lead to crashes and/or security bugs, and is one of
the main categories of errors that Rust prevents.
This is exactly the problem that lifetimes prevent. A lifetime is a
bit of metadata that allows you and the compiler to know how long a
value will be valid at its current memory location. That's an
important distinction, as it's a common mistake Rust newcomers make.
Rust lifetimes are not the time period between when an object is
created and when it is destroyed!
As an analogy, think of it this way: During a person's life, they will
reside in many different locations, each with a distinct address. A
Rust lifetime is concerned with the address you currently reside at,
not about whenever you will die in the future (although dying also
changes your address). Every time you move it's relevant because your
address is no longer valid.
It's also important to note that lifetimes do not change your code; your
code controls the lifetimes, your lifetimes don't control the code. The
pithy saying is "lifetimes are descriptive, not prescriptive".
Let's annotate Combined::new with some line numbers which we will use
to highlight lifetimes:
{ // 0
let parent = Parent { count: 42 }; // 1
let child = Child { parent: &parent }; // 2
// 3
Combined { parent, child } // 4
} // 5
The concrete lifetime of parent is from 1 to 4, inclusive (which I'll
represent as [1,4]). The concrete lifetime of child is [2,4], and
the concrete lifetime of the return value is [4,5]. It's
possible to have concrete lifetimes that start at zero - that would
represent the lifetime of a parameter to a function or something that
existed outside of the block.
Note that the lifetime of child itself is [2,4], but that it refers
to a value with a lifetime of [1,4]. This is fine as long as the
referring value becomes invalid before the referred-to value does. The
problem occurs when we try to return child from the block. This would
"over-extend" the lifetime beyond its natural length.
This new knowledge should explain the first two examples. The third
one requires looking at the implementation of Parent::child. Chances
are, it will look something like this:
impl Parent {
fn child(&self) -> Child { /* ... */ }
}
This uses lifetime elision to avoid writing explicit generic
lifetime parameters. It is equivalent to:
impl Parent {
fn child<'a>(&'a self) -> Child<'a> { /* ... */ }
}
In both cases, the method says that a Child structure will be
returned that has been parameterized with the concrete lifetime of
self. Said another way, the Child instance contains a reference
to the Parent that created it, and thus cannot live longer than that
Parent instance.
This also lets us recognize that something is really wrong with our
creation function:
fn make_combined<'a>() -> Combined<'a> { /* ... */ }
Although you are more likely to see this written in a different form:
impl<'a> Combined<'a> {
fn new() -> Combined<'a> { /* ... */ }
}
In both cases, there is no lifetime parameter being provided via an
argument. This means that the lifetime that Combined will be
parameterized with isn't constrained by anything - it can be whatever
the caller wants it to be. This is nonsensical, because the caller
could specify the 'static lifetime and there's no way to meet that
condition.
How do I fix it?
The easiest and most recommended solution is to not attempt to put
these items in the same structure together. By doing this, your
structure nesting will mimic the lifetimes of your code. Place types
that own data into a structure together and then provide methods that
allow you to get references or objects containing references as needed.
There is a special case where the lifetime tracking is overzealous:
when you have something placed on the heap. This occurs when you use a
Box<T>, for example. In this case, the structure that is moved
contains a pointer into the heap. The pointed-at value will remain
stable, but the address of the pointer itself will move. In practice,
this doesn't matter, as you always follow the pointer.
Some crates provide ways of representing this case, but they
require that the base address never move. This rules out mutating
vectors, which may cause a reallocation and a move of the
heap-allocated values.
rental (no longer maintained or supported)
owning_ref (has multiple soundness issues)
ouroboros
self_cell
Examples of problems solved with Rental:
Is there an owned version of String::chars?
Returning a RWLockReadGuard independently from a method
How can I return an iterator over a locked struct member in Rust?
How to return a reference to a sub-value of a value that is under a mutex?
How do I store a result using Serde Zero-copy deserialization of a Futures-enabled Hyper Chunk?
How to store a reference without having to deal with lifetimes?
In other cases, you may wish to move to some type of reference-counting, such as by using Rc or Arc.
More information
After moving parent into the struct, why is the compiler not able to get a new reference to parent and assign it to child in the struct?
While it is theoretically possible to do this, doing so would introduce a large amount of complexity and overhead. Every time that the object is moved, the compiler would need to insert code to "fix up" the reference. This would mean that copying a struct is no longer a very cheap operation that just moves some bits around. It could even mean that code like this is expensive, depending on how good a hypothetical optimizer would be:
let a = Object::new();
let b = a;
let c = b;
Instead of forcing this to happen for every move, the programmer gets to choose when this will happen by creating methods that will take the appropriate references only when you call them.
A type with a reference to itself
There's one specific case where you can create a type with a reference to itself. You need to use something like Option to make it in two steps though:
#[derive(Debug)]
struct WhatAboutThis<'a> {
name: String,
nickname: Option<&'a str>,
}
fn main() {
let mut tricky = WhatAboutThis {
name: "Annabelle".to_string(),
nickname: None,
};
tricky.nickname = Some(&tricky.name[..4]);
println!("{:?}", tricky);
}
This does work, in some sense, but the created value is highly restricted - it can never be moved. Notably, this means it cannot be returned from a function or passed by-value to anything. A constructor function shows the same problem with the lifetimes as above:
fn creator<'a>() -> WhatAboutThis<'a> { /* ... */ }
If you try to do this same code with a method, you'll need the alluring but ultimately useless &'a self. When that's involved, this code is even more restricted and you will get borrow-checker errors after the first method call:
#[derive(Debug)]
struct WhatAboutThis<'a> {
name: String,
nickname: Option<&'a str>,
}
impl<'a> WhatAboutThis<'a> {
fn tie_the_knot(&'a mut self) {
self.nickname = Some(&self.name[..4]);
}
}
fn main() {
let mut tricky = WhatAboutThis {
name: "Annabelle".to_string(),
nickname: None,
};
tricky.tie_the_knot();
// cannot borrow `tricky` as immutable because it is also borrowed as mutable
// println!("{:?}", tricky);
}
See also:
Cannot borrow as mutable more than once at a time in one code - but can in another very similar
What about Pin?
Pin, stabilized in Rust 1.33, has this in the module documentation:
A prime example of such a scenario would be building self-referential structs, since moving an object with pointers to itself will invalidate them, which could cause undefined behavior.
It's important to note that "self-referential" doesn't necessarily mean using a reference. Indeed, the example of a self-referential struct specifically says (emphasis mine):
We cannot inform the compiler about that with a normal reference,
since this pattern cannot be described with the usual borrowing rules.
Instead we use a raw pointer, though one which is known to not be null,
since we know it's pointing at the string.
The ability to use a raw pointer for this behavior has existed since Rust 1.0. Indeed, owning-ref and rental use raw pointers under the hood.
The only thing that Pin adds to the table is a common way to state that a given value is guaranteed to not move.
See also:
How to use the Pin struct with self-referential structures?
A slightly different issue which causes very similar compiler messages is object lifetime dependency, rather than storing an explicit reference. An example of that is the ssh2 library. When developing something bigger than a test project, it is tempting to try to put the Session and Channel obtained from that session alongside each other into a struct, hiding the implementation details from the user. However, note that the Channel definition has the 'sess lifetime in its type annotation, while Session doesn't.
This causes similar compiler errors related to lifetimes.
One way to solve it in a very simple way is to declare the Session outside in the caller, and then for annotate the reference within the struct with a lifetime, similar to the answer in this Rust User's Forum post talking about the same issue while encapsulating SFTP. This will not look elegant and may not always apply - because now you have two entities to deal with, rather than one that you wanted!
Turns out the rental crate or the owning_ref crate from the other answer are the solutions for this issue too. Let's consider the owning_ref, which has the special object for this exact purpose:
OwningHandle. To avoid the underlying object moving, we allocate it on the heap using a Box, which gives us the following possible solution:
use ssh2::{Channel, Error, Session};
use std::net::TcpStream;
use owning_ref::OwningHandle;
struct DeviceSSHConnection {
tcp: TcpStream,
channel: OwningHandle<Box<Session>, Box<Channel<'static>>>,
}
impl DeviceSSHConnection {
fn new(targ: &str, c_user: &str, c_pass: &str) -> Self {
use std::net::TcpStream;
let mut session = Session::new().unwrap();
let mut tcp = TcpStream::connect(targ).unwrap();
session.handshake(&tcp).unwrap();
session.set_timeout(5000);
session.userauth_password(c_user, c_pass).unwrap();
let mut sess = Box::new(session);
let mut oref = OwningHandle::new_with_fn(
sess,
unsafe { |x| Box::new((*x).channel_session().unwrap()) },
);
oref.shell().unwrap();
let ret = DeviceSSHConnection {
tcp: tcp,
channel: oref,
};
ret
}
}
The result of this code is that we can not use the Session anymore, but it is stored alongside with the Channel which we will be using. Because the OwningHandle object dereferences to Box, which dereferences to Channel, when storing it in a struct, we name it as such. NOTE: This is just my understanding. I have a suspicion this may not be correct, since it appears to be quite close to discussion of OwningHandle unsafety.
One curious detail here is that the Session logically has a similar relationship with TcpStream as Channel has to Session, yet its ownership is not taken and there are no type annotations around doing so. Instead, it is up to the user to take care of this, as the documentation of handshake method says:
This session does not take ownership of the socket provided, it is
recommended to ensure that the socket persists the lifetime of this
session to ensure that communication is correctly performed.
It is also highly recommended that the stream provided is not used
concurrently elsewhere for the duration of this session as it may
interfere with the protocol.
So with the TcpStream usage, is completely up to the programmer to ensure the correctness of the code. With the OwningHandle, the attention to where the "dangerous magic" happens is drawn using the unsafe {} block.
A further and a more high-level discussion of this issue is in this Rust User's Forum thread - which includes a different example and its solution using the rental crate, which does not contain unsafe blocks.
I've found the Arc (read-only) or Arc<Mutex> (read-write with locking) patterns to be sometimes quite useful tradeoff between performance and code complexity (mostly caused by lifetime-annotation).
Arc:
use std::sync::Arc;
struct Parent {
child: Arc<Child>,
}
struct Child {
value: u32,
}
struct Combined(Parent, Arc<Child>);
fn main() {
let parent = Parent { child: Arc::new(Child { value: 42 }) };
let child = parent.child.clone();
let combined = Combined(parent, child.clone());
assert_eq!(combined.0.child.value, 42);
assert_eq!(child.value, 42);
// combined.0.child.value = 50; // fails, Arc is not DerefMut
}
Arc + Mutex:
use std::sync::{Arc, Mutex};
struct Child {
value: u32,
}
struct Parent {
child: Arc<Mutex<Child>>,
}
struct Combined(Parent, Arc<Mutex<Child>>);
fn main() {
let parent = Parent { child: Arc::new(Mutex::new(Child {value: 42 }))};
let child = parent.child.clone();
let combined = Combined(parent, child.clone());
assert_eq!(combined.0.child.lock().unwrap().value, 42);
assert_eq!(child.lock().unwrap().value, 42);
child.lock().unwrap().value = 50;
assert_eq!(combined.0.child.lock().unwrap().value, 50);
}
See also RwLock (When or why should I use a Mutex over an RwLock?)
As a newcomer to Rust, I had a case similar to your last example:
struct Combined<'a>(Parent, Child<'a>);
fn make_combined<'a>() -> Combined<'a> {
let parent = Parent::new();
let child = parent.child();
Combined(parent, child)
}
In the end, I solved it by using this pattern:
fn make_parent_and_child<'a>(anchor: &'a mut DataAnchorFor1<Parent>) -> Child<'a> {
// construct parent, then store it in anchor object the caller gave us a mut-ref to
*anchor = DataAnchorFor1::holding(Parent::new());
// now retrieve parent from storage-slot we assigned to in the previous line
let parent = anchor.val1.as_mut().unwrap();
// now proceed with regular code, except returning only the child
// (the parent can already be accessed by the caller through the anchor object)
let child = parent.child();
child
}
// this is a generic struct that we can define once, and use whenever we need this pattern
// (it can also be extended to have multiple slots, naturally)
struct DataAnchorFor1<T> {
val1: Option<T>,
}
impl<T> DataAnchorFor1<T> {
fn empty() -> Self {
Self { val1: None }
}
fn holding(val1: T) -> Self {
Self { val1: Some(val1) }
}
}
// for my case, this was all I needed
fn main_simple() {
let anchor = DataAnchorFor1::empty();
let child = make_parent_and_child(&mut anchor);
let child_processing_result = do_some_processing(child);
println!("ChildProcessingResult:{}", child_processing_result);
}
// but if access to parent-data later on is required, you can use this
fn main_complex() {
let anchor = DataAnchorFor1::empty();
// if you want to use the parent object (which is stored in anchor), you must...
// ...wrap the child-related processing in a new scope, so the mut-ref to anchor...
// ...gets dropped at its end, letting us access anchor.val1 (the parent) directly
let child_processing_result = {
let child = make_parent_and_child(&mut anchor);
// do the processing you want with the child here (avoiding ref-chain...
// ...back to anchor-data, if you need to access parent-data afterward)
do_some_processing(child)
};
// now that scope is ended, we can access parent data directly
// so print out the relevant data for both parent and child (adjust to your case)
let parent = anchor.val1.unwrap();
println!("Parent:{} ChildProcessingResult:{}", parent, child_processing_result);
}
This is far from a universal solution! But it worked in my case, and only required usage of the main_simple pattern above (not the main_complex variant), because in my case the "parent" object was just something temporary (a database "Client" object) that I had to construct to pass to the "child" object (a database "Transaction" object) so I could run some database commands.
Anyway, it accomplished the encapsulation/simplification-of-boilerplate that I needed (since I had many functions that needed creation of a Transaction/"child" object, and now all they need is that generic anchor-object creation line), while avoiding the need for using a whole new library.
These are the libraries I'm aware of that may be relevant:
owning-ref
rental
ouroboros
reffers
self_cell
escher
rust-viewbox
However, I scanned through them, and they all seem to have issues of one kind or another (not being updated in years, having multiple unsoundness issues/concerns raised, etc.), so I was hesitant to use them.
So while this isn't as generic of a solution, I figured I would mention it for people with similar use-cases:
Where the caller only needs the "child" object returned.
But the called-function needs to construct a "parent" object to perform its functions.
And the borrowing rules requires that the "parent" object be stored somewhere that persists beyond the "make_parent_and_child" function. (in my case, this was a start_transaction function)

What are the performance implications of .await on a Ready future? [duplicate]

In a language like C#, giving this code (I am not using the await keyword on purpose):
async Task Foo()
{
var task = LongRunningOperationAsync();
// Some other non-related operation
AnotherOperation();
result = task.Result;
}
In the first line, the long operation is run in another thread, and a Task is returned (that is a future). You can then do another operation that will run in parallel of the first one, and at the end, you can wait for the operation to be finished. I think that it is also the behavior of async/await in Python, JavaScript, etc.
On the other hand, in Rust, I read in the RFC that:
A fundamental difference between Rust's futures and those from other languages is that Rust's futures do not do anything unless polled. The whole system is built around this: for example, cancellation is dropping the future for precisely this reason. In contrast, in other languages, calling an async fn spins up a future that starts executing immediately.
In this situation, what is the purpose of async/await in Rust? Seeing other languages, this notation is a convenient way to run parallel operations, but I cannot see how it works in Rust if the calling of an async function does not run anything.
You are conflating a few concepts.
Concurrency is not parallelism, and async and await are tools for concurrency, which may sometimes mean they are also tools for parallelism.
Additionally, whether a future is immediately polled or not is orthogonal to the syntax chosen.
async / await
The keywords async and await exist to make creating and interacting with asynchronous code easier to read and look more like "normal" synchronous code. This is true in all of the languages that have such keywords, as far as I am aware.
Simpler code
This is code that creates a future that adds two numbers when polled
before
fn long_running_operation(a: u8, b: u8) -> impl Future<Output = u8> {
struct Value(u8, u8);
impl Future for Value {
type Output = u8;
fn poll(self: Pin<&mut Self>, _ctx: &mut Context) -> Poll<Self::Output> {
Poll::Ready(self.0 + self.1)
}
}
Value(a, b)
}
after
async fn long_running_operation(a: u8, b: u8) -> u8 {
a + b
}
Note that the "before" code is basically the implementation of today's poll_fn function
See also Peter Hall's answer about how keeping track of many variables can be made nicer.
References
One of the potentially surprising things about async/await is that it enables a specific pattern that wasn't possible before: using references in futures. Here's some code that fills up a buffer with a value in an asynchronous manner:
before
use std::io;
fn fill_up<'a>(buf: &'a mut [u8]) -> impl Future<Output = io::Result<usize>> + 'a {
futures::future::lazy(move |_| {
for b in buf.iter_mut() { *b = 42 }
Ok(buf.len())
})
}
fn foo() -> impl Future<Output = Vec<u8>> {
let mut data = vec![0; 8];
fill_up(&mut data).map(|_| data)
}
This fails to compile:
error[E0597]: `data` does not live long enough
--> src/main.rs:33:17
|
33 | fill_up_old(&mut data).map(|_| data)
| ^^^^^^^^^ borrowed value does not live long enough
34 | }
| - `data` dropped here while still borrowed
|
= note: borrowed value must be valid for the static lifetime...
error[E0505]: cannot move out of `data` because it is borrowed
--> src/main.rs:33:32
|
33 | fill_up_old(&mut data).map(|_| data)
| --------- ^^^ ---- move occurs due to use in closure
| | |
| | move out of `data` occurs here
| borrow of `data` occurs here
|
= note: borrowed value must be valid for the static lifetime...
after
use std::io;
async fn fill_up(buf: &mut [u8]) -> io::Result<usize> {
for b in buf.iter_mut() { *b = 42 }
Ok(buf.len())
}
async fn foo() -> Vec<u8> {
let mut data = vec![0; 8];
fill_up(&mut data).await.expect("IO failed");
data
}
This works!
Calling an async function does not run anything
The implementation and design of a Future and the entire system around futures, on the other hand, is unrelated to the keywords async and await. Indeed, Rust has a thriving asynchronous ecosystem (such as with Tokio) before the async / await keywords ever existed. The same was true for JavaScript.
Why aren't Futures polled immediately on creation?
For the most authoritative answer, check out this comment from withoutboats on the RFC pull request:
A fundamental difference between Rust's futures and those from other
languages is that Rust's futures do not do anything unless polled. The
whole system is built around this: for example, cancellation is
dropping the future for precisely this reason. In contrast, in other
languages, calling an async fn spins up a future that starts executing
immediately.
A point about this is that async & await in Rust are not inherently
concurrent constructions. If you have a program that only uses async &
await and no concurrency primitives, the code in your program will
execute in a defined, statically known, linear order. Obviously, most
programs will use some kind of concurrency to schedule multiple,
concurrent tasks on the event loop, but they don't have to. What this
means is that you can - trivially - locally guarantee the ordering of
certain events, even if there is nonblocking IO performed in between
them that you want to be asynchronous with some larger set of nonlocal
events (e.g. you can strictly control ordering of events inside of a
request handler, while being concurrent with many other request
handlers, even on two sides of an await point).
This property gives Rust's async/await syntax the kind of local
reasoning & low-level control that makes Rust what it is. Running up
to the first await point would not inherently violate that - you'd
still know when the code executed, it would just execute in two
different places depending on whether it came before or after an
await. However, I think the decision made by other languages to start
executing immediately largely stems from their systems which
immediately schedule a task concurrently when you call an async fn
(for example, that's the impression of the underlying problem I got
from the Dart 2.0 document).
Some of the Dart 2.0 background is covered by this discussion from munificent:
Hi, I'm on the Dart team. Dart's async/await was designed mainly by
Erik Meijer, who also worked on async/await for C#. In C#, async/await
is synchronous to the first await. For Dart, Erik and others felt that
C#'s model was too confusing and instead specified that an async
function always yields once before executing any code.
At the time, I and another on my team were tasked with being the
guinea pigs to try out the new in-progress syntax and semantics in our
package manager. Based on that experience, we felt async functions
should run synchronously to the first await. Our arguments were
mostly:
Always yielding once incurs a performance penalty for no good reason. In most cases, this doesn't matter, but in some it really
does. Even in cases where you can live with it, it's a drag to bleed a
little perf everywhere.
Always yielding means certain patterns cannot be implemented using async/await. In particular, it's really common to have code like
(pseudo-code here):
getThingFromNetwork():
if (downloadAlreadyInProgress):
return cachedFuture
cachedFuture = startDownload()
return cachedFuture
In other words, you have an async operation that you can call multiple times before it completes. Later calls use the same
previously-created pending future. You want to ensure you don't start
the operation multiple times. That means you need to synchronously
check the cache before starting the operation.
If async functions are async from the start, the above function can't use async/await.
We pleaded our case, but ultimately the language designers stuck with
async-from-the-top. This was several years ago.
That turned out to be the wrong call. The performance cost is real
enough that many users developed a mindset that "async functions are
slow" and started avoiding using it even in cases where the perf hit
was affordable. Worse, we see nasty concurrency bugs where people
think they can do some synchronous work at the top of a function and
are dismayed to discover they've created race conditions. Overall, it
seems users do not naturally assume an async function yields before
executing any code.
So, for Dart 2, we are now taking the very painful breaking change to
change async functions to be synchronous to the first await and
migrating all of our existing code through that transition. I'm glad
we're making the change, but I really wish we'd done the right thing
on day one.
I don't know if Rust's ownership and performance model place different
constraints on you where being async from the top really is better,
but from our experience, sync-to-the-first-await is clearly the better
trade-off for Dart.
cramert replies (note that some of this syntax is outdated now):
If you need code to execute immediately when a function is called
rather than later on when the future is polled, you can write your
function like this:
fn foo() -> impl Future<Item=Thing> {
println!("prints immediately");
async_block! {
println!("prints when the future is first polled");
await!(bar());
await!(baz())
}
}
Code examples
These examples use the async support in Rust 1.39 and the futures crate 0.3.1.
Literal transcription of the C# code
use futures; // 0.3.1
async fn long_running_operation(a: u8, b: u8) -> u8 {
println!("long_running_operation");
a + b
}
fn another_operation(c: u8, d: u8) -> u8 {
println!("another_operation");
c * d
}
async fn foo() -> u8 {
println!("foo");
let sum = long_running_operation(1, 2);
another_operation(3, 4);
sum.await
}
fn main() {
let task = foo();
futures::executor::block_on(async {
let v = task.await;
println!("Result: {}", v);
});
}
If you called foo, the sequence of events in Rust would be:
Something implementing Future<Output = u8> is returned.
That's it. No "actual" work is done yet. If you take the result of foo and drive it towards completion (by polling it, in this case via futures::executor::block_on), then the next steps are:
Something implementing Future<Output = u8> is returned from calling long_running_operation (it does not start work yet).
another_operation does work as it is synchronous.
the .await syntax causes the code in long_running_operation to start. The foo future will continue to return "not ready" until the computation is done.
The output would be:
foo
another_operation
long_running_operation
Result: 3
Note that there are no thread pools here: this is all done on a single thread.
async blocks
You can also use async blocks:
use futures::{future, FutureExt}; // 0.3.1
fn long_running_operation(a: u8, b: u8) -> u8 {
println!("long_running_operation");
a + b
}
fn another_operation(c: u8, d: u8) -> u8 {
println!("another_operation");
c * d
}
async fn foo() -> u8 {
println!("foo");
let sum = async { long_running_operation(1, 2) };
let oth = async { another_operation(3, 4) };
let both = future::join(sum, oth).map(|(sum, _)| sum);
both.await
}
Here we wrap synchronous code in an async block and then wait for both actions to complete before this function will be complete.
Note that wrapping synchronous code like this is not a good idea for anything that will actually take a long time; see What is the best approach to encapsulate blocking I/O in future-rs? for more info.
With a threadpool
// Requires the `thread-pool` feature to be enabled
use futures::{executor::ThreadPool, future, task::SpawnExt, FutureExt};
async fn foo(pool: &mut ThreadPool) -> u8 {
println!("foo");
let sum = pool
.spawn_with_handle(async { long_running_operation(1, 2) })
.unwrap();
let oth = pool
.spawn_with_handle(async { another_operation(3, 4) })
.unwrap();
let both = future::join(sum, oth).map(|(sum, _)| sum);
both.await
}
The purpose of async/await in Rust is to provide a toolkit for concurrency—same as in C# and other languages.
In C# and JavaScript, async methods start running immediately, and they're scheduled whether you await the result or not. In Python and Rust, when you call an async method, nothing happens (it isn't even scheduled) until you await it. But it's largely the same programming style either way.
The ability to spawn another task (that runs concurrent with and independent of the current task) is provided by libraries: see async_std::task::spawn and tokio::task::spawn.
As for why Rust async is not exactly like C#, well, consider the differences between the two languages:
Rust discourages global mutable state. In C# and JS, every async method call is implicitly added to a global mutable queue. It's a side effect to some implicit context. For better or worse, that's not Rust's style.
Rust is not a framework. It makes sense that C# provides a default event loop. It also provides a great garbage collector! Lots of things that come standard in other languages are optional libraries in Rust.
Consider this simple pseudo-JavaScript code that fetches some data, processes it, fetches some more data based on the previous step, summarises it, and then prints a result:
getData(url)
.then(response -> parseObjects(response.data))
.then(data -> findAll(data, 'foo'))
.then(foos -> getWikipediaPagesFor(foos))
.then(sumPages)
.then(sum -> console.log("sum is: ", sum));
In async/await form, that's:
async {
let response = await getData(url);
let objects = parseObjects(response.data);
let foos = findAll(objects, 'foo');
let pages = await getWikipediaPagesFor(foos);
let sum = sumPages(pages);
console.log("sum is: ", sum);
}
It introduces a lot of single-use variables and is arguably worse than the original version with promises. So why bother?
Consider this change, where the variables response and objects are needed later on in the computation:
async {
let response = await getData(url);
let objects = parseObjects(response.data);
let foos = findAll(objects, 'foo');
let pages = await getWikipediaPagesFor(foos);
let sum = sumPages(pages, objects.length);
console.log("sum is: ", sum, " and status was: ", response.status);
}
And try to rewrite it in the original form with promises:
getData(url)
.then(response -> Promise.resolve(parseObjects(response.data))
.then(objects -> Promise.resolve(findAll(objects, 'foo'))
.then(foos -> getWikipediaPagesFor(foos))
.then(pages -> sumPages(pages, objects.length)))
.then(sum -> console.log("sum is: ", sum, " and status was: ", response.status)));
Each time you need to refer back to a previous result, you need to nest the entire structure one level deeper. This can quickly become very difficult to read and maintain, but the async/await version does not suffer from this problem.

How to write fuctions that takes IntoIter more generally

I was reading an answer to stackoverflow question and tried to modify the function history to take IntoIter where item can be anything that can be transformed into reference and has some traits Debug in this case.
If I will remove V: ?Sized from the function definition rust compiler would complain that it doesn't know the size of str at compile time.
use std::fmt::Debug;
pub fn history<I: IntoIterator, V: ?Sized>(i: I) where I::Item: AsRef<V>, V: Debug {
for s in i {
println!("{:?}", s.as_ref());
}
}
fn main() {
history::<_, str>(&["st", "t", "u"]);
}
I don't understand why compiler shows error in the first place and not sure why the program is working properly if I kind of cheat with V: ?Sized.
I kind of cheat with V: ?Sized
It isn't cheating. All generic arguments are assumed to be Sized by default. This default is there because it's the most common case - without it, nearly every type parameter would have to be annotated with : Sized.
In your case, V is only ever accessed by reference, so it doesn't need to be Sized. Relaxing the Sized constraint makes your function as general as possible, allowing it to be used with the most possible types.
The type str is unsized, so this is not just about generalisation, you actually need to relax the default Sized constraint to be able to use your function with str.

Read from an enum without pattern matching

The Rust documentation gives this example where we have an instance of Result<T, E> named some_value:
match some_value {
Ok(value) => println!("got a value: {}", value),
Err(_) => println!("an error occurred"),
}
Is there any way to read from some_value without pattern matching? What about without even checking the type of the contents at runtime? Perhaps we somehow know with absolute certainty what type is contained or perhaps we're just being a bad programmer. In either case, I'm just curious to know if it's at all possible, not if it's a good idea.
It strikes me as a really interesting language feature that this branch is so difficult (or impossible?) to avoid.
At the lowest level, no, you can't read enum fields without a match1.
Methods on an enum can provide more convenient access to data within the enum (e.g. Result::unwrap), but under the hood, they're always implemented with a match.
If you know that a particular case in a match is unreachable, a common practice is to write unreachable!() on that branch (unreachable!() simply expands to a panic!() with a specific message).
1 If you have an enum with only one variant, you could also write a simple let statement to deconstruct the enum. Patterns in let and match statements must be exhaustive, and pattern matching the single variant from an enum is exhaustive. But enums with only one variant are pretty much never used; a struct would do the job just fine. And if you intend to add variants later, you're better off writing a match right away.
enum Single {
S(i32),
}
fn main() {
let a = Single::S(1);
let Single::S(b) = a;
println!("{}", b);
}
On the other hand, if you have an enum with more than one variant, you can also use if let and while let if you're interested in the data from a single variant only. While let and match require exhaustive patterns, if let and while let accept non-exhaustive patterns. You'll often see them used with Option:
fn main() {
if let Some(x) = std::env::args().len().checked_add(1) {
println!("{}", x);
} else {
println!("too many args :(");
}
}

"too many parameters" on perfectly fine function

I have code similar to this:
pub trait WorldImpl {
fn new(size: (usize, usize), seed: u32) -> World;
fn three() -> bool;
fn other() -> bool;
fn non_self_methods() -> bool;
}
pub type World = Vec<Vec<UnitOfSpace>>;
// I'm doing this because I want a SPECIAL version of Vec<Vec<UnitOfSpace>>, so I can treat it like a struct but have it be a normal type underneath.
impl WorldImpl for World {
fn new(size: (usize, usize), seed: u32) -> World {
// Code
vec![/* vector stuff */]
}
// Implement other three methods
}
let w = World::new((120, 120), /* seed from UNIX_EPOCH stuff */);
And I get this error, which is clearly wrong:
error[E0061]: this function takes 0 parameters but 2 parameters were supplied
--> src/main.rs:28:28
|
28 | let world = World::new((120 as usize, 120 as usize),
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ expected 0 parameters
I'm thinking two things:
This is not idiomatic and Rust was never meant to be used this way. In this case, I need to know how to really do this.
It's a stupid error that I'm missing.
When I try similar code to the above on the playground, it works just fine, no errors. I have not found any information on any errors like this anywhere else, so I'll not be surprised to find out I'm just using the language wrong. I have no particular attachment to any of my code, so please tell me what the idiom is for this!
What you are trying to do doesn't quite make sense. You have made World a type alias for Vec<Vec<UnitOfSpace>>, so they are completely interchangeable - the implementations you add for one will apply to the other and vice versa.
If you want to treat this type differently then wrap it in a newtype:
struct World(Vec<Vec<UnitOfSpace>>);
This is now a distinct type from Vec<Vec<UnitOfSpace>>, but with zero runtime overhead.
Your actual error is because you have added a method called new to World as part of its implementation of WorldImpl, but World is a Vec which already has a new method (with zero args!).
Your type World is an alias for Vec<Vec<UnitOfSpace>>. Vec<T> provides an inherent associated function called new that takes no parameters. The compiler prefers selecting inherent associated functions to associated functions defined in traits, thus it selects the inherent new with no parameters instead of your own new that takes 2 parameters.
Here are a few options to solve this:
Invoke the trait's associated function explicitly:
let w = <World as WorldImpl>::new((120, 120), /* seed from UNIX_EPOCH stuff */);
Make World a newtype (struct World(Vec<Vec<UnitOfSpace>>);), which will let you define inherent associated functions (but then Vec's inherent methods won't be available on World).
Rename WorldImpl::new to a name that is not used by an inherent associated function on Vec.

Resources