How to cache expensive async tasks to await those already in-progress? - caching

I have a forward cache which computes some expensive values. In some cases I have to perform an expensive call to the same resource. In a situation where the forward cache is already computing the value, I'd like to .await until this in-flight computation has completed.
My current (simplified) code is structured similar to this:
struct MyStruct {
cache: Cache, // cache for results
}
impl MyStruct {
async fn compute(&self) -> ExpensiveThing { ... }
async fn forward_cache_compute(&self, identifier: &str) {
// do some expensive computation and cache it:
...
let value = self.compute().await // .... takes 100 ms ...
self.cache.insert(identifier, value)
// consider if possible to save a future of compute() or conditional variable to wait upon for "identifier"
}
async fn get_from_cache_or_compute_if_neeeded(&self, identifier: &str) -> ExpensiveThing {
// would like to check if the forward cache is already computing and return that value if possible (share a future?)
if let Some(cached_value) = self.cache.get(identifier) {
// use this cached_value and don't compute
} else if ... inflight computation is in progress... {
// block on that
// can I save the future and await it from multiple places?
}
}
}

Here is a poor-man's implementation of an asynchronous cache:
# Cargo.toml
[dependencies]
async-once-cell = { version = "0.4.2", features = ["unpin"] }
tokio = { version = "1.21.0", features = ["full"] }
use std::collections::HashMap;
use std::sync::{Mutex, Arc};
use async_once_cell::unpin::Lazy;
struct MyStruct {
cache: Mutex<HashMap<&'static str, Arc<Lazy<i32>>>>,
}
impl MyStruct {
async fn get_or_compute(&self, key: &'static str) -> i32 {
let fut = self
.cache
.lock()
.unwrap()
.entry(key)
.or_insert_with(|| Arc::new(Lazy::new(Box::pin(async move {
println!("calculating value for: {}", key);
tokio::time::sleep(std::time::Duration::from_secs(1)).await;
1
}))))
.clone();
*fut.get().await
}
}
#[tokio::main]
async fn main() {
let my_struct = MyStruct { cache: Default::default() };
tokio::join![
my_struct.get_or_compute("a"),
my_struct.get_or_compute("a"),
my_struct.get_or_compute("b"),
my_struct.get_or_compute("b"),
my_struct.get_or_compute("c"),
my_struct.get_or_compute("a"),
my_struct.get_or_compute("b"),
];
}
calculating value for: a
calculating value for: b
calculating value for: c
As you can see, .get_or_compute() is called multiple times for the same keys concurrently but the task is only executed once for each. The secret sauce is provided by Lazy from the async-once-cell crate; it represents a Future that can be .await-d from multiple places, but will only execute once.

Related

I want to keep a reference inside an HashMap but I'm not able to specify correctly the lifetime

I'm using ws-rs to build a chat app. I need to keep associations between a Sender and a Username but I'm having issues in referencing the Sender in my HashMap.
I'm 99.99% sure that Handler keeps the ownership of Sender.
I had solved this problem cloning every time the sender passing it to another thread, together with the username, via a mspc::channel but I wanna try to use smart pointers and reference.
Here is a Minimal, Reproducible Example:
use std::collections::HashMap;
use std::sync::Arc;
use std::thread;
trait Factory {
fn connection_made(&mut self, _: Sender) -> MHandler;
}
trait Handler {
fn on_open(&mut self) -> ();
}
struct MFactory<'a> {
connections: Arc<HashMap<String, &'a Sender>>,
}
struct MHandler<'a> {
sender: Sender,
connections: Arc<HashMap<String, &'a Sender>>,
}
struct Sender{}
fn main() {
let mut connections: Arc<HashMap<String, &Sender>> = Arc::new(HashMap::new());
// Server thread
let server = thread::Builder::new()
.name(format!("server"))
.spawn(|| {
let mFactory = MFactory {
connections: connections.clone(),
};
let mHandler = mFactory.connection_made(Sender{});
mHandler.on_open();
})
.unwrap();
}
impl Factory for MFactory<'_> {
fn connection_made(&mut self, s: Sender) -> MHandler {
MHandler {
sender: s,
connections: self.connections.clone(),
}
}
}
impl Handler for MHandler<'_> {
fn on_open(&mut self) -> () {
self.connections.insert(format!("Alan"), &self.sender);
}
}
Playground.
Ps: I'm aware that Arc doesn't guarantee mutual exclusion so I have to wrap my HasMap in a Mutex. I've decided to ignore it for the moment.
What you're trying to do is unsafe. You're keeping in a map that lives for the duration of your program references to a structure that is owned by another object inside a thread. So the map outlives the the objects it stores references to, which Rust prevents.
Following on my comment, this code compiles (I've removed the factory for clarity):
use std::collections::HashMap;
use std::sync::{Arc,Mutex};
use std::thread;
use std::ptr::NonNull;
struct MHandler {
sender: Sender,
}
struct Sender{}
struct Wrapper(NonNull<Sender>);
unsafe impl std::marker::Send for Wrapper { }
fn main() {
let connections: Arc<Mutex<HashMap<String, Wrapper>>> = Arc::new(Mutex::new(HashMap::new()));
// Server thread
let server = thread::Builder::new()
.name(format!("server"))
.spawn(move || {
let mut handler = MHandler {
sender: Sender{},
};
let w = Wrapper(NonNull::new(&mut handler.sender as *mut Sender).unwrap());
Arc::clone(&connections).lock().unwrap().insert(format!("Alan"), w);
})
.unwrap();
}
This is using raw pointers (https://doc.rust-lang.org/book/ch19-01-unsafe-rust.html#dereferencing-a-raw-pointer) and NonNull to be able to implement Send (see https://github.com/rust-lang/rust/issues/21709 and https://play.rust-lang.org/?gist=1ce2532a0eefc60695663c26faddebe1&version=stable)
Not sure this helps you.

How to run workload in the background?

I've a GUI application that is based on a loop. The loop can run more often than every frame, so it needs to be lightweight. There's a heavy workload that needs to be done from time to time. I'm not sure how to implement that. I'm imagining something like:
extern crate tokio; // 0.1.7
extern crate tokio_threadpool; // 0.1.2
use std::{thread, time::Duration};
use tokio::{prelude::*, runtime::Runtime};
fn delay_for(
seconds: u64
) -> impl Future<Item = u64, Error = tokio_threadpool::BlockingError>
{
future::poll_fn(move || {
tokio_threadpool::blocking(|| {
thread::sleep(Duration::from_secs(seconds));
seconds
})
})
}
fn render_frame(n: i8) {
println!("rendering frame {}", n);
thread::sleep(Duration::from_millis(500));
}
fn send_processed_data_to_gui(n: i8) {
println!("view updated. result of background processing was {}", n);
}
fn main() {
let mut frame_n = 0;
let frame_where_some_input_triggers_heavy_work = 2;
let mut parallel_work: Option<BoxedFuture> = None;
loop {
render_frame(frame_n);
if frame_n == frame_where_some_input_triggers_heavy_work {
parallel_work = Some(execute_in_background(delay_for(1)));
}
// check if there's parallel processing going on
// and handle result if it's finished
parallel_work
.take()
.map(|parallel_work| {
if parallel_work.done() {
// giving result back to app
send_processed_data_to_gui(parallel_work.result())
}
});
frame_n += 1;
if frame_n == 10 {
break;
}
}
}
fn execute_in_background(work: /* ... */) -> BoxedFuture {
unimplemented!()
}
Playground link
Above example is based on the linked answer's tokio-threadpool example. That example has a data flow like this:
let a = delay_for(3);
let b = delay_for(1);
let sum = a.join(b).map(|(a, b)| a + b);
The main difference between that example and my case is that task a triggers task b and when b is finished a gets passed result of b and continues working. It will also repeat this any number of times.
I feel like I'm trying to approach this in a way that is not idiomatic async programming in Rust.
How to run that workload in the background? Or to rephrase in terms of the code sketch above: how do I execute the future in parallel_work in parallel? If my approach is indeed severely off-track, can you nudge me in the right direction?

What is the idiomatic way to implement caching on a function that is not a struct method?

I have an expensive function like this:
pub fn get_expensive_value(n: u64): u64 {
let ret = 0;
for 0 .. n {
// expensive stuff
}
ret
}
And it gets called very frequently with the same argument. It's pure, so that means it will return the same result and can make use of a cache.
If this was a struct method, I would add a member to the struct that acts as a cache, but it isn't. So my option seems to be to use a static:
static mut LAST_VAL: Option<(u64, u64)> = None;
pub fn cached_expensive(n: u64) -> u64 {
unsafe {
LAST_VAL = LAST_VAL.and_then(|(k, v)| {
if k == n {
Some((n,v))
} else {
None
}
}).or_else(|| {
Some((n, get_expensive_value(n)))
});
let (_, v) = LAST_VAL.unwrap();
v
}
}
Now, I've had to use unsafe. Instead of the static mut, I could put a RefCell in a const. But I'm not convinced that is any safer - it just avoids having to use the unsafe block. I thought about a Mutex, but I don't think that will get me thread safety either.
Redesigning the code to use a struct for storage is not really an option.
I think the best alternative is to use a global variable with a mutex. Using lazy_static makes it easy and allows the "global" declaration inside the function
pub fn cached_expensive(n: u64) -> u64 {
use std::sync::Mutex;
lazy_static! {
static ref LAST_VAL: Mutex<Option<(u64, u64)>> = Mutex::new(None);
}
let mut last = LAST_VAL.lock().unwrap();
let r = last.and_then(|(k, v)| {
if k == n {
Some((n, v))
} else {
None
}
}).or_else(|| Some((n, get_expensive_value(n))));
let (_, v) = r.unwrap();
*last = r;
v
}
You can also check out the cached project / crate. It memoizes the function with a simple macro.

How can I lock the internals of my Rust data structure?

I'm trying to implement a collection that stores values in both a vector and a hashmap and this is what I have so far:
pub struct CollectionWrapper {
items: Vec<Item>,
items_map: HashMap<ItemKey, Item>,
}
impl CollectionWrapper {
pub fn new() -> Self {
CollectionWrapper {
items: Vec::new(),
items_map: HashMap::new(),
}
}
pub fn add(&mut self, item: Item) {
let key = item.get_key();
self.items.push(item.clone());
self.items_map.insert(key, item.clone());
}
}
I obviously need some kind of lock. I've looked at the Mutex Rust has, but I do not understand how to use it. When I search for the problem, I only find use cases where people spawn a bunch of threads and synchronize them. I'm looking for something like:
try {
lock.lock();
// insert into both collections
} finally {
lock.unlock();
}
I obviously need some kind of lock
I don't know that I agree with this need. I'd only introduce a lock when multiple threads could be modifying the object concurrently. Note that's two conditions: multiple threads AND concurrent modification.
If you only have one thread, then Rust's enforcement of a single mutable reference to an item will prevent any issues. Likewise, if you have multiple threads and fully transfer ownership of the item between them, you don't need any locking because only one thread can mutate it.
I'm looking for something like:
try {
lock.lock();
// insert into both collections
} finally {
lock.unlock();
}
If you need something like that, then you can create a Mutex<()> — a mutex that locks the unit type, which takes no space:
use std::sync::Mutex;
struct Thing {
lock: Mutex<()>,
nums: Vec<i32>,
names: Vec<String>,
}
impl Thing {
fn new() -> Thing {
Thing {
lock: Mutex::new(()),
nums: vec![],
names: vec![],
}
}
fn add(&mut self) {
let _lock = self.lock.lock().unwrap();
// Lock is held until the end of the block
self.nums.push(42);
self.names.push("The answer".to_string());
}
}
fn main() {
let mut thing = Thing::new();
thing.add();
}
Note that there is no explicit unlock required. When you call lock, you get back a MutexGuard. This type implements Drop, which allows for code to be run when it goes out of scope. In this case, the lock will be automatically released. This is commonly called Resource Acquisition Is Initialization (RAII).
I wouldn't recommend this practice in most cases. It's generally better to wrap the item that you want to lock. This enforces that access to the item can only happen when the lock is locked:
use std::sync::Mutex;
struct Thing {
nums: Vec<i32>,
names: Vec<String>,
}
impl Thing {
fn new() -> Thing {
Thing {
nums: vec![],
names: vec![],
}
}
fn add(&mut self) {
self.nums.push(42);
self.names.push("The answer".to_string());
}
}
fn main() {
let thing = Thing::new();
let protected = Mutex::new(thing);
let mut locked_thing = protected.lock().unwrap();
locked_thing.add();
}
Note that the MutexGuard also implements Deref and DerefMut, which allow it to "look" like the locked type.

How to specify a lifetime for an Option<closure>?

I'm trying to put a field on a struct that should hold an Option<closure>.
However, Rust is yelling at me that I have to specify the lifetime (not that I would have really grokked that yet). I'm trying my best to do so but Rust is never happy with what I come up with. Take a look at my inline comments for the compile errors I got.
struct Floor{
handler: Option<|| ->&str> //this gives: missing lifetime specifier
//handler: Option<||: 'a> // this gives: use of undeclared lifetime name `'a`
}
impl Floor {
// I guess I need to specify life time here as well
// but I can't figure out for the life of me what's the correct syntax
fn get(&mut self, handler: || -> &str){
self.handler = Some(handler);
}
}
This gets a bit trickier.
As a general rule of thumb, whenever you're storing a borrowed reference (i.e., an & type) in a data structure, then you need to name its lifetime. In this case, you were on the right track by using a 'a, but that 'a has to be introduced in the current scope. It's done the same way you introduce type variables. So to define your Floor struct:
struct Floor<'a> {
handler: Option<|| -> &'a str>
}
But there's another problem here. The closure itself is also a reference with a lifetime, which also must be named. So there are two different lifetimes at play here! Try this:
struct Floor<'cl, 'a> {
handler: Option<||:'cl -> &'a str>
}
For your impl Floor, you also need to introduce these lifetimes into scope:
impl<'cl, 'a> Floor<'cl, 'a> {
fn get(&mut self, handler: ||:'cl -> &'a str){
self.handler = Some(handler);
}
}
You could technically reduce this down to one lifetime and use ||:'a -> &'a str, but this implies that the &str returned always has the same lifetime as the closure itself, which I think is a bad assumption to make.
Answer for current Rust version 1.x:
There are two possibilities to get what you want: either an unboxed closure or a boxed one. Unboxed closures are incredibly fast (most of the time, they are inlined), but they add a type parameter to the struct. Boxed closures add a bit freedom here: their type is erased by one level of indirection, which sadly is a bit slower.
My code has some example functions and for that reason it's a bit longer, please excuse that ;)
Unboxed Closure
Full code:
struct Floor<F>
where F: for<'a> FnMut() -> &'a str
{
handler: Option<F>,
}
impl<F> Floor<F>
where F: for<'a> FnMut() -> &'a str
{
pub fn with_handler(handler: F) -> Self {
Floor {
handler: Some(handler),
}
}
pub fn empty() -> Self {
Floor {
handler: None,
}
}
pub fn set_handler(&mut self, handler: F) {
self.handler = Some(handler);
}
pub fn do_it(&mut self) {
if let Some(ref mut h) = self.handler {
println!("Output: {}", h());
}
}
}
fn main() {
let mut a = Floor::with_handler(|| "hi");
a.do_it();
let mut b = Floor::empty();
b.set_handler(|| "cheesecake");
b.do_it();
}
Now this has some typical problems: You can't simply have a Vec of multiple Floors and every function using a Floor object needs to have type parameter on it's own. Also: if you remove the line b.set_handler(|| "cheesecake");, the code won't compile, because the compiler is lacking type information for b.
In some cases you won't run into those problems -- in others you'll need another solution.
Boxed closures
Full code:
type HandlerFun = Box<for<'a> FnMut() -> &'a str>;
struct Floor {
handler: Option<HandlerFun>,
}
impl Floor {
pub fn with_handler(handler: HandlerFun) -> Self {
Floor {
handler: Some(handler),
}
}
pub fn empty() -> Self {
Floor {
handler: None,
}
}
pub fn set_handler(&mut self, handler: HandlerFun) {
self.handler = Some(handler);
}
pub fn do_it(&mut self) {
if let Some(ref mut h) = self.handler {
println!("Output: {}", h());
}
}
}
fn main() {
let mut a = Floor::with_handler(Box::new(|| "hi"));
a.do_it();
let mut b = Floor::empty();
b.set_handler(Box::new(|| "cheesecake"));
b.do_it();
}
It's a bit slower, because we have a heap allocation for every closure and when calling a boxed closure it's an indirect call most of the time (CPUs don't like indirect calls...).
But the Floor struct does not have a type parameter, so you can have a Vec of them. You can also remove b.set_handler(Box::new(|| "cheesecake")); and it will still work.

Resources