How to run workload in the background? - parallel-processing

I've a GUI application that is based on a loop. The loop can run more often than every frame, so it needs to be lightweight. There's a heavy workload that needs to be done from time to time. I'm not sure how to implement that. I'm imagining something like:
extern crate tokio; // 0.1.7
extern crate tokio_threadpool; // 0.1.2
use std::{thread, time::Duration};
use tokio::{prelude::*, runtime::Runtime};
fn delay_for(
seconds: u64
) -> impl Future<Item = u64, Error = tokio_threadpool::BlockingError>
{
future::poll_fn(move || {
tokio_threadpool::blocking(|| {
thread::sleep(Duration::from_secs(seconds));
seconds
})
})
}
fn render_frame(n: i8) {
println!("rendering frame {}", n);
thread::sleep(Duration::from_millis(500));
}
fn send_processed_data_to_gui(n: i8) {
println!("view updated. result of background processing was {}", n);
}
fn main() {
let mut frame_n = 0;
let frame_where_some_input_triggers_heavy_work = 2;
let mut parallel_work: Option<BoxedFuture> = None;
loop {
render_frame(frame_n);
if frame_n == frame_where_some_input_triggers_heavy_work {
parallel_work = Some(execute_in_background(delay_for(1)));
}
// check if there's parallel processing going on
// and handle result if it's finished
parallel_work
.take()
.map(|parallel_work| {
if parallel_work.done() {
// giving result back to app
send_processed_data_to_gui(parallel_work.result())
}
});
frame_n += 1;
if frame_n == 10 {
break;
}
}
}
fn execute_in_background(work: /* ... */) -> BoxedFuture {
unimplemented!()
}
Playground link
Above example is based on the linked answer's tokio-threadpool example. That example has a data flow like this:
let a = delay_for(3);
let b = delay_for(1);
let sum = a.join(b).map(|(a, b)| a + b);
The main difference between that example and my case is that task a triggers task b and when b is finished a gets passed result of b and continues working. It will also repeat this any number of times.
I feel like I'm trying to approach this in a way that is not idiomatic async programming in Rust.
How to run that workload in the background? Or to rephrase in terms of the code sketch above: how do I execute the future in parallel_work in parallel? If my approach is indeed severely off-track, can you nudge me in the right direction?

Related

How to cache expensive async tasks to await those already in-progress?

I have a forward cache which computes some expensive values. In some cases I have to perform an expensive call to the same resource. In a situation where the forward cache is already computing the value, I'd like to .await until this in-flight computation has completed.
My current (simplified) code is structured similar to this:
struct MyStruct {
cache: Cache, // cache for results
}
impl MyStruct {
async fn compute(&self) -> ExpensiveThing { ... }
async fn forward_cache_compute(&self, identifier: &str) {
// do some expensive computation and cache it:
...
let value = self.compute().await // .... takes 100 ms ...
self.cache.insert(identifier, value)
// consider if possible to save a future of compute() or conditional variable to wait upon for "identifier"
}
async fn get_from_cache_or_compute_if_neeeded(&self, identifier: &str) -> ExpensiveThing {
// would like to check if the forward cache is already computing and return that value if possible (share a future?)
if let Some(cached_value) = self.cache.get(identifier) {
// use this cached_value and don't compute
} else if ... inflight computation is in progress... {
// block on that
// can I save the future and await it from multiple places?
}
}
}
Here is a poor-man's implementation of an asynchronous cache:
# Cargo.toml
[dependencies]
async-once-cell = { version = "0.4.2", features = ["unpin"] }
tokio = { version = "1.21.0", features = ["full"] }
use std::collections::HashMap;
use std::sync::{Mutex, Arc};
use async_once_cell::unpin::Lazy;
struct MyStruct {
cache: Mutex<HashMap<&'static str, Arc<Lazy<i32>>>>,
}
impl MyStruct {
async fn get_or_compute(&self, key: &'static str) -> i32 {
let fut = self
.cache
.lock()
.unwrap()
.entry(key)
.or_insert_with(|| Arc::new(Lazy::new(Box::pin(async move {
println!("calculating value for: {}", key);
tokio::time::sleep(std::time::Duration::from_secs(1)).await;
1
}))))
.clone();
*fut.get().await
}
}
#[tokio::main]
async fn main() {
let my_struct = MyStruct { cache: Default::default() };
tokio::join![
my_struct.get_or_compute("a"),
my_struct.get_or_compute("a"),
my_struct.get_or_compute("b"),
my_struct.get_or_compute("b"),
my_struct.get_or_compute("c"),
my_struct.get_or_compute("a"),
my_struct.get_or_compute("b"),
];
}
calculating value for: a
calculating value for: b
calculating value for: c
As you can see, .get_or_compute() is called multiple times for the same keys concurrently but the task is only executed once for each. The secret sauce is provided by Lazy from the async-once-cell crate; it represents a Future that can be .await-d from multiple places, but will only execute once.

How can I speed up the process of parsing a 40Gb txt file and inserting a row, per line, into an sqlite db

Background
I'm currently trying to parse a .txt com zone file. It is structured like so
blahblah.com xx xx examplens.com
Currently my code is structured like so
extern crate regex;
use regex::Regex;
use rusqlite::{params, Connection, Result};
#[derive(Debug)]
struct Domain {
id: i32,
domain: String,
}
use std::io::stdin;
fn main() -> std::io::Result<()> {
let mut reader = my_reader::BufReader::open("data/com_practice.txt")?;
let mut buffer = String::new();
let conn: Connection = Connection::open("/Users/alex/Projects/domain_randomizer/data/domains.db").unwrap();
while let Some(line) = reader.read_line(&mut buffer) {
let regexed_i_think = rexeginton_the_domain_only(line?.trim());
println!("{:?}", regexed_i_think);
sqliting(regexed_i_think, &conn).unwrap();
}
let mut stmt = conn.prepare("SELECT id, domain FROM domains").unwrap();
let domain_iter = stmt.query_map(params![], |row| {
Ok(Domain {
id: row.get(0)?,
domain: row.get(1)?,
})
}).unwrap();
for domain in domain_iter {
println!("Found person {:?}", domain.unwrap());
}
Ok(())
}
pub fn sqliting(domain_name: &str, conn: &Connection) -> Result<()> {
let yeah = Domain {
id: 0,
domain: domain_name.to_string()
};
conn.execute(
"INSERT INTO domains (domain) VALUES (?1)",
params![yeah.domain],
)?;
Ok(())
}
mod my_reader {
use std::{
fs::File,
io::{self, prelude::*},
};
pub struct BufReader {
reader: io::BufReader<File>,
}
impl BufReader {
pub fn open(path: impl AsRef<std::path::Path>) -> io::Result<Self> {
let file = File::open(path)?;
let reader = io::BufReader::new(file);
Ok(Self { reader })
}
pub fn read_line<'buf>(
&mut self,
buffer: &'buf mut String,
) -> Option<io::Result<&'buf mut String>> {
buffer.clear();
self.reader
.read_line(buffer)
.map(|u| if u == 0 { None } else { Some(buffer) })
.transpose()
}
}
}
pub fn rexeginton_the_domain_only(full_line: &str) -> &str {
let regex_str = Regex::new(r"(?m).*?.com").unwrap();
let final_str = regex_str.captures(full_line).unwrap().get(0).unwrap().as_str();
return final_str;
}
Issue
So I am Parsing a single domain at a time, each time making an Insert. As I've gathered, Inserting would be far more efficient if I was making thousands of Inserts in a single transaction. However, I'm not quite sure what an efficient approach is to refactoring my parsing around this.
Question
How should I reorient my parsing process around my Insertion process? Also how can I actively gauge the speed and efficiency of the process in the first place so I can compare and contrast in an articulate manner?

Iterative filter function that modifies tree like structure

I have a structure like
struct Node {
pub id: String,
pub dis: String,
pub parent: Option<NodeRefNodeRefWeak>,
pub children: Vec<NodeRef>,
}
pub type NodeRef = Rc<RefCell<Node>>;
pub type NodeRefNodeRefWeak = Weak<RefCell<Node>>;
I also have a start of a function that can iterate this structure to
pull out a match on a node id but it has issues.
What I would like is for this function to return the parent node of the whole tree with ONLY the branches that have a match somewhere on the branch.
Children past the search node can be removed.
Ie a function that filters all other nodes out of the tree.
For example with my rust playground link I would like it to return
level0_node_#1 (level0_node_#1)
level1_node_4 (level1_node_4)
level2_node_4_3 (level2_node_4_3)
level3_node_4_3_2 (level3_node_4_3_2)
However, using the recursive approach as below causes real issue with already borrowed errors when trying to remove branches etc.
Is there a way to achieve this filter function?
I have a test in the Rust playground.
fn tree_filter_node_objects<F>(node: &NodeRef, f: F) -> Vec<NodeRef>
where F: Fn(&str) -> bool + Copy {
let mut filtered_nodes: Vec<NodeRef> = vec![];
let mut borrow = node.borrow_mut();
if f(&borrow.id) {
filtered_nodes.push(node.clone());
}
for n in borrow.children.iter() {
let children_filtered = tree_filter_node_objects(n, f);
for c in children_filtered.iter() {
filtered_nodes.push(c.clone());
}
}
filtered_nodes
}
In the end I used this iterative approach.
pub fn tree_filter_node_dfs<F>(root: &NodeRef, f: F) -> Vec<NodeRef>
where F: Fn(&BmosHaystackObject) -> bool + Copy {
let mut filtered_nodes: Vec<NodeRef> = vec![];
let mut cur_node: Option<NodeRef> = Some(root.clone());
let mut last_visited_child: Option<NodeRef> = None;
let mut next_child: Option<NodeRef>;
let mut run_visit: bool = true;
while cur_node.is_some() {
if run_visit {
let n = cur_node.as_ref().unwrap();
if f(&n.borrow().object) {
}
}
if last_visited_child.is_none() {
let children = cur_node.as_ref().unwrap().borrow().children.clone();
if children.len() > 0 {
next_child = Some(children[0].clone());
}
else {
next_child = None;
}
}
else {
next_child = tree_filter_node_get_next_sibling(last_visited_child.as_ref().unwrap());
}
if next_child.is_some() {
last_visited_child = None;
cur_node = next_child;
run_visit = true;
}
else {
last_visited_child = cur_node;
cur_node = tree_node_parent_node(&last_visited_child.clone().unwrap().clone());
run_visit = false;
}
}
filtered_nodes
}

Run code after loop in background ends

I am running background thread in a while loop where I am doing some file handling task. I have some code after the loop. But the codes after the loop being executed before the loop ends (cause I am using background thread). Is there any way I can execute some code exactly after the loop ends?
Here is my code:
while i < testCount {
let task = AsyncTask(
backgroundTask: {
() -> Double in
// some file handling
return 234.09
},
afterTask: {
(val: Double) in
self.showVal(val)
}
);
task.execute();
i += 1
}
// I want to run this code after the loop ends
print("average: \(avg)")
showVal(avg)
My showVal(Double) function
func showVal(val: Double) {
print("val found: \(val)")
display.text = "\(val) found"
}
And here is my AsyncTask class
public class AsyncTask <BGParam,BGResult>{
private var pre:(()->())?;//Optional closure -> called before the backgroundTask
private var bgTask:(param:BGParam)->BGResult;//background task
private var post:((param:BGResult)->())?;//Optional closure -> called after the backgroundTask
public init(beforeTask: (()->())?=nil, backgroundTask: (param:BGParam)->BGResult, afterTask:((param:BGResult)->())?=nil){
self.pre=beforeTask;
self.bgTask=backgroundTask;
self.post=afterTask;
}
public func execute(param:BGParam){
pre?()//if beforeTask exists - invoke it before bgTask
dispatch_async(dispatch_get_global_queue(QOS_CLASS_BACKGROUND, 0), {
let bgResult=self.bgTask(param: param);//execute backgroundTask in background thread
if(self.post != nil){//if afterTask exists - invoke it in UI thread after bgTask
dispatch_async(dispatch_get_main_queue(),{self.post!(param: bgResult)});
}
});
}
}
Note: I am a beginner in Swift.
Edit:
I actually want to do:
A background file handling task
After the task ends, I want to show a text in a UILabel
I need to do this two tasks several times (say 100 times). If swift has easier methods for my purpose, please advise.
I'll try with some pseudo-code based our discussion to see if that might help you on the way. The manager-object would probably look a little something like this:
class RequestManager {
private let requiredNumberOfRequests: Int
private let completionHandler: ()->()
private var counter: Int {
didSet {
if counter == requiredNumberOfRequests {
completionHandler()
}
}
}
init(numberOfRequests: Int, completionHandler: ()->()) {
self.requiredNumberOfRequests = numberOfRequests
self.completionHandler = completionHandler
self.counter = 0
}
// Don't know exactly what you want to do here, but something like...
func performRequest(success: (value: Double)->()) {
// Network stuff, on success it looks like you'll have a Double(?)
let value: Double = 234.09
success(value: value)
counter += 1
}
}
Then, before your loop you'll create an instance of the requestManager
let manager = RequestManager(numberOfRequests: 100) { average in
print("average: \(average)")
}
...and call the performRequest as part of your loop.

What is the idiomatic way to implement caching on a function that is not a struct method?

I have an expensive function like this:
pub fn get_expensive_value(n: u64): u64 {
let ret = 0;
for 0 .. n {
// expensive stuff
}
ret
}
And it gets called very frequently with the same argument. It's pure, so that means it will return the same result and can make use of a cache.
If this was a struct method, I would add a member to the struct that acts as a cache, but it isn't. So my option seems to be to use a static:
static mut LAST_VAL: Option<(u64, u64)> = None;
pub fn cached_expensive(n: u64) -> u64 {
unsafe {
LAST_VAL = LAST_VAL.and_then(|(k, v)| {
if k == n {
Some((n,v))
} else {
None
}
}).or_else(|| {
Some((n, get_expensive_value(n)))
});
let (_, v) = LAST_VAL.unwrap();
v
}
}
Now, I've had to use unsafe. Instead of the static mut, I could put a RefCell in a const. But I'm not convinced that is any safer - it just avoids having to use the unsafe block. I thought about a Mutex, but I don't think that will get me thread safety either.
Redesigning the code to use a struct for storage is not really an option.
I think the best alternative is to use a global variable with a mutex. Using lazy_static makes it easy and allows the "global" declaration inside the function
pub fn cached_expensive(n: u64) -> u64 {
use std::sync::Mutex;
lazy_static! {
static ref LAST_VAL: Mutex<Option<(u64, u64)>> = Mutex::new(None);
}
let mut last = LAST_VAL.lock().unwrap();
let r = last.and_then(|(k, v)| {
if k == n {
Some((n, v))
} else {
None
}
}).or_else(|| Some((n, get_expensive_value(n))));
let (_, v) = r.unwrap();
*last = r;
v
}
You can also check out the cached project / crate. It memoizes the function with a simple macro.

Resources