Immutable Strings and Cloning - memory-management

I have the mindset keeping my Strings immutable, a single source of truth.
As I take the same mindset into Rust, I find I have to do a lot of cloning.
Since the Strings do not change, all the cloning is unnecessary.
Below there is an example of this and link to the relevant playground.
Borrowing does not seem like an option as I would have to deal with references and their lifetimes. My next thought is to use something like Rc or Cow struct. But wrapping all the Strings with something like Rc feels unnatural. In my limited experience of Rust, I have never seen any exposed ownership/memory management structs, that is Rc and Cow. I am curious how a more experience Rust developer would handle such a problem.
Is it actually natural in Rust to expose ownership/memory management structs like Rc and Cow? Should I be using slices?
use std::collections::HashSet;
#[derive(Debug)]
enum Check {
Known(String),
Duplicate(String),
Missing(String),
Unknown(String)
}
fn main() {
let known_values: HashSet<_> = [
"a".to_string(),
"b".to_string(),
"c".to_string()]
.iter().cloned().collect();
let provided_values = vec![
"a".to_string(),
"b".to_string(),
"z".to_string(),
"b".to_string()
];
let mut found = HashSet::new();
let mut check_values: Vec<_> = provided_values.iter().cloned()
.map(|v| {
if known_values.contains(&v) {
if found.contains(&v) {
Check::Duplicate(v)
} else {
found.insert(v.clone());
Check::Known(v)
}
} else {
Check::Unknown(v)
}
}).collect();
let missing = known_values.difference(&found);
check_values = missing
.cloned()
.fold(check_values, |mut cv, m| {
cv.push(Check::Missing(m));
cv
});
println!("check_values: {:#?}", check_values);
}

From the discussion in the comments of my question, all the cloning of immutable Strings in the example is correct. The cloning is necessary due to Rust handling memory via ownership rather than a reference in other languages.
At best, without using Rc, I can see some reduction in the cloning by using move semantics on provided_values.
Update: Some interesting reading
https://www.reddit.com/r/rust/comments/5xjl95/rc_or_cloning/
https://medium.com/swlh/ownership-managing-memory-in-rust-ce7bf3f5c9d5
How to create a Rust struct with string members?
Cow would not work in my example as it involves a borrowing of a reference. Rc would be what I would have to use. In my example everything has to be converted to Rc but I can see the potential that this could all be hidden away through encapsulation.
use std::collections::HashSet;
use std::rc::Rc;
#[derive(Debug)]
enum Check {
Known(Rc<String>),
Duplicate(Rc<String>),
Missing(Rc<String>),
Unknown(Rc<String>)
}
fn main() {
let known_values: HashSet<_> = [
Rc::new("a".to_string()),
Rc::new("b".to_string()),
Rc::new("c".to_string())]
.iter().cloned().collect();
let provided_values = vec![
Rc::new("a".to_string()),
Rc::new("b".to_string()),
Rc::new("z".to_string()),
Rc::new("b".to_string())
];
let mut found = HashSet::new();
let mut check_values: Vec<_> = provided_values.iter().cloned()
.map(|v| {
if known_values.contains(&v) {
if found.contains(&v) {
Check::Duplicate(v)
} else {
found.insert(v.clone());
Check::Known(v)
}
} else {
Check::Unknown(v)
}
}).collect();
let missing = known_values.difference(&found);
check_values = missing
.cloned()
.fold(check_values, |mut cv, m| {
cv.push(Check::Missing(m));
cv
});
println!("check_values: {:#?}", check_values);
}
Playground

Related

Trying to read MacOS clipboard contents

On my adventure to learn Rust I decided to try and print to the cli contents of the clipboard. I've done this before in Swift so thought I would have much issues in Rust.
However I'm having a hard time printing the contents of the returned NSArray. I've spent a few hours playing around with different functions but haven't made much progress.
The Swift code I have that works:
import Foundation
import AppKit
let pasteboard = NSPasteboard.general
func reload() -> [String]{
var clipboardItems: [String] = []
for element in pasteboard.pasteboardItems! {
if let str = element.string(forType: NSPasteboard.PasteboardType(rawValue: "public.utf8-plain-text")) {
clipboardItems.append(str)
}
}
return clipboardItems;
}
// Access the item in the clipboard
while true {
let firstClipboardItem = reload()
print(firstClipboardItem);
sleep(1);
}
Here is the Rust code:
use cocoa::appkit::{NSApp, NSPasteboard, NSPasteboardReading, NSPasteboardTypeString};
use cocoa::foundation::NSArray;
fn main() {
unsafe {
let app = NSApp();
let pid = NSPasteboard::generalPasteboard(app);
let changec = pid.changeCount();
let pid_item = pid.pasteboardItems();
if pid_item.count() != 0 {
let items = &*pid_item.objectAtIndex(0);
println!("{:?}", items);
}
println!("{:?}", *pid.stringForType(NSPasteboardTypeString));
}
}
The code above produces: *<NSPasteboardItem: 0x6000021a3de0>*
EDIT:
I've made a little progress but stuck on one last bit. I've managed to get the first UTF8 char out of the clipboard.
The issue I have is if I copy the text: World the system will loop the correct amount of times for the word length but will only print the first letter, in this case W. Output below:
TEXT 'W'
TEXT 'W'
TEXT 'W'
TEXT 'W'
TEXT 'W'
The bit I'm trying to get my head around is how to move to the next i8. I can't seem to find a way to point to the next i8.
The NSString function UTF8String() returns *const i8. I'm scratching my head with how one would walk the text.
use cocoa::appkit::{NSApp, NSPasteboard, NSPasteboardTypeString};
use cocoa::foundation::{NSArray, NSString};
fn main() {
unsafe {
let app = NSApp();
let pid = NSPasteboard::generalPasteboard(app);
let changec = pid.changeCount();
let nsarray_ptr = pid.pasteboardItems();
if nsarray_ptr.count() != 0 {
for i in 0..NSArray::count(nsarray_ptr) {
let raw_item_ptr = NSArray::objectAtIndex(nsarray_ptr, i);
let itm = raw_item_ptr.stringForType(NSPasteboardTypeString);
for u in 0..itm.len() {
let stri = itm.UTF8String();
println!("TEXT {:?}", *stri as u8 as char);
}
}
}
}
}
To everyone who's looked/commented on this so far thank you.
After reading some tests provided by cocoa I figured out what I needed to do.
The code below prints the contents of the clipboard. Thanks to those who pointed me in the right direction.
use cocoa::appkit::{NSApp, NSPasteboard, NSPasteboardTypeString};
use cocoa::foundation::{NSArray, NSString};
use std::{str, slice};
fn main() {
unsafe {
let app = NSApp();
let pid = NSPasteboard::generalPasteboard(app);
let nsarray_ptr = pid.pasteboardItems();
if nsarray_ptr.count() != 0 {
for i in 0..NSArray::count(nsarray_ptr) {
let raw_item_ptr = NSArray::objectAtIndex(nsarray_ptr, i);
let itm = raw_item_ptr.stringForType(NSPasteboardTypeString);
let stri = itm.UTF8String() as *const u8;
let clipboard = str::from_utf8(slice::from_raw_parts(stri, itm.len()))
.unwrap();
println!("{}", clipboard);
}
}
}
}

Rust proc_macro_derive (with syn crate) generating enum variant for matching

I'm a rust newbie, I started one week ago but this language is already very exciting. I'm rewritting a nodejs project in rust to get better performance and for the moment it's just crazy how faster it is.
I'm actually writting a proc_derive_macro (using the "syn" crate) to generate method on some specific struct. I'm almost done but i don't find how to generate enum variant. I will try to explain myself.
That's my code generation (using quote!)
quote! {
// The generated impl.
impl #name /*#ty_generics #where_clause*/ {
pub fn from_config(config: &IndicatorConfig) -> Result<Self, Error> {
let mut #name_lower = #name::default()?;
for (k, v) in config.opts.iter() {
println!("{:?} {:?}", k, v);
match (k.as_str(), v) {
("label", Values::String(val)) => {
#name_lower.label = val.clone();
}
("agg_time", Values::String(val)) => {
#name_lower.agg_time = Some(val.clone());
}
#(
(#fields_name_str, Values::Unteger(val)) => {
#name_lower.#fields_name = val.clone();
}
)*
(&_, _) => {}
}
}
#name_lower.init()?;
Ok(#name_lower)
}
}
};
As we can see I'm generating much of my code here
(#fields_name_str, Values::Unteger(val)) => {
#name_lower.#fields_name = val.clone();
}
But I didn't find a way to generate an "enum variant for the matching" (I don't know how we call that, i hope you will understand):
Values::String(val)
OR
Values::Unteger(val)
...
I'm writting a function which will create the variant matching according to parameter type found inside the struct:
fn create_variant_match(ty: &str) -> PatTupleStruct {
let variant = match ty {
"u32" => Ident::new("Unteger", Span::call_site()),
...
_ => unimplemented!(),
};
}
Actually I'm creating an Ident but I want to create the "enum variant match" -> Values::Unteger(val).
I watched the doc of the syn crate, spend hours trying to find a way, but it's a bit complex for my actual level, so I hope someone will explain me how to do that.
I found a simple way of doing that. Just need to parse a string (which i can format before) using the syn parser.
Didn't think about it before was trying to construct the Expr by hand (a bit stupid ^^)
syn::parse_str::<Expr>("Values::Unteger(val)")
which will generate the Expr needed

How do I convert a C-style enum generated by bindgen to another enum?

I am creating bindings in Rust for a C library and Bindgen generated enums like:
// Rust
#[derive(Debug, Copy, Clone, PartialEq, Eq, Hash)]
pub enum rmw_qos_history_policy_t {
RMW_QOS_POLICY_HISTORY_SYSTEM_DEFAULT = 0,
RMW_QOS_POLICY_HISTORY_KEEP_LAST = 1,
RMW_QOS_POLICY_HISTORY_KEEP_ALL = 2,
RMW_QOS_POLICY_HISTORY_UNKNOWN = 3,
}
I need to convert these to:
// Rust
pub enum QoSHistoryPolicy {
SystemDefault = 0,
KeepLast = 1,
KeepAll = 2,
Unknown = 3,
}
When importing constant values from this C library:
// C library
const rmw_qos_history_policy_t some_value_from_C = RMW_QOS_POLICY_HISTORY_SYSTEM_DEFAULT;
I would like to do something like:
let some_value: QoSHistoryPolicy = some_value_from_C;
How can I go about it?
The compiler does not inspect enums for ABI compatibility, and as such does not provide a direct way to convert values between these types. A few possible solutions follow.
1. One-by-one matching
This is trivial and safe, albeit leading to exhaustive code.
impl From<rmw_qos_history_policy_t> for QoSHistoryPolicy {
fn from(x: rmw_qos_history_policy_t) -> Self {
use rmw_qos_history_policy_t::*;
match x {
RMW_QOS_POLICY_HISTORY_SYSTEM_DEFAULT => QoSHistoryPolicy::SystemDefault,
RMW_QOS_POLICY_HISTORY_KEEP_LAST => QoSHistoryPolicy::KeepLast,
RMW_QOS_POLICY_HISTORY_KEEP_ALL => QoSHistoryPolicy::KeepAll,
RMW_QOS_POLICY_HISTORY_UNKNOWN => QoSHistoryPolicy::Unknown,
}
}
}
2. Casting + FromPrimitive
Rust allows you to convert field-less enums into an integer type using the as operator. The opposite conversion is not always safe however. Derive FromPrimitive using the num crate to obtain the missing piece.
#[derive(FromPrimitive)]
pub enum QoSHistoryPolicy { ... }
impl From<rmw_qos_history_policy_t> for QoSHistoryPolicy {
fn from(x: rmw_qos_history_policy_t) -> Self {
FromPrimitive::from_u32(x as _).expect("1:1 enum variant matching, all good")
}
}
3. Need an enum?
In the event that you just want an abstraction to low-level bindings, you might go without a new enum type.
#[repr(transparent)]
pub struct QoSHistoryPolicy(rmw_qos_history_policy_t);
The type above contains the same information and binary representation, but can expose an encapsulated API. The conversion from the low-level type to the high-level type becomes trivial. The main downside is that you lose pattern matching over its variants.
4. You're on your own
When absolutely sure that the two enums are equivalent in their binary representation, you can transmute between them. The compiler won't help you here, this is far from recommended.
unsafe {
let policy: QoSHistoryPolicy = std::mem::transmute(val);
}
See also:
How do I match enum values with an integer?
It looks to be a good candidate for the From trait on QoSHistoryPolicy.
impl From<rmw_qos_history_policy_t> for QoSHistoryPolicy {
fn from(raw: rmw_qos_history_policy_t) -> Self {
match raw {
rmw_qos_history_policy_t::RMW_QOS_POLICY_HISTORY_SYSTEM_DEFAULT => QoSHistoryPolicy::SystemDefault,
rmw_qos_history_policy_t::RMW_QOS_POLICY_HISTORY_KEEP_LAST => QoSHistoryPolicy::KeepLast,
rmw_qos_history_policy_t::RMW_QOS_POLICY_HISTORY_KEEP_ALL => QoSHistoryPolicy::KeepAll,
rmw_qos_history_policy_t::RMW_QOS_POLICY_HISTORY_UNKNOWN => QoSHistoryPolicy::Unknown
}
}
}
so this should now work
let some_value: QoSHistoryPolicy = some_value_from_C.into();

Get the current memory usage of a variable? [duplicate]

I notice that Rust's test has a benchmark mode that will measure execution time in ns/iter, but I could not find a way to measure memory usage.
How would I implement such a benchmark? Let us assume for the moment that I only care about heap memory at the moment (though stack usage would also certainly be interesting).
Edit: I found this issue which asks for the exact same thing.
You can use the jemalloc allocator to print the allocation statistics. For example,
Cargo.toml:
[package]
name = "stackoverflow-30869007"
version = "0.1.0"
edition = "2018"
[dependencies]
jemallocator = "0.5"
jemalloc-sys = {version = "0.5", features = ["stats"]}
libc = "0.2"
src/main.rs:
use libc::{c_char, c_void};
use std::ptr::{null, null_mut};
#[global_allocator]
static ALLOC: jemallocator::Jemalloc = jemallocator::Jemalloc;
extern "C" fn write_cb(_: *mut c_void, message: *const c_char) {
print!("{}", String::from_utf8_lossy(unsafe {
std::ffi::CStr::from_ptr(message as *const i8).to_bytes()
}));
}
fn mem_print() {
unsafe { jemalloc_sys::malloc_stats_print(Some(write_cb), null_mut(), null()) }
}
fn main() {
mem_print();
let _heap = Vec::<u8>::with_capacity (1024 * 128);
mem_print();
}
In a single-threaded program that should allow you to get a good measurement of how much memory a structure takes. Just print the statistics before the structure is created and after and calculate the difference.
(The "total:" of "allocated" in particular.)
You can also use Valgrind (Massif) to get the heap profile. It works just like with any other C program. Make sure you have debug symbols enabled in the executable (e.g. using debug build or custom Cargo configuration). You can use, say, http://massiftool.sourceforge.net/ to analyse the generated heap profile.
(I verified this to work on Debian Jessie, in a different setting your mileage may vary).
(In order to use Rust with Valgrind you'll probably have to switch back to the system allocator).
P.S. There is now also a better DHAT.
jemalloc can be told to dump a memory profile. You can probably do this with the Rust FFI but I haven't investigated this route.
As far as measuring data structure sizes is concerned, this can be done fairly easily through the use of traits and a small compiler plugin. Nicholas Nethercote in his article Measuring data structure sizes: Firefox (C++) vs. Servo (Rust) demonstrates how it works in Servo; it boils down to adding #[derive(HeapSizeOf)] (or occasionally a manual implementation) to each type you care about. This is a good way of allowing precise checking of where memory is going, too; it is, however, comparatively intrusive as it requires changes to be made in the first place, where something like jemalloc’s print_stats() doesn’t. Still, for good and precise measurements, it’s a sound approach.
Currently, the only way to get allocation information is the alloc::heap::stats_print(); method (behind #![feature(alloc)]), which calls jemalloc's print_stats().
I'll update this answer with further information once I have learned what the output means.
(Note that I'm not going to accept this answer, so if someone comes up with a better solution...)
Now there is jemalloc_ctl crate which provides convenient safe typed API. Add it to your Cargo.toml:
[dependencies]
jemalloc-ctl = "0.3"
jemallocator = "0.3"
Then configure jemalloc to be global allocator and use methods from jemalloc_ctl::stats module:
jemalloc_ctl::stats::allocated
jemalloc_ctl::stats::resident
Here is official example:
use std::thread;
use std::time::Duration;
use jemalloc_ctl::{stats, epoch};
#[global_allocator]
static ALLOC: jemallocator::Jemalloc = jemallocator::Jemalloc;
fn main() {
loop {
// many statistics are cached and only updated when the epoch is advanced.
epoch::advance().unwrap();
let allocated = stats::allocated::read().unwrap();
let resident = stats::resident::read().unwrap();
println!("{} bytes allocated/{} bytes resident", allocated, resident);
thread::sleep(Duration::from_secs(10));
}
}
There's a neat little solution someone put together here: https://github.com/discordance/trallocator/blob/master/src/lib.rs
use std::alloc::{GlobalAlloc, Layout};
use std::sync::atomic::{AtomicU64, Ordering};
pub struct Trallocator<A: GlobalAlloc>(pub A, AtomicU64);
unsafe impl<A: GlobalAlloc> GlobalAlloc for Trallocator<A> {
unsafe fn alloc(&self, l: Layout) -> *mut u8 {
self.1.fetch_add(l.size() as u64, Ordering::SeqCst);
self.0.alloc(l)
}
unsafe fn dealloc(&self, ptr: *mut u8, l: Layout) {
self.0.dealloc(ptr, l);
self.1.fetch_sub(l.size() as u64, Ordering::SeqCst);
}
}
impl<A: GlobalAlloc> Trallocator<A> {
pub const fn new(a: A) -> Self {
Trallocator(a, AtomicU64::new(0))
}
pub fn reset(&self) {
self.1.store(0, Ordering::SeqCst);
}
pub fn get(&self) -> u64 {
self.1.load(Ordering::SeqCst)
}
}
Usage: (from: https://www.reddit.com/r/rust/comments/8z83wc/comment/e2h4dp9)
// needed for Trallocator struct (as written, anyway)
#![feature(integer_atomics, const_fn_trait_bound)]
use std::alloc::System;
#[global_allocator]
static GLOBAL: Trallocator<System> = Trallocator::new(System);
fn main() {
GLOBAL.reset();
println!("memory used: {} bytes", GLOBAL.get());
{
let mut vec = vec![1, 2, 3, 4];
for i in 5..20 {
vec.push(i);
println!("memory used: {} bytes", GLOBAL.get());
}
for v in vec {
println!("{}", v);
}
}
// For some reason this does not print zero =/
println!("memory used: {} bytes", GLOBAL.get());
}
I've just started using it, and it seems to work well! Straight-forward, realtime, requires no external packages, and doesn't require changing your base memory allocator.
It's also nice that, because it's intercepting the allocate/deallocate calls, you should be able to add custom logic if desired (eg. if memory usage goes above X, print the stack-trace to see what's triggering the allocations) -- although I haven't tried this yet.
I also haven't yet tested to see how much overhead this approach adds. If someone does a test for this, let me know!

Hand-over-hand locking with Rust

I'm trying to write an implementation of union-find in Rust. This is famously very simple to implement in languages like C, while still having a complex run time analysis.
I'm having trouble getting Rust's mutex semantics to allow iterative hand-over-hand locking.
Here's how I got where I am now.
First, this is a very simple implementation of part of the structure I want in C:
#include <stdlib.h>
struct node {
struct node * parent;
};
struct node * create(struct node * parent) {
struct node * ans = malloc(sizeof(struct node));
ans->parent = parent;
return ans;
}
struct node * find_root(struct node * x) {
while (x->parent) {
x = x->parent;
}
return x;
}
int main() {
struct node * foo = create(NULL);
struct node * bar = create(foo);
struct node * baz = create(bar);
baz->parent = find_root(bar);
}
Note that the structure of the pointers is that of an inverted tree; multiple pointers may point at a single location, and there are no cycles.
At this point, there is no path compression.
Here is a Rust translation. I chose to use Rust's reference-counted pointer type to support the inverted tree type I referenced above.
Note that this implementation is much more verbose, possibly due to the increased safety that Rust offers, but possibly due to my inexperience with Rust.
use std::rc::Rc;
struct Node {
parent: Option<Rc<Node>>
}
fn create(parent: Option<Rc<Node>>) -> Node {
Node {parent: parent.clone()}
}
fn find_root(x: Rc<Node>) -> Rc<Node> {
let mut ans = x.clone();
while ans.parent.is_some() {
ans = ans.parent.clone().unwrap();
}
ans
}
fn main() {
let foo = Rc::new(create(None));
let bar = Rc::new(create(Some(foo.clone())));
let mut prebaz = create(Some(bar.clone()));
prebaz.parent = Some(find_root(bar.clone()));
}
Path compression re-parents each node along a path to the root every time find_root is called. To add this feature to the C code, only two new small functions are needed:
void change_root(struct node * x, struct node * root) {
while (x) {
struct node * tmp = x->parent;
x->parent = root;
x = tmp;
}
}
struct node * root(struct node * x) {
struct node * ans = find_root(x);
change_root(x, ans);
return ans;
}
The function change_root does all the re-parenting, while the function root is just a wrapper to use the results of find_root to re-parent the nodes on the path to the root.
In order to do this in Rust, I decided I would have to use a Mutex rather than just a reference counted pointer, since the Rc interface only allows mutable access by copy-on-write when more than one pointer to the item is live. As a result, all of the code would have to change. Before even getting to the path compression part, I got hung up on find_root:
use std::sync::{Mutex,Arc};
struct Node {
parent: Option<Arc<Mutex<Node>>>
}
fn create(parent: Option<Arc<Mutex<Node>>>) -> Node {
Node {parent: parent.clone()}
}
fn find_root(x: Arc<Mutex<Node>>) -> Arc<Mutex<Node>> {
let mut ans = x.clone();
let mut inner = ans.lock();
while inner.parent.is_some() {
ans = inner.parent.clone().unwrap();
inner = ans.lock();
}
ans.clone()
}
This produces the error (with 0.12.0)
error: cannot assign to `ans` because it is borrowed
ans = inner.parent.clone().unwrap();
note: borrow of `ans` occurs here
let mut inner = ans.lock();
What I think I need here is hand-over-hand locking. For the path A -> B -> C -> ..., I need to lock A, lock B, unlock A, lock C, unlock B, ... Of course, I could keep all of the locks open: lock A, lock B, lock C, ... unlock C, unlock B, unlock A, but this seems inefficient.
However, Mutex does not offer unlock, and uses RAII instead. How can I achieve hand-over-hand locking in Rust without being able to directly call unlock?
EDIT: As the comments noted, I could use Rc<RefCell<Node>> rather than Arc<Mutex<Node>>. Doing so leads to the same compiler error.
For clarity about what I'm trying to avoid by using hand-over-hand locking, here is a RefCell version that compiles but used space linear in the length of the path.
fn find_root(x: Rc<RefCell<Node>>) -> Rc<RefCell<Node>> {
let mut inner : RefMut<Node> = x.borrow_mut();
if inner.parent.is_some() {
find_root(inner.parent.clone().unwrap())
} else {
x.clone()
}
}
We can pretty easily do full hand-over-hand locking as we traverse this list using just a bit of unsafe, which is necessary to tell the borrow checker a small bit of insight that we are aware of, but that it can't know.
But first, let's clearly formulate the problem:
We want to traverse a linked list whose nodes are stored as Arc<Mutex<Node>> to get the last node in the list
We need to lock each node in the list as we go along the way such that another concurrent traversal has to follow strictly behind us and cannot muck with our progress.
Before we get into the nitty-gritty details, let's try to write the signature for this function:
fn find_root(node: Arc<Mutex<Node>>) -> Arc<Mutex<Node>>;
Now that we know our goal, we can start to get into the implementation - here's a first attempt:
fn find_root(incoming: Arc<Mutex<Node>>) -> Arc<Mutex<Node>> {
// We have to separate this from incoming since the lock must
// be borrowed from incoming, not this local node.
let mut node = incoming.clone();
let mut lock = incoming.lock();
// Could use while let but that leads to borrowing issues.
while lock.parent.is_some() {
node = lock.parent.as_ref().unwrap().clone(); // !! uh-oh !!
lock = node.lock();
}
node
}
If we try to compile this, rustc will error on the line marked !! uh-oh !!, telling us that we can't move out of node while lock still exists, since lock is borrowing node. This is not a spurious error! The data in lock might go away as soon as node does - it's only because we know that we can keep the data lock is pointing to valid and in the same memory location even if we move node that we can fix this.
The key insight here is that the lifetime of data contained within an Arc is dynamic, and it is hard for the borrow checker to make the inferences we can about exactly how long data inside an Arc is valid.
This happens every once in a while when writing rust; you have more knowledge about the lifetime and organization of your data than rustc, and you want to be able to express that knowledge to the compiler, effectively saying "trust me". Enter: unsafe - our way of telling the compiler that we know more than it, and it should allow us to inform it of the guarantees that we know but it doesn't.
In this case, the guarantee is pretty simple - we are going to replace node while lock still exists, but we are not going to ensure that the data inside lock continues to be valid even though node goes away. To express this guarantee we can use mem::transmute, a function which allows us to reinterpret the type of any variable, by just using it to change the lifetime of the lock returned by node to be slightly longer than it actually is.
To make sure we keep our promise, we are going to use another handoff variable to hold node while we reassign lock - even though this moves node (changing its address) and the borrow checker will be angry at us, we know it's ok since lock doesn't point at node, it points at data inside of node, whose address (in this case, since it's behind an Arc) will not change.
Before we get to the solution, it's important to note that the trick we are using here is only valid because we are using an Arc. The borrow checker is warning us of a possibly serious error - if the Mutex was held inline and not in an Arc, this error would be a correct prevention of a use-after-free, where the MutexGuard held in lock would attempt to unlock a Mutex which has already been dropped, or at least moved to another memory location.
use std::mem;
use std::sync::{Arc, Mutex};
fn find_root(incoming: Arc<Mutex<Node>>) -> Arc<Mutex<Node>> {
let mut node = incoming.clone();
let mut handoff_node;
let mut lock = incoming.lock().unwrap();
// Could use while let but that leads to borrowing issues.
while lock.parent.is_some() {
// Keep the data in node around by holding on to this `Arc`.
handoff_node = node;
node = lock.parent.as_ref().unwrap().clone();
// We are going to move out of node while this lock is still around,
// but since we kept the data around it's ok.
lock = unsafe { mem::transmute(node.lock().unwrap()) };
}
node
}
And, just like that, rustc is happy, and we have hand-over-hand locking, since the last lock is released only after we have acquired the new lock!
There is one unanswered question in this implementation which I have not yet received an answer too, which is whether the drop of the old value and assignment of a new value to a variable is a guaranteed to be atomic - if not, there is a race condition where the old lock is released before the new lock is acquired in the assignment of lock. It's pretty trivial to work around this by just having another holdover_lock variable and moving the old lock into it before reassigning, then dropping it after reassigning lock.
Hopefully this fully addresses your question and shows how unsafe can be used to work around "deficiencies" in the borrow checker when you really do know more. I would still like to want that the cases where you know more than the borrow checker are rare, and transmuting lifetimes is not "usual" behavior.
Using Mutex in this way, as you can see, is pretty complex and you have to deal with many, many, possible sources of a race condition and I may not even have caught all of them! Unless you really need this structure to be accessible from many threads, it would probably be best to just use Rc and RefCell, if you need it, as this makes things much easier.
I believe this to fit the criteria of hand-over-hand locking.
use std::sync::Mutex;
fn main() {
// Create a set of mutexes to lock hand-over-hand
let mutexes = Vec::from_fn(4, |_| Mutex::new(false));
// Lock the first one
let val_0 = mutexes[0].lock();
if !*val_0 {
// Lock the second one
let mut val_1 = mutexes[1].lock();
// Unlock the first one
drop(val_0);
// Do logic
*val_1 = true;
}
for mutex in mutexes.iter() {
println!("{}" , *mutex.lock());
}
}
Edit #1
Does it work when access to lock n+1 is guarded by lock n?
If you mean something that could be shaped like the following, then I think the answer is no.
struct Level {
data: bool,
child: Option<Mutex<Box<Level>>>,
}
However, it is sensible that this should not work. When you wrap an object in a mutex, then you are saying "The entire object is safe". You can't say both "the entire pie is safe" and "I'm eating the stuff below the crust" at the same time. Perhaps you jettison the safety by creating a Mutex<()> and lock that?
This is still not the answer your literal question of to how to do hand-over-hand locking, which should only be important in a concurrent setting (or if someone else forced you to use Mutex references to nodes). It is instead how to do this with Rc and RefCell, which you seem to be interested in.
RefCell only allows mutable writes when one mutable reference is held. Importantly, the Rc<RefCell<Node>> objects are not mutable references. The mutable references it is talking about are the results from calling borrow_mut() on the Rc<RefCell<Node>>object, and as long as you do that in a limited scope (e.g. the body of the while loop), you'll be fine.
The important thing happening in path compression is that the next Rc object will keep the rest of the chain alive while you swing the parent pointer for node to point at root. However, it is not a reference in the Rust sense of the word.
struct Node
{
parent: Option<Rc<RefCell<Node>>>
}
fn find_root(mut node: Rc<RefCell<Node>>) -> Rc<RefCell<Node>>
{
while let Some(parent) = node.borrow().parent.clone()
{
node = parent;
}
return node;
}
fn path_compress(mut node: Rc<RefCell<Node>>, root: Rc<RefCell<Node>>)
{
while node.borrow().parent.is_some()
{
let next = node.borrow().parent.clone().unwrap();
node.borrow_mut().parent = Some(root.clone());
node = next;
}
}
This runs fine for me with the test harness I used, though there may still be bugs. It certainly compiles and runs without a panic! due to trying to borrow_mut() something that is already borrowed. It may actually produce the right answer, that's up to you.
On IRC, Jonathan Reem pointed out that inner is borrowing until the end of its lexical scope, which is too far for what I was asking. Inlining it produces the following, which compiles without error:
fn find_root(x: Arc<Mutex<Node>>) -> Arc<Mutex<Node>> {
let mut ans = x.clone();
while ans.lock().parent.is_some() {
ans = ans.lock().parent.clone().unwrap();
}
ans
}
EDIT: As Francis Gagné points out, this has a race condition, since the lock doesn't extend long enough. Here's a modified version that only has one lock() call; perhaps it is not vulnerable to the same problem.
fn find_root(x: Arc<Mutex<Node>>) -> Arc<Mutex<Node>> {
let mut ans = x.clone();
loop {
ans = {
let tmp = ans.lock();
match tmp.parent.clone() {
None => break,
Some(z) => z
}
}
}
ans
}
EDIT 2: This only holds one lock at a time, and so is racey. I still don't know how to do hand-over-hand locking.
As pointed out by Frank Sherry and others, you shouldn't use Arc/Mutex when single threaded. But his code was outdated, so here is the new one (for version 1.0.0alpha2).
This does not take linear space either (like the recursive code given in the question).
struct Node {
parent: Option<Rc<RefCell<Node>>>
}
fn find_root(node: Rc<RefCell<Node>>) -> Rc<RefCell<Node>> {
let mut ans = node.clone(); // Rc<RefCell<Node>>
loop {
ans = {
let ans_ref = ans.borrow(); // std::cell::Ref<Node>
match ans_ref.parent.clone() {
None => break,
Some(z) => z
}
} // ans_ref goes out of scope, and ans becomes mutable
}
ans
}
fn path_compress(mut node: Rc<RefCell<Node>>, root: Rc<RefCell<Node>>) {
while node.borrow().parent.is_some() {
let next = {
let node_ref = node.borrow();
node_ref.parent.clone().unwrap()
};
node.borrow_mut().parent = Some(root.clone());
// RefMut<Node> from borrow_mut() is out of scope here...
node = next; // therefore we can mutate node
}
}
Note for beginners: Pointers are automatically dereferenced by dot operator. ans.borrow() actually means (*ans).borrow(). I intentionally used different styles for the two functions.
Although not the answer to your literal question (hand-over locking), union-find with weighted-union and path-compression can be very simple in Rust:
fn unionfind<I: Iterator<(uint, uint)>>(mut iterator: I, nodes: uint) -> Vec<uint>
{
let mut root = Vec::from_fn(nodes, |x| x);
let mut rank = Vec::from_elem(nodes, 0u8);
for (mut x, mut y) in iterator
{
// find roots for x and y; do path compression on look-ups
while (x != root[x]) { root[x] = root[root[x]]; x = root[x]; }
while (y != root[y]) { root[y] = root[root[y]]; y = root[y]; }
if x != y
{
// weighted union swings roots
match rank[x].cmp(&rank[y])
{
Less => root[x] = y,
Greater => root[y] = x,
Equal =>
{
root[y] = x;
rank[x] += 1
},
}
}
}
}
Maybe the meta-point is that the union-find algorithm may not be the best place to handle node ownership, and by using references to existing memory (in this case, by just using uint identifiers for the nodes) without affecting the lifecycle of the nodes makes for a much simpler implementation, if you can get away with it of course.

Resources