I'm trying to write an implementation of union-find in Rust. This is famously very simple to implement in languages like C, while still having a complex run time analysis.
I'm having trouble getting Rust's mutex semantics to allow iterative hand-over-hand locking.
Here's how I got where I am now.
First, this is a very simple implementation of part of the structure I want in C:
#include <stdlib.h>
struct node {
struct node * parent;
};
struct node * create(struct node * parent) {
struct node * ans = malloc(sizeof(struct node));
ans->parent = parent;
return ans;
}
struct node * find_root(struct node * x) {
while (x->parent) {
x = x->parent;
}
return x;
}
int main() {
struct node * foo = create(NULL);
struct node * bar = create(foo);
struct node * baz = create(bar);
baz->parent = find_root(bar);
}
Note that the structure of the pointers is that of an inverted tree; multiple pointers may point at a single location, and there are no cycles.
At this point, there is no path compression.
Here is a Rust translation. I chose to use Rust's reference-counted pointer type to support the inverted tree type I referenced above.
Note that this implementation is much more verbose, possibly due to the increased safety that Rust offers, but possibly due to my inexperience with Rust.
use std::rc::Rc;
struct Node {
parent: Option<Rc<Node>>
}
fn create(parent: Option<Rc<Node>>) -> Node {
Node {parent: parent.clone()}
}
fn find_root(x: Rc<Node>) -> Rc<Node> {
let mut ans = x.clone();
while ans.parent.is_some() {
ans = ans.parent.clone().unwrap();
}
ans
}
fn main() {
let foo = Rc::new(create(None));
let bar = Rc::new(create(Some(foo.clone())));
let mut prebaz = create(Some(bar.clone()));
prebaz.parent = Some(find_root(bar.clone()));
}
Path compression re-parents each node along a path to the root every time find_root is called. To add this feature to the C code, only two new small functions are needed:
void change_root(struct node * x, struct node * root) {
while (x) {
struct node * tmp = x->parent;
x->parent = root;
x = tmp;
}
}
struct node * root(struct node * x) {
struct node * ans = find_root(x);
change_root(x, ans);
return ans;
}
The function change_root does all the re-parenting, while the function root is just a wrapper to use the results of find_root to re-parent the nodes on the path to the root.
In order to do this in Rust, I decided I would have to use a Mutex rather than just a reference counted pointer, since the Rc interface only allows mutable access by copy-on-write when more than one pointer to the item is live. As a result, all of the code would have to change. Before even getting to the path compression part, I got hung up on find_root:
use std::sync::{Mutex,Arc};
struct Node {
parent: Option<Arc<Mutex<Node>>>
}
fn create(parent: Option<Arc<Mutex<Node>>>) -> Node {
Node {parent: parent.clone()}
}
fn find_root(x: Arc<Mutex<Node>>) -> Arc<Mutex<Node>> {
let mut ans = x.clone();
let mut inner = ans.lock();
while inner.parent.is_some() {
ans = inner.parent.clone().unwrap();
inner = ans.lock();
}
ans.clone()
}
This produces the error (with 0.12.0)
error: cannot assign to `ans` because it is borrowed
ans = inner.parent.clone().unwrap();
note: borrow of `ans` occurs here
let mut inner = ans.lock();
What I think I need here is hand-over-hand locking. For the path A -> B -> C -> ..., I need to lock A, lock B, unlock A, lock C, unlock B, ... Of course, I could keep all of the locks open: lock A, lock B, lock C, ... unlock C, unlock B, unlock A, but this seems inefficient.
However, Mutex does not offer unlock, and uses RAII instead. How can I achieve hand-over-hand locking in Rust without being able to directly call unlock?
EDIT: As the comments noted, I could use Rc<RefCell<Node>> rather than Arc<Mutex<Node>>. Doing so leads to the same compiler error.
For clarity about what I'm trying to avoid by using hand-over-hand locking, here is a RefCell version that compiles but used space linear in the length of the path.
fn find_root(x: Rc<RefCell<Node>>) -> Rc<RefCell<Node>> {
let mut inner : RefMut<Node> = x.borrow_mut();
if inner.parent.is_some() {
find_root(inner.parent.clone().unwrap())
} else {
x.clone()
}
}
We can pretty easily do full hand-over-hand locking as we traverse this list using just a bit of unsafe, which is necessary to tell the borrow checker a small bit of insight that we are aware of, but that it can't know.
But first, let's clearly formulate the problem:
We want to traverse a linked list whose nodes are stored as Arc<Mutex<Node>> to get the last node in the list
We need to lock each node in the list as we go along the way such that another concurrent traversal has to follow strictly behind us and cannot muck with our progress.
Before we get into the nitty-gritty details, let's try to write the signature for this function:
fn find_root(node: Arc<Mutex<Node>>) -> Arc<Mutex<Node>>;
Now that we know our goal, we can start to get into the implementation - here's a first attempt:
fn find_root(incoming: Arc<Mutex<Node>>) -> Arc<Mutex<Node>> {
// We have to separate this from incoming since the lock must
// be borrowed from incoming, not this local node.
let mut node = incoming.clone();
let mut lock = incoming.lock();
// Could use while let but that leads to borrowing issues.
while lock.parent.is_some() {
node = lock.parent.as_ref().unwrap().clone(); // !! uh-oh !!
lock = node.lock();
}
node
}
If we try to compile this, rustc will error on the line marked !! uh-oh !!, telling us that we can't move out of node while lock still exists, since lock is borrowing node. This is not a spurious error! The data in lock might go away as soon as node does - it's only because we know that we can keep the data lock is pointing to valid and in the same memory location even if we move node that we can fix this.
The key insight here is that the lifetime of data contained within an Arc is dynamic, and it is hard for the borrow checker to make the inferences we can about exactly how long data inside an Arc is valid.
This happens every once in a while when writing rust; you have more knowledge about the lifetime and organization of your data than rustc, and you want to be able to express that knowledge to the compiler, effectively saying "trust me". Enter: unsafe - our way of telling the compiler that we know more than it, and it should allow us to inform it of the guarantees that we know but it doesn't.
In this case, the guarantee is pretty simple - we are going to replace node while lock still exists, but we are not going to ensure that the data inside lock continues to be valid even though node goes away. To express this guarantee we can use mem::transmute, a function which allows us to reinterpret the type of any variable, by just using it to change the lifetime of the lock returned by node to be slightly longer than it actually is.
To make sure we keep our promise, we are going to use another handoff variable to hold node while we reassign lock - even though this moves node (changing its address) and the borrow checker will be angry at us, we know it's ok since lock doesn't point at node, it points at data inside of node, whose address (in this case, since it's behind an Arc) will not change.
Before we get to the solution, it's important to note that the trick we are using here is only valid because we are using an Arc. The borrow checker is warning us of a possibly serious error - if the Mutex was held inline and not in an Arc, this error would be a correct prevention of a use-after-free, where the MutexGuard held in lock would attempt to unlock a Mutex which has already been dropped, or at least moved to another memory location.
use std::mem;
use std::sync::{Arc, Mutex};
fn find_root(incoming: Arc<Mutex<Node>>) -> Arc<Mutex<Node>> {
let mut node = incoming.clone();
let mut handoff_node;
let mut lock = incoming.lock().unwrap();
// Could use while let but that leads to borrowing issues.
while lock.parent.is_some() {
// Keep the data in node around by holding on to this `Arc`.
handoff_node = node;
node = lock.parent.as_ref().unwrap().clone();
// We are going to move out of node while this lock is still around,
// but since we kept the data around it's ok.
lock = unsafe { mem::transmute(node.lock().unwrap()) };
}
node
}
And, just like that, rustc is happy, and we have hand-over-hand locking, since the last lock is released only after we have acquired the new lock!
There is one unanswered question in this implementation which I have not yet received an answer too, which is whether the drop of the old value and assignment of a new value to a variable is a guaranteed to be atomic - if not, there is a race condition where the old lock is released before the new lock is acquired in the assignment of lock. It's pretty trivial to work around this by just having another holdover_lock variable and moving the old lock into it before reassigning, then dropping it after reassigning lock.
Hopefully this fully addresses your question and shows how unsafe can be used to work around "deficiencies" in the borrow checker when you really do know more. I would still like to want that the cases where you know more than the borrow checker are rare, and transmuting lifetimes is not "usual" behavior.
Using Mutex in this way, as you can see, is pretty complex and you have to deal with many, many, possible sources of a race condition and I may not even have caught all of them! Unless you really need this structure to be accessible from many threads, it would probably be best to just use Rc and RefCell, if you need it, as this makes things much easier.
I believe this to fit the criteria of hand-over-hand locking.
use std::sync::Mutex;
fn main() {
// Create a set of mutexes to lock hand-over-hand
let mutexes = Vec::from_fn(4, |_| Mutex::new(false));
// Lock the first one
let val_0 = mutexes[0].lock();
if !*val_0 {
// Lock the second one
let mut val_1 = mutexes[1].lock();
// Unlock the first one
drop(val_0);
// Do logic
*val_1 = true;
}
for mutex in mutexes.iter() {
println!("{}" , *mutex.lock());
}
}
Edit #1
Does it work when access to lock n+1 is guarded by lock n?
If you mean something that could be shaped like the following, then I think the answer is no.
struct Level {
data: bool,
child: Option<Mutex<Box<Level>>>,
}
However, it is sensible that this should not work. When you wrap an object in a mutex, then you are saying "The entire object is safe". You can't say both "the entire pie is safe" and "I'm eating the stuff below the crust" at the same time. Perhaps you jettison the safety by creating a Mutex<()> and lock that?
This is still not the answer your literal question of to how to do hand-over-hand locking, which should only be important in a concurrent setting (or if someone else forced you to use Mutex references to nodes). It is instead how to do this with Rc and RefCell, which you seem to be interested in.
RefCell only allows mutable writes when one mutable reference is held. Importantly, the Rc<RefCell<Node>> objects are not mutable references. The mutable references it is talking about are the results from calling borrow_mut() on the Rc<RefCell<Node>>object, and as long as you do that in a limited scope (e.g. the body of the while loop), you'll be fine.
The important thing happening in path compression is that the next Rc object will keep the rest of the chain alive while you swing the parent pointer for node to point at root. However, it is not a reference in the Rust sense of the word.
struct Node
{
parent: Option<Rc<RefCell<Node>>>
}
fn find_root(mut node: Rc<RefCell<Node>>) -> Rc<RefCell<Node>>
{
while let Some(parent) = node.borrow().parent.clone()
{
node = parent;
}
return node;
}
fn path_compress(mut node: Rc<RefCell<Node>>, root: Rc<RefCell<Node>>)
{
while node.borrow().parent.is_some()
{
let next = node.borrow().parent.clone().unwrap();
node.borrow_mut().parent = Some(root.clone());
node = next;
}
}
This runs fine for me with the test harness I used, though there may still be bugs. It certainly compiles and runs without a panic! due to trying to borrow_mut() something that is already borrowed. It may actually produce the right answer, that's up to you.
On IRC, Jonathan Reem pointed out that inner is borrowing until the end of its lexical scope, which is too far for what I was asking. Inlining it produces the following, which compiles without error:
fn find_root(x: Arc<Mutex<Node>>) -> Arc<Mutex<Node>> {
let mut ans = x.clone();
while ans.lock().parent.is_some() {
ans = ans.lock().parent.clone().unwrap();
}
ans
}
EDIT: As Francis Gagné points out, this has a race condition, since the lock doesn't extend long enough. Here's a modified version that only has one lock() call; perhaps it is not vulnerable to the same problem.
fn find_root(x: Arc<Mutex<Node>>) -> Arc<Mutex<Node>> {
let mut ans = x.clone();
loop {
ans = {
let tmp = ans.lock();
match tmp.parent.clone() {
None => break,
Some(z) => z
}
}
}
ans
}
EDIT 2: This only holds one lock at a time, and so is racey. I still don't know how to do hand-over-hand locking.
As pointed out by Frank Sherry and others, you shouldn't use Arc/Mutex when single threaded. But his code was outdated, so here is the new one (for version 1.0.0alpha2).
This does not take linear space either (like the recursive code given in the question).
struct Node {
parent: Option<Rc<RefCell<Node>>>
}
fn find_root(node: Rc<RefCell<Node>>) -> Rc<RefCell<Node>> {
let mut ans = node.clone(); // Rc<RefCell<Node>>
loop {
ans = {
let ans_ref = ans.borrow(); // std::cell::Ref<Node>
match ans_ref.parent.clone() {
None => break,
Some(z) => z
}
} // ans_ref goes out of scope, and ans becomes mutable
}
ans
}
fn path_compress(mut node: Rc<RefCell<Node>>, root: Rc<RefCell<Node>>) {
while node.borrow().parent.is_some() {
let next = {
let node_ref = node.borrow();
node_ref.parent.clone().unwrap()
};
node.borrow_mut().parent = Some(root.clone());
// RefMut<Node> from borrow_mut() is out of scope here...
node = next; // therefore we can mutate node
}
}
Note for beginners: Pointers are automatically dereferenced by dot operator. ans.borrow() actually means (*ans).borrow(). I intentionally used different styles for the two functions.
Although not the answer to your literal question (hand-over locking), union-find with weighted-union and path-compression can be very simple in Rust:
fn unionfind<I: Iterator<(uint, uint)>>(mut iterator: I, nodes: uint) -> Vec<uint>
{
let mut root = Vec::from_fn(nodes, |x| x);
let mut rank = Vec::from_elem(nodes, 0u8);
for (mut x, mut y) in iterator
{
// find roots for x and y; do path compression on look-ups
while (x != root[x]) { root[x] = root[root[x]]; x = root[x]; }
while (y != root[y]) { root[y] = root[root[y]]; y = root[y]; }
if x != y
{
// weighted union swings roots
match rank[x].cmp(&rank[y])
{
Less => root[x] = y,
Greater => root[y] = x,
Equal =>
{
root[y] = x;
rank[x] += 1
},
}
}
}
}
Maybe the meta-point is that the union-find algorithm may not be the best place to handle node ownership, and by using references to existing memory (in this case, by just using uint identifiers for the nodes) without affecting the lifecycle of the nodes makes for a much simpler implementation, if you can get away with it of course.
Related
I am quite new to Rust. I'm trying to build a global cache using lru::LruCache and a RwLock for safety. It needs to be globally accessible based on my program's architecture.
//Size to take up 5MB
const CACHE_ENTRIES: usize = (GIBYTE as usize/ 200) / (BLOCK_SIZE);
pub type CacheEntry = LruCache<i32, Bytes>;
static mut CACHE : CacheEntry = LruCache::new(CACHE_ENTRIES);
lazy_static!{
static ref BLOCKCACHE: RwLock<CacheEntry> = RwLock::new(CACHE);
}
//this gets called from another setup function
async fn download_block(&self,
context: &BlockContext,
buf: &mut [u8],
count: u32, //bytes to read
offset: u64,
buf_index: u32
) -> Result<u32> {
let block_index = context.block_index;
let block_data : Bytes = match {BLOCKCACHE.read().unwrap().get(&block_index)} {
Some(data) => data.to_owned(),
None => download_block_from_remote(context).await.unwrap().to_owned(),
};
//the rest of the function does stuff with the block_data
}
async fn download_block_from_remote(context: &BlockContext) -> Result<Bytes>{
//code to download block data from remote into block_data
{BLOCKCACHE.write().unwrap().put(block_index, block_data.clone())};
Ok(block_data)
}
Right now I am getting an error on this line:
let block_data = match {BLOCKCACHE.read().unwrap().get(&block_index)} {
"cannot borrow as mutable" for the value inside the braces.
"help: trait DerefMut is required to modify through a dereference, but it is not implemented for std::sync::RwLockReadGuard<'_, LruCache<i32, bytes::Bytes>>"
I have gotten some other errors involving ownership and mutability, but I can't seem to get rid of this. Is anyone able to offer guidance on how to get this to work, or set me on the right path if it's just not possible/feasible?
The problem here is that LruCache::get() requires mutable access to the cache object. (Reason: it's a cache object, which has to change its internal state when querying things for the actual caching)
Therefore, if you use an RwLock, you need to use the write() method instead of the read() method.
That said, the fact that get() requires mutable access makes the entire RwLock pretty pointless, and I'd use a normal Mutex instead. There is very rarely the necessity to use an RwLock as it has more overhead compared to a simple Mutex.
So, I'm trying to do a exhaustive search of a hash. The hash itself is not important here. As I want to use all processing power of my CPU, I'm using Rayon to get a thread pool and lots of tasks. The search algorithm is the following:
let (tx, rx) = mpsc::channel();
let original_hash = String::from(original_hash);
rayon::spawn(move || {
let mut i = 0;
let mut iter = SenhaIterator::from(initial_pwd);
while i < max_iteracoes {
let pwd = iter.next().unwrap();
let clone_tx = tx.clone();
rayon::spawn(move || {
let hash = calcula_hash(&pwd);
clone_tx.send((pwd, hash)).unwrap();
});
i += 1;
}
});
let mut last_pwd = None;
let bar = ProgressBar::new(max_iteracoes as u64);
while let Ok((pwd, hash)) = rx.recv() {
last_pwd = Some(pwd);
if hash == original_hash {
bar.finish();
return last_pwd.map_or(ResultadoSenha::SenhaNaoEncontrada(None), |s| {
ResultadoSenha::SenhaEncontrada(s)
});
}
bar.inc(1);
}
bar.finish();
ResultadoSenha::SenhaNaoEncontrada(last_pwd)
Just a high level explanation: as the tasks go completing their work, they send a pair of (password, hash) to the main thread, which will compare the hash with the original hash (the one I'm trying to find a password for). If they match, great, I return to main with an enum value that indicates success, and the password that produces the original hash. After all iterations end, I'll return to main with an enum value that indicates that no hash was found, but with the last password, so I can retry from this point in some future run.
I'm trying to use Indicatif to show a progress bar, so I can get a glimpse of the progress.
But my problem is that the program is growing it's memory usage without a clear reason why. If I make it run with, let's say, 1 billion iterations, it goes slowly adding memory until it fills all available system memory.
But when I comment the line bar.inc(1);, the program behaves as expected, with normal memory usage.
I've created a test program, with Rayon and Indicatif, but without the hash calculation and it works correctly, no memory misbehavior.
That makes me think that I'm doing something wrong with memory management in my code, but I can't see anything obvious.
I found a solution, but I'm still not sure why it solves the original problem.
What I did to solve it is to transfer the progress code to the first spawn closure. Look at lines 6 and 19 below:
let (tx, rx) = mpsc::channel();
let original_hash = String::from(original_hash);
rayon::spawn(move || {
let mut bar = ProgressBar::new(max_iteracoes as u64);
let mut i = 0;
let mut iter = SenhaIterator::from(initial_pwd);
while i < max_iteracoes {
let pwd = iter.next().unwrap();
let clone_tx = tx.clone();
rayon::spawn(move || {
let hash = calcula_hash(&pwd);
clone_tx.send((pwd, hash)).unwrap();
});
i += 1;
bar.inc();
}
bar.finish();
});
let mut latest_pwd = None;
while let Ok((pwd, hash)) = rx.recv() {
latest_pwd = Some(pwd);
if hash == original_hash {
return latest_pwd.map_or(PasswordOutcome::PasswordNotFound(None), |s| {
PasswordOutcome::PasswordFound(s)
})
}
}
PasswordOutcome::PasswordNotFound(latest_pwd)
That first spawn closure has the role of fetching the next password to try and pass it over to a worker task, which calculates the corresponding hash and sends the pair (password, hash) to the main thread. The main thread will wait for the pairs from the rx channel and compare with the expected hash.
What is still missing from me is why tracking the progress on the outer thread leaks memory. I couldn't identify what is really leaking. But it's working now and I'm happy with the result.
Suppose, for example, I have a linked list which does not allow removal of nodes.
Would it be possible to return shared references to values which have already been inserted, while still allowing the relative order of the nodes to be changed, or new nodes inserted?
Even mutation through one of the nodes should be safe "on paper" as long as only one node is used to mutate the list at a time. Is it possible to represent this in rust's ownership system?
I'm specifically interested in doing so without runtime overhead (potentially using unsafe in the implementation, but not in the interface).
EDIT: As requested, here is an example that gives the outline of what I am thinking of.
let list = MyLinkedList::<i32>::new()
let handle1 = list.insert(1); // Returns a handle to the inserted element.
let handle2 = list.insert(2);
let value1 : &i32 = handle1.get();
let value2 : &i32 = handle2.prev().get(); // Ok to have two immutable references to the same element.
list.insert(3); // Also ok to insert and swap nodes, while the references are held.
list.swap(handle1,handl2);
foo(value1,value2);
let exclusive_value: &mut i32 = handle1.get_mut(); // While this reference is held, no other handles can be used, but insertion and permutation are ok
handle5 = list.insert(4);
list.swap(handle1, handle2);
In other words, the data contained inside the nodes of the list is treated as one resource that can be borrowed shared/mutably, and the links between the nodes are another resource that can be borrowed shared/mutably.
In other words, the data contained inside the nodes of the list is treated as one resource that can be borrowed shared/mutably, and the links between the nodes are another resource that can be borrowed shared/mutably.
The idea to deal with such spatial partitioning is to introduce a different "key" for each partition; it's easy since they are static. This has been dubbed the PassKey pattern.
In the absence of brands, it will still require a run-time check: verifying that the elements-key is tied to this specific list instance is mandatory for safety. This is, however, a read-only comparison that will always be true, so the performance is about as good as it gets as far as run-time checks go.
The idea, in a nutshell:
let (handles, elements) = list.keys();
let h0 = handles.create(4);
handles.swap(h0, h1);
let e = elements.get(h0);
In your usecase:
It is always possible to change the links, so we will use internal mutability for this.
The borrow-check on elements inside handles will be carried out by borrowing elements.
The full implementation can be found here. It heavily uses unsafe, and I make no promise that it is fully safe, however it is hopefully sufficient for a demonstration.
In this implementation, I have opted for dumb handles and implemented the operations on the key types themselves. This limited the number of types who needed to borrow from the main list, and simplified borrowing.
The core idea, then:
struct LinkedList<T> {
head: *mut Node<T>,
tail: *mut Node<T>
}
struct Handles<'a, T> {
list: ptr::NonNull<LinkedList<T>>,
_marker: PhantomData<&'a mut LinkedList<T>>,
}
struct Elements<'a, T> {
list: ptr::NonNull<LinkedList<T>>,
_marker: PhantomData<&'a mut LinkedList<T>>,
}
LinkedList<T> will act as the storage, however will implement only 3 operations:
construction,
destruction,
handing out the keys.
The two keys Handles and Elements will both borrow the list mutably, guaranteeing that a single of (each of them) can exist simultaneously. Borrow-checking will prevent a new Handles or Elements from being created if any instance of them still lives for this list:
list: grants access to the list storage; Elements will only use it for checking (necessary) run-time invariants and never dereference it.
_marker: is the key to the borrow-checking actually guaranteeing exclusitivity.
Sounds cool so far? For completion, the last two structures then:
struct Handle<'a, T> {
node: ptr::NonNull<Node<T>>,
list: ptr::NonNull<LinkedList<T>>,
_marker: PhantomData<&'a LinkedList<T>>,
}
struct Node<T> {
data: T,
prev: *mut Node<T>,
next: *mut Node<T>,
}
Node is the most obvious representation of a doubly-linked list ever, so we're doing something right. The list in Handle<T> is there for the exact same purpose as the one in Elements: verifying that both Handle and Handles/Elements are talking about the same instance of list. It's critical for get_mut to be safe, and otherwise helps avoiding bugs.
There's a subtle reason for Handle<'a, T> having a lifetime tying to the LinkedList. I was tempted to remove it, however this would allow creating a handle from a list, destroying the list, then recreating a list at the same address... and handle.node would now be dangling!
And with, we only need to implement the methods we need on Handles and Elements. A few samples:
impl<'a, T> Handles<'a, T> {
pub fn push_front(&self, data: T) -> Handle<'a, T> {
let list = unsafe { &mut *self.list.as_ptr() };
let node = Box::into_raw(Box::new(Node { data, prev: ptr::null_mut(), next: list.head }));
unsafe { &mut *node }.set_neighbours();
list.head = node;
if list.tail.is_null() {
list.tail = node;
}
Handle {
node: unsafe { ptr::NonNull::new_unchecked(node) },
list: self.list, _marker: PhantomData,
}
}
pub fn prev(&self, handle: Handle<'a, T>) -> Option<Handle<'a, T>> {
unsafe { handle.node.as_ref() }.prev().map(|node| Handle {
node,
list: self.list,
_marker: PhantomData
})
}
}
And:
impl<'a, T> Elements<'a, T> {
pub fn get<'b>(&'b self, handle: Handle<'a, T>) -> &'b T {
assert_eq!(self.list, handle.list);
let node = unsafe { &*handle.node.as_ptr() };
&node.data
}
pub fn get_mut<'b>(&'b mut self, handle: Handle<'a, T>) -> &'b mut T {
assert_eq!(self.list, handle.list);
let node = unsafe { &mut *handle.node.as_ptr() };
&mut node.data
}
}
And this should be safe because:
Handles, after creating a new handle, only ever accesses its links.
Elements only ever returns references to data, and the links cannot be modified while it accesses them.
Example of usage:
fn main() {
let mut linked_list = LinkedList::default();
{
let (handles, mut elements) = linked_list.access();
let h0 = handles.push_front("Hello".to_string());
assert!(handles.prev(h0).is_none());
assert!(handles.next(h0).is_none());
println!("{}", elements.get(h0));
let h1 = {
let first = elements.get_mut(h0);
first.replace_range(.., "Hallo");
let h1 = handles.push_front("World".to_string());
assert!(handles.prev(h0).is_some());
first.replace_range(.., "Goodbye");
h1
};
println!("{} {}", elements.get(h0), elements.get(h1));
handles.swap(h0, h1);
println!("{} {}", elements.get(h0), elements.get(h1));
}
{
let (handles, elements) = linked_list.access();
let h0 = handles.front().unwrap();
let h1 = handles.back().unwrap();
let h2 = handles.push_back("And thanks for the fish!".to_string());
println!("{} {}! {}", elements.get(h0), elements.get(h1), elements.get(h2));
}
}
vector& vector::operator = (const vector& a)
//make this vector a copy of a
{
double* p = new double [ a.sz ]; // allocate new space
copy(a.elem, a.elem+a.sz, elem); // copy elements
delete[] elem; // deallocate old space
elem = p; // now we can reset elem
sz = a.sz;
return *this; // return a self-reference
}
I thought that the third argument of std::copy() should be the pointer p, but the book (Programming principles and practice using C++ - 2nd edition) says:
"When implementing the assignment, you could consider simplifying the code by freeing the memory for the old elements before creating the copy, but it is usually a very good idea not to throw away information before you know that you can replace it. Also, if you did that, strange things would happen if you assigned a vector to itself" - Page 635 and 636.
So, the pointer elem must be third argument of std::copy() to not let the pointer be invalid for a moment. I think...
But from where does p gets the information to be put in the array it points to, to be able to do: elem = p ?
I already know copy and swap strategy exist, you don't have to explain that.
I want to comprehend what is above.
No, that is a typo.
std::copy(a.elem, a.elem+a.sz, p);
is what the code should read.
First off, I've read this question:
What's the right way to make a stack (or other dynamically resizable vector-like thing) in Rust?
The problem
The selected answer just tells the question asker to use the standard library instead of explaining the implementation, which is fine if my goal was to build something. Except I'm trying to learn about the implementation of a stack, while following a data structure textbook written for Java (Algorithms by Robert Sedgwick & Kevin Wayne), where they implement a stack via resizing an array (Page 136).
I'm in the process of implementing the resize method, and it turns out the size of the array needs to be a constant expression.
meta: are arrays in rust called slices?
use std::mem;
struct DynamicStack<T> {
length: uint,
internal: Box<[T]>,
}
impl<T> DynamicStack<T> {
fn new() -> DynamicStack<T> {
DynamicStack {
length: 0,
internal: box [],
}
}
fn resize(&mut self, new_size: uint) {
let mut temp: Box<[T, ..new_size]> = box unsafe { mem::uninitialized() };
// ^^ error: expected constant expr for array
// length: non-constant path in constant expr
// code for copying elements from self.internal
self.internal = temp;
}
}
For brevity the compiler error was this
.../src/lib.rs:51:23: 51:38 error: expected constant expr for array length: non-constant path in constant expr
.../src/lib.rs:51 let mut temp: Box<[T, ..new_size]> = box unsafe { mem::uninitialized() };
^~~~~~~~~~~~~~~
.../src/lib.rs:51:23: 51:38 error: expected constant expr for array length: non-constant path in constant expr
.../src/lib.rs:51 let mut temp: Box<[T, ..new_size]> = box unsafe { mem::uninitialized() };
^~~~~~~~~~~~~~~
The Question
Surely there is a way in rust to initialize an array with it's size determined at runtime (even if it's unsafe)? Could you also provide an explanation of what's going on in your answer?
Other attempts
I've considered it's probably possible to implement the stack in terms of
struct DynamicStack<T> {
length: uint,
internal: Box<Optional<T>>
}
But I don't want the overhead of matching optional value to remove the unsafe memory operations, but this still doesn't resolve the issue of unknown array sizes.
I also tried this (which doesn't even compile)
fn resize(&mut self, new_size: uint) {
let mut temp: Box<[T]> = box [];
let current_size = self.internal.len();
for i in range(0, current_size) {
temp[i] = self.internal[i];
}
for i in range(current_size, new_size) {
temp[i] = unsafe { mem::uninitialized() };
}
self.internal = temp;
}
And I got this compiler error
.../src/lib.rs:55:17: 55:21 error: cannot move out of dereference of `&mut`-pointer
.../src/lib.rs:55 temp[i] = self.internal[i];
^~~~
.../src/lib.rs:71:19: 71:30 error: cannot use `self.length` because it was mutably borrowed
.../src/lib.rs:71 self.resize(self.length * 2);
^~~~~~~~~~~
.../src/lib.rs:71:7: 71:11 note: borrow of `*self` occurs here
.../src/lib.rs:71 self.resize(self.length * 2);
^~~~
.../src/lib.rs:79:18: 79:22 error: cannot move out of dereference of `&mut`-pointer
.../src/lib.rs:79 let result = self.internal[self.length];
^~~~
.../src/lib.rs:79:9: 79:15 note: attempting to move value to here
.../src/lib.rs:79 let result = self.internal[self.length];
^~~~~~
.../src/lib.rs:79:9: 79:15 help: to prevent the move, use `ref result` or `ref mut result` to capture value by reference
.../src/lib.rs:79 let result = self.internal[self.length];
I also had a look at this, but it's been awhile since I've done any C/C++
How should you do pointer arithmetic in rust?
Surely there is a way in Rust to initialize an array with it's size determined at runtime?
No, Rust arrays are only able to be created with a size known at compile time. In fact, each tuple of type and size constitutes a new type! The Rust compiler uses that information to make optimizations.
Once you need a set of things determined at runtime, you have to add runtime checks to ensure that Rust's safety guarantees are always valid. For example, you can't access uninitialized memory (such as by walking off the beginning or end of a set of items).
If you truly want to go down this path, I expect that you are going to have to get your hands dirty with some direct memory allocation and unsafe code. In essence, you will be building a smaller version of Vec itself! To that end, you can check out the source of Vec.
At a high level, you will need to allocate chunks of memory big enough to hold N objects of some type. Then you can provide ways of accessing those elements, using pointer arithmetic under the hood. When you add more elements, you can allocate more space and move old values around. There are lots of nuanced things that may or may not come up, but it sounds like you are on the beginning of a fun journey!
Edit
Of course, you could choose to pretend that most of the methods of Vec don't even exist, and just use the ones that are analogs of Java's array. You'll still need to use Option to avoid uninitialized values though.