Referencing / dereferencing a vector element in a for loop - syntax

In the code below, I want to retain number_list, after iterating over it, since the .into_iter() that for uses by default will consume. Thus, I am assuming that n: &i32 and I can get the value of n by dereferencing.
fn main() {
let number_list = vec![24, 34, 100, 65];
let mut largest = number_list[0];
for n in &number_list {
if *n > largest {
largest = *n;
}
}
println!("{}", largest);
}
It was revealed to me that instead of this, we can use &n as a 'pattern':
fn main() {
let number_list = vec![24, 34, 100, 65];
let mut largest = number_list[0];
for &n in &number_list {
if n > largest {
largest = n;
}
}
println!("{}", largest);
number_list;
}
My confusion (and bear in mind I haven't covered patterns) is that I would expect that since n: &i32, then &n: &&i32 rather than it resolving to the value (if a double ref is even possible). Why does this happen, and does the meaning of & differ depending on context?

It can help to think of a reference as a kind of container. For comparison, consider Option, where we can "unwrap" the value using pattern-matching, for example in an if let statement:
let n = 100;
let opt = Some(n);
if let Some(p) = opt {
// do something with p
}
We call Some and None constructors for Option, because they each produce a value of type Option. In the same way, you can think of & as a constructor for a reference. And the syntax is symmetric:
let n = 100;
let reference = &n;
if let &p = reference {
// do something with p
}
You can use this feature in any place where you are binding a value to a variable, which happens all over the place. For example:
if let, as above
match expressions:
match opt {
Some(1) => { ... },
Some(p) => { ... },
None => { ... },
}
match reference {
&1 => { ... },
&p => { ... },
}
In function arguments:
fn foo(&p: &i32) { ... }
Loops:
for &p in iter_of_i32_refs {
...
}
And probably more.
Note that the last two won't work for Option because they would panic if a None was found instead of a Some, but that can't happen with references because they only have one constructor, &.
does the meaning of & differ depending on context?
Hopefully, if you can interpret & as a constructor instead of an operator, then you'll see that its meaning doesn't change. It's a pretty cool feature of Rust that you can use constructors on the right hand side of an expression for creating values and on the left hand side for taking them apart (destructuring).

As apart from other languages (C++), &n in this case isn't a reference, but pattern matching, which means that this is expecting a reference.
The opposite of this would be ref n which would give you &&i32 as a type.
This is also the case for closures, e.g.
(0..).filter(|&idx| idx < 10)...
Please note, that this will move the variable, e.g. you cannot do this with types, that don't implement the Copy trait.

My confusion (and bear in mind I haven't covered patterns) is that I would expect that since n: &i32, then &n: &&i32 rather than it resolving to the value (if a double ref is even possible). Why does this happen, and does the meaning of & differ depending on context?
When you do pattern matching (for example when you write for &n in &number_list), you're not saying that n is an &i32, instead you are saying that &n (the pattern) is an &i32 (the expression) from which the compiler infers that n is an i32.
Similar things happen for all kinds of pattern, for example when pattern-matching in if let Some (x) = Some (42) { /* … */ } we are saying that Some (x) is Some (42), therefore x is 42.

Related

F# record: ref vs mutable field

While refactoring my F# code, I found a record with a field of type bool ref:
type MyType =
{
Enabled : bool ref
// other, irrelevant fields here
}
I decided to try changing it to a mutable field instead
// Refactored version
type MyType =
{
mutable Enabled : bool
// other fields unchanged
}
Also, I applied all the changes required to make the code compile (i.e. changing := to <-, removing incr and decr functions, etc).
I noticed that after the changes some of the unit tests started to fail.
As the code is pretty large, I can't really see what exactly changed.
Is there a significant difference in implementation of the two that could change the behavior of my program?
Yes, there is a difference. Refs are first-class values, while mutable variables are a language construct.
Or, from a different perspective, you might say that ref cells are passed by reference, while mutable variables are passed by value.
Consider this:
type T = { mutable x : int }
type U = { y : int ref }
let t = { x = 5 }
let u = { y = ref 5 }
let mutable xx = t.x
xx <- 10
printfn "%d" t.x // Prints 5
let mutable yy = u.y
yy := 10
printfn "%d" !u.y // Prints 10
This happens because xx is a completely new mutable variable, unrelated to t.x, so that mutating xx has no effect on x.
But yy is a reference to the exact same ref cell as u.y, so that pushing a new value into that cell while referring to it via yy has the same effect as if referring to it via u.y.
If you "copy" a ref, the copy ends up pointing to the same ref, but if you copy a mutable variable, only its value gets copied.
You have the difference not because one is first-value, passed by reference/value or other things. It's because a ref is just a container (class) on its own.
The difference is more obvious when you implement a ref by yourself. You could do it like this:
type Reference<'a> = {
mutable Value: 'a
}
Now look at both definitions.
type MyTypeA = {
mutable Enabled: bool
}
type MyTypeB = {
Enabled: Reference<bool>
}
MyTypeA has a Enabled field that can be directly changed or with other word is mutable.
On the other-side you have MyTypeB that is theoretically immutable but has a Enabled that reference to a mutable class.
The Enabled from MyTypeB just reference to an object that is mutable like the millions of other classes in .NET. From the above type definitions, you can create objects like these.
let t = { MyTypeA.Enabled = true }
let u = { MyTypeB.Enabled = { Value = true }}
Creating the types makes it more obvious, that the first is a mutable field, and the second contains an object with a mutable field.
You find the implementation of ref in FSharp.Core/prim-types.fs it looks like this:
[<DebuggerDisplay("{contents}")>]
[<StructuralEquality; StructuralComparison>]
[<CompiledName("FSharpRef`1")>]
type Ref<'T> =
{
[<DebuggerBrowsable(DebuggerBrowsableState.Never)>]
mutable contents: 'T }
member x.Value
with get() = x.contents
and set v = x.contents <- v
and 'T ref = Ref<'T>
The ref keyword in F# is just the built-in way to create such a pre-defined mutable Reference object, instead that you create your own type for this. And it has some benefits that it works well whenever you need to pass byref, in or out values in .NET. So you should use ref. But you also can use a mutable for this. For example, both code examples do the same.
With a reference
let parsed =
let result = ref 0
match System.Int32.TryParse("1234", result) with
| true -> result.Value
| false -> result.Value
With a mutable
let parsed =
let mutable result = 0
match System.Int32.TryParse("1234", &result) with
| true -> result
| false -> result
In both examples you get a 1234 as an int parsed. But the first example will create a FSharpRef and pass it to Int32.TryParse while the second example creates a field or variable and passes it with out to Int32.TryParse

What does Some() do on the left hand side of a variable assignment?

I was reading some Rust code and I came across this line
if let Some(path) = env::args().nth(1) {
Inside of this function
fn main() {
if let Some(path) = env::args().nth(1) {
// Try reading the file provided by the path.
let mut file = File::open(path).expect("Failed reading file.");
let mut content = String::new();
file.read_to_string(&mut content);
perform_conversion(content.as_str()).expect("Conversion failed.");
} else {
println!(
"provide a path to a .cue file to be converted into a MusicBrainz compatible tracklist."
)
}
}
The line seems to be assigning the env argument to the variable path but I can't work out what the Some() around it is doing.
I took a look at the documentation for Option and I understand how it works when used on the right hand side of = but on the left hand side I am a little confused.
Am I right in thinking this line is equivalent to
if let path = Some(env::args().nth(1)) {
From the reference :
An if let expression is semantically similar to an if expression but
in place of a condition expression it expects the keyword let followed
by a refutable pattern, an = and an expression. If the value of the
expression on the right hand side of the = matches the pattern, the
corresponding block will execute, otherwise flow proceeds to the
following else block if it exists. Like if expressions, if let
expressions have a value determined by the block that is evaluated.
In here the important part is refutability. What it means refutable pattern in here it can be in different forms. For example :
enum Test {
First(String, i32, usize),
Second(i32, usize),
Third(i32),
}
You can check the x's value for a value for 3 different pattern like :
fn main() {
let x = Test::Second(14, 55);
if let Test::First(a, b, c) = x {}
if let Test::Second(a, b) = x {} //This block will be executed
if let Test::Third(a) = x {}
}
This is called refutability. But consider your code like this:
enum Test {
Second(i32, usize),
}
fn main() {
let x = Test::Second(14, 55);
if let Test::Second(a, b) = x {}
}
This code will not compile because x's pattern is obvious, it has single pattern.
You can get more information from the reference of refutability.
Also you are not right thinking for this:
if let path = Some(env::args().nth(1)) {
Compiler will throw error like irrefutable if-let pattern because as the reference says: "keyword let followed by a refutable pattern". In here there is no refutable pattern after "let". Actually this code tries to create a variable named path which is an Option and this make no sense because there is no "If" needed,
Instead Rust expects from you to write like this:
let path = Some(env::args().nth(1)); // This will be seem like Some(Some(value))
The other answers go into a lot of detail, which might be more than you need to know.
Essentially, this:
if let Some(path) = env::args().nth(1) {
// Do something with path
} else {
// otherwise do something else
}
is identical to this:
match env::args().nth(1) {
Some(path) => { /* Do something with path */ }
_ => { /* otherwise do something else */ }
}

A way to resize a stack (or other dynamically resizable types)

First off, I've read this question:
What's the right way to make a stack (or other dynamically resizable vector-like thing) in Rust?
The problem
The selected answer just tells the question asker to use the standard library instead of explaining the implementation, which is fine if my goal was to build something. Except I'm trying to learn about the implementation of a stack, while following a data structure textbook written for Java (Algorithms by Robert Sedgwick & Kevin Wayne), where they implement a stack via resizing an array (Page 136).
I'm in the process of implementing the resize method, and it turns out the size of the array needs to be a constant expression.
meta: are arrays in rust called slices?
use std::mem;
struct DynamicStack<T> {
length: uint,
internal: Box<[T]>,
}
impl<T> DynamicStack<T> {
fn new() -> DynamicStack<T> {
DynamicStack {
length: 0,
internal: box [],
}
}
fn resize(&mut self, new_size: uint) {
let mut temp: Box<[T, ..new_size]> = box unsafe { mem::uninitialized() };
// ^^ error: expected constant expr for array
// length: non-constant path in constant expr
// code for copying elements from self.internal
self.internal = temp;
}
}
For brevity the compiler error was this
.../src/lib.rs:51:23: 51:38 error: expected constant expr for array length: non-constant path in constant expr
.../src/lib.rs:51 let mut temp: Box<[T, ..new_size]> = box unsafe { mem::uninitialized() };
^~~~~~~~~~~~~~~
.../src/lib.rs:51:23: 51:38 error: expected constant expr for array length: non-constant path in constant expr
.../src/lib.rs:51 let mut temp: Box<[T, ..new_size]> = box unsafe { mem::uninitialized() };
^~~~~~~~~~~~~~~
The Question
Surely there is a way in rust to initialize an array with it's size determined at runtime (even if it's unsafe)? Could you also provide an explanation of what's going on in your answer?
Other attempts
I've considered it's probably possible to implement the stack in terms of
struct DynamicStack<T> {
length: uint,
internal: Box<Optional<T>>
}
But I don't want the overhead of matching optional value to remove the unsafe memory operations, but this still doesn't resolve the issue of unknown array sizes.
I also tried this (which doesn't even compile)
fn resize(&mut self, new_size: uint) {
let mut temp: Box<[T]> = box [];
let current_size = self.internal.len();
for i in range(0, current_size) {
temp[i] = self.internal[i];
}
for i in range(current_size, new_size) {
temp[i] = unsafe { mem::uninitialized() };
}
self.internal = temp;
}
And I got this compiler error
.../src/lib.rs:55:17: 55:21 error: cannot move out of dereference of `&mut`-pointer
.../src/lib.rs:55 temp[i] = self.internal[i];
^~~~
.../src/lib.rs:71:19: 71:30 error: cannot use `self.length` because it was mutably borrowed
.../src/lib.rs:71 self.resize(self.length * 2);
^~~~~~~~~~~
.../src/lib.rs:71:7: 71:11 note: borrow of `*self` occurs here
.../src/lib.rs:71 self.resize(self.length * 2);
^~~~
.../src/lib.rs:79:18: 79:22 error: cannot move out of dereference of `&mut`-pointer
.../src/lib.rs:79 let result = self.internal[self.length];
^~~~
.../src/lib.rs:79:9: 79:15 note: attempting to move value to here
.../src/lib.rs:79 let result = self.internal[self.length];
^~~~~~
.../src/lib.rs:79:9: 79:15 help: to prevent the move, use `ref result` or `ref mut result` to capture value by reference
.../src/lib.rs:79 let result = self.internal[self.length];
I also had a look at this, but it's been awhile since I've done any C/C++
How should you do pointer arithmetic in rust?
Surely there is a way in Rust to initialize an array with it's size determined at runtime?
No, Rust arrays are only able to be created with a size known at compile time. In fact, each tuple of type and size constitutes a new type! The Rust compiler uses that information to make optimizations.
Once you need a set of things determined at runtime, you have to add runtime checks to ensure that Rust's safety guarantees are always valid. For example, you can't access uninitialized memory (such as by walking off the beginning or end of a set of items).
If you truly want to go down this path, I expect that you are going to have to get your hands dirty with some direct memory allocation and unsafe code. In essence, you will be building a smaller version of Vec itself! To that end, you can check out the source of Vec.
At a high level, you will need to allocate chunks of memory big enough to hold N objects of some type. Then you can provide ways of accessing those elements, using pointer arithmetic under the hood. When you add more elements, you can allocate more space and move old values around. There are lots of nuanced things that may or may not come up, but it sounds like you are on the beginning of a fun journey!
Edit
Of course, you could choose to pretend that most of the methods of Vec don't even exist, and just use the ones that are analogs of Java's array. You'll still need to use Option to avoid uninitialized values though.

Hand-over-hand locking with Rust

I'm trying to write an implementation of union-find in Rust. This is famously very simple to implement in languages like C, while still having a complex run time analysis.
I'm having trouble getting Rust's mutex semantics to allow iterative hand-over-hand locking.
Here's how I got where I am now.
First, this is a very simple implementation of part of the structure I want in C:
#include <stdlib.h>
struct node {
struct node * parent;
};
struct node * create(struct node * parent) {
struct node * ans = malloc(sizeof(struct node));
ans->parent = parent;
return ans;
}
struct node * find_root(struct node * x) {
while (x->parent) {
x = x->parent;
}
return x;
}
int main() {
struct node * foo = create(NULL);
struct node * bar = create(foo);
struct node * baz = create(bar);
baz->parent = find_root(bar);
}
Note that the structure of the pointers is that of an inverted tree; multiple pointers may point at a single location, and there are no cycles.
At this point, there is no path compression.
Here is a Rust translation. I chose to use Rust's reference-counted pointer type to support the inverted tree type I referenced above.
Note that this implementation is much more verbose, possibly due to the increased safety that Rust offers, but possibly due to my inexperience with Rust.
use std::rc::Rc;
struct Node {
parent: Option<Rc<Node>>
}
fn create(parent: Option<Rc<Node>>) -> Node {
Node {parent: parent.clone()}
}
fn find_root(x: Rc<Node>) -> Rc<Node> {
let mut ans = x.clone();
while ans.parent.is_some() {
ans = ans.parent.clone().unwrap();
}
ans
}
fn main() {
let foo = Rc::new(create(None));
let bar = Rc::new(create(Some(foo.clone())));
let mut prebaz = create(Some(bar.clone()));
prebaz.parent = Some(find_root(bar.clone()));
}
Path compression re-parents each node along a path to the root every time find_root is called. To add this feature to the C code, only two new small functions are needed:
void change_root(struct node * x, struct node * root) {
while (x) {
struct node * tmp = x->parent;
x->parent = root;
x = tmp;
}
}
struct node * root(struct node * x) {
struct node * ans = find_root(x);
change_root(x, ans);
return ans;
}
The function change_root does all the re-parenting, while the function root is just a wrapper to use the results of find_root to re-parent the nodes on the path to the root.
In order to do this in Rust, I decided I would have to use a Mutex rather than just a reference counted pointer, since the Rc interface only allows mutable access by copy-on-write when more than one pointer to the item is live. As a result, all of the code would have to change. Before even getting to the path compression part, I got hung up on find_root:
use std::sync::{Mutex,Arc};
struct Node {
parent: Option<Arc<Mutex<Node>>>
}
fn create(parent: Option<Arc<Mutex<Node>>>) -> Node {
Node {parent: parent.clone()}
}
fn find_root(x: Arc<Mutex<Node>>) -> Arc<Mutex<Node>> {
let mut ans = x.clone();
let mut inner = ans.lock();
while inner.parent.is_some() {
ans = inner.parent.clone().unwrap();
inner = ans.lock();
}
ans.clone()
}
This produces the error (with 0.12.0)
error: cannot assign to `ans` because it is borrowed
ans = inner.parent.clone().unwrap();
note: borrow of `ans` occurs here
let mut inner = ans.lock();
What I think I need here is hand-over-hand locking. For the path A -> B -> C -> ..., I need to lock A, lock B, unlock A, lock C, unlock B, ... Of course, I could keep all of the locks open: lock A, lock B, lock C, ... unlock C, unlock B, unlock A, but this seems inefficient.
However, Mutex does not offer unlock, and uses RAII instead. How can I achieve hand-over-hand locking in Rust without being able to directly call unlock?
EDIT: As the comments noted, I could use Rc<RefCell<Node>> rather than Arc<Mutex<Node>>. Doing so leads to the same compiler error.
For clarity about what I'm trying to avoid by using hand-over-hand locking, here is a RefCell version that compiles but used space linear in the length of the path.
fn find_root(x: Rc<RefCell<Node>>) -> Rc<RefCell<Node>> {
let mut inner : RefMut<Node> = x.borrow_mut();
if inner.parent.is_some() {
find_root(inner.parent.clone().unwrap())
} else {
x.clone()
}
}
We can pretty easily do full hand-over-hand locking as we traverse this list using just a bit of unsafe, which is necessary to tell the borrow checker a small bit of insight that we are aware of, but that it can't know.
But first, let's clearly formulate the problem:
We want to traverse a linked list whose nodes are stored as Arc<Mutex<Node>> to get the last node in the list
We need to lock each node in the list as we go along the way such that another concurrent traversal has to follow strictly behind us and cannot muck with our progress.
Before we get into the nitty-gritty details, let's try to write the signature for this function:
fn find_root(node: Arc<Mutex<Node>>) -> Arc<Mutex<Node>>;
Now that we know our goal, we can start to get into the implementation - here's a first attempt:
fn find_root(incoming: Arc<Mutex<Node>>) -> Arc<Mutex<Node>> {
// We have to separate this from incoming since the lock must
// be borrowed from incoming, not this local node.
let mut node = incoming.clone();
let mut lock = incoming.lock();
// Could use while let but that leads to borrowing issues.
while lock.parent.is_some() {
node = lock.parent.as_ref().unwrap().clone(); // !! uh-oh !!
lock = node.lock();
}
node
}
If we try to compile this, rustc will error on the line marked !! uh-oh !!, telling us that we can't move out of node while lock still exists, since lock is borrowing node. This is not a spurious error! The data in lock might go away as soon as node does - it's only because we know that we can keep the data lock is pointing to valid and in the same memory location even if we move node that we can fix this.
The key insight here is that the lifetime of data contained within an Arc is dynamic, and it is hard for the borrow checker to make the inferences we can about exactly how long data inside an Arc is valid.
This happens every once in a while when writing rust; you have more knowledge about the lifetime and organization of your data than rustc, and you want to be able to express that knowledge to the compiler, effectively saying "trust me". Enter: unsafe - our way of telling the compiler that we know more than it, and it should allow us to inform it of the guarantees that we know but it doesn't.
In this case, the guarantee is pretty simple - we are going to replace node while lock still exists, but we are not going to ensure that the data inside lock continues to be valid even though node goes away. To express this guarantee we can use mem::transmute, a function which allows us to reinterpret the type of any variable, by just using it to change the lifetime of the lock returned by node to be slightly longer than it actually is.
To make sure we keep our promise, we are going to use another handoff variable to hold node while we reassign lock - even though this moves node (changing its address) and the borrow checker will be angry at us, we know it's ok since lock doesn't point at node, it points at data inside of node, whose address (in this case, since it's behind an Arc) will not change.
Before we get to the solution, it's important to note that the trick we are using here is only valid because we are using an Arc. The borrow checker is warning us of a possibly serious error - if the Mutex was held inline and not in an Arc, this error would be a correct prevention of a use-after-free, where the MutexGuard held in lock would attempt to unlock a Mutex which has already been dropped, or at least moved to another memory location.
use std::mem;
use std::sync::{Arc, Mutex};
fn find_root(incoming: Arc<Mutex<Node>>) -> Arc<Mutex<Node>> {
let mut node = incoming.clone();
let mut handoff_node;
let mut lock = incoming.lock().unwrap();
// Could use while let but that leads to borrowing issues.
while lock.parent.is_some() {
// Keep the data in node around by holding on to this `Arc`.
handoff_node = node;
node = lock.parent.as_ref().unwrap().clone();
// We are going to move out of node while this lock is still around,
// but since we kept the data around it's ok.
lock = unsafe { mem::transmute(node.lock().unwrap()) };
}
node
}
And, just like that, rustc is happy, and we have hand-over-hand locking, since the last lock is released only after we have acquired the new lock!
There is one unanswered question in this implementation which I have not yet received an answer too, which is whether the drop of the old value and assignment of a new value to a variable is a guaranteed to be atomic - if not, there is a race condition where the old lock is released before the new lock is acquired in the assignment of lock. It's pretty trivial to work around this by just having another holdover_lock variable and moving the old lock into it before reassigning, then dropping it after reassigning lock.
Hopefully this fully addresses your question and shows how unsafe can be used to work around "deficiencies" in the borrow checker when you really do know more. I would still like to want that the cases where you know more than the borrow checker are rare, and transmuting lifetimes is not "usual" behavior.
Using Mutex in this way, as you can see, is pretty complex and you have to deal with many, many, possible sources of a race condition and I may not even have caught all of them! Unless you really need this structure to be accessible from many threads, it would probably be best to just use Rc and RefCell, if you need it, as this makes things much easier.
I believe this to fit the criteria of hand-over-hand locking.
use std::sync::Mutex;
fn main() {
// Create a set of mutexes to lock hand-over-hand
let mutexes = Vec::from_fn(4, |_| Mutex::new(false));
// Lock the first one
let val_0 = mutexes[0].lock();
if !*val_0 {
// Lock the second one
let mut val_1 = mutexes[1].lock();
// Unlock the first one
drop(val_0);
// Do logic
*val_1 = true;
}
for mutex in mutexes.iter() {
println!("{}" , *mutex.lock());
}
}
Edit #1
Does it work when access to lock n+1 is guarded by lock n?
If you mean something that could be shaped like the following, then I think the answer is no.
struct Level {
data: bool,
child: Option<Mutex<Box<Level>>>,
}
However, it is sensible that this should not work. When you wrap an object in a mutex, then you are saying "The entire object is safe". You can't say both "the entire pie is safe" and "I'm eating the stuff below the crust" at the same time. Perhaps you jettison the safety by creating a Mutex<()> and lock that?
This is still not the answer your literal question of to how to do hand-over-hand locking, which should only be important in a concurrent setting (or if someone else forced you to use Mutex references to nodes). It is instead how to do this with Rc and RefCell, which you seem to be interested in.
RefCell only allows mutable writes when one mutable reference is held. Importantly, the Rc<RefCell<Node>> objects are not mutable references. The mutable references it is talking about are the results from calling borrow_mut() on the Rc<RefCell<Node>>object, and as long as you do that in a limited scope (e.g. the body of the while loop), you'll be fine.
The important thing happening in path compression is that the next Rc object will keep the rest of the chain alive while you swing the parent pointer for node to point at root. However, it is not a reference in the Rust sense of the word.
struct Node
{
parent: Option<Rc<RefCell<Node>>>
}
fn find_root(mut node: Rc<RefCell<Node>>) -> Rc<RefCell<Node>>
{
while let Some(parent) = node.borrow().parent.clone()
{
node = parent;
}
return node;
}
fn path_compress(mut node: Rc<RefCell<Node>>, root: Rc<RefCell<Node>>)
{
while node.borrow().parent.is_some()
{
let next = node.borrow().parent.clone().unwrap();
node.borrow_mut().parent = Some(root.clone());
node = next;
}
}
This runs fine for me with the test harness I used, though there may still be bugs. It certainly compiles and runs without a panic! due to trying to borrow_mut() something that is already borrowed. It may actually produce the right answer, that's up to you.
On IRC, Jonathan Reem pointed out that inner is borrowing until the end of its lexical scope, which is too far for what I was asking. Inlining it produces the following, which compiles without error:
fn find_root(x: Arc<Mutex<Node>>) -> Arc<Mutex<Node>> {
let mut ans = x.clone();
while ans.lock().parent.is_some() {
ans = ans.lock().parent.clone().unwrap();
}
ans
}
EDIT: As Francis Gagné points out, this has a race condition, since the lock doesn't extend long enough. Here's a modified version that only has one lock() call; perhaps it is not vulnerable to the same problem.
fn find_root(x: Arc<Mutex<Node>>) -> Arc<Mutex<Node>> {
let mut ans = x.clone();
loop {
ans = {
let tmp = ans.lock();
match tmp.parent.clone() {
None => break,
Some(z) => z
}
}
}
ans
}
EDIT 2: This only holds one lock at a time, and so is racey. I still don't know how to do hand-over-hand locking.
As pointed out by Frank Sherry and others, you shouldn't use Arc/Mutex when single threaded. But his code was outdated, so here is the new one (for version 1.0.0alpha2).
This does not take linear space either (like the recursive code given in the question).
struct Node {
parent: Option<Rc<RefCell<Node>>>
}
fn find_root(node: Rc<RefCell<Node>>) -> Rc<RefCell<Node>> {
let mut ans = node.clone(); // Rc<RefCell<Node>>
loop {
ans = {
let ans_ref = ans.borrow(); // std::cell::Ref<Node>
match ans_ref.parent.clone() {
None => break,
Some(z) => z
}
} // ans_ref goes out of scope, and ans becomes mutable
}
ans
}
fn path_compress(mut node: Rc<RefCell<Node>>, root: Rc<RefCell<Node>>) {
while node.borrow().parent.is_some() {
let next = {
let node_ref = node.borrow();
node_ref.parent.clone().unwrap()
};
node.borrow_mut().parent = Some(root.clone());
// RefMut<Node> from borrow_mut() is out of scope here...
node = next; // therefore we can mutate node
}
}
Note for beginners: Pointers are automatically dereferenced by dot operator. ans.borrow() actually means (*ans).borrow(). I intentionally used different styles for the two functions.
Although not the answer to your literal question (hand-over locking), union-find with weighted-union and path-compression can be very simple in Rust:
fn unionfind<I: Iterator<(uint, uint)>>(mut iterator: I, nodes: uint) -> Vec<uint>
{
let mut root = Vec::from_fn(nodes, |x| x);
let mut rank = Vec::from_elem(nodes, 0u8);
for (mut x, mut y) in iterator
{
// find roots for x and y; do path compression on look-ups
while (x != root[x]) { root[x] = root[root[x]]; x = root[x]; }
while (y != root[y]) { root[y] = root[root[y]]; y = root[y]; }
if x != y
{
// weighted union swings roots
match rank[x].cmp(&rank[y])
{
Less => root[x] = y,
Greater => root[y] = x,
Equal =>
{
root[y] = x;
rank[x] += 1
},
}
}
}
}
Maybe the meta-point is that the union-find algorithm may not be the best place to handle node ownership, and by using references to existing memory (in this case, by just using uint identifiers for the nodes) without affecting the lifecycle of the nodes makes for a much simpler implementation, if you can get away with it of course.

Class set method in Haskell using State-Monad

I've recently had a look at Haskell's Monad - State. I've been able to create functions that operate with this Monad, but I'm trying to encapsulate the behavior into a class, basically I'm trying to replicate in Haskell something like this:
class A {
public:
int x;
void setX(int newX) {
x = newX;
}
void getX() {
return x;
}
}
I would be very grateful if anyone can help with this. Thanks!
I would start off by noting that Haskell, to say the least, does not encourage traditional OO-style development; instead, it has features and characteristics that lend themselves well to the sort of 'pure functional' manipulation that you won't really find in many other languages; the short on this is that trying to 'bring over' concepts from other (traditional languages) can often be a very bad idea.
but I'm trying to encapsulate the behavior into a class
Hence, my first major question that comes to mind is why? Surely you must want to do something with this (traditional OO concept of a) class?
If an approximate answer to this question is: "I'd like to model some sort of data construct", then you'd be better off working with something like
data A = A { xval :: Int }
> let obj1 = A 5
> xval obj1
5
> let obj2 = obj1 { xval = 10 }
> xval obj2
10
Which demonstrates pure, immutable data structures, along with 'getter' functions and destructive updates (utilizing record syntax). This way, you'd do whatever work you need to do as some combination of functions mapping these 'data constructs' to new data constructs, as appropriate.
Now, if you absolutely needed some sort of model of State (and indeed, answering this question requires a bit of experience in knowing exactly what local versus global state is), only then would you delve into using the State Monad, with something like:
module StateGame where
import Control.Monad.State
-- Example use of State monad
-- Passes a string of dictionary {a,b,c}
-- Game is to produce a number from the string.
-- By default the game is off, a C toggles the
-- game on and off. A 'a' gives +1 and a b gives -1.
-- E.g
-- 'ab' = 0
-- 'ca' = 1
-- 'cabca' = 0
-- State = game is on or off & current score
-- = (Bool, Int)
type GameValue = Int
type GameState = (Bool, Int)
playGame :: String -> State GameState GameValue
playGame [] = do
(_, score) <- get
return score
playGame (x:xs) = do
(on, score) <- get
case x of
'a' | on -> put (on, score + 1)
'b' | on -> put (on, score - 1)
'c' -> put (not on, score)
_ -> put (on, score)
playGame xs
startState = (False, 0)
main = print $ evalState (playGame "abcaaacbbcabbab") startState
(shamelessly lifted from this tutorial). Note the use of the analogous 'pure immutable data structures' within the context of a state monad, in addition to 'put' and 'get' monadic functions, which facilitate access to the state contained within the State Monad.
Ultimately, I'd suggest you ask yourself: what is it that you really want to accomplish with this model of an (OO) class? Haskell is not your typical OO-language, and trying to map concepts over 1-to-1 will only frustrate you in the short (and possibly long) term. This should be a standard mantra, but I'd highly recommend learning from the book Real World Haskell, where the authors are able to delve into far more detailed 'motivation' for picking any one tool or abstraction over another. If you were adamant, you could model traditional OO constructs in Haskell, but I wouldn't suggest going about doing this - unless you have a really good reason for doing so.
It takes a bit of permuting to transform imperative code into a purely functional context.
A setter mutates an object. We're not allowed to do that directly in Haskell because of laziness and purity.
Perhaps, if we transcribe the State monad to another language it'll be more apparent. Your code is in C++, but because I at least want garbage collection I'll use java here.
Since Java never got around to defining anonymous functions, first, we'll define an interface for pure functions.
public interface Function<A,B> {
B apply(A a);
}
Then we can make up a pure immutable pair type.
public final class Pair<A,B> {
private final A a;
private final B b;
public Pair(A a, B b) {
this.a = a;
this.b = b;
}
public A getFst() { return a; }
public B getSnd() { return b; }
public static <A,B> Pair<A,B> make(A a, B b) {
return new Pair<A,B>(a, b);
}
public String toString() {
return "(" + a + ", " + b + ")";
}
}
With those in hand, we can actually define the State monad:
public abstract class State<S,A> implements Function<S, Pair<A, S> > {
// pure takes a value a and yields a state action, that takes a state s, leaves it untouched, and returns your a paired with it.
public static <S,A> State<S,A> pure(final A a) {
return new State<S,A>() {
public Pair<A,S> apply(S s) {
return new Pair<A,S>(a, s);
}
};
}
// we can also read the state as a state action.
public static <S> State<S,S> get() {
return new State<S,S>() {
public Pair<S,S> apply(S, s) {
return new Pair<S,S>(s, s);
}
}
}
// we can compose state actions
public <B> State<S,B> bind(final Function<A, State<S,B>> f) {
return new State<S,B>() {
public Pair<B,S> apply(S s0) {
Pair<A,S> p = State.this.apply(s0);
return f.apply(p.getFst()).apply(p.getSnd());
}
};
}
// we can also read the state as a state action.
public static <S> State<S,S> put(final S newS) {
return new State<S,S>() {
public Pair<S,S> apply(S, s) {
return new Pair<S,S>(s, newS);
}
}
}
}
Now, there does exist a notion of a getter and a setter inside of the state monad. These are called lenses. The basic presentation in Java would look like:
public abstract class Lens[A,B] {
public abstract B get(A a);
public abstract A set(B b, A a);
// .. followed by a whole bunch of concrete methods.
}
The idea is that a lens provides access to a getter that knows how to extract a B from an A, and a setter that knows how to take a B and some old A, and replace part of the A, yielding a new A. It can't mutate the old one, but it can construct a new one with one of the fields replaced.
I gave a talk on these at a recent Boston Area Scala Enthusiasts meeting. You can watch the presentation here.
To come back into Haskell, rather than talk about things in an imperative setting. We can import
import Data.Lens.Lazy
from comonad-transformers or one of the other lens libraries mentioned here. That link provides the laws that must be satisfied to be a valid lens.
And then what you are looking for is some data type like:
data A = A { x_ :: Int }
with a lens
x :: Lens A Int
x = lens x_ (\b a -> a { x_ = b })
Now you can write code that looks like
postIncrement :: State A Int
postIncrement = do
old_x <- access x
x ~= (old_x + 1)
return old_x
using the combinators from Data.Lens.Lazy.
The other lens libraries mentioned above provide similar combinators.
First of all, I agree with Raeez that this probably the wrong way to go, unless you really know why! If you want to increase some value by 42 (say), why not write a function that does that for you?
It's quite a change from the traditional OO mindset where you have boxes with values in them and you take them out, manipulate them and put them back in. I would say that until you start noticing the pattern "Hey! I always take some value as an argument, and at the end return it slightly modified, tupled with some other value and all the plumbing is getting messy!" you don't really need the State monad. Part of the fun of (learning) Haskell is finding new ways to get around the stateful OO thinking!
That said, if you absolutely want a box with an x of type Int in it, you could try making your own get and put variants, something like this:
import Control.Monad.State
data Point = Point { x :: Int, y :: Int } deriving Show
getX :: State Point Int
getX = get >>= return . x
putX :: Int -> State Point ()
putX newVal = do
pt <- get
put (pt { x = newVal })
increaseX :: State Point ()
increaseX = do
x <- getX
putX (x + 1)
test :: State Point Int
test = do
x1 <- getX
increaseX
x2 <- getX
putX 7
x3 <- getX
return $ x1 + x2 + x3
Now, if you evaluate runState test (Point 2 9) in ghci, you'll get back (12,Point {x = 7, y = 9}) (since 12 = 2 + (2+1) + 7 and the x in the state gets set to 7 at the end). If you don't care about the returned point, you can use evalState and you'll get just the 12.
There's also more advanced things to do here like abstracting out Point with a typeclass in case you have multiple datastructures which have something which behaves like x but that's better left for another question, in my opinion!

Resources