Why is Vec::len a method instead of a public property? - methods

I noticed that Rust's Vec::len method just accesses the vector's len property. Why isn't len just a public property, rather than wrapping a method around it?
I assume this is so that in case the implementation changes in the future, nothing will break because Vec::len can change the way it gets the length without any users of Vec knowing, but I don't know if there are any other reasons.
The second part of my question is about when I'm designing an API. If I am building my own API, and I have a struct with a len property, should I make len private and create a public len() method? Is it bad practice to make fields public in Rust? I wouldn't think so, but I don't notice this being done often in Rust. For example, I have the following struct:
pub struct Segment {
pub dol_offset: u64,
pub len: usize,
pub loading_address: u64,
pub seg_type: SegmentType,
pub seg_num: u64,
}
Should any of those fields be private and instead have a wrapper function like Vec does? If so, then why? Is there a good guideline to follow for this in Rust?

One reason is to provide the same interface for all containers that implement some idea of length. (Such as std::iter::ExactSizeIterator.)
In the case of Vec, len() is acting like a getter:
impl<T> Vec<T> {
pub fn len(&self) -> usize {
self.len
}
}
While this ensures consistency across the standard library, there is another reason underlying this design choice...
This getter protects from external modification of len. If the condition Vec::len <= Vec::buf::cap is not ever satisfied, Vec's methods may try to access memory illegally. For instance, the implementation of Vec::push:
pub fn push(&mut self, value: T) {
if self.len == self.buf.cap() {
self.buf.double();
}
unsafe {
let end = self.as_mut_ptr().offset(self.len as isize);
ptr::write(end, value);
self.len += 1;
}
}
will attempt to write to memory past the actual end of the memory owned by the container. Because of this critical requirement, modification to len is forbidden.
Philosophy
It's definitely good to use a getter like this in library code (crazy people out there might try to modify it!).
However, one should design their code in a manner that minimizes the requirement of getters/setters. A class should act on its own members as much as possible. These actions should be made available to the public through methods. And here I mean methods that do useful things -- not just a plain ol' getter/setter that returns/sets a variable. Setters in particular can be made redundant through the use of constructors or methods. Vec shows us some of these "setters":
push
insert
pop
reserve
...
Thus, Vec implements algorithms that provide access to the outside world. But it manages its innards by itself.

The Vec struct looks something like this[1]:
pub struct Vec<T> {
ptr: *mut T,
capacity: usize,
len: usize,
}
The idea is that ptr points at a block of allocated memory of size capacity. If the size of the Vec needs to be bigger than the capacity then new memory is allocated. The unused portion of the allocated memory is uninitialised and could contain arbitrary data.
When you call mutating methods on Vec like push or pop, they carefully manage the Vec's internal state, increase capacity when necessary, and ensure that items that are removed are properly dropped.
If len was a public field, any code with an owned Vec, or a mutable reference to one, could set len to any value. Set it higher than it should be and you'll be able to read from uninitialised memory, causing Undefined Behaviour. Set it lower and you'll be effectively removing elements without properly dropping them.
In some other programming languages (e.g. JavaScript) the API for arrays or vectors specifically lets you change the size by setting a length property. It's not unreasonable to think that a programmer who is used to that approach could do this accidentally in Rust.
Keeping all the fields private and using a getter method for len() allows Vec to protect the mutability of its internals, make strong memory guarantees and prevent users from accidentally doing bad things to themselves.
[1] In practice, there are abstraction layers built over this data structure, so it looks a little different.

Related

In rust, what's the idiomatic way of expressing a struct that can be ordered, but only in reference to a standard value?

I'm trying to implement a game similar to GeoGuessr, where players enter geographic coordinates according to a street-view image, and are ranked by their distance to the correct location of the image.
I need a data structure to represent the submission of a player, and I want it to implement PartialEq and PartialOrd so that it can be easily sorted within container structures. However, unlike ordinary PartialOrd structures that are comparable by themselves, my structure is only comparable in reference to the correct answer.
I would like the rankings to be accessible at any time, so I'd prefer a container that always maintains the order of its elements to avoid the sorting costs, in my case I chose skiplist::ordered_skiplist::OrderedSkipList. That means methods like sort_by_key are unavailable to me, and I have to implement PartialOrd for my structure.
So I ended up keeping a reference to the correct answer as a field in my structure:
struct Submission<'a> {
submitted: Location,
correct: &'a Location,
}
impl Submission<'_> {
fn distance(&self) -> f64 {
self.submitted.distance(*self.correct)
}
}
impl PartialEq for Submission<'_> {
fn eq(&self, other: &Self) -> bool {
let d1 = self.distance();
let d2 = other.distance();
d1.eq(&d2)
}
}
impl PartialOrd for Submission<'_> {
fn partial_cmp(&self, other: &Self) -> Option<std::cmp::Ordering> {
let d1 = self.distance();
let d2 = other.distance();
d1.partial_cmp(&d2)
}
}
But this doesn't seem idiomatic to me, as it doesn't restrict comparison between Submissions with different correct references, which would be invalid. Also, maintaining the same correct reference in each Submission seems a redundant cost. Is there a more idiomatic way of defining the data structure for this scenario?
Edit:
I've considered comparing the correct references in partial_cmp and returning None for invalid comparisons, but that's also redundant for my case as I can prevent this in coding myself. I'm looking for a compile-time way of preventing invalid comparisons, rather than a runtime one.

Rust cannot infer an appropriate lifetime for autoref due to conflicting requirements [duplicate]

I have a value and I want to store that value and a reference to
something inside that value in my own type:
struct Thing {
count: u32,
}
struct Combined<'a>(Thing, &'a u32);
fn make_combined<'a>() -> Combined<'a> {
let thing = Thing { count: 42 };
Combined(thing, &thing.count)
}
Sometimes, I have a value and I want to store that value and a reference to
that value in the same structure:
struct Combined<'a>(Thing, &'a Thing);
fn make_combined<'a>() -> Combined<'a> {
let thing = Thing::new();
Combined(thing, &thing)
}
Sometimes, I'm not even taking a reference of the value and I get the
same error:
struct Combined<'a>(Parent, Child<'a>);
fn make_combined<'a>() -> Combined<'a> {
let parent = Parent::new();
let child = parent.child();
Combined(parent, child)
}
In each of these cases, I get an error that one of the values "does
not live long enough". What does this error mean?
Let's look at a simple implementation of this:
struct Parent {
count: u32,
}
struct Child<'a> {
parent: &'a Parent,
}
struct Combined<'a> {
parent: Parent,
child: Child<'a>,
}
impl<'a> Combined<'a> {
fn new() -> Self {
let parent = Parent { count: 42 };
let child = Child { parent: &parent };
Combined { parent, child }
}
}
fn main() {}
This will fail with the error:
error[E0515]: cannot return value referencing local variable `parent`
--> src/main.rs:19:9
|
17 | let child = Child { parent: &parent };
| ------- `parent` is borrowed here
18 |
19 | Combined { parent, child }
| ^^^^^^^^^^^^^^^^^^^^^^^^^^ returns a value referencing data owned by the current function
error[E0505]: cannot move out of `parent` because it is borrowed
--> src/main.rs:19:20
|
14 | impl<'a> Combined<'a> {
| -- lifetime `'a` defined here
...
17 | let child = Child { parent: &parent };
| ------- borrow of `parent` occurs here
18 |
19 | Combined { parent, child }
| -----------^^^^^^---------
| | |
| | move out of `parent` occurs here
| returning this value requires that `parent` is borrowed for `'a`
To completely understand this error, you have to think about how the
values are represented in memory and what happens when you move
those values. Let's annotate Combined::new with some hypothetical
memory addresses that show where values are located:
let parent = Parent { count: 42 };
// `parent` lives at address 0x1000 and takes up 4 bytes
// The value of `parent` is 42
let child = Child { parent: &parent };
// `child` lives at address 0x1010 and takes up 4 bytes
// The value of `child` is 0x1000
Combined { parent, child }
// The return value lives at address 0x2000 and takes up 8 bytes
// `parent` is moved to 0x2000
// `child` is ... ?
What should happen to child? If the value was just moved like parent
was, then it would refer to memory that no longer is guaranteed to
have a valid value in it. Any other piece of code is allowed to store
values at memory address 0x1000. Accessing that memory assuming it was
an integer could lead to crashes and/or security bugs, and is one of
the main categories of errors that Rust prevents.
This is exactly the problem that lifetimes prevent. A lifetime is a
bit of metadata that allows you and the compiler to know how long a
value will be valid at its current memory location. That's an
important distinction, as it's a common mistake Rust newcomers make.
Rust lifetimes are not the time period between when an object is
created and when it is destroyed!
As an analogy, think of it this way: During a person's life, they will
reside in many different locations, each with a distinct address. A
Rust lifetime is concerned with the address you currently reside at,
not about whenever you will die in the future (although dying also
changes your address). Every time you move it's relevant because your
address is no longer valid.
It's also important to note that lifetimes do not change your code; your
code controls the lifetimes, your lifetimes don't control the code. The
pithy saying is "lifetimes are descriptive, not prescriptive".
Let's annotate Combined::new with some line numbers which we will use
to highlight lifetimes:
{ // 0
let parent = Parent { count: 42 }; // 1
let child = Child { parent: &parent }; // 2
// 3
Combined { parent, child } // 4
} // 5
The concrete lifetime of parent is from 1 to 4, inclusive (which I'll
represent as [1,4]). The concrete lifetime of child is [2,4], and
the concrete lifetime of the return value is [4,5]. It's
possible to have concrete lifetimes that start at zero - that would
represent the lifetime of a parameter to a function or something that
existed outside of the block.
Note that the lifetime of child itself is [2,4], but that it refers
to a value with a lifetime of [1,4]. This is fine as long as the
referring value becomes invalid before the referred-to value does. The
problem occurs when we try to return child from the block. This would
"over-extend" the lifetime beyond its natural length.
This new knowledge should explain the first two examples. The third
one requires looking at the implementation of Parent::child. Chances
are, it will look something like this:
impl Parent {
fn child(&self) -> Child { /* ... */ }
}
This uses lifetime elision to avoid writing explicit generic
lifetime parameters. It is equivalent to:
impl Parent {
fn child<'a>(&'a self) -> Child<'a> { /* ... */ }
}
In both cases, the method says that a Child structure will be
returned that has been parameterized with the concrete lifetime of
self. Said another way, the Child instance contains a reference
to the Parent that created it, and thus cannot live longer than that
Parent instance.
This also lets us recognize that something is really wrong with our
creation function:
fn make_combined<'a>() -> Combined<'a> { /* ... */ }
Although you are more likely to see this written in a different form:
impl<'a> Combined<'a> {
fn new() -> Combined<'a> { /* ... */ }
}
In both cases, there is no lifetime parameter being provided via an
argument. This means that the lifetime that Combined will be
parameterized with isn't constrained by anything - it can be whatever
the caller wants it to be. This is nonsensical, because the caller
could specify the 'static lifetime and there's no way to meet that
condition.
How do I fix it?
The easiest and most recommended solution is to not attempt to put
these items in the same structure together. By doing this, your
structure nesting will mimic the lifetimes of your code. Place types
that own data into a structure together and then provide methods that
allow you to get references or objects containing references as needed.
There is a special case where the lifetime tracking is overzealous:
when you have something placed on the heap. This occurs when you use a
Box<T>, for example. In this case, the structure that is moved
contains a pointer into the heap. The pointed-at value will remain
stable, but the address of the pointer itself will move. In practice,
this doesn't matter, as you always follow the pointer.
Some crates provide ways of representing this case, but they
require that the base address never move. This rules out mutating
vectors, which may cause a reallocation and a move of the
heap-allocated values.
rental (no longer maintained or supported)
owning_ref (has multiple soundness issues)
ouroboros
self_cell
Examples of problems solved with Rental:
Is there an owned version of String::chars?
Returning a RWLockReadGuard independently from a method
How can I return an iterator over a locked struct member in Rust?
How to return a reference to a sub-value of a value that is under a mutex?
How do I store a result using Serde Zero-copy deserialization of a Futures-enabled Hyper Chunk?
How to store a reference without having to deal with lifetimes?
In other cases, you may wish to move to some type of reference-counting, such as by using Rc or Arc.
More information
After moving parent into the struct, why is the compiler not able to get a new reference to parent and assign it to child in the struct?
While it is theoretically possible to do this, doing so would introduce a large amount of complexity and overhead. Every time that the object is moved, the compiler would need to insert code to "fix up" the reference. This would mean that copying a struct is no longer a very cheap operation that just moves some bits around. It could even mean that code like this is expensive, depending on how good a hypothetical optimizer would be:
let a = Object::new();
let b = a;
let c = b;
Instead of forcing this to happen for every move, the programmer gets to choose when this will happen by creating methods that will take the appropriate references only when you call them.
A type with a reference to itself
There's one specific case where you can create a type with a reference to itself. You need to use something like Option to make it in two steps though:
#[derive(Debug)]
struct WhatAboutThis<'a> {
name: String,
nickname: Option<&'a str>,
}
fn main() {
let mut tricky = WhatAboutThis {
name: "Annabelle".to_string(),
nickname: None,
};
tricky.nickname = Some(&tricky.name[..4]);
println!("{:?}", tricky);
}
This does work, in some sense, but the created value is highly restricted - it can never be moved. Notably, this means it cannot be returned from a function or passed by-value to anything. A constructor function shows the same problem with the lifetimes as above:
fn creator<'a>() -> WhatAboutThis<'a> { /* ... */ }
If you try to do this same code with a method, you'll need the alluring but ultimately useless &'a self. When that's involved, this code is even more restricted and you will get borrow-checker errors after the first method call:
#[derive(Debug)]
struct WhatAboutThis<'a> {
name: String,
nickname: Option<&'a str>,
}
impl<'a> WhatAboutThis<'a> {
fn tie_the_knot(&'a mut self) {
self.nickname = Some(&self.name[..4]);
}
}
fn main() {
let mut tricky = WhatAboutThis {
name: "Annabelle".to_string(),
nickname: None,
};
tricky.tie_the_knot();
// cannot borrow `tricky` as immutable because it is also borrowed as mutable
// println!("{:?}", tricky);
}
See also:
Cannot borrow as mutable more than once at a time in one code - but can in another very similar
What about Pin?
Pin, stabilized in Rust 1.33, has this in the module documentation:
A prime example of such a scenario would be building self-referential structs, since moving an object with pointers to itself will invalidate them, which could cause undefined behavior.
It's important to note that "self-referential" doesn't necessarily mean using a reference. Indeed, the example of a self-referential struct specifically says (emphasis mine):
We cannot inform the compiler about that with a normal reference,
since this pattern cannot be described with the usual borrowing rules.
Instead we use a raw pointer, though one which is known to not be null,
since we know it's pointing at the string.
The ability to use a raw pointer for this behavior has existed since Rust 1.0. Indeed, owning-ref and rental use raw pointers under the hood.
The only thing that Pin adds to the table is a common way to state that a given value is guaranteed to not move.
See also:
How to use the Pin struct with self-referential structures?
A slightly different issue which causes very similar compiler messages is object lifetime dependency, rather than storing an explicit reference. An example of that is the ssh2 library. When developing something bigger than a test project, it is tempting to try to put the Session and Channel obtained from that session alongside each other into a struct, hiding the implementation details from the user. However, note that the Channel definition has the 'sess lifetime in its type annotation, while Session doesn't.
This causes similar compiler errors related to lifetimes.
One way to solve it in a very simple way is to declare the Session outside in the caller, and then for annotate the reference within the struct with a lifetime, similar to the answer in this Rust User's Forum post talking about the same issue while encapsulating SFTP. This will not look elegant and may not always apply - because now you have two entities to deal with, rather than one that you wanted!
Turns out the rental crate or the owning_ref crate from the other answer are the solutions for this issue too. Let's consider the owning_ref, which has the special object for this exact purpose:
OwningHandle. To avoid the underlying object moving, we allocate it on the heap using a Box, which gives us the following possible solution:
use ssh2::{Channel, Error, Session};
use std::net::TcpStream;
use owning_ref::OwningHandle;
struct DeviceSSHConnection {
tcp: TcpStream,
channel: OwningHandle<Box<Session>, Box<Channel<'static>>>,
}
impl DeviceSSHConnection {
fn new(targ: &str, c_user: &str, c_pass: &str) -> Self {
use std::net::TcpStream;
let mut session = Session::new().unwrap();
let mut tcp = TcpStream::connect(targ).unwrap();
session.handshake(&tcp).unwrap();
session.set_timeout(5000);
session.userauth_password(c_user, c_pass).unwrap();
let mut sess = Box::new(session);
let mut oref = OwningHandle::new_with_fn(
sess,
unsafe { |x| Box::new((*x).channel_session().unwrap()) },
);
oref.shell().unwrap();
let ret = DeviceSSHConnection {
tcp: tcp,
channel: oref,
};
ret
}
}
The result of this code is that we can not use the Session anymore, but it is stored alongside with the Channel which we will be using. Because the OwningHandle object dereferences to Box, which dereferences to Channel, when storing it in a struct, we name it as such. NOTE: This is just my understanding. I have a suspicion this may not be correct, since it appears to be quite close to discussion of OwningHandle unsafety.
One curious detail here is that the Session logically has a similar relationship with TcpStream as Channel has to Session, yet its ownership is not taken and there are no type annotations around doing so. Instead, it is up to the user to take care of this, as the documentation of handshake method says:
This session does not take ownership of the socket provided, it is
recommended to ensure that the socket persists the lifetime of this
session to ensure that communication is correctly performed.
It is also highly recommended that the stream provided is not used
concurrently elsewhere for the duration of this session as it may
interfere with the protocol.
So with the TcpStream usage, is completely up to the programmer to ensure the correctness of the code. With the OwningHandle, the attention to where the "dangerous magic" happens is drawn using the unsafe {} block.
A further and a more high-level discussion of this issue is in this Rust User's Forum thread - which includes a different example and its solution using the rental crate, which does not contain unsafe blocks.
I've found the Arc (read-only) or Arc<Mutex> (read-write with locking) patterns to be sometimes quite useful tradeoff between performance and code complexity (mostly caused by lifetime-annotation).
Arc:
use std::sync::Arc;
struct Parent {
child: Arc<Child>,
}
struct Child {
value: u32,
}
struct Combined(Parent, Arc<Child>);
fn main() {
let parent = Parent { child: Arc::new(Child { value: 42 }) };
let child = parent.child.clone();
let combined = Combined(parent, child.clone());
assert_eq!(combined.0.child.value, 42);
assert_eq!(child.value, 42);
// combined.0.child.value = 50; // fails, Arc is not DerefMut
}
Arc + Mutex:
use std::sync::{Arc, Mutex};
struct Child {
value: u32,
}
struct Parent {
child: Arc<Mutex<Child>>,
}
struct Combined(Parent, Arc<Mutex<Child>>);
fn main() {
let parent = Parent { child: Arc::new(Mutex::new(Child {value: 42 }))};
let child = parent.child.clone();
let combined = Combined(parent, child.clone());
assert_eq!(combined.0.child.lock().unwrap().value, 42);
assert_eq!(child.lock().unwrap().value, 42);
child.lock().unwrap().value = 50;
assert_eq!(combined.0.child.lock().unwrap().value, 50);
}
See also RwLock (When or why should I use a Mutex over an RwLock?)
As a newcomer to Rust, I had a case similar to your last example:
struct Combined<'a>(Parent, Child<'a>);
fn make_combined<'a>() -> Combined<'a> {
let parent = Parent::new();
let child = parent.child();
Combined(parent, child)
}
In the end, I solved it by using this pattern:
fn make_parent_and_child<'a>(anchor: &'a mut DataAnchorFor1<Parent>) -> Child<'a> {
// construct parent, then store it in anchor object the caller gave us a mut-ref to
*anchor = DataAnchorFor1::holding(Parent::new());
// now retrieve parent from storage-slot we assigned to in the previous line
let parent = anchor.val1.as_mut().unwrap();
// now proceed with regular code, except returning only the child
// (the parent can already be accessed by the caller through the anchor object)
let child = parent.child();
child
}
// this is a generic struct that we can define once, and use whenever we need this pattern
// (it can also be extended to have multiple slots, naturally)
struct DataAnchorFor1<T> {
val1: Option<T>,
}
impl<T> DataAnchorFor1<T> {
fn empty() -> Self {
Self { val1: None }
}
fn holding(val1: T) -> Self {
Self { val1: Some(val1) }
}
}
// for my case, this was all I needed
fn main_simple() {
let anchor = DataAnchorFor1::empty();
let child = make_parent_and_child(&mut anchor);
let child_processing_result = do_some_processing(child);
println!("ChildProcessingResult:{}", child_processing_result);
}
// but if access to parent-data later on is required, you can use this
fn main_complex() {
let anchor = DataAnchorFor1::empty();
// if you want to use the parent object (which is stored in anchor), you must...
// ...wrap the child-related processing in a new scope, so the mut-ref to anchor...
// ...gets dropped at its end, letting us access anchor.val1 (the parent) directly
let child_processing_result = {
let child = make_parent_and_child(&mut anchor);
// do the processing you want with the child here (avoiding ref-chain...
// ...back to anchor-data, if you need to access parent-data afterward)
do_some_processing(child)
};
// now that scope is ended, we can access parent data directly
// so print out the relevant data for both parent and child (adjust to your case)
let parent = anchor.val1.unwrap();
println!("Parent:{} ChildProcessingResult:{}", parent, child_processing_result);
}
This is far from a universal solution! But it worked in my case, and only required usage of the main_simple pattern above (not the main_complex variant), because in my case the "parent" object was just something temporary (a database "Client" object) that I had to construct to pass to the "child" object (a database "Transaction" object) so I could run some database commands.
Anyway, it accomplished the encapsulation/simplification-of-boilerplate that I needed (since I had many functions that needed creation of a Transaction/"child" object, and now all they need is that generic anchor-object creation line), while avoiding the need for using a whole new library.
These are the libraries I'm aware of that may be relevant:
owning-ref
rental
ouroboros
reffers
self_cell
escher
rust-viewbox
However, I scanned through them, and they all seem to have issues of one kind or another (not being updated in years, having multiple unsoundness issues/concerns raised, etc.), so I was hesitant to use them.
So while this isn't as generic of a solution, I figured I would mention it for people with similar use-cases:
Where the caller only needs the "child" object returned.
But the called-function needs to construct a "parent" object to perform its functions.
And the borrowing rules requires that the "parent" object be stored somewhere that persists beyond the "make_parent_and_child" function. (in my case, this was a start_transaction function)

How to write fuctions that takes IntoIter more generally

I was reading an answer to stackoverflow question and tried to modify the function history to take IntoIter where item can be anything that can be transformed into reference and has some traits Debug in this case.
If I will remove V: ?Sized from the function definition rust compiler would complain that it doesn't know the size of str at compile time.
use std::fmt::Debug;
pub fn history<I: IntoIterator, V: ?Sized>(i: I) where I::Item: AsRef<V>, V: Debug {
for s in i {
println!("{:?}", s.as_ref());
}
}
fn main() {
history::<_, str>(&["st", "t", "u"]);
}
I don't understand why compiler shows error in the first place and not sure why the program is working properly if I kind of cheat with V: ?Sized.
I kind of cheat with V: ?Sized
It isn't cheating. All generic arguments are assumed to be Sized by default. This default is there because it's the most common case - without it, nearly every type parameter would have to be annotated with : Sized.
In your case, V is only ever accessed by reference, so it doesn't need to be Sized. Relaxing the Sized constraint makes your function as general as possible, allowing it to be used with the most possible types.
The type str is unsized, so this is not just about generalisation, you actually need to relax the default Sized constraint to be able to use your function with str.

Cannot use Rayon's `.par_iter()`

I have a struct which implements Iterator and it works fine as an iterator. It produces values, and using .map(), I download each item from a local HTTP server and save the results. I now want to parallelize this operation, and Rayon looks friendly.
I am getting a compiler error when trying to follow the example in the documentation.
This is the code that works sequentially. generate_values returns the struct which implements Iterator. dl downloads the values and saves them (i.e. it has side effects). Since iterators are lazy in Rust, I have put a .count() at the end so that it will actually run it.
generate_values(14).map(|x| { dl(x, &path, &upstream_url); }).count();
Following the Rayon example I tried this:
generate_values(14).par_iter().map(|x| { dl(x, &path, &upstream_url); }).count();
and got the following error:
src/main.rs:69:27: 69:37 error: no method named `par_iter` found for type `MyIterator` in the current scope
Interestingly, when I use .iter(), which many Rust things use, I get a similar error:
src/main.rs:69:27: 69:33 error: no method named `iter` found for type `MyIterator` in the current scope
src/main.rs:69 generate_values(14).iter().map(|tile| { dl_tile(tile, &tc_path, &upstream_url); }).count();
Since I implement Iterator, I should get .iter() for free right? Is this why .par_iter() doesn't work?
Rust 1.6 and Rayon 0.3.1
$ rustc --version
rustc 1.6.0 (c30b771ad 2016-01-19)
Rayon 0.3.1 defines par_iter as:
pub trait IntoParallelRefIterator<'data> {
type Iter: ParallelIterator<Item=&'data Self::Item>;
type Item: Sync + 'data;
fn par_iter(&'data self) -> Self::Iter;
}
There is only one type that implements this trait in Rayon itself: [T]:
impl<'data, T: Sync + 'data> IntoParallelRefIterator<'data> for [T] {
type Item = T;
type Iter = SliceIter<'data, T>;
fn par_iter(&'data self) -> Self::Iter {
self.into_par_iter()
}
}
That's why Lukas Kalbertodt's answer to collect to a Vec will work; Vec dereferences to a slice.
Generally, Rayon could not assume that any iterator would be amenable to parallelization, so it cannot default to including all Iterators.
Since you have defined generate_values, you could implement the appropriate Rayon trait for it as well:
IntoParallelIterator
IntoParallelRefIterator
IntoParallelRefMutIterator
That should allow you to avoid collecting into a temporary vector.
No, the Iterator trait has nothing to do with the iter() method. Yes, this is slightly confusing.
There are a few different concepts here. An Iterator is a type that can spit out values; it only needs to implement next() and has many other methods, but none of these is iter(). Then there is IntoIterator which says that a type can be transformed into an Iterator. This trait has the into_iter() method. Now the iter() method is not really related to any of those two traits. It's just a normal method of many types, that often works similar to into_iter().
Now to your Rayon problem: it looks like you can't just take any normal iterator and turn it into a parallel one. However, I never used this library, so takes this with a grain of salt. To me it looks like you need to collect your iterator into a Vec to be able to use par_iter().
And just as a note: when using normal iterators, you shouldn't use map() and count(), but rather use a standard for loop.

Code duplication for functions that take shared_ptr and unique_ptr

Problem:
Let's assume I have an algorithm that takes a unique_ptr to some type:
void FancyAlgo(unique_ptr<SomeType>& ptr);
Now I have shared_ptr sPtr to SomeType, and I need to apply the same algorithm on sPtr. Does this mean I have to duplicate the algorithm just for the shared_ptr?
void FancyAlgo(shared_ptr<SomeType>& sPtr);
I know smart pointers come with ownership of the underlying managed object on the heap. Here in my FancyAlgo, ownership is usually not an issue. I thought about stripping off the smart pointer layer and do something like:
void FancyAlgo(SomeType& value);
and when I need to call it with unique_ptr:
FancyAlgo(*ptr);
likewise for shared_ptr.
1, Is this an acceptable style in PRODUCTION code?(I saw somewhere that in a context of smart pointers, you should NOT manipulate raw pointers in a similar way. It has the danger of introducing mysterious bugs.)
2, Can you suggest any better way (without code duplication) if 1 is not a good idea.
Thanks.
Smart pointers are about ownership. Asking for a smart pointer is asking for ownership information or control.
Asking for a non-const lvalue reference to a smart pointer is asking for permission to change the ownership status of that value.
Asking for a const lvalue reference to a smart pointer is asking for permission to query the ownership status of that value.
Asking for an rvalue reference to a smart pointer is being a "sink", and promising to take that ownership away from the caller.
Asking for a const rvalue reference is a bad idea.
If you are accessing the pointed to value, and you want it to be non-nullable, a reference to the underlying type is good.
If you want it to be nullable, a boost::optional<T&> or a T* are acceptable, as is the std::experimental "world's dumbest smart pointer" (or an equivalent hand-written one). All of these are non-owning nullable references to some variable.
In an interface, don't ask for things you don't need and won't need in the future. That makes reasoning about what the function does harder, and leads to problems like you have in the OP. A function that reseats a reference is a very different function from one that reads a value.
Now, a more interesting question based off yours is one where you want the function to reseat the smart pointer, but you want to be able to do it to both shared and unique pointer inputs. This is sort of a strange case, but I could imagine writing a type-erase-down-to-emplace type (a emplace_sink<T>).
template<class T>
using later_ctor = std::function<T*(void*)>;
template<class T, class...Args>
later_ctor<T> delayed_emplace(Args&&...args) {
// relies on C++1z lambda reference reference binding, write manually
// if that doesn't get in, or don't want to rely on it:
return [&](void* ptr)->T* {
return new T(ptr)(std::forward<Args>(args));
};
}
namespace details {
template<class T>
struct emplace_target {
virtual ~emplace_target() {}
virtual T* emplace( later_ctor<T> ctor ) = 0;
};
}
template<class T>
struct emplacer {
std::unique_ptr<emplace_target<T>> pImpl;
template<class...Args>
T* emplace( Args&&... args ) {
return pImpl->emplace( delayed_emplace<T>(std::forward<Args>(args)...) );
}
template<class D>
emplacer( std::shared_ptr<T, D>& target ):
pImpl( new details::emplace_shared_ptr<T,D>(&target) ) // TODO
{}
template<class D>
emplacer( std::unique_ptr<T, D>& target ):
pImpl( new details::emplace_unique_ptr<T,D>(&target) ) // TODO
{}
};
etc. Lots of polish needed. The idea is to type-erase construction of an object T into an arbitrary context. We might need to special case shared_ptr so we can call make_shared, as a void*->T* delayed ctor is not good enough to pull that off (not fundamentally, but because of lack of API hooks).
Aha! I can do a make shared shared ptr without special casing it much.
We allocate a block of memory (char[sizeof(T)]) with a destructor that converts the buffer to T then calls delete, in-place construct in that buffer (getting the T*), then convert to a shared_ptr<T> via the shared_ptr<T>( shared_ptr<char[sizeof(T)]>, T* ) constructor. With careful exception catching this should be safe, and we can emplace using our emplacement function into a make_shared combined buffer.

Resources