Cannot use Rayon's `.par_iter()` - parallel-processing

I have a struct which implements Iterator and it works fine as an iterator. It produces values, and using .map(), I download each item from a local HTTP server and save the results. I now want to parallelize this operation, and Rayon looks friendly.
I am getting a compiler error when trying to follow the example in the documentation.
This is the code that works sequentially. generate_values returns the struct which implements Iterator. dl downloads the values and saves them (i.e. it has side effects). Since iterators are lazy in Rust, I have put a .count() at the end so that it will actually run it.
generate_values(14).map(|x| { dl(x, &path, &upstream_url); }).count();
Following the Rayon example I tried this:
generate_values(14).par_iter().map(|x| { dl(x, &path, &upstream_url); }).count();
and got the following error:
src/main.rs:69:27: 69:37 error: no method named `par_iter` found for type `MyIterator` in the current scope
Interestingly, when I use .iter(), which many Rust things use, I get a similar error:
src/main.rs:69:27: 69:33 error: no method named `iter` found for type `MyIterator` in the current scope
src/main.rs:69 generate_values(14).iter().map(|tile| { dl_tile(tile, &tc_path, &upstream_url); }).count();
Since I implement Iterator, I should get .iter() for free right? Is this why .par_iter() doesn't work?
Rust 1.6 and Rayon 0.3.1
$ rustc --version
rustc 1.6.0 (c30b771ad 2016-01-19)

Rayon 0.3.1 defines par_iter as:
pub trait IntoParallelRefIterator<'data> {
type Iter: ParallelIterator<Item=&'data Self::Item>;
type Item: Sync + 'data;
fn par_iter(&'data self) -> Self::Iter;
}
There is only one type that implements this trait in Rayon itself: [T]:
impl<'data, T: Sync + 'data> IntoParallelRefIterator<'data> for [T] {
type Item = T;
type Iter = SliceIter<'data, T>;
fn par_iter(&'data self) -> Self::Iter {
self.into_par_iter()
}
}
That's why Lukas Kalbertodt's answer to collect to a Vec will work; Vec dereferences to a slice.
Generally, Rayon could not assume that any iterator would be amenable to parallelization, so it cannot default to including all Iterators.
Since you have defined generate_values, you could implement the appropriate Rayon trait for it as well:
IntoParallelIterator
IntoParallelRefIterator
IntoParallelRefMutIterator
That should allow you to avoid collecting into a temporary vector.

No, the Iterator trait has nothing to do with the iter() method. Yes, this is slightly confusing.
There are a few different concepts here. An Iterator is a type that can spit out values; it only needs to implement next() and has many other methods, but none of these is iter(). Then there is IntoIterator which says that a type can be transformed into an Iterator. This trait has the into_iter() method. Now the iter() method is not really related to any of those two traits. It's just a normal method of many types, that often works similar to into_iter().
Now to your Rayon problem: it looks like you can't just take any normal iterator and turn it into a parallel one. However, I never used this library, so takes this with a grain of salt. To me it looks like you need to collect your iterator into a Vec to be able to use par_iter().
And just as a note: when using normal iterators, you shouldn't use map() and count(), but rather use a standard for loop.

Related

What is the difference between method call syntax `foo.method()` and UFCS `Foo::method(&foo)`?

Is there any difference in Rust between calling a method on a value, like this:
struct A { e: u32 }
impl A {
fn show(&self) {
println!("{}", self.e)
}
}
fn main() {
A { e: 0 }.show();
}
...and calling it on the type, like this:
fn main() {
A::show(&A { e: 0 })
}
Summary: The most important difference is that the universal function call syntax (UFCS) is more explicit than the method call syntax.
With UFCS there is basically no ambiguity what function you want to call (there is still a longer form of the UFCS for trait methods, but let's ignore that for now). The method call syntax, on the other hand, requires more work in the compiler to figure out which method to call and how to call it. This manifests in mostly two things:
Method resolution: figure out if the method is inherent (bound to the type, not a trait) or a trait method. And in the latter case, also figure out which trait it belongs to.
Figure out the correct receiver type (self) and potentially use type coercions to make the call work.
Receiver type coercions
Let's take a look at this example to understand the type coercions to the receiver type:
struct Foo;
impl Foo {
fn on_ref(&self) {}
fn on_mut_ref(&mut self) {}
fn on_value(self) {}
}
fn main() {
let reference = &Foo; // type `&Foo`
let mut_ref = &mut Foo; // type `&mut Foo`
let mut value = Foo; // type `Foo`
// ...
}
So we have three methods that take Foo, &Foo and &mut Foo receiver and we have three variables with those types. Let's try out all 9 combinations with each, method call syntax and UFCS.
UFCS
Foo::on_ref(reference);
//Foo::on_mut_ref(reference); error: mismatched types
//Foo::on_value(reference); error: mismatched types
//Foo::on_ref(mut_ref); error: mismatched types
Foo::on_mut_ref(mut_ref);
//Foo::on_value(mut_ref); error: mismatched types
//Foo::on_ref(value); error: mismatched types
//Foo::on_mut_ref(value); error: mismatched types
Foo::on_value(value);
As we can see, only the calls succeed where the types are correct. To make the other calls work we would have to manually add & or &mut or * in front of the argument. That's the standard behavior for all function arguments.
Method call syntax
reference.on_ref();
//reference.on_mut_ref(); error: cannot borrow `*reference` as mutable
//reference.on_value(); error: cannot move out of `*reference`
mut_ref.on_ref();
mut_ref.on_mut_ref();
//mut_ref.on_value(); error: cannot move out of `*mut_ref`
value.on_ref();
value.on_mut_ref();
value.on_value();
Only three of the method calls lead to an error while the others succeed. Here, the compiler automatically inserts deref (dereferencing) or autoref (adding a reference) coercions to make the call work. Also note that the three errors are not "type mismatch" errors: the compiler already tried to adjust the type correctly, but this lead to other errors.
There are some additional coercions:
Unsize coercions, described by the Unsize trait. Allows you to call slice methods on arrays and to coerce types into trait objects of traits they implement.
Advanced deref coercions via the Deref trait. This allows you to call slice methods on Vec, for example.
Method resolution: figuring out what method to call
When writing lhs.method_name(), then the method method_name could be an inherent method of the type of lhs or it could belong to a trait that's in scope (imported). The compiler has to figure out which one to call and has a number of rules for this. When getting into the details, these rules are actually really complex and can lead to some surprising behavior. Luckily, most programmers will never have to deal with that and it "just works" most of the time.
To give a coarse overview how it works, the compiler tries the following things in order, using the first method that is found.
Is there an inherent method with the name method_name where the receiver type fits exactly (does not need coercions)?
Is there a trait method with the name method_name where the receiver type fits exactly (does not need coercions)?
Is there an inherent method with the name method_name? (type coercions will be performed)
Is there a trait method with the name method_name? (type coercions will be performed)
(Again, note that this is still a simplification. Different type of coercions are preferred over others, for example.)
This shows one rule that most programmers know: inherent methods have a higher priority than trait methods. But a bit unknown is the fact that whether or not the receiver type fits perfectly is a more important factor. There is a quiz that nicely demonstrates this: Rust Quiz #23. More details on the exact method resolution algorithm can be found in this StackOverflow answer.
This set of rules can actually make a bunch of changes to an API to be breaking changes. We currently have to deal with that in the attempt to add an IntoIterator impl for arrays.
Another – minor and probably very obvious – difference is that for the method call syntax, the type name does not have to be imported.
Apart from that it's worth pointing out what is not different about the two syntaxes:
Runtime behavior: no difference whatsoever.
Performance: the method call syntax is "converted" (desugared) into basically the UFCS pretty early inside the compiler, meaning that there aren't any performance differences either.

How to write fuctions that takes IntoIter more generally

I was reading an answer to stackoverflow question and tried to modify the function history to take IntoIter where item can be anything that can be transformed into reference and has some traits Debug in this case.
If I will remove V: ?Sized from the function definition rust compiler would complain that it doesn't know the size of str at compile time.
use std::fmt::Debug;
pub fn history<I: IntoIterator, V: ?Sized>(i: I) where I::Item: AsRef<V>, V: Debug {
for s in i {
println!("{:?}", s.as_ref());
}
}
fn main() {
history::<_, str>(&["st", "t", "u"]);
}
I don't understand why compiler shows error in the first place and not sure why the program is working properly if I kind of cheat with V: ?Sized.
I kind of cheat with V: ?Sized
It isn't cheating. All generic arguments are assumed to be Sized by default. This default is there because it's the most common case - without it, nearly every type parameter would have to be annotated with : Sized.
In your case, V is only ever accessed by reference, so it doesn't need to be Sized. Relaxing the Sized constraint makes your function as general as possible, allowing it to be used with the most possible types.
The type str is unsized, so this is not just about generalisation, you actually need to relax the default Sized constraint to be able to use your function with str.

Why is Vec::len a method instead of a public property?

I noticed that Rust's Vec::len method just accesses the vector's len property. Why isn't len just a public property, rather than wrapping a method around it?
I assume this is so that in case the implementation changes in the future, nothing will break because Vec::len can change the way it gets the length without any users of Vec knowing, but I don't know if there are any other reasons.
The second part of my question is about when I'm designing an API. If I am building my own API, and I have a struct with a len property, should I make len private and create a public len() method? Is it bad practice to make fields public in Rust? I wouldn't think so, but I don't notice this being done often in Rust. For example, I have the following struct:
pub struct Segment {
pub dol_offset: u64,
pub len: usize,
pub loading_address: u64,
pub seg_type: SegmentType,
pub seg_num: u64,
}
Should any of those fields be private and instead have a wrapper function like Vec does? If so, then why? Is there a good guideline to follow for this in Rust?
One reason is to provide the same interface for all containers that implement some idea of length. (Such as std::iter::ExactSizeIterator.)
In the case of Vec, len() is acting like a getter:
impl<T> Vec<T> {
pub fn len(&self) -> usize {
self.len
}
}
While this ensures consistency across the standard library, there is another reason underlying this design choice...
This getter protects from external modification of len. If the condition Vec::len <= Vec::buf::cap is not ever satisfied, Vec's methods may try to access memory illegally. For instance, the implementation of Vec::push:
pub fn push(&mut self, value: T) {
if self.len == self.buf.cap() {
self.buf.double();
}
unsafe {
let end = self.as_mut_ptr().offset(self.len as isize);
ptr::write(end, value);
self.len += 1;
}
}
will attempt to write to memory past the actual end of the memory owned by the container. Because of this critical requirement, modification to len is forbidden.
Philosophy
It's definitely good to use a getter like this in library code (crazy people out there might try to modify it!).
However, one should design their code in a manner that minimizes the requirement of getters/setters. A class should act on its own members as much as possible. These actions should be made available to the public through methods. And here I mean methods that do useful things -- not just a plain ol' getter/setter that returns/sets a variable. Setters in particular can be made redundant through the use of constructors or methods. Vec shows us some of these "setters":
push
insert
pop
reserve
...
Thus, Vec implements algorithms that provide access to the outside world. But it manages its innards by itself.
The Vec struct looks something like this[1]:
pub struct Vec<T> {
ptr: *mut T,
capacity: usize,
len: usize,
}
The idea is that ptr points at a block of allocated memory of size capacity. If the size of the Vec needs to be bigger than the capacity then new memory is allocated. The unused portion of the allocated memory is uninitialised and could contain arbitrary data.
When you call mutating methods on Vec like push or pop, they carefully manage the Vec's internal state, increase capacity when necessary, and ensure that items that are removed are properly dropped.
If len was a public field, any code with an owned Vec, or a mutable reference to one, could set len to any value. Set it higher than it should be and you'll be able to read from uninitialised memory, causing Undefined Behaviour. Set it lower and you'll be effectively removing elements without properly dropping them.
In some other programming languages (e.g. JavaScript) the API for arrays or vectors specifically lets you change the size by setting a length property. It's not unreasonable to think that a programmer who is used to that approach could do this accidentally in Rust.
Keeping all the fields private and using a getter method for len() allows Vec to protect the mutability of its internals, make strong memory guarantees and prevent users from accidentally doing bad things to themselves.
[1] In practice, there are abstraction layers built over this data structure, so it looks a little different.

What does the 'where' clause within a trait do?

If I have this code:
trait Trait {
fn f(&self) -> i32 where Self: Sized;
fn g(&self) -> i32;
}
fn object_safety_dynamic(x: &Trait) {
x.f(); // error
x.g(); // works
}
What does the where clause actually do?
Naively, I was thinking where Self: Sized; dictates something about the type implementing Trait, like 'if you implement Trait for type A your type A must be sized, i.e., it can be i32 but not [i32].
However, such a constraint would rather go as trait Trait: Sized (correct me if I am wrong)?
Now I noticed where Self: Sized; actually determines if I can call f or g from within object_safety_dynamic.
My questions:
What happens here behind the scenes?
What (in simple English) am I actually telling the compiler by where Self: Sized; that makes g() work but f() not?
In particular: Since &self is a reference anyway, what compiled difference exists between f and g for various (sized or unsized) types. Wouldn't it always boil down to something like _vtable_f_or_g(*self) -> i32, regardless of where or if the type is sized or not?
Why can I implement Trait for both u8 and [u8]. Shouldn't the compiler actually stop me from implementing f() for [u8], instead of throwing an error at the call site?
fn f(&self) -> i32 where Self: Sized;
This says that f is only defined for types that also implement Sized. Unsized types may still implement Trait, but f will not be available.
Inside object_safety_dynamic, calling x.f() is actually doing: (*x).f(). While x is sized because it's a pointer, *x might not be because it could be any implementation of Trait. But code inside the function has to work for any valid argument, so you are not allowed to call x.f() there.
What does the where clause actually do?
Naively, I was thinking where Self: Sized; dictates something about the type implementing Trait, like 'if you implement Trait for type A your type A must be sized, i.e., it can be i32 but not [i32].
However, such a constraint would rather go as trait Trait: Sized
This is correct.
However, in this case, the bound applies only to the function. where bounds on functions are only checked at the callsite.
What happens here behind the scenes?
There is a confusing bit about rust's syntax which is that Trait can refer to either
The trait Trait; or
The "trait object" Trait, which is actually a type, not an object.
Sized is a trait, and any type T that is Sized may have its size taken as a constant, by std::mem::size_of::<T>(). Such types that are not sized are str and [u8], whose contents do not have a fixed size.
The type Trait is also unsized. Intuitively, this is because Trait as a type consists of all values of types that implement the trait Trait, which may have varying size. This means you can never have a value of type Trait - you can only refer to one via a "fat pointer" such as &Trait or Box<Trait> and so on. These have the size of 2 pointers - one for a vtable, one for the data. It looks roughly like this:
struct &Trait {
pub data: *mut (),
pub vtable: *mut (),
}
There is automatically an impl of the form:
impl Trait /* the trait */ for Trait /* the type */ {
fn f(&self) -> i32 where Self: Sized { .. }
fn g(&self) -> i32 {
/* vtable magic: something like (self.vtable.g)(self.data) */
}
}
What (in simple English) am I actually telling the compiler by where Self: Sized; that makes g() work but f() not?
Note that since, as I mentioned, Trait is not Sized, the bound Self: Sized is not satisfied and so the function f cannot be called where Self == Trait.
In particular: Since &self is a reference anyway, what compiled difference exists between f and g for various (sized or unsized) types. Wouldn't it always boil down to something like _vtable_f_or_g(*self) -> i32, regardless of where or if the type is sized or not?
The type Trait is always unsized. It doesn't matter which type has been coerced to Trait. The way you call the function with a Sized variable is to use it directly:
fn generic<T: Trait + Sized>(x: &T) { // the `Sized` bound is implicit, added here for clarity
x.f(); // compiles just fine
x.g();
}
Why can I implement Trait for both u8 and [u8]. Shouldn't the compiler actually stop me from implementing f() for [u8], instead of throwing an error at the call site?
Because the trait is not bounded by Self: Sized - the function f is. So there is nothing stopping you from implementing the function - it's just that the bounds on the function can never be satisfied, so you can never call it.

Kotlin Instantiate Immutable List

I've started using Kotlin as a substitute for java and quite like it. However, I've been unable to find a solution to this without jumping back into java-land:
I have an Iterable<SomeObject> and need to convert it to a list so I can iterate through it more than once. This is an obvious application of an immutable list, as all I need to do is read it several times. How do I actually put that data in the list at the beginning though? (I know it's an interface, but I've been unable to find an implementation of it in documentation)
Possible (if unsatisfactory) solutions:
val valueList = arrayListOf(values)
// iterate through valuelist
or
fun copyIterableToList(values: Iterable<SomeObject>) : List<SomeObject> {
var outList = ArrayList<SomeObject>()
for (value in values) {
outList.add(value)
}
return outList
}
Unless I'm misunderstanding, these end up with MutableLists, which works but feels like a workaround. Is there a similar immutableListOf(Iterable<SomeObject>) method that will instantiate an immutable list object?
In Kotlin, List<T> is a read-only list interface, it has no functions for changing the content, unlike MutableList<T>.
In general, List<T> implementation may be a mutable list (e.g. ArrayList<T>), but if you pass it as a List<T>, no mutating functions will be exposed without casting. Such a list reference is called read-only, stating that the list is not meant to be changed. This is immutability through interfaces which was chosen as the approach to immutability for Kotlin stdlib.
Closer to the question, toList() extension function for Iterable<T> in stdlib will fit: it returns read-only List<T>.
Example:
val iterable: Iterable<Int> = listOf(1, 2, 3)
val list: List<Int> = iterable.toList()

Resources