How do I detect overflow on a sum of integers using Rayon?

How do I detect overflow on a sum of integers using Rayon? - parallel-processing

Is there a way to detect an overflow in Rayon and force it to panic instead of having an infinite loop?
extern crate rayon;
use rayon::prelude::*;
fn main() {
let sample: Vec<u32> = (0..50000000).collect();
let sum: u32 = sample.par_iter().sum();
println!("{}",sum );
}
Playground

You are looking for ParallelIterator::try_reduce. The documentation example does what you are looking for (and more):
use rayon::prelude::*;
// Compute the sum of squares, being careful about overflow.
fn sum_squares<I: IntoParallelIterator<Item = i32>>(iter: I) -> Option<i32> {
iter.into_par_iter()
.map(|i| i.checked_mul(i)) // square each item,
.try_reduce(|| 0, i32::checked_add) // and add them up!
}
assert_eq!(sum_squares(0..5), Some(0 + 1 + 4 + 9 + 16));
// The sum might overflow
assert_eq!(sum_squares(0..10_000), None);
// Or the squares might overflow before it even reaches `try_reduce`
assert_eq!(sum_squares(1_000_000..1_000_001), None);
Specifically for your example:
extern crate rayon;
use rayon::prelude::*;
fn main() {
let sample: Vec<u32> = (0..50000000).collect();
let sum = sample
.into_par_iter()
.map(Some)
.try_reduce(
|| 0,
|a, b| a.checked_add(b)
);
println!("{:?}", sum);
}
The collect is unneeded inefficiency, but I've left it for now.

Related

My prime number sieve is extremely slow even with --release

I have looked at multiple answers online to the same question but I cannot figure out why my program is so slow. I think it is the for loops but I am unsure.
P.S. I am quite new to Rust and am not very proficient in it yet. Any tips or tricks, or any good coding practices that I am not using are more than welcome :)
math.rs
pub fn number_to_vector(number: i32) -> Vec<i32> {
let mut numbers: Vec<i32> = Vec::new();
for i in 1..number + 1 {
numbers.push(i);
}
return numbers;
}
user_input.rs
use std::io;
pub fn get_user_input(prompt: &str) -> i32 {
println!("{}", prompt);
let mut user_input: String = String::new();
io::stdin().read_line(&mut user_input).expect("Failed to read line");
let number: i32 = user_input.trim().parse().expect("Please enter an integer!");
return number;
}
main.rs
mod math;
mod user_input;
fn main() {
let user_input: i32 = user_input::get_user_input("Enter a positive integer: ");
let mut numbers: Vec<i32> = math::number_to_vector(user_input);
numbers.remove(numbers.iter().position(|x| *x == 1).unwrap());
let mut numbers_to_remove: Vec<i32> = Vec::new();
let ceiling_root: i32 = (user_input as f64).sqrt().ceil() as i32;
for i in 2..ceiling_root + 1 {
for j in i..user_input + 1 {
numbers_to_remove.push(i * j);
}
}
numbers_to_remove.sort_unstable();
numbers_to_remove.dedup();
numbers_to_remove.retain(|x| *x <= user_input);
for number in numbers_to_remove {
if numbers.iter().any(|&i| i == number) {
numbers.remove(numbers.iter().position(|x| *x == number).unwrap());
}
}
println!("Prime numbers up to {}: {:?}", user_input, numbers);
}

There's two main problems in your code: the i * j loop has wrong upper limit for j, and the composites removal loop uses O(n) operations for each entry, making it quadratic overall.
The corrected code:
fn main() {
let user_input: i32 = get_user_input("Enter a positive integer: ");
let mut numbers: Vec<i32> = number_to_vector(user_input);
numbers.remove(numbers.iter().position(|x| *x == 1).unwrap());
let mut numbers_to_remove: Vec<i32> = Vec::new();
let mut primes: Vec<i32> = Vec::new(); // new code
let mut i = 0; // new code
let ceiling_root: i32 = (user_input as f64).sqrt().ceil() as i32;
for i in 2..ceiling_root + 1 {
for j in i..(user_input/i) + 1 { // FIX #1: user_input/i
numbers_to_remove.push(i * j);
}
}
numbers_to_remove.sort_unstable();
numbers_to_remove.dedup();
//numbers_to_remove.retain(|x| *x <= user_input); // not needed now
for n in numbers { // FIX #2:
if n < numbers_to_remove[i] { // two linear enumerations
primes.push(n);
}
else {
i += 1; // in unison
}
}
println!("Last prime number up to {}: {:?}", user_input, primes.last());
println!("Total prime numbers up to {}: {:?}", user_input,
primes.iter().count());
}
Your i * j loop was actually O( N1.5), whereas your numbers removal loop was actually quadratic -- remove is O(n) because it needs to move all the elements past the removed one back, so there is no gap.
The mended code now runs at ~ N1.05 empirically in the 106...2*106 range, and orders of magnitude faster in absolute terms as well.
Oh and that's a sieve, but not of Eratosthenes. To qualify as such, the is should range over primes, not just all numbers.

As commented AKX you function's big O is (m * n), that's why it's slow.
For this kind of "expensive" calculations to make it run faster you can use multithreading.
This part of answer is not about the right algorithm to choose, but code style. (tips/tricks)
I think the idiomatic way to do this is with iterators (which are lazy), it make code more readable/simple and runs like 2 times faster in this case.
fn primes_up_to() {
let num = get_user_input("Enter a positive integer greater than 2: ");
let primes = (2..=num).filter(is_prime).collect::<Vec<i32>>();
println!("{:?}", primes);
}
fn is_prime(num: &i32) -> bool {
let bound = (*num as f32).sqrt() as i32;
*num == 2 || !(2..=bound).any(|n| num % n == 0)
}
Edit: Also this style gives you ability easily to switch to parallel iterators for "expensive" calculations with rayon (Link)
Edit2: Algorithm fix. Before, this uses a quadratic algorithm. Thanks to #WillNess

Why is my recursive Fibonacci implementation so slow compared to an iterative one?

I have created the following simple Fibonacci implementations:
#![feature(test)]
extern crate test;
pub fn fibonacci_recursive(n: u32) -> u32 {
match n {
0 => 0,
1 => 1,
_ => fibonacci_recursive(n - 1) + fibonacci_recursive(n - 2)
}
}
pub fn fibonacci_imperative(n: u32) -> u32 {
match n {
0 => 0,
1 => 1,
_ => {
let mut penultimate;
let mut last = 1;
let mut fib = 0;
for _ in 0..n {
penultimate = last;
last = fib;
fib = penultimate + last;
}
fib
}
}
}
I created them to try out cargo bench, so I wrote the following benchmarks:
#[cfg(test)]
mod tests {
use super::*;
use test::Bencher;
#[bench]
fn bench_fibonacci_recursive(b: &mut Bencher) {
b.iter(|| {
let n = test::black_box(20);
fibonacci_recursive(n)
});
}
#[bench]
fn bench_fibonacci_imperative(b: &mut Bencher) {
b.iter(|| {
let n = test::black_box(20);
fibonacci_imperative(n)
});
}
}
I know that a recursive implementation is generally slower than an imperative one, especially since Rust doesn't support tail recursion optimization (which this implementation couldn't use anyways). But I was not expecting the following difference of nearly 2'000 times:
running 2 tests
test tests::bench_fibonacci_imperative ... bench: 15 ns/iter (+/- 3)
test tests::bench_fibonacci_recursive ... bench: 28,435 ns/iter (+/- 1,114)
I ran it both on Windows and Ubuntu with the newest Rust nightly compiler (rustc 1.25.0-nightly) and obtained similar results.
Is this speed difference normal? Did I write something "wrong"? Or are my benchmarks flawed?

As said by Shepmaster, you should use accumulators to keep the previously calculated fib(n - 1) and fib(n - 2) otherwise you keep calculating the same values:
pub fn fibonacci_recursive(n: u32) -> u32 {
fn inner(n: u32, penultimate: u32, last: u32) -> u32 {
match n {
0 => penultimate,
1 => last,
_ => inner(n - 1, last, penultimate + last),
}
}
inner(n, 0, 1)
}
fn main() {
assert_eq!(fibonacci_recursive(0), 0);
assert_eq!(fibonacci_recursive(1), 1);
assert_eq!(fibonacci_recursive(2), 1);
assert_eq!(fibonacci_recursive(20), 6765);
}
last is equivalent to fib(n - 1).
penultimate is equivalent to fib(n - 2).

The algorithmic complexity between the two implementations differs:
your iterative implementation uses an accumulator: O(N),
your recursive implementation doesn't: O(1.6N).
Since 20 (N) << 12089 (1.6N), it's pretty normal to have a large difference.
See this answer for an exact computation of the complexity in the naive implementation case.
Note: the method you use for the iterative case is called dynamic programming.

How to do a 'for' loop with boundary values and step as floating point values?

I need to implement a for loop that goes from one floating point number to another with the step as another floating point number.
I know how to implement that in C-like languages:
for (float i = -1.0; i < 1.0; i += 0.01) { /* ... */ }
I also know that in Rust I can specify the loop step using step_by, and that gives me what I want if I have the boundary values and step as integers:
#![feature(iterator_step_by)]
fn main() {
for i in (0..30).step_by(3) {
println!("Index {}", i);
}
}
When I do that with floating point numbers, it results in a compilation error:
#![feature(iterator_step_by)]
fn main() {
for i in (-1.0..1.0).step_by(0.01) {
println!("Index {}", i);
}
}
And here is the compilation output:
error[E0599]: no method named `step_by` found for type `std::ops::Range<{float}>` in the current scope
--> src/main.rs:4:26
|
4 | for i in (-1.0..1.0).step_by(0.01) {
| ^^^^^^^
|
= note: the method `step_by` exists but the following trait bounds were not satisfied:
`std::ops::Range<{float}> : std::iter::Iterator`
`&mut std::ops::Range<{float}> : std::iter::Iterator`
How can I implement this loop in Rust?

If you haven't yet, I invite you to read Goldberg's What Every Computer Scientist Should Know About Floating-Point Arithmetic.
The problem with floating points is that your code may be doing 200 or 201 iterations, depending on whether the last step of the loop ends up being i = 0.99 or i = 0.999999 (which is still < 1 even if really close).
To avoid this footgun, Rust does not allow iterating over a range of f32 or f64. Instead, it forces you to use integral steps:
for i in -100i8..100 {
let i = f32::from(i) * 0.01;
// ...
}
See also:
How do I convert between numeric types safely and idiomatically?

As a real iterator:
Playground
/// produces: [ linear_interpol(start, end, i/steps) | i <- 0..steps ]
/// (does NOT include "end")
///
/// linear_interpol(a, b, p) = (1 - p) * a + p * b
pub struct FloatIterator {
current: u64,
current_back: u64,
steps: u64,
start: f64,
end: f64,
}
impl FloatIterator {
pub fn new(start: f64, end: f64, steps: u64) -> Self {
FloatIterator {
current: 0,
current_back: steps,
steps: steps,
start: start,
end: end,
}
}
/// calculates number of steps from (end - start) / step
pub fn new_with_step(start: f64, end: f64, step: f64) -> Self {
let steps = ((end - start) / step).abs().round() as u64;
Self::new(start, end, steps)
}
pub fn length(&self) -> u64 {
self.current_back - self.current
}
fn at(&self, pos: u64) -> f64 {
let f_pos = pos as f64 / self.steps as f64;
(1. - f_pos) * self.start + f_pos * self.end
}
/// panics (in debug) when len doesn't fit in usize
fn usize_len(&self) -> usize {
let l = self.length();
debug_assert!(l <= ::std::usize::MAX as u64);
l as usize
}
}
impl Iterator for FloatIterator {
type Item = f64;
fn next(&mut self) -> Option<Self::Item> {
if self.current >= self.current_back {
return None;
}
let result = self.at(self.current);
self.current += 1;
Some(result)
}
fn size_hint(&self) -> (usize, Option<usize>) {
let l = self.usize_len();
(l, Some(l))
}
fn count(self) -> usize {
self.usize_len()
}
}
impl DoubleEndedIterator for FloatIterator {
fn next_back(&mut self) -> Option<Self::Item> {
if self.current >= self.current_back {
return None;
}
self.current_back -= 1;
let result = self.at(self.current_back);
Some(result)
}
}
impl ExactSizeIterator for FloatIterator {
fn len(&self) -> usize {
self.usize_len()
}
//fn is_empty(&self) -> bool {
// self.length() == 0u64
//}
}
pub fn main() {
println!(
"count: {}",
FloatIterator::new_with_step(-1.0, 1.0, 0.01).count()
);
for f in FloatIterator::new_with_step(-1.0, 1.0, 0.01) {
println!("{}", f);
}
}

This is basically doing the same as in the accepted answer, but you might prefer to write something like:
for i in (-100..100).map(|x| x as f64 * 0.01) {
println!("Index {}", i);
}

Another answer using iterators but in a slightly different way playground
extern crate num;
use num::{Float, FromPrimitive};
fn linspace<T>(start: T, stop: T, nstep: u32) -> Vec<T>
where
T: Float + FromPrimitive,
{
let delta: T = (stop - start) / T::from_u32(nstep - 1).expect("out of range");
return (0..(nstep))
.map(|i| start + T::from_u32(i).expect("out of range") * delta)
.collect();
}
fn main() {
for f in linspace(-1f32, 1f32, 3) {
println!("{}", f);
}
}
Under nightly you can use the conservative impl trait feature to avoid the Vec allocation playground
#![feature(conservative_impl_trait)]
extern crate num;
use num::{Float, FromPrimitive};
fn linspace<T>(start: T, stop: T, nstep: u32) -> impl Iterator<Item = T>
where
T: Float + FromPrimitive,
{
let delta: T = (stop - start) / T::from_u32(nstep - 1).expect("out of range");
return (0..(nstep))
.map(move |i| start + T::from_u32(i).expect("out of range") * delta);
}
fn main() {
for f in linspace(-1f32, 1f32, 3) {
println!("{}", f);
}
}

For the reasons mentioned by others, one shouldn't be looping using floats under most circumstances.
For those cases where it is appropriate, it can be done (although not as ergonomically, which is probably good design--Rust should make it more difficult to juggle running chainsaws).
Since Rust 1.34, std::iter::successors() enables looping directly with a floating point index:
use std::iter;
const START: f64 = -1.0;
const END: f64 = 1.0;
// Increment by 0.1 (instead of 0.01 per the question) for output brevity
const INCREMENT: f64 = 0.1;
fn main() {
iter::successors(Some(START), |i| {
let next = i + INCREMENT;
(next < END).then_some(next)
})
.for_each(|i| println!("{i}"));
}
Note there are 21 lines of output, although only 20 were probably expected given the condition of i < 1.0 (as opposed to i <= 1.0) in the sample code of your question.
This is due to the precision and/or cumulative rounding errors present in the output, even though the source code specifies iterating from -1.0 to 1.0 in increments of exactly 0.1. (Feel free to switch the START value to 0.0 or 0.3 to see different series output, also with precision/cumulative rounding errors).
Playground example

Performance comparison of a Vec and a boxed slice

I want a function to
allocate a basic variable-length "array" (in the generic sense of the word, not necessarily the Rust type) of floats on the heap
initialize it with values
implement Drop, so I don't have to worry about freeing memory
implement something for indexing or iterating
The obvious choice is Vec, but how does it compare to a boxed slice on the heap? Vec is more powerful, but I need the array for numerical math and, in my case, don't need stuff like push/pop. The idea is to have something with less features, but faster.
Below I have two versions of a "linspace" function (a la Matlab and numpy),
"linspace_vec" (see listing below) uses Vec
"linspace_boxed_slice" (see listing below) uses a boxed slice
Both are used like
let y = linspace_*(start, stop, len);
where y is a linearly spaced "array" (i.e. a Vec in (1) and a boxed slice in (2)) of length len.
For small "arrays" of length 1000, (1) is FASTER. For large arrays of length 4*10^6, (1) is SLOWER. Why is that? Am I doing something wrong in (2)?
When the argument len = 1000, benchmarking by just calling the function results in
(1) ... bench: 879 ns/iter (+/- 12)
(2) ... bench: 1,295 ns/iter (+/- 38)
When the argument len = 4000000, benchmarking results in
(1) ... bench: 5,802,836 ns/iter (+/- 90,209)
(2) ... bench: 4,767,234 ns/iter (+/- 121,596)
Listing of (1):
pub fn linspace_vec<'a, T: 'a>(start: T, stop: T, len: usize) -> Vec<T>
where
T: Float,
{
// get 0, 1 and the increment dx as T
let (one, zero, dx) = get_values_as_type_t::<T>(start, stop, len);
let mut v = vec![zero; len];
let mut c = zero;
let ptr: *mut T = v.as_mut_ptr();
unsafe {
for ii in 0..len {
let x = ptr.offset((ii as isize));
*x = start + c * dx;
c = c + one;
}
}
return v;
}
Listing of (2):
pub fn linspace_boxed_slice<'a, T: 'a>(start: T, stop: T, len: usize) -> Box<&'a mut [T]>
where
T: Float,
{
let (one, zero, dx) = get_values_as_type_t::<T>(start, stop, len);
let size = len * mem::size_of::<T>();
unsafe {
let ptr = heap::allocate(size, align_of::<T>()) as *mut T;
let mut c = zero;
for ii in 0..len {
let x = ptr.offset((ii as isize));
*x = start + c * dx;
c = c + one;
}
// IS THIS WHAT MAKES IT SLOW?:
let sl = slice::from_raw_parts_mut(ptr, len);
return Box::new(sl);
}
}

In your second version, you use the type Box<&'a mut [T]>, which means there are two levels of indirection to reach a T, because both Box and & are pointers.
What you want instead is a Box<[T]>. I think the only sane way to construct such a value is from a Vec<T>, using the into_boxed_slice method. Note that the only benefit is that you lose the capacity field that a Vec would have. Unless you need to have a lot of these arrays in memory at the same time, the overhead is likely to be insignificant.
pub fn linspace_vec<'a, T: 'a>(start: T, stop: T, len: usize) -> Box<[T]>
where
T: Float,
{
// get 0, 1 and the increment dx as T
let (one, zero, dx) = get_values_as_type_t::<T>(start, stop, len);
let mut v = vec![zero; len].into_boxed_slice();
let mut c = zero;
let ptr: *mut T = v.as_mut_ptr();
unsafe {
for ii in 0..len {
let x = ptr.offset((ii as isize));
*x = start + c * dx;
c = c + one;
}
}
v
}

How do I turn a circular buffer into a vector in O(n) without an allocation?

I have a Vec that is the allocation for a circular buffer. Let's assume the buffer is full, so there are no elements in the allocation that aren't in the circular buffer. I now want to turn that circular buffer into a Vec where the first element of the circular buffer is also the first element of the Vec. As an example I have this (allocating) function:
fn normalize(tail: usize, buf: Vec<usize>) -> Vec<usize> {
let n = buf.len();
buf[tail..n]
.iter()
.chain(buf[0..tail].iter())
.cloned()
.collect()
}
Playground
Obviously this can also be done without allocating anything, since we already have an allocation that is large enough, and we have a swap operation to swap arbitrary elements of the allocation.
fn normalize(tail: usize, mut buf: Vec<usize>) -> Vec<usize> {
for _ in 0..tail {
for i in 0..(buf.len() - 1) {
buf.swap(i, i + 1);
}
}
buf
}
Playground
Sadly this requires buf.len() * tail swap operations. I'm fairly sure it can be done in buf.len() + tail swap operations. For concrete values of tail and buf.len() I have been able to figure out solutions, but I'm not sure how to do it in the general case.
My recursive partial solution can be seen in action.

The simplest solution is to use 3 reversals, indeed this is what is recommended in Algorithm to rotate an array in linear time.
// rotate to the left by "k".
fn rotate<T>(array: &mut [T], k: usize) {
if array.is_empty() { return; }
let k = k % array.len();
array[..k].reverse();
array[k..].reverse();
array.reverse();
}
While this is linear, this requires reading and writing each element at most twice (reversing a range with an odd number of elements does not require touching the middle element). On the other hand, the very predictable access pattern of the reversal plays nice with prefetching, YMMV.

This operation is typically called a "rotation" of the vector, e.g. the C++ standard library has std::rotate to do this. There are known algorithms for doing the operation, although you may have to quite careful when porting if you're trying to it generically/with non-Copy types, where swaps become key, as one can't generally just read something straight out from a vector.
That said, one is likely to be able to use unsafe code with std::ptr::read/std::ptr::write for this, since data is just being moved around, and hence there's no need to execute caller-defined code or very complicated concerns about exception safety.
A port of the C code in the link above (by #ker):
fn rotate(k: usize, a: &mut [i32]) {
if k == 0 { return }
let mut c = 0;
let n = a.len();
let mut v = 0;
while c < n {
let mut t = v;
let mut tp = v + k;
let tmp = a[v];
c += 1;
while tp != v {
a[t] = a[tp];
t = tp;
tp += k;
if tp >= n { tp -= n; }
c += 1;
}
a[t] = tmp;
v += 1;
}
}

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

How do I detect overflow on a sum of integers using Rayon? - parallel-processing

Is there a way to detect an overflow in Rayon and force it to panic instead of having an infinite loop? extern crate rayon; use rayon::prelude::*; fn main() { let sample: Vec<u32> = (0..50000000).collect(); let sum: u32 = sample.par_iter().sum(); println!("{}",sum ); } Playground

Related

My prime number sieve is extremely slow even with --release

Why is my recursive Fibonacci implementation so slow compared to an iterative one?

How to do a 'for' loop with boundary values and step as floating point values?

Performance comparison of a Vec and a boxed slice

How do I turn a circular buffer into a vector in O(n) without an allocation?

Categories

Resources