My prime number sieve is extremely slow even with --release - performance

I have looked at multiple answers online to the same question but I cannot figure out why my program is so slow. I think it is the for loops but I am unsure.
P.S. I am quite new to Rust and am not very proficient in it yet. Any tips or tricks, or any good coding practices that I am not using are more than welcome :)
math.rs
pub fn number_to_vector(number: i32) -> Vec<i32> {
let mut numbers: Vec<i32> = Vec::new();
for i in 1..number + 1 {
numbers.push(i);
}
return numbers;
}
user_input.rs
use std::io;
pub fn get_user_input(prompt: &str) -> i32 {
println!("{}", prompt);
let mut user_input: String = String::new();
io::stdin().read_line(&mut user_input).expect("Failed to read line");
let number: i32 = user_input.trim().parse().expect("Please enter an integer!");
return number;
}
main.rs
mod math;
mod user_input;
fn main() {
let user_input: i32 = user_input::get_user_input("Enter a positive integer: ");
let mut numbers: Vec<i32> = math::number_to_vector(user_input);
numbers.remove(numbers.iter().position(|x| *x == 1).unwrap());
let mut numbers_to_remove: Vec<i32> = Vec::new();
let ceiling_root: i32 = (user_input as f64).sqrt().ceil() as i32;
for i in 2..ceiling_root + 1 {
for j in i..user_input + 1 {
numbers_to_remove.push(i * j);
}
}
numbers_to_remove.sort_unstable();
numbers_to_remove.dedup();
numbers_to_remove.retain(|x| *x <= user_input);
for number in numbers_to_remove {
if numbers.iter().any(|&i| i == number) {
numbers.remove(numbers.iter().position(|x| *x == number).unwrap());
}
}
println!("Prime numbers up to {}: {:?}", user_input, numbers);
}

There's two main problems in your code: the i * j loop has wrong upper limit for j, and the composites removal loop uses O(n) operations for each entry, making it quadratic overall.
The corrected code:
fn main() {
let user_input: i32 = get_user_input("Enter a positive integer: ");
let mut numbers: Vec<i32> = number_to_vector(user_input);
numbers.remove(numbers.iter().position(|x| *x == 1).unwrap());
let mut numbers_to_remove: Vec<i32> = Vec::new();
let mut primes: Vec<i32> = Vec::new(); // new code
let mut i = 0; // new code
let ceiling_root: i32 = (user_input as f64).sqrt().ceil() as i32;
for i in 2..ceiling_root + 1 {
for j in i..(user_input/i) + 1 { // FIX #1: user_input/i
numbers_to_remove.push(i * j);
}
}
numbers_to_remove.sort_unstable();
numbers_to_remove.dedup();
//numbers_to_remove.retain(|x| *x <= user_input); // not needed now
for n in numbers { // FIX #2:
if n < numbers_to_remove[i] { // two linear enumerations
primes.push(n);
}
else {
i += 1; // in unison
}
}
println!("Last prime number up to {}: {:?}", user_input, primes.last());
println!("Total prime numbers up to {}: {:?}", user_input,
primes.iter().count());
}
Your i * j loop was actually O( N1.5), whereas your numbers removal loop was actually quadratic -- remove is O(n) because it needs to move all the elements past the removed one back, so there is no gap.
The mended code now runs at ~ N1.05 empirically in the 106...2*106 range, and orders of magnitude faster in absolute terms as well.
Oh and that's a sieve, but not of Eratosthenes. To qualify as such, the is should range over primes, not just all numbers.

As commented AKX you function's big O is (m * n), that's why it's slow.
For this kind of "expensive" calculations to make it run faster you can use multithreading.
This part of answer is not about the right algorithm to choose, but code style. (tips/tricks)
I think the idiomatic way to do this is with iterators (which are lazy), it make code more readable/simple and runs like 2 times faster in this case.
fn primes_up_to() {
let num = get_user_input("Enter a positive integer greater than 2: ");
let primes = (2..=num).filter(is_prime).collect::<Vec<i32>>();
println!("{:?}", primes);
}
fn is_prime(num: &i32) -> bool {
let bound = (*num as f32).sqrt() as i32;
*num == 2 || !(2..=bound).any(|n| num % n == 0)
}
Edit: Also this style gives you ability easily to switch to parallel iterators for "expensive" calculations with rayon (Link)
Edit2: Algorithm fix. Before, this uses a quadratic algorithm. Thanks to #WillNess

Related

Rust - why is my program performing very slowly - over 5 times slower than the same program written in JavaScript using Node

I have finished converting an application that I made in JavaScript to Rust for increased performance. I am learning to program, and all the application does is work out the multiplicative persistence of any number in a range. It multiplies all digits together to form a new number, then repeats until the number becomes less than 10.
My issue is, my program written in JavaScript is over 5 times faster than the same in Rust. I must be doing something wrong with converting Strings to ints somewhere, I even tried swapping i128 to i64 and it made little difference.
If I run "cargo run --release" it is still slower!
Please can somebody look through my code to work out if there is any part of it that is causing the issues? Thank you in advance :)
fn multiplicative_persistence(mut user_input: i128) -> i128 {
let mut steps: i128 = 0;
let mut numbers: Vec<i128> = Vec::new();
while user_input > 10 {
let string_number: String = user_input.to_string();
let digits: Vec<&str> = string_number.split("").collect();
let mut sum: i128 = 1;
let digits_count = digits.len();
for number in 1..digits_count - 1 {
sum *= digits[number].parse::<i128>().unwrap();
}
numbers.push(sum);
steps += 1;
user_input = sum;
}
return steps;
}
fn main() {
// let _user_input: i128 = 277777788888899;
let mut highest_steps_count: i128 = 0;
let mut highest_steps_number: i128 = 0;
let start: i128 = 77551000000;
let finish: i128 = 1000000000000000;
for number in start..=finish {
// println!("{}: {}", number, multiplicative_persistence(number));
if multiplicative_persistence(number) > highest_steps_count {
highest_steps_count = multiplicative_persistence(number);
highest_steps_number = number;
}
if number % 1000000 == 0 {
println!("Upto {} so far: {}", number, highest_steps_number);
}
}
println!("Highest step count: {} at {}", highest_steps_number, highest_steps_count);
}
I do plan to use the numbers variable in the function but I have not learnt enough to know how to properly return it as an associative array.
Maybe the issue is that converting a number to a string, and then re-converting it again into a number is not that fast, and avoidable. You don't need this intermediate step:
fn step(mut x: i128) -> i128 {
let mut result = 1;
while x > 0 {
result *= x % 10;
x /= 10;
}
result
}
fn multiplicative_persistence(mut user_input: i128) -> i128 {
let mut steps = 0;
while user_input > 10 {
user_input = step(user_input);
steps += 1;
}
steps
}
EDIT Just out of curiosity, I'd like to know whether the bottleneck is really due to the string conversion or to the rest of the code that is somehow wasteful. Here is an example that does not call .split(""), does not re-allocate that intermediate vector, and only allocates once, not at each step, the string.
#![feature(fmt_internals)]
use std::fmt::{Formatter, Display};
fn multiplicative_persistence(user_input: i128) -> i128 {
let mut steps = 0;
let mut digits = user_input.to_string();
while user_input > 10 {
let product = digits
.chars()
.map(|x| x.to_digit(10).unwrap())
.fold(1, |acc, i| acc*i);
digits.clear();
let mut formatter = Formatter::new(&mut digits);
Display::fmt(&product, &mut formatter).unwrap();
steps += 1;
}
steps
}
I have basically inlined the string conversion that would be performed by .to_string() in order to re-use the already-allocated buffer, instead of re-allocating one each iteration. You can try it out on the playground. Note that you need a nightly compiler because it makes use of an unstable feature.

Why is my recursive Fibonacci implementation so slow compared to an iterative one?

I have created the following simple Fibonacci implementations:
#![feature(test)]
extern crate test;
pub fn fibonacci_recursive(n: u32) -> u32 {
match n {
0 => 0,
1 => 1,
_ => fibonacci_recursive(n - 1) + fibonacci_recursive(n - 2)
}
}
pub fn fibonacci_imperative(n: u32) -> u32 {
match n {
0 => 0,
1 => 1,
_ => {
let mut penultimate;
let mut last = 1;
let mut fib = 0;
for _ in 0..n {
penultimate = last;
last = fib;
fib = penultimate + last;
}
fib
}
}
}
I created them to try out cargo bench, so I wrote the following benchmarks:
#[cfg(test)]
mod tests {
use super::*;
use test::Bencher;
#[bench]
fn bench_fibonacci_recursive(b: &mut Bencher) {
b.iter(|| {
let n = test::black_box(20);
fibonacci_recursive(n)
});
}
#[bench]
fn bench_fibonacci_imperative(b: &mut Bencher) {
b.iter(|| {
let n = test::black_box(20);
fibonacci_imperative(n)
});
}
}
I know that a recursive implementation is generally slower than an imperative one, especially since Rust doesn't support tail recursion optimization (which this implementation couldn't use anyways). But I was not expecting the following difference of nearly 2'000 times:
running 2 tests
test tests::bench_fibonacci_imperative ... bench: 15 ns/iter (+/- 3)
test tests::bench_fibonacci_recursive ... bench: 28,435 ns/iter (+/- 1,114)
I ran it both on Windows and Ubuntu with the newest Rust nightly compiler (rustc 1.25.0-nightly) and obtained similar results.
Is this speed difference normal? Did I write something "wrong"? Or are my benchmarks flawed?
As said by Shepmaster, you should use accumulators to keep the previously calculated fib(n - 1) and fib(n - 2) otherwise you keep calculating the same values:
pub fn fibonacci_recursive(n: u32) -> u32 {
fn inner(n: u32, penultimate: u32, last: u32) -> u32 {
match n {
0 => penultimate,
1 => last,
_ => inner(n - 1, last, penultimate + last),
}
}
inner(n, 0, 1)
}
fn main() {
assert_eq!(fibonacci_recursive(0), 0);
assert_eq!(fibonacci_recursive(1), 1);
assert_eq!(fibonacci_recursive(2), 1);
assert_eq!(fibonacci_recursive(20), 6765);
}
last is equivalent to fib(n - 1).
penultimate is equivalent to fib(n - 2).
The algorithmic complexity between the two implementations differs:
your iterative implementation uses an accumulator: O(N),
your recursive implementation doesn't: O(1.6N).
Since 20 (N) << 12089 (1.6N), it's pretty normal to have a large difference.
See this answer for an exact computation of the complexity in the naive implementation case.
Note: the method you use for the iterative case is called dynamic programming.

Is the big integer implementation in the num crate slow?

I implemented the Miller-Rabin Strong Pseudoprime Test in Rust using BigUint to support arbitrary large primes. To run through the numbers between 5 and 10^6, it took about 40s with cargo run --release.
I implemented the same algorithm with Java's BigInteger and the same test took 10s to finish. Rust appears to be 4 times slower. I assume this is caused by the implementation of num::bigint.
Is this just the current state of num::bigint, or can anyone spot any obvious improvement in my code? (Mainly about how I used the language. Regardless whether my implementation of the algorithm is good or bad, it is almost implemented exactly the same in both languages - so does not cause the difference in performance.)
I did notice there are lots of clone() required, due to Rust's ownership model, that could well impact the speed to some level. But I guess there is no way around that, am I right?
Here is the code:
extern crate rand;
extern crate num;
extern crate core;
extern crate time;
use std::time::{Duration};
use time::{now, Tm};
use rand::Rng;
use num::{Zero, One};
use num::bigint::{RandBigInt, BigUint, ToBigUint};
use num::traits::{ToPrimitive};
use num::integer::Integer;
use core::ops::{Add, Sub, Mul, Div, Rem, Shr};
fn find_r_and_d(i: BigUint) -> (u64, BigUint) {
let mut d = i;
let mut r = 0;
loop {
if d.clone().rem(&2u64.to_biguint().unwrap()) == Zero::zero() {
d = d.shr(1usize);
r = r + 1;
} else {
break;
}
}
return (r, d);
}
fn might_be_prime(n: &BigUint) -> bool {
let nsub1 = n.sub(1u64.to_biguint().unwrap());
let two = 2u64.to_biguint().unwrap();
let (r, d) = find_r_and_d(nsub1.clone());
'WitnessLoop: for kk in 0..6u64 {
let a = rand::thread_rng().gen_biguint_range(&two, &nsub1);
let mut x = mod_exp(&a, &d, &n);
if x == 1u64.to_biguint().unwrap() || x == nsub1 {
continue;
}
for rr in 1..r {
x = x.clone().mul(x.clone()).rem(n);
if x == 1u64.to_biguint().unwrap() {
return false;
} else if x == nsub1 {
continue 'WitnessLoop;
}
}
return false;
}
return true;
}
fn mod_exp(base: &BigUint, exponent: &BigUint, modulus: &BigUint) -> BigUint {
let one = 1u64.to_biguint().unwrap();
let mut result = one.clone();
let mut base_clone = base.clone();
let mut exponent_clone = exponent.clone();
while exponent_clone > 0u64.to_biguint().unwrap() {
if exponent_clone.clone() & one.clone() == one {
result = result.mul(&base_clone).rem(modulus);
}
base_clone = base_clone.clone().mul(base_clone).rem(modulus);
exponent_clone = exponent_clone.shr(1usize);
}
return result;
}
fn main() {
let now1 = now();
for n in 5u64..1_000_000u64 {
let b = n.to_biguint().unwrap();
if might_be_prime(&b) {
println!("{}", n);
}
}
let now2 = now();
println!("{}", now2.to_timespec().sec - now1.to_timespec().sec);
}
You can remove most of the clones pretty easily. BigUint has all ops traits implemented also for operations with &BigUint, not just working with values. With that, it becomes faster but still about half as fast as Java...
Also (not related to performance, just readability) you don't need to use add, sub, mul and shr explicitly; they override the regular +, -, * and >> operators.
For instance you could rewrite might_be_prime and mod_exp like this, which already gives a good speedup on my machine (from 40 to 24sec on avg):
fn might_be_prime(n: &BigUint) -> bool {
let one = BigUint::one();
let nsub1 = n - &one;
let two = BigUint::new(vec![2]);
let mut rng = rand::thread_rng();
let (r, mut d) = find_r_and_d(nsub1.clone());
let mut x;
let mut a: BigUint;
'WitnessLoop: for kk in 0..6u64 {
a = rng.gen_biguint_range(&two, &nsub1);
x = mod_exp(&mut a, &mut d, &n);
if &x == &one || x == nsub1 {
continue;
}
for rr in 1..r {
x = (&x * &x) % n;
if &x == &one {
return false;
} else if x == nsub1 {
continue 'WitnessLoop;
}
}
return false;
}
true
}
fn mod_exp(base: &mut BigUint, exponent: &mut BigUint, modulus: &BigUint) -> BigUint {
let one = BigUint::one();
let zero = BigUint::zero();
let mut result = BigUint::one();
while &*exponent > &zero {
if &*exponent & &one == one {
result = (result * &*base) % modulus;
}
*base = (&*base * &*base) % modulus;
*exponent = &*exponent >> 1usize;
}
result
}
Note that I've moved the println! out of the timing, so that we're not benchmarking IO.
fn main() {
let now1 = now();
let v = (5u64..1_000_000u64)
.filter_map(|n| n.to_biguint())
.filter(|n| might_be_prime(&n))
.collect::<Vec<BigUint>>();
let now2 = now();
for n in v {
println!("{}", n);
}
println!("time spent seconds: {}", now2.to_timespec().sec - now1.to_timespec().sec);
}

How do I turn a circular buffer into a vector in O(n) without an allocation?

I have a Vec that is the allocation for a circular buffer. Let's assume the buffer is full, so there are no elements in the allocation that aren't in the circular buffer. I now want to turn that circular buffer into a Vec where the first element of the circular buffer is also the first element of the Vec. As an example I have this (allocating) function:
fn normalize(tail: usize, buf: Vec<usize>) -> Vec<usize> {
let n = buf.len();
buf[tail..n]
.iter()
.chain(buf[0..tail].iter())
.cloned()
.collect()
}
Playground
Obviously this can also be done without allocating anything, since we already have an allocation that is large enough, and we have a swap operation to swap arbitrary elements of the allocation.
fn normalize(tail: usize, mut buf: Vec<usize>) -> Vec<usize> {
for _ in 0..tail {
for i in 0..(buf.len() - 1) {
buf.swap(i, i + 1);
}
}
buf
}
Playground
Sadly this requires buf.len() * tail swap operations. I'm fairly sure it can be done in buf.len() + tail swap operations. For concrete values of tail and buf.len() I have been able to figure out solutions, but I'm not sure how to do it in the general case.
My recursive partial solution can be seen in action.
The simplest solution is to use 3 reversals, indeed this is what is recommended in Algorithm to rotate an array in linear time.
// rotate to the left by "k".
fn rotate<T>(array: &mut [T], k: usize) {
if array.is_empty() { return; }
let k = k % array.len();
array[..k].reverse();
array[k..].reverse();
array.reverse();
}
While this is linear, this requires reading and writing each element at most twice (reversing a range with an odd number of elements does not require touching the middle element). On the other hand, the very predictable access pattern of the reversal plays nice with prefetching, YMMV.
This operation is typically called a "rotation" of the vector, e.g. the C++ standard library has std::rotate to do this. There are known algorithms for doing the operation, although you may have to quite careful when porting if you're trying to it generically/with non-Copy types, where swaps become key, as one can't generally just read something straight out from a vector.
That said, one is likely to be able to use unsafe code with std::ptr::read/std::ptr::write for this, since data is just being moved around, and hence there's no need to execute caller-defined code or very complicated concerns about exception safety.
A port of the C code in the link above (by #ker):
fn rotate(k: usize, a: &mut [i32]) {
if k == 0 { return }
let mut c = 0;
let n = a.len();
let mut v = 0;
while c < n {
let mut t = v;
let mut tp = v + k;
let tmp = a[v];
c += 1;
while tp != v {
a[t] = a[tp];
t = tp;
tp += k;
if tp >= n { tp -= n; }
c += 1;
}
a[t] = tmp;
v += 1;
}
}

How to reverse a singly-linked list and convert it to a vector?

While writing the A* algorithm, I tried to reverse a singly-linked list of actions and pack it into Vec.
Here's the structure for my singly-linked list:
use std::rc::Rc;
struct FrontierElem<A> {
prev: Option<Rc<FrontierElem<A>>>,
action: A,
}
My first thought was to push actions into Vec then reverse the vector:
fn rev1<A>(fel: &Rc<FrontierElem<A>>) -> Vec<A>
where
A: Clone,
{
let mut cur = fel;
let mut ret = Vec::new();
while let Some(ref prev) = cur.prev {
ret.push(cur.action.clone());
cur = prev;
} // First action (where cur.prev==None) is ignored by design
ret.as_mut_slice().reverse();
ret
}
I didn't find the SliceExt::reverse method at the time, so I proceeded to the second plan: fill the vector from the end to the start. I didn't find a way to do that safely.
/// Copies action fields from single-linked list to vector in reverse order.
/// `fel` stands for first element
fn rev2<A>(fel: &Rc<FrontierElem<A>>) -> Vec<A>
where
A: Clone,
{
let mut cnt = 0usize;
// First pass. Let's find a length of list `fel`
{
let mut cur = fel;
while let Some(ref prev) = cur.prev {
cnt = cnt + 1;
cur = prev;
}
} // Lexical scoping to unborrow `fel`
// Second pass. Create and fill `ret` vector
let mut ret = Vec::<A>::with_capacity(cnt);
{
let mut idx = cnt - 1;
let mut cur = fel;
// I didn't find safe and fast way to populate vector from the end to the beginning.
unsafe {
ret.set_len(cnt); //unsafe. vector values aren't initialized
while let Some(ref prev) = cur.prev {
ret[idx] = cur.action.clone();
idx = idx - 1;
cur = prev;
}
}
assert_eq!(idx, std::usize::MAX);
} // Lexical scoping to make `fel` usable again
ret
}
While I was writing this, it occurred to me that I can also implement Iterator for the linked list and then use rev and from_iter to create a vector. Alas, this requires significant overhead, as I must implement DoubleEndedIterator trait for rev to work.
At this point my question seems trivial, but I post it in hope that it will be of some use.
Benchmark:
running 2 tests
test bench_rev1 ... bench: 1537061 ns/iter (+/- 14466)
test bench_rev2 ... bench: 1556088 ns/iter (+/- 17165)
Fill the vector, then reverse it using .as_mut_slice().reverse().
fn rev1<A>(fel: &Rc<FrontierElem<A>>) -> Vec<A>
where
A: Clone,
{
let mut cur = fel;
let mut ret = Vec::new();
while let Some(ref prev) = cur.prev {
ret.push(cur.action.clone());
cur = prev;
} // First action (where cur.prev==None) is ignored by design
ret.as_mut_slice().reverse();
ret
}

Resources