Strange bug using Rust and ndarray - debugging

I have a very strange bug in my Rust code using ndarray. The following code runs with no error:
// ...other code before producing a lot of print on screen
//
let rank = 2;
let mut Q = Array2::<f64>::zeros((rank,rank));
// U is an Array2<F> where F is a struct that can call eval(&self,&f64) -> f64
for i in 0..rank {
println!("i = {}",i);
for j in 0..rank {
let v: f64 = U[[0,i]].eval(&1.0);
println!("i,j = {}-{} -- v = {}",i,j,v);
Q[[i,j]] = v;
}
}
but if I change the order of the indices for writing to a specific entry of the matrix Q, like for example
for i in 0..rank {
println!("i = {}",i);
for j in 0..rank {
let v: f64 = U[[0,i]].eval(&1.0);
println!("i,j = {}-{} -- v = {}",i,j,v);
Q[[j,i]] = v;
}
}
the executable runs forever, without printing anything on the screen (neither the output that is supposed to be produced before that point).
I've tried to replace the line
let v: f64 = U[[0,i]].eval(&1.0);
with
let v: f64 = f64::sin(1.0);
and now both version works fine.
Do you have any idea how to debug such kind of situation?

Related

Rust - why is my program performing very slowly - over 5 times slower than the same program written in JavaScript using Node

I have finished converting an application that I made in JavaScript to Rust for increased performance. I am learning to program, and all the application does is work out the multiplicative persistence of any number in a range. It multiplies all digits together to form a new number, then repeats until the number becomes less than 10.
My issue is, my program written in JavaScript is over 5 times faster than the same in Rust. I must be doing something wrong with converting Strings to ints somewhere, I even tried swapping i128 to i64 and it made little difference.
If I run "cargo run --release" it is still slower!
Please can somebody look through my code to work out if there is any part of it that is causing the issues? Thank you in advance :)
fn multiplicative_persistence(mut user_input: i128) -> i128 {
let mut steps: i128 = 0;
let mut numbers: Vec<i128> = Vec::new();
while user_input > 10 {
let string_number: String = user_input.to_string();
let digits: Vec<&str> = string_number.split("").collect();
let mut sum: i128 = 1;
let digits_count = digits.len();
for number in 1..digits_count - 1 {
sum *= digits[number].parse::<i128>().unwrap();
}
numbers.push(sum);
steps += 1;
user_input = sum;
}
return steps;
}
fn main() {
// let _user_input: i128 = 277777788888899;
let mut highest_steps_count: i128 = 0;
let mut highest_steps_number: i128 = 0;
let start: i128 = 77551000000;
let finish: i128 = 1000000000000000;
for number in start..=finish {
// println!("{}: {}", number, multiplicative_persistence(number));
if multiplicative_persistence(number) > highest_steps_count {
highest_steps_count = multiplicative_persistence(number);
highest_steps_number = number;
}
if number % 1000000 == 0 {
println!("Upto {} so far: {}", number, highest_steps_number);
}
}
println!("Highest step count: {} at {}", highest_steps_number, highest_steps_count);
}
I do plan to use the numbers variable in the function but I have not learnt enough to know how to properly return it as an associative array.
Maybe the issue is that converting a number to a string, and then re-converting it again into a number is not that fast, and avoidable. You don't need this intermediate step:
fn step(mut x: i128) -> i128 {
let mut result = 1;
while x > 0 {
result *= x % 10;
x /= 10;
}
result
}
fn multiplicative_persistence(mut user_input: i128) -> i128 {
let mut steps = 0;
while user_input > 10 {
user_input = step(user_input);
steps += 1;
}
steps
}
EDIT Just out of curiosity, I'd like to know whether the bottleneck is really due to the string conversion or to the rest of the code that is somehow wasteful. Here is an example that does not call .split(""), does not re-allocate that intermediate vector, and only allocates once, not at each step, the string.
#![feature(fmt_internals)]
use std::fmt::{Formatter, Display};
fn multiplicative_persistence(user_input: i128) -> i128 {
let mut steps = 0;
let mut digits = user_input.to_string();
while user_input > 10 {
let product = digits
.chars()
.map(|x| x.to_digit(10).unwrap())
.fold(1, |acc, i| acc*i);
digits.clear();
let mut formatter = Formatter::new(&mut digits);
Display::fmt(&product, &mut formatter).unwrap();
steps += 1;
}
steps
}
I have basically inlined the string conversion that would be performed by .to_string() in order to re-use the already-allocated buffer, instead of re-allocating one each iteration. You can try it out on the playground. Note that you need a nightly compiler because it makes use of an unstable feature.

How can I make sure my RNG numbers are unique?

I'm trying to select 2 random items out of a list using the RNG class. The problem is occasionally I get the same 2 numbers and I'd like them to be unique. I tried using a while loop to get another number if the it's the same as the last one but adding even a simple while loop results in an "Exceeded prepaid gas" error. What am I not understanding?
//simplified for posting question
var lengthOfList = 10
var numItemsWanted = 2
//Get more rng numbers than I need incase of duplicates
const rng = new RNG<u32>(lenghtOfList, lengthOfList)
for(let i = 0; i < numItemsWanted; i++) {
var r = rng.next()
while (r == rng.last()) {
r = rng.next()
}
newList.push(oldList[r])
}
Working:
//simplified for posting question
var lengthOfList = 10
var numItemsWanted = 2
//Get more rng numbers than I need incase of duplicates
const rng = new RNG<u32>(lenghtOfList, lengthOfList)
let r = rng.next()
let last = r + 1
for(let i = 0; i < numItemsWanted; i++) {
newList.push(oldList[r])
last = r
r = rng.next()
while (r == last) {
r = rng.next()
}
}
this is about near-sdk-as, the smart contract development kit for AssemblyScript on the NEAR platform
you can see how RNG is used in this example
https://github.com/Learn-NEAR/NCD.L1.sample--lottery/blob/ff6cddaa8cac4d8fe29dd1a19b38a6e3c7045363/src/lottery/assembly/lottery.ts#L12-L13
class Lottery {
private chance: f64 = 0.20
play(): bool {
const rng = new RNG<u32>(1, u32.MAX_VALUE);
const roll = rng.next();
logging.log("roll: " + roll.toString());
return roll <= <u32>(<f64>u32.MAX_VALUE * this.chance);
}
}
and how the constructor is implemented here:
https://github.com/near/near-sdk-as/blob/f3707a1672d6da6f6d6a75cd645f8cbdacdaf495/sdk-core/assembly/math.ts#L152
the first argument is the length of the buffer holding random numbers generated from the seed. you can use the next() method to get more numbers from this buffer with each call
export class RNG<T> {
constructor(len: u32, public max: u32 = 10_000) {
let real_len = len * sizeof<T>();
this.buffer = math.randomBuffer(real_len);
this._last = this.get(0);
}
next(): T {}
}
If you remove the item from oldList once picked, it would be imposible to picked it again.
Another aproach is to shuffle your oldList and then pick the first two items.

How can i make my rust code run faster in parallel?

#![feature(map_first_last)]
use num_cpus;
use std::collections::BTreeMap;
use ordered_float::OrderedFloat;
use std::sync::{Arc, Mutex};
use std::thread;
use std::time::Instant;
const MONO_FREQ: [f64; 26] = [
8.55, 1.60, 3.16, 3.87, 12.1, 2.18, 2.09, 4.96, 7.33, 0.22, 0.81, 4.21, 2.53, 7.17, 7.47, 2.07,
0.10, 6.33, 6.73, 8.94, 2.68, 1.06, 1.83, 0.19, 1.72, 0.11,
];
fn main() {
let ciphertext : String = "helloworldthisisatest".to_string();
concurrent( &ciphertext);
parallel( &ciphertext);
}
fn concurrent(ciphertext : &String) {
let start = Instant::now();
for _ in 0..50000 {
let mut best_fit : f64 = chi_squared(&ciphertext);
let mut best_key : u8 = 0;
for i in 1..26 {
let test_fit = chi_squared(&decrypt(&ciphertext, i));
if test_fit < best_fit {
best_key = i;
best_fit = test_fit;
}
}
}
let elapsed = start.elapsed();
println!("Concurrent : {} ms", elapsed.as_millis());
}
fn parallel(ciphertext : &String) {
let cpus = num_cpus::get() as u8;
let start = Instant::now();
for _ in 0..50000 {
let mut best_result : f64 = chi_squared(&ciphertext);
for i in (0..26).step_by(cpus.into()) {
let results = Arc::new(Mutex::new(BTreeMap::new()));
let mut threads = vec![];
for ii in i..i+cpus {
threads.push(thread::spawn({
let clone = Arc::clone(&results);
let test = OrderedFloat(chi_squared(&decrypt(&ciphertext, ii)));
move || {
let mut v = clone.lock().unwrap();
v.insert(test, ii);
}
}));
}
for t in threads {
t.join().unwrap();
}
let lock = Arc::try_unwrap(results).expect("Lock still has multiple owners");
let hold = lock.into_inner().expect("Mutex cannot be locked");
if hold.last_key_value().unwrap().0.into_inner() > best_result {
best_result = hold.last_key_value().unwrap().0.into_inner();
}
}
}
let elapsed = start.elapsed();
println!("Parallel : {} ms", elapsed.as_millis());
}
fn decrypt(ciphertext : &String, shift : u8) -> String {
ciphertext.chars().map(|x| ((x as u8 + shift - 97) % 26 + 97) as char).collect()
}
pub fn chi_squared(text: &str) -> f64 {
let mut result: f64 = 0.0;
for (pos, i) in get_letter_counts(text).iter().enumerate() {
let expected = MONO_FREQ[pos] * text.len() as f64 / 100.0;
result += (*i as f64 - expected).powf(2.0) / expected;
}
return result;
}
fn get_letter_counts(text: &str) -> [u64; 26] {
let mut results: [u64; 26] = [0; 26];
for i in text.chars() {
results[((i as u64) - 97) as usize] += 1;
}
return results;
}
Sorry to dump so much code, but i have no idea where the problem is, no matter what i try the parallel code seems to be around 100x slower.
I think that the problem may be in the chi_squared function as i don't know if this is running in parallel.
I have tried arc mutex, rayon and messaging and all slow it down when it should speed it up. What could I do to make this faster?
Your code calculates chi_squared function on main thread here is the correct version.
for ii in i..i + cpus {
let cp = ciphertext.clone();
let clone = Arc::clone(&results);
threads.push(thread::spawn(move || {
let test = OrderedFloat(chi_squared(&decrypt(&cp, ii)));
let mut v = clone.lock().unwrap();
v.insert(test, ii);
}));
}
Note that it does not matter if it is calculated parallel or not because spawning 50000*26 threads and synchronization overhead between threads are what makes up the 100x difference in the first place. Using a threadpool implementation would reduce the overhead but the result will still be much slower than single threaded version. The only thing you can do is assigning work in the outer loop (0..50000 ) however i am guessing you are trying to parallelize inside the main loop.

A vector is created inside an if block in Rust. How to use it after the scope of if ends?

I am learning Rust. I am trying to calculate a list of prime numbers up to some number. For that I need to create a vector (vec1) inside an if block and use it outside the scope of the if.
I tried a code with the same logic in MATLAB and it works.
A simplified version of the actual code looks like this:
fn main() {
let mut initiate = 1;
let mut whilechecker = 2;
while whilechecker > 0 {
whilechecker = whilechecker - 1;
if initiate == 1 {
let mut vec1 = vec![2];
}
for i in &vec1 {
if *i == 2 {
break;
}
} //for
initiate = 2;
vec1.push(5);
} //while
} //main
It is supposed to put a list of prime numbers in vec1. But since it is simplified code it should compile and giving a vector (vec1) will suffice.
But the compiler says:
cannot find value vec1 in this scope
at for i in &vec1{ and at vec1.push(5);.
Can you make it compile?
There's no reason to have the complicated if initialize==1 checking. Just move the initialization of the vector outside the while loop, so it gets done only once:
fn main() {
let mut whilechecker = 2;
let mut vec1 = vec![2];
while whilechecker > 0 {
whilechecker = whilechecker - 1;
for i in &vec1 {
if *i == 2 {
break;
}
} //for
vec1.push(5);
} //while
} //main
I don't get the thing which you actually want. But here is an example which may help you to define the global scope variable.
fn main() {
let mut initiate = 1;
let mut whilechecker = 2;
let mut vec1 = Vec::new();
while whilechecker > 0 {
if initiate == 1 {
let mut vec1 = vec![2];
}
for i in &vec1 {
if *i == 2 {
break;
}
}
initiate = 2;
vec1.push(5);
whilechecker = whilechecker - 1;
}
println!("{:?}", vec1);
}
The output of the given code is:
[5, 5]

Is the big integer implementation in the num crate slow?

I implemented the Miller-Rabin Strong Pseudoprime Test in Rust using BigUint to support arbitrary large primes. To run through the numbers between 5 and 10^6, it took about 40s with cargo run --release.
I implemented the same algorithm with Java's BigInteger and the same test took 10s to finish. Rust appears to be 4 times slower. I assume this is caused by the implementation of num::bigint.
Is this just the current state of num::bigint, or can anyone spot any obvious improvement in my code? (Mainly about how I used the language. Regardless whether my implementation of the algorithm is good or bad, it is almost implemented exactly the same in both languages - so does not cause the difference in performance.)
I did notice there are lots of clone() required, due to Rust's ownership model, that could well impact the speed to some level. But I guess there is no way around that, am I right?
Here is the code:
extern crate rand;
extern crate num;
extern crate core;
extern crate time;
use std::time::{Duration};
use time::{now, Tm};
use rand::Rng;
use num::{Zero, One};
use num::bigint::{RandBigInt, BigUint, ToBigUint};
use num::traits::{ToPrimitive};
use num::integer::Integer;
use core::ops::{Add, Sub, Mul, Div, Rem, Shr};
fn find_r_and_d(i: BigUint) -> (u64, BigUint) {
let mut d = i;
let mut r = 0;
loop {
if d.clone().rem(&2u64.to_biguint().unwrap()) == Zero::zero() {
d = d.shr(1usize);
r = r + 1;
} else {
break;
}
}
return (r, d);
}
fn might_be_prime(n: &BigUint) -> bool {
let nsub1 = n.sub(1u64.to_biguint().unwrap());
let two = 2u64.to_biguint().unwrap();
let (r, d) = find_r_and_d(nsub1.clone());
'WitnessLoop: for kk in 0..6u64 {
let a = rand::thread_rng().gen_biguint_range(&two, &nsub1);
let mut x = mod_exp(&a, &d, &n);
if x == 1u64.to_biguint().unwrap() || x == nsub1 {
continue;
}
for rr in 1..r {
x = x.clone().mul(x.clone()).rem(n);
if x == 1u64.to_biguint().unwrap() {
return false;
} else if x == nsub1 {
continue 'WitnessLoop;
}
}
return false;
}
return true;
}
fn mod_exp(base: &BigUint, exponent: &BigUint, modulus: &BigUint) -> BigUint {
let one = 1u64.to_biguint().unwrap();
let mut result = one.clone();
let mut base_clone = base.clone();
let mut exponent_clone = exponent.clone();
while exponent_clone > 0u64.to_biguint().unwrap() {
if exponent_clone.clone() & one.clone() == one {
result = result.mul(&base_clone).rem(modulus);
}
base_clone = base_clone.clone().mul(base_clone).rem(modulus);
exponent_clone = exponent_clone.shr(1usize);
}
return result;
}
fn main() {
let now1 = now();
for n in 5u64..1_000_000u64 {
let b = n.to_biguint().unwrap();
if might_be_prime(&b) {
println!("{}", n);
}
}
let now2 = now();
println!("{}", now2.to_timespec().sec - now1.to_timespec().sec);
}
You can remove most of the clones pretty easily. BigUint has all ops traits implemented also for operations with &BigUint, not just working with values. With that, it becomes faster but still about half as fast as Java...
Also (not related to performance, just readability) you don't need to use add, sub, mul and shr explicitly; they override the regular +, -, * and >> operators.
For instance you could rewrite might_be_prime and mod_exp like this, which already gives a good speedup on my machine (from 40 to 24sec on avg):
fn might_be_prime(n: &BigUint) -> bool {
let one = BigUint::one();
let nsub1 = n - &one;
let two = BigUint::new(vec![2]);
let mut rng = rand::thread_rng();
let (r, mut d) = find_r_and_d(nsub1.clone());
let mut x;
let mut a: BigUint;
'WitnessLoop: for kk in 0..6u64 {
a = rng.gen_biguint_range(&two, &nsub1);
x = mod_exp(&mut a, &mut d, &n);
if &x == &one || x == nsub1 {
continue;
}
for rr in 1..r {
x = (&x * &x) % n;
if &x == &one {
return false;
} else if x == nsub1 {
continue 'WitnessLoop;
}
}
return false;
}
true
}
fn mod_exp(base: &mut BigUint, exponent: &mut BigUint, modulus: &BigUint) -> BigUint {
let one = BigUint::one();
let zero = BigUint::zero();
let mut result = BigUint::one();
while &*exponent > &zero {
if &*exponent & &one == one {
result = (result * &*base) % modulus;
}
*base = (&*base * &*base) % modulus;
*exponent = &*exponent >> 1usize;
}
result
}
Note that I've moved the println! out of the timing, so that we're not benchmarking IO.
fn main() {
let now1 = now();
let v = (5u64..1_000_000u64)
.filter_map(|n| n.to_biguint())
.filter(|n| might_be_prime(&n))
.collect::<Vec<BigUint>>();
let now2 = now();
for n in v {
println!("{}", n);
}
println!("time spent seconds: {}", now2.to_timespec().sec - now1.to_timespec().sec);
}

Resources