Why is Vec::with_capacity slower than Vec::new for small final lengths? - performance

Consider this code.
type Int = i32;
const MAX_NUMBER: Int = 1_000_000;
fn main() {
let result1 = with_new();
let result2 = with_capacity();
assert_eq!(result1, result2)
}
fn with_new() -> Vec<Int> {
let mut result = Vec::new();
for i in 0..MAX_NUMBER {
result.push(i);
}
result
}
fn with_capacity() -> Vec<Int> {
let mut result = Vec::with_capacity(MAX_NUMBER as usize);
for i in 0..MAX_NUMBER {
result.push(i);
}
result
}
Both functions produce the same output. One uses Vec::new, the other uses Vec::with_capacity. For small values of MAX_NUMBER (like in the example), with_capacity is slower than new. Only for larger final vector lengths (e.g. 100 million) the version using with_capacity is as fast as using new.
Flamegraph for 1 million elements
Flamegraph for 100 million elements
It is my understanding that with_capacity should always be faster if the final length is known, because data on the heap is allocated once which should result in a single chunk. In contrast, the version with new grows the vector MAX_NUMBER times, which results in more allocations.
What am I missing?
Edit
The first section was compiled with the debug profile. If I use the release profile with the following settings in Cargo.toml
[package]
name = "vec_test"
version = "0.1.0"
edition = "2021"
[profile.release]
opt-level = 3
debug = 2
I still get the following result for a length of 10 million.

I was not able to reproduce this in a synthetic benchmark:
use criterion::{criterion_group, criterion_main, BenchmarkId, Criterion};
fn with_new(size: i32) -> Vec<i32> {
let mut result = Vec::new();
for i in 0..size {
result.push(i);
}
result
}
fn with_capacity(size: i32) -> Vec<i32> {
let mut result = Vec::with_capacity(size as usize);
for i in 0..size {
result.push(i);
}
result
}
pub fn with_new_benchmark(c: &mut Criterion) {
let mut group = c.benchmark_group("with_new");
for size in [100, 1_000, 10_000, 100_000, 1_000_000, 10_000_000].iter() {
group.bench_with_input(BenchmarkId::from_parameter(size), size, |b, &size| {
b.iter(|| with_new(size));
});
}
group.finish();
}
pub fn with_capacity_benchmark(c: &mut Criterion) {
let mut group = c.benchmark_group("with_capacity");
for size in [100, 1_000, 10_000, 100_000, 1_000_000, 10_000_000].iter() {
group.bench_with_input(BenchmarkId::from_parameter(size), size, |b, &size| {
b.iter(|| with_capacity(size));
});
}
group.finish();
}
criterion_group!(benches, with_new_benchmark, with_capacity_benchmark);
criterion_main!(benches);
Here's the output (with outliers and other benchmarking stuff removed):
with_new/100 time: [331.17 ns 331.38 ns 331.61 ns]
with_new/1000 time: [1.1719 us 1.1731 us 1.1745 us]
with_new/10000 time: [8.6784 us 8.6840 us 8.6899 us]
with_new/100000 time: [77.524 us 77.596 us 77.680 us]
with_new/1000000 time: [1.6966 ms 1.6978 ms 1.6990 ms]
with_new/10000000 time: [22.063 ms 22.084 ms 22.105 ms]
with_capacity/100 time: [76.796 ns 76.859 ns 76.926 ns]
with_capacity/1000 time: [497.90 ns 498.14 ns 498.39 ns]
with_capacity/10000 time: [5.0058 us 5.0084 us 5.0112 us]
with_capacity/100000 time: [50.358 us 50.414 us 50.470 us]
with_capacity/1000000 time: [1.0861 ms 1.0868 ms 1.0876 ms]
with_capacity/10000000 time: [10.644 ms 10.656 ms 10.668 ms]
The with_capacity runs were consistently faster than with_new. The closest ones being in the 10,000- to 1,000,000-element runs where with_capacity still only took ~60% of the time, while others had it taking half the time or less.
It crossed my mind that there could be some strange const-propagation behavior going on, but even with individual functions with hard-coded sizes (playground for brevity), the behavior didn't significantly change:
with_new/100 time: [313.87 ns 314.22 ns 314.56 ns]
with_new/1000 time: [1.1498 us 1.1505 us 1.1511 us]
with_new/10000 time: [7.9062 us 7.9095 us 7.9130 us]
with_new/100000 time: [77.925 us 77.990 us 78.057 us]
with_new/1000000 time: [1.5675 ms 1.5683 ms 1.5691 ms]
with_new/10000000 time: [20.956 ms 20.990 ms 21.023 ms]
with_capacity/100 time: [76.943 ns 76.999 ns 77.064 ns]
with_capacity/1000 time: [535.00 ns 536.22 ns 537.21 ns]
with_capacity/10000 time: [5.1122 us 5.1150 us 5.1181 us]
with_capacity/100000 time: [50.064 us 50.092 us 50.122 us]
with_capacity/1000000 time: [1.0768 ms 1.0776 ms 1.0784 ms]
with_capacity/10000000 time: [10.600 ms 10.613 ms 10.628 ms]
Your testing code only calls each strategy once, so its conceivable that your testing environment is changed after calling the first one (a potential culprit being heap fragmentation as suggested by #trent_formerly_cl in the comments, though there could be others: cpu boosting/throttling, spatial and/or temporal cache locality, OS behavior, etc.). A benchmarking framework like criterion helps avoid a lot of these problems by iterating each test multiple times (including warmup iterations).

Related

Faster HashMap for sequential keys

Initially I was very surprised to find out Rust's HashMap, even with the FNV hasher, was considerably slower than the equivalents in Java, .NET, PHP. I am talking about optimized Release mode, not Debug mode. I did some calculations and realized the timings in Java/.NET/PHP were suspiciously low. Then it hit me - even though I was testing with a big hash table (millions of entries), I was reading mostly sequential key values (like 14, 15, 16, ...), which apparently resulted in lots of CPU cache hits, due to the way the standard hash tables (and hash-code functions for integers and short strings) in those languages are implementated, so that entries with nearby keys are usually located in nearby memory locations.
Rust's HashMap, on the other hand, uses the so called SwissTable implementation, which apparently distributes values differently. When I tested reading by random keys, everything fell into place - the "competitors" scored behind Rust.
So if we are in a situation where we need to perform lots of gets sequentially, for example iterating some DB IDs that are ordered and mostly sequential (with not too many gaps), is there a good Rust hash map implementation that can compete with Java's HashMap or .NET's Dictionary?
P.S. As requested in the comments, I paste an example here. I ran lots of tests, but here is a simple example that takes 75 ms in Rust (release mode) and 20 ms in Java:
In Rust:
let hm: FnvHashMap<i32, i32> = ...;
// Start timer here
let mut sum: i64 = 0;
for i in 0..1_000_000 {
if let Some(x) = hm.get(&i) {
sum += *x as i64;
}
}
println!("The sum is: {}", sum);
In Java:
Map<Integer, Integer> hm = ...;
// Start timer here
long sum = 0;
for (int i = 0; i < 1_000_000; i++) {
sum += hm.get(i);
}
With HashMap<i32, i32> and its default SipHash hasher it took 190 ms. I know why it's slower than FnvHashMap. I'm just mentioning that for completeness.
First, here is some runnable code to measure the efficiency of the different implementations:
use std::{collections::HashMap, time::Instant};
fn main() {
let hm: HashMap<i32, i32> = (0..1_000_000).map(|i| (i, i)).collect();
let t0 = Instant::now();
let mut sum = 0;
for i in 0..1_000_000 {
if let Some(x) = hm.get(&i) {
sum += x;
}
}
let elapsed = t0.elapsed().as_secs_f64();
println!("{} - The sum is: {}", elapsed, sum);
}
On the old desktop machine I'm writing this on, it reports 76 ms to run. Since the machine is 10+ years old, I find it baffling that your hardware would take 190 ms to run the same code, so I'm wondering how and what you're actually measuring. But let's ignore that and concentrate on the relative numbers.
When you care about hashmap efficiency in Rust, and when the keys don't come from an untrusted source, the first thing to try should always be to switch to a non-DOS-resistant hash function. One possibility is the FNV hash function from the fnv crate, which you can get by switching HashMap to fnv::FnvHashMap. That brings performance to 34 ms, i.e. a 2.2x speedup.
If this is not enough, you can try the hash from the rustc-hash crate (almost the same as fxhash, but allegedly better maintained), which uses the same function as the Rust compiler, adapted from the hash used by Firefox. Not based on any formal analysis, it performs badly on hash function test suites, but is reported to consistently outperform FNV. That's confirmed on the above example, where switching from FnvHashMap to rustc_hash::FxHashMap drops the time to 28 ms, i.e. a 2.7x speedup from the initial timing.
Finally, if you want to just imitate what C# and Java do, and could care less about certain patterns of inserted numbers leading to degraded performance, you can use the aptly-named nohash_hasher crate that gives you an identity hash. Changing HashMap<i32, i32> to HashMap<i32, i32, nohash_hasher::BuildNoHashHasher<i32>> drops the time to just under 4 ms, i.e. a whopping 19x speedup from the initial timing.
Since you report the Java example to be 9.5x faster than Rust, a 19x speedup should make your code approximately twice as fast as Java.
Rust's HashMap by default uses an implementation of SipHash as the hash function. SipHash is designed to avoid denial-of-service attacks based on predicting hash collisions, which is an important security property for a hash function used in a hash map.
If you don't need this guarantee, you can use a simpler hash function. One option is using the fxhash crate, which should speed up reading integers from a HashMap<i32, i32> by about a factor of 3.
Other options are implementing your own trivial hash function (e.g. by simply using the identity function, which is a decent hash function for mostly consecutive keys), or using a vector instead of a hash map.
.NET uses the identity function for hashes of Int32 by default, so it's not resistant to hash flooding attacks. Of course this is faster, but the downside is not even mentioned in the documentation of Dictionary. For what it's worth, I prefer Rust's "safe by default" approach over .NET's any day, since many developers aren't even aware of the problems predictable hash functions can cause. Rust still allows you to use a more performant hash function if you don't need the hash flooding protection, so to me personally this seems to be a strength of Rust compared to at least .NET rather than a weakness.
I decided to run some more tests, based on the suggestions by user4815162342. This time I used another machine with Ubuntu 20.04.
Rust code
println!("----- HashMap (with its default SipHash hasher) -----------");
let hm: HashMap<i32, i32> = (0..1_000_000).map(|i| (i, i)).collect();
for k in 0..6 {
let t0 = Instant::now();
let mut sum: i64 = 0;
for i in 0..1_000_000 {
if let Some(x) = hm.get(&i) {
sum += *x as i64;
}
}
let elapsed = t0.elapsed().as_secs_f64();
println!("The sum is: {}. Time elapsed: {:.3} sec", sum, elapsed);
}
println!("----- FnvHashMap (fnv 1.0.7) ------------------------------");
let hm: FnvHashMap<i32, i32> = (0..1_000_000).map(|i| (i, i)).collect();
for k in 0..6 {
let t0 = Instant::now();
let mut sum: i64 = 0;
for i in 0..1_000_000 {
if let Some(x) = hm.get(&i) {
sum += *x as i64;
}
}
let elapsed = t0.elapsed().as_secs_f64();
println!("The sum is: {}. Time elapsed: {:.3} sec", sum, elapsed);
}
println!("----- FxHashMap (rustc-hash 1.1.0) ------------------------");
let hm: FxHashMap<i32, i32> = (0..1_000_000).map(|i| (i, i)).collect();
for k in 0..6 {
let t0 = Instant::now();
let mut sum: i64 = 0;
for i in 0..1_000_000 {
if let Some(x) = hm.get(&i) {
sum += *x as i64;
}
}
let elapsed = t0.elapsed().as_secs_f64();
println!("The sum is: {}. Time elapsed: {:.3} sec", sum, elapsed);
}
println!("----- HashMap/BuildNoHashHasher (nohash-hasher 0.2.0) -----");
let hm: HashMap<i32, i32, nohash_hasher::BuildNoHashHasher<i32>> = (0..1_000_000).map(|i| (i, i)).collect();
for k in 0..6 {
let t0 = Instant::now();
let mut sum: i64 = 0;
for i in 0..1_000_000 {
if let Some(x) = hm.get(&i) {
sum += *x as i64;
}
}
let elapsed = t0.elapsed().as_secs_f64();
println!("The sum is: {}. Time elapsed: {:.3} sec", sum, elapsed);
}
BTW the last one can be replaced with this shorter type:
let hm: IntMap<i32, i32> = (0..1_000_000).map(|i| (i, i)).collect();
For those interested, this is IntMap's definition:
pub type IntMap<K, V> = std::collections::HashMap<K, V, BuildNoHashHasher<K>>;
Java code
On the same machine I tested a Java example. I don't have a JVM installed on it, so I used a Docker image adoptopenjdk/openjdk14 and directly pasted the code below in jshell> (not sure if that would hurt Java's timings). So this is the Java code:
Map<Integer, Integer> hm = new HashMap<>();
for (int i = 0; i < 1_000_000; i++) {
hm.put(i, i);
}
for (int k = 0; k < 6; k++) {
long t0 = System.currentTimeMillis();
// Start timer here
long sum = 0;
for (int i = 0; i < 1_000_000; i++) {
sum += hm.get(i);
}
System.out.println("The sum is: " + sum + ". Time elapsed: " + (System.currentTimeMillis() - t0) + " ms");
}
Results
Rust (release mode):
----- HashMap (with its default SipHash hasher) -----------
The sum is: 499999500000. Time elapsed: 0.149 sec
The sum is: 499999500000. Time elapsed: 0.140 sec
The sum is: 499999500000. Time elapsed: 0.167 sec
The sum is: 499999500000. Time elapsed: 0.150 sec
The sum is: 499999500000. Time elapsed: 0.261 sec
The sum is: 499999500000. Time elapsed: 0.189 sec
----- FnvHashMap (fnv 1.0.7) ------------------------------
The sum is: 499999500000. Time elapsed: 0.055 sec
The sum is: 499999500000. Time elapsed: 0.052 sec
The sum is: 499999500000. Time elapsed: 0.053 sec
The sum is: 499999500000. Time elapsed: 0.058 sec
The sum is: 499999500000. Time elapsed: 0.051 sec
The sum is: 499999500000. Time elapsed: 0.056 sec
----- FxHashMap (rustc-hash 1.1.0) ------------------------
The sum is: 499999500000. Time elapsed: 0.039 sec
The sum is: 499999500000. Time elapsed: 0.076 sec
The sum is: 499999500000. Time elapsed: 0.064 sec
The sum is: 499999500000. Time elapsed: 0.048 sec
The sum is: 499999500000. Time elapsed: 0.057 sec
The sum is: 499999500000. Time elapsed: 0.061 sec
----- HashMap/BuildNoHashHasher (nohash-hasher 0.2.0) -----
The sum is: 499999500000. Time elapsed: 0.004 sec
The sum is: 499999500000. Time elapsed: 0.003 sec
The sum is: 499999500000. Time elapsed: 0.003 sec
The sum is: 499999500000. Time elapsed: 0.003 sec
The sum is: 499999500000. Time elapsed: 0.003 sec
The sum is: 499999500000. Time elapsed: 0.003 sec
Java:
The sum is: 499999500000. Time elapsed: 49 ms // see notes below
The sum is: 499999500000. Time elapsed: 41 ms // see notes below
The sum is: 499999500000. Time elapsed: 18 ms
The sum is: 499999500000. Time elapsed: 29 ms
The sum is: 499999500000. Time elapsed: 19 ms
The sum is: 499999500000. Time elapsed: 23 ms
(With Java the first 1-2 runs are normally slower, as the JVM HotSpot still hasn't fully optimized the relevant piece of code.)
Try hashbrown
It uses aHash which have full comparison with other HashMap algorithm here

Why rayon-based parallel processing takes more time than serial processing?

Learning Rayon, I wanted to compare the performace of parallel calculation and serial calculation of Fibonacci series. Here's my code:
use rayon;
use std::time::Instant;
fn main() {
let nth = 30;
let now = Instant::now();
let fib = fibonacci_serial(nth);
println!(
"[s] The {}th number in the fibonacci sequence is {}, elapsed: {}",
nth,
fib,
now.elapsed().as_micros()
);
let now = Instant::now();
let fib = fibonacci_parallel(nth);
println!(
"[p] The {}th number in the fibonacci sequence is {}, elapsed: {}",
nth,
fib,
now.elapsed().as_micros()
);
}
fn fibonacci_parallel(n: u64) -> u64 {
if n <= 1 {
return n;
}
let (a, b) = rayon::join(|| fibonacci_parallel(n - 2), || fibonacci_parallel(n - 1));
a + b
}
fn fibonacci_serial(n: u64) -> u64 {
if n <= 1 {
return n;
}
fibonacci_serial(n - 2) + fibonacci_serial(n - 1)
}
Run in Rust Playground
I expected the elapsed time of parallel calculation would be smaller than the elapsed time of serial caculation, but the result was opposite:
# `s` stands for serial calculation and `p` for parallel
[s] The 30th number in the fibonacci sequence is 832040, elapsed: 12127
[p] The 30th number in the fibonacci sequence is 832040, elapsed: 990379
My implementation for serial/parallel calculation would have flaws. But if not, why am I seeing these results?
I think the real reason is, that you create n² threads which is not good. In every call of fibonacci_parallel you create another pair of threads for rayon and because you call fibonacci_parallel again in the closure you create yet another pair of threads.
This is utterly terrible for the OS/rayon.
An approach to solve this problem could be this:
fn fibonacci_parallel(n: u64) -> u64 {
fn inner(n: u64) -> u64 {
if n <= 1 {
return n;
}
inner(n - 2) + inner(n - 1)
}
if n <= 1 {
return n;
}
let (a, b) = rayon::join(|| inner(n - 2), || inner(n - 1));
a + b
}
You create two threads which both execute the inner function. With this addition I get
op#VBOX /t/t/foo> cargo run --release 40
Finished release [optimized] target(s) in 0.03s
Running `target/release/foo 40`
[s] The 40th number in the fibonacci sequence is 102334155, elapsed: 1373741
[p] The 40th number in the fibonacci sequence is 102334155, elapsed: 847343
But as said, for low numbers parallel execution is not worth it:
op#VBOX /t/t/foo> cargo run --release 20
Finished release [optimized] target(s) in 0.02s
Running `target/release/foo 20`
[s] The 10th number in the fibonacci sequence is 6765, elapsed: 82
[p] The 10th number in the fibonacci sequence is 6765, elapsed: 241

Why is my for loop code slower than an iterator?

I am trying to solve the leetcode problem distribute-candies. It is easy, just find out the minimum between the candies' kinds and candies half number.
Here's my solution (cost 48ms):
use std::collections::HashSet;
pub fn distribute_candies(candies: Vec<i32>) -> i32 {
let sister_candies = (candies.len() / 2) as i32;
let mut kind = 0;
let mut candies_kinds = HashSet::new();
for candy in candies.into_iter() {
if candies_kinds.insert(candy) {
kind += 1;
if kind > sister_candies {
return sister_candies;
}
}
}
kind
}
However, I found a solution using an iterator:
use std::collections::HashSet;
use std::cmp::min;
pub fn distribute_candies(candies: Vec<i32>) -> i32 {
min(candies.iter().collect::<HashSet<_>>().len(), candies.len() / 2) as i32
}
and it costs 36ms.
I can't quite understand why the iterator solution is faster than my for loop solution. Are there some magic optimizations that Rust is performing in the background?
The main difference is that the iterator version internally uses Iterator::size_hint to determine how much space to reserve in the HashSet before collecting into it. This prevents repeatedly having to reallocate as the set grows.
You can do the same using HashSet::with_capacity instead of HashSet::new:
let mut candies_kinds = HashSet::with_capacity(candies.len());
In my benchmark this single change makes your code significantly faster than the iterator. However, if I simplify your code to remove the early bailout optimisation, it runs in almost exactly the same time as the iterator version.
pub fn distribute_candies(candies: &[i32]) -> i32 {
let sister_candies = (candies.len() / 2) as i32;
let mut candies_kinds = HashSet::with_capacity(candies.len());
for candy in candies.into_iter() {
candies_kinds.insert(candy);
}
sister_candies.min(candies_kinds.len() as i32)
}
Timings:
test tests::bench_iter ... bench: 262,315 ns/iter (+/- 23,704)
test tests::bench_loop ... bench: 307,697 ns/iter (+/- 16,119)
test tests::bench_loop_with_capacity ... bench: 112,194 ns/iter (+/- 18,295)
test tests::bench_loop_with_capacity_no_bailout ... bench: 259,961 ns/iter (+/- 17,712)
This suggests to me that the HashSet preallocation is the dominant difference. Your additional optimisation also proves to be very effective - at least with the dataset that I happened to choose.

What part of my code to filter prime numbers causes it to slow down as it processes?

I am doing some problems on Project Euler. This challenge requires filtering prime numbers from an array. I was halfway near my solution when I found out that Rust was a bit slow. I added a progressbar to check the progress.
Here is the code:
extern crate pbr;
use self::pbr::ProgressBar;
pub fn is_prime(i: i32) -> bool {
for d in 2..i {
if i % d == 0 {
return false;
}
}
true
}
pub fn calc_sum_loop(max_num: i32) -> i32 {
let mut pb = ProgressBar::new(max_num as u64);
pb.format("[=>_]");
let mut sum_primes = 0;
for i in 1..max_num {
if is_prime(i) {
sum_primes += i;
}
pb.inc();
}
sum_primes
}
pub fn solve() {
println!("About to calculate sum of primes in the first 20000");
println!("When using a forloop {:?}", calc_sum_loop(400000));
}
I am calling the solve function from my main.rs file. Turns out that the number of iterations in my for loop is a lot faster in the beginning and a lot slower later.
➜ euler-rust git:(master) ✗ cargo run --release
Finished release [optimized] target(s) in 0.05s
Running `target/release/euler-rust`
About to calculate sum of primes..
118661 / 400000 [===========>__________________________] 29.67 % 48780.25/s 6s
...
...
400000 / 400000 [=======================================] 100.00 % 23725.24/s
I am sort of drawing a blank in what might be causing this slowdown. It feels like Rust should be able to be much faster than what I am currently seeing. Note that I am telling Cargo to build with the --release flag. I am aware that not doing this might slow things down even further.
The function which is slowing the execution is:
is_prime(i: i32)
You may consider to use more efficient crate like primes or you can check efficient prime number checking algorithms here

Why is the Rust random number generator slower with multiple instances running?

I am doing some random number generation for my Lotto Simulation and was wondering why would it be MUCH slower when running multiple instances?
I am running this program under Ubuntu 15.04 (linux kernel 4.2). rustc 1.7.0-nightly (d5e229057 2016-01-04)
Overall CPU utilization is about 45% during these tests but each individual thread is taking up 100% of that thread.
Here is my script I am using to start multiple instances at the same time.
#!/usr/bin/env bash
pkill lotto_sim
for _ in `seq 1 14`;
do
./lotto_sim 15000000 1>> /var/log/syslog &
done
Output:
Took PT38.701900316S seconds to generate 15000000 random tickets
Took PT39.193917241S seconds to generate 15000000 random tickets
Took PT39.412279484S seconds to generate 15000000 random tickets
Took PT39.492940352S seconds to generate 15000000 random tickets
Took PT39.715433024S seconds to generate 15000000 random tickets
Took PT39.726609237S seconds to generate 15000000 random tickets
Took PT39.884151996S seconds to generate 15000000 random tickets
Took PT40.025874106S seconds to generate 15000000 random tickets
Took PT40.088332517S seconds to generate 15000000 random tickets
Took PT40.112601899S seconds to generate 15000000 random tickets
Took PT40.205958636S seconds to generate 15000000 random tickets
Took PT40.227956170S seconds to generate 15000000 random tickets
Took PT40.393753486S seconds to generate 15000000 random tickets
Took PT40.465173616S seconds to generate 15000000 random tickets
However, a single run gives this output:
$ ./lotto_sim 15000000
Took PT9.860698141S seconds to generate 15000000 random tickets
My understanding is that each process has it's own memory and doesn't share anything. Correct?
Here is the relevant code:
extern crate rand;
extern crate itertools;
extern crate time;
use std::env;
use rand::{Rng, Rand};
use itertools::Itertools;
use time::PreciseTime;
struct Ticket {
whites: Vec<u8>,
power_ball: u8,
is_power_play: bool,
}
const POWER_PLAY_PERCENTAGE: u8 = 15;
const WHITE_MIN: u8 = 1;
const WHITE_MAX: u8 = 69;
const POWER_BALL_MIN: u8 = 1;
const POWER_BALL_MAX: u8 = 26;
impl Rand for Ticket {
fn rand<R: Rng>(rng: &mut R) -> Self {
let pp_guess = rng.gen_range(0, 100);
let pp_value = pp_guess < POWER_PLAY_PERCENTAGE;
let mut whites_vec: Vec<_> = (0..).map(|_| rng.gen_range(WHITE_MIN, WHITE_MAX + 1))
.unique().take(5).collect();
whites_vec.sort();
let pb_value = rng.gen_range(POWER_BALL_MIN, POWER_BALL_MAX + 1);
Ticket { whites: whites_vec, power_ball: pb_value, is_power_play: pp_value}
}
}
fn gen_test(num_tickets: i64) {
let mut rng = rand::thread_rng();
let _: Vec<_> = rng.gen_iter::<Ticket>()
.take(num_tickets as usize)
.collect();
}
fn main() {
let args: Vec<_> = env::args().collect();
let num_tickets: i64 = args[1].parse::<i64>().unwrap();
let start = PreciseTime::now();
gen_test(num_tickets);
let end = PreciseTime::now();
println!("Took {} seconds to generate {} random tickets", start.to(end), num_tickets);
}
Edit:
Maybe a better question would be how do I debug and figure this out? Where would I look within the program or within my OS to find the performance hindrances? I am new to Rust and lower level programming like this that relies so heavily on the OS.

Resources