Speedup counter game - performance

I'm trying to solve a Rust algorithm question on hackerrank. My answer times out on some of the larger test cases. There are about 5 people who've completed it, so I believe it is possible and I assume they compile in release mode. Is there any speed-ups I'm missing?
The gist of the game is a counter (inp in main) is conditionally reduced and based on who can't reduce it any more, the winner is chosen.
use std::io;
fn main() {
let n: usize = read_one_line().
trim().parse().unwrap();
for _i in 0..n{
let inp: u64 = read_one_line().
trim().parse().unwrap();
println!("{:?}", find_winner(inp));
}
return;
}
fn find_winner(mut n: u64) -> String{
let mut win = 0;
while n>1{
if n.is_power_of_two(){
n /= 2;
}
else{
n -= n.next_power_of_two()/2;
}
win += 1;
}
let winner =
if win % 2 == 0{
String::from("Richard")
} else{
String::from("Louise")
};
winner
}
fn read_one_line() -> String{
let mut input = String::new();
io::stdin().read_line(&mut input).expect("Failed to read");
input
}

Your inner loop can be replaced by a combination of builtin functions:
let win = if n > 0 {
n.count_ones() + n.trailing_zeros() - 1
} else {
0
};
Also, instead of allocating a string every time find_winner is called,
a string slice may be returned:
fn find_winner(n: u64) -> &'static str {
let win = if n > 0 {
n.count_ones() + n.trailing_zeros() - 1
} else {
0
};
if win % 2 == 0 {
"Richard"
} else{
"Louise"
}
}

Avoiding memory allocation can help speeding up the application.
At the moment, the read_one_line function is doing one memory allocation per call, which can be avoided if you supply the String as a &mut parameter:
fn read_one_line(input: &mut String) -> &str {
io::stdin().read_line(input).expect("Failed to read");
input
}
Note how I also alter the return type to return a slice (which borrows input): further uses here do not need to modify the original string.
Another improvement is I/O. Rust is all about explicitness, and it means that io::stdin() is raw I/O: each call to read_line triggers interactions with the kernel.
You can (and should) instead used buffered I/O with std::io::BufReader. Build it once, then pass it as an argument:
fn read_one_line<'a, R>(reader: &mut R, input: &'a mut String) -> &'a str
where R: io::BufRead
{
reader.read_line(input).expect("Failed to read");
input
}
Note:
it's easier to make it generic (R) than to specify the exact type of BufReader :)
annotating the lifetime is mandatory because the return type could borrow either parameter
Putting it altogether:
fn read_one_line<'a, R>(reader: &mut R, input: &'a mut String) -> &'a str
where R: io::BufRead
{
reader.read_line(input).expect("Failed to read");
input
}
fn main() {
let mut reader = io::BufReader::new(io::stdin());
let mut input = String::new();
let n: usize = read_one_line(&mut reader, &mut input).
trim().parse().unwrap();
for _i in 0..n{
let inp: u64 = read_one_line(&mut reader, &mut input).
trim().parse().unwrap();
println!("{:?}", find_winner(inp));
}
return;
}
with the bigger win probably being I/O (might even be sufficient in itself).
Don't forget to also apply #John's advices, this way you'll be allocation-free in your main loop!

Related

Random string generator with minimal allocations

I want to generate a large file of pseudo-random ASCII characters given the parameters: size per line and number of lines. I cannot figure out a way to do this without allocating new Strings for each line. This is what I have: https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=42f5b803910e3a15ff20561117bf9176
use rand::{Rng, SeedableRng};
use std::error::Error;
fn main() -> Result<(), Box<dyn Error>> {
let mut data: Vec<u8> = Vec::new();
write_random_lines(&mut data, 10, 10)?;
println!("{}", std::str::from_utf8(&data)?);
Ok(())
}
fn write_random_lines<W>(
file: &mut W,
line_size: usize,
line_count: usize,
) -> Result<(), Box<dyn Error>>
where
W: std::io::Write,
{
for _ in 0..line_count {
let mut s: String = rand::rngs::SmallRng::from_entropy()
.sample_iter(rand::distributions::Alphanumeric)
.take(line_size)
.collect();
s.push('\n');
file.write(s.as_bytes())?;
}
Ok(())
}
I'm creating a new String every line, so I believe this is not memory efficient. There is fn fill_bytes(&mut self, dest: &mut [u8]) but this is for bytes.
I would preferably not create a new SmallRng for each line, but it is used in a loop and SmallRng cannot be copied.
How can I generate a random file in a more memory and time efficient way?
You can easily reuse a String in a loop by creating it outside the loop and clearing it after using the contents:
// Use Kevin's suggestion not to make a new `SmallRng` each time:
let mut rng_iter =
rand::rngs::SmallRng::from_entropy().sample_iter(rand::distributions::Alphanumeric);
let mut s = String::with_capacity(line_size + 1); // allocate the buffer
for _ in 0..line_count {
s.extend(rng_iter.by_ref().take(line_size)); // fill the buffer
s.push('\n');
file.write(s.as_bytes())?; // use the contents
s.clear(); // clear the buffer
}
String::clear erases the contents of the String (dropping if necessary), but does not free its backing buffer, so it can be reused without needing to reallocate.
See also
Weird behaviour when using read_line in a loop
Why does Iterator::take_while take ownership of the iterator? explains why by_ref is needed
This modification of your code does not allocate any Strings and also does not construct a new SmallRng each time, but I have not benchmarked it:
fn write_random_lines<W>(
file: &mut W,
line_size: usize,
line_count: usize,
) -> Result<(), Box<dyn Error>>
where
W: std::io::Write,
{
// One random data iterator.
let mut rng_iter = rand::rngs::SmallRng::from_entropy()
.sample_iter(rand::distributions::Alphanumeric);
// Temporary storage for encoding of chars. If the characters used
// are not all ASCII then its size should be increased to 4.
let mut char_buffer = [0; 1];
for _ in 0..line_count {
for _ in 0..line_size {
file.write(
rng_iter.next()
.unwrap() // iterator is infinite so this never fails
.encode_utf8(&mut char_buffer)
.as_bytes())?;
}
file.write("\n".as_bytes())?;
}
Ok(())
}
I am new to Rust so it may be missing some ways to tidy it up. Also, note that this writes only one character at a time; if your W is more expensive per operation than an in-memory buffer, you probably want to wrap it in std::io::BufWriter, which will batch writes to the destination (using a buffer that needs to be allocated, but only once).
I (MakotoE) benchmarked Kevin Reid's answer, and it seems their method is faster though memory allocation seems to be the same.
Benchmarking time-wise:
#[cfg(test)]
mod tests {
extern crate test;
use test::Bencher;
use super::*;
#[bench]
fn bench_write_random_lines0(b: &mut Bencher) {
let mut data: Vec<u8> = Vec::new();
data.reserve(100 * 1000000);
b.iter(|| {
write_random_lines0(&mut data, 100, 1000000).unwrap();
data.clear();
});
}
#[bench]
fn bench_write_random_lines1(b: &mut Bencher) {
let mut data: Vec<u8> = Vec::new();
data.reserve(100 * 1000000);
b.iter(|| {
// This is Kevin's implementation
write_random_lines1(&mut data, 100, 1000000).unwrap();
data.clear();
});
}
}
test tests::bench_write_random_lines0 ... bench: 764,953,658 ns/iter (+/- 7,597,989)
test tests::bench_write_random_lines1 ... bench: 360,662,595 ns/iter (+/- 886,456)
Benchmarking memory usage using valgrind's Massif shows that both are about the same. Mine used 3.072 Gi total, 101.0 MB at peak level. Kevin's used 4.166 Gi total, 128.0 MB peak.

How to loop certain (variable) number of times?

This question may seem extremely basic, but I'm having a hard time figuring out how to do this. I have an integer, and I need to use a for loop to loop integer number of times.
First, I tried -
fn main() {
let number = 10; // Any value is ok
for num in number {
println!("success");
}
}
this prints the error
error[E0277]: `{integer}` is not an iterator
--> src/main.rs:3:16
|
3 | for num in number{
| ^^^^^^ `{integer}` is not an iterator
|
= help: the trait `std::iter::Iterator` is not implemented for `{integer}`
= note: if you want to iterate between `start` until a value `end`, use the exclusive range syntax `start..end` or the inclusive range syntax `start..=end`
= note: required by `std::iter::IntoIterator::into_iter`
Next, I tried -
fn main() {
let number = 10; // Any value is ok
for num in number.iter() {
println!("success");
}
}
the compiler says there is no method iter for integer
error[E0599]: no method named `iter` found for type `{integer}` in the current scope
--> src/main.rs:3:23
|
3 | for num in number.iter() {
| ^^^^
How am I supposed to do this?
This is because you are saying to the compiler for a num contained in number where number is not an iterator and neither does implement iter, rather than for a num in the range 0..number which is an iterator.
The documentation describes the for loop as:
for loop_variable in iterator {
code()
}
Change the code to:
fn main() {
let number = 10;
for num in 0..number { // change it to get range
println!("success");
}
}
You can also change it to:
fn main() {
let number = 10;
for num in 1..=number { // inclusive range
println!("success");
}
}
Or to:
fn main() {
let number = 10;
for _ in 0..number { // where _ is a "throw away" variable
println!("success");
}
}
Also see for documentation
I would like to share this closure-based groovy-inspired way of looping n times in Rust.
pub trait Times {
fn times(&self, f:fn(Self));
}
impl Times for u8 {
fn times(&self, f:fn(u8)) {
for x in 0..*self {
f(x)
}
}
}
fn main() {
const K: u8 =7;
4.times( |v:u8| { println!("Inline Closure {v}.{K}"); } );
}
The output is:
Inline Closure 0.7
Inline Closure 1.7
Inline Closure 2.7
Inline Closure 3.7
Rust for-loops take an iterator (actually anything that can be converted into an iterator). An sole integer can not be converted into an iterator, but a range can.
fn main() {
let number = 10; // Any value is ok
for num in 0..number {
println!("success");
}
}
https://play.integer32.com/?version=stable&mode=debug&edition=2018&gist=029803cf8ac6efaa3113b2f32ae6ef0d

How do I partially sort a Vec or slice?

I need to get the top N items from a Vec which is quite large in production. Currently I do it like this inefficient way:
let mut v = vec![6, 4, 3, 7, 2, 1, 5];
v.sort_unstable();
v = v[0..3].to_vec();
In C++, I'd use std::partial_sort, but I can't find an equivalent in the Rust docs.
Am I just overlooking it, or does it not exist (yet)?
The standard library doesn't contain this functionality, but it looks like the lazysort crate is exactly what you need:
So what's the point of lazy sorting? As per the linked blog post, they're useful when you do not need or intend to need every value; for example you may only need the first 1,000 ordered values from a larger set.
#![feature(test)]
extern crate lazysort;
extern crate rand;
extern crate test;
use std::cmp::Ordering;
trait SortLazy<T> {
fn sort_lazy<F>(&mut self, cmp: F, n: usize)
where
F: Fn(&T, &T) -> Ordering;
unsafe fn sort_lazy_fast<F>(&mut self, cmp: F, n: usize)
where
F: Fn(&T, &T) -> Ordering;
}
impl<T> SortLazy<T> for [T] {
fn sort_lazy<F>(&mut self, cmp: F, n: usize)
where
F: Fn(&T, &T) -> Ordering,
{
fn sort_lazy<F, T>(data: &mut [T], accu: &mut usize, cmp: &F, n: usize)
where
F: Fn(&T, &T) -> Ordering,
{
if !data.is_empty() && *accu < n {
let mut pivot = 1;
let mut lower = 0;
let mut upper = data.len();
while pivot < upper {
match cmp(&data[pivot], &data[lower]) {
Ordering::Less => {
data.swap(pivot, lower);
lower += 1;
pivot += 1;
}
Ordering::Greater => {
upper -= 1;
data.swap(pivot, upper);
}
Ordering::Equal => pivot += 1,
}
}
sort_lazy(&mut data[..lower], accu, cmp, n);
sort_lazy(&mut data[upper..], accu, cmp, n);
} else {
*accu += 1;
}
}
sort_lazy(self, &mut 0, &cmp, n);
}
unsafe fn sort_lazy_fast<F>(&mut self, cmp: F, n: usize)
where
F: Fn(&T, &T) -> Ordering,
{
fn sort_lazy<F, T>(data: &mut [T], accu: &mut usize, cmp: &F, n: usize)
where
F: Fn(&T, &T) -> Ordering,
{
if !data.is_empty() && *accu < n {
unsafe {
use std::mem::swap;
let mut pivot = 1;
let mut lower = 0;
let mut upper = data.len();
while pivot < upper {
match cmp(data.get_unchecked(pivot), data.get_unchecked(lower)) {
Ordering::Less => {
swap(
&mut *(data.get_unchecked_mut(pivot) as *mut T),
&mut *(data.get_unchecked_mut(lower) as *mut T),
);
lower += 1;
pivot += 1;
}
Ordering::Greater => {
upper -= 1;
swap(
&mut *(data.get_unchecked_mut(pivot) as *mut T),
&mut *(data.get_unchecked_mut(upper) as *mut T),
);
}
Ordering::Equal => pivot += 1,
}
}
sort_lazy(&mut data[..lower], accu, cmp, n);
sort_lazy(&mut data[upper..], accu, cmp, n);
}
} else {
*accu += 1;
}
}
sort_lazy(self, &mut 0, &cmp, n);
}
}
#[cfg(test)]
mod tests {
use test::Bencher;
use lazysort::Sorted;
use std::collections::BinaryHeap;
use SortLazy;
use rand::{thread_rng, Rng};
const SIZE_VEC: usize = 100_000;
const N: usize = 42;
#[bench]
fn sort(b: &mut Bencher) {
b.iter(|| {
let mut rng = thread_rng();
let mut v: Vec<i32> = std::iter::repeat_with(|| rng.gen())
.take(SIZE_VEC)
.collect();
v.sort_unstable();
})
}
#[bench]
fn lazysort(b: &mut Bencher) {
b.iter(|| {
let mut rng = thread_rng();
let v: Vec<i32> = std::iter::repeat_with(|| rng.gen())
.take(SIZE_VEC)
.collect();
let _: Vec<_> = v.iter().sorted().take(N).collect();
})
}
#[bench]
fn lazysort_in_place(b: &mut Bencher) {
b.iter(|| {
let mut rng = thread_rng();
let mut v: Vec<i32> = std::iter::repeat_with(|| rng.gen())
.take(SIZE_VEC)
.collect();
v.sort_lazy(i32::cmp, N);
})
}
#[bench]
fn lazysort_in_place_fast(b: &mut Bencher) {
b.iter(|| {
let mut rng = thread_rng();
let mut v: Vec<i32> = std::iter::repeat_with(|| rng.gen())
.take(SIZE_VEC)
.collect();
unsafe { v.sort_lazy_fast(i32::cmp, N) };
})
}
#[bench]
fn binaryheap(b: &mut Bencher) {
b.iter(|| {
let mut rng = thread_rng();
let v: Vec<i32> = std::iter::repeat_with(|| rng.gen())
.take(SIZE_VEC)
.collect();
let mut iter = v.iter();
let mut heap: BinaryHeap<_> = iter.by_ref().take(N).collect();
for i in iter {
heap.push(i);
heap.pop();
}
let _ = heap.into_sorted_vec();
})
}
}
running 5 tests
test tests::binaryheap ... bench: 3,283,938 ns/iter (+/- 413,805)
test tests::lazysort ... bench: 1,669,229 ns/iter (+/- 505,528)
test tests::lazysort_in_place ... bench: 1,781,007 ns/iter (+/- 443,472)
test tests::lazysort_in_place_fast ... bench: 1,652,103 ns/iter (+/- 691,847)
test tests::sort ... bench: 5,600,513 ns/iter (+/- 711,927)
test result: ok. 0 passed; 0 failed; 0 ignored; 5 measured; 0 filtered out
This code allows us to see that lazysort is faster than the solution with BinaryHeap. We can also see that BinaryHeap solution gets worse when N increases.
The problem with lazysort is that it creates a second Vec<_>. A "better" solution would be to implement the partial sort in-place. I provided an example of such an implementation.
Keep in mind that all these solutions come with overhead. When N is about SIZE_VEC / 3, the classic sort wins.
You could submit an RFC/issue to ask about adding this feature to the standard library.
There is a select_nth_unstable, the equivalent of std::nth_element. The result of this can then be sorted to achieve what you want.
Example:
let mut v = vec![6, 4, 3, 7, 2, 1, 5];
let top_three = v.select_nth_unstable(3).0;
top_three.sort();
3 here is the index of the "nth" element, so we're actually picking the 4th element, that's because select_nth_unstable returns a tuple of
a slice to the left of the nth element
a reference to the nth element
a slice to the right of the nth element

Why can't I mutably borrow a primitive from an enum?

I would like to be able to obtain references (both immutable and mutable) to the usize wrapped in Bar in the Foo enum:
use Foo::*;
#[derive(Debug, PartialEq, Clone)]
pub enum Foo {
Bar(usize)
}
impl Foo {
/* this works */
fn get_bar_ref(&self) -> &usize {
match *self {
Bar(ref n) => &n
}
}
/* this doesn't */
fn get_bar_ref_mut(&mut self) -> &mut usize {
match *self {
Bar(ref mut n) => &mut n
}
}
}
But I can't obtain the mutable reference because:
n does not live long enough
I was able to provide both variants of similar functions accessing other contents of Foo that are Boxed - why does the mutable borrow (and why only it) fail with an unboxed primitive?
You need to replace Bar(ref mut n) => &mut n with Bar(ref mut n) => n.
When you use ref mut n in Bar(ref mut n), it creates a mutable
reference to the data in Bar, so the type of n is &mut usize.
Then you try to return &mut n of &mut &mut u32 type.
This part is most likely incorrect.
Now deref coercion kicks in
and converts &mut n into &mut *n, creating a temporary value *n
of type usize, which doesn't live long enough.
These examples show the sample problem:
fn implicit_reborrow<T>(x: &mut T) -> &mut T {
x
}
fn explicit_reborrow<T>(x: &mut T) -> &mut T {
&mut *x
}
fn implicit_reborrow_bad<T>(x: &mut T) -> &mut T {
&mut x
}
fn explicit_reborrow_bad<T>(x: &mut T) -> &mut T {
&mut **&mut x
}
The explicit_ versions show what the compiler deduces through deref coercions.
The _bad versions both error in the exact same way, while the other two compile.
This is either a bug, or a limitation in how lifetimes are currently implemented in the compiler. The invariance of &mut T over T might have something to do with it, because it results in &mut &'a mut T being invariant over 'a and thus more restrictive during inference than the shared reference (&&'a T) case, even though in this situation the strictness is unnecessary.

What's the most straightforward way to chain comparisons, yielding the first non-equal?

It's quite common to compare data with precedence, for a struct which has multiple members which can be compared, or for a sort_by callback.
// Example of sorting a: Vec<[f64; 2]>, sort first by y, then x,
xy_coords.sort_by(
|co_a, co_b| {
let ord = co_a[1].cmp(&co_b[1]);
if ord != std::cmp::Ordering::Equal {
ord
} else {
co_a[0].cmp(&co_b[0])
}
}
);
Is there a more straightforward way to perform multiple cmp functions, where only the first non-equal result is returned?
perform multiple cmp functions, where only the first non-equal result is returned
That's basically how Ord is defined for tuples. Create a function that converts your type into a tuple and compare those:
fn main() {
let mut xy_coords = vec![[1, 0], [-1, -1], [0, 1]];
fn sort_key(coord: &[i32; 2]) -> (i32, i32) {
(coord[1], coord[0])
}
xy_coords.sort_by(|a, b| {
sort_key(a).cmp(&sort_key(b))
});
}
Since that's common, there's a method just for it:
xy_coords.sort_by_key(sort_key);
It won't help your case, because floating point doesn't implement Ord.
One of many possibilities is to kill the program on NaN:
xy_coords.sort_by(|a, b| {
sort_key(a).partial_cmp(&sort_key(b)).expect("Don't know how to handle NaN")
});
See also
Using max_by_key on a vector of floats
How to do a binary search on a Vec of floats?
There are times when you may not want to create a large tuple to compare values which will be ignored because higher priority values will early-exit the comparison.
Stealing a page from Guava's ComparisonChain, we can make a small builder that allows us to use closures to avoid extra work:
use std::cmp::Ordering;
struct OrdBuilder<T> {
a: T,
b: T,
ordering: Ordering,
}
impl<T> OrdBuilder<T> {
fn new(a: T, b: T) -> OrdBuilder<T> {
OrdBuilder {
a: a,
b: b,
ordering: Ordering::Equal,
}
}
fn compare_with<F, V>(mut self, mut f: F) -> OrdBuilder<T>
where F: for <'a> FnMut(&'a T) -> V,
V: Ord,
{
if self.ordering == Ordering::Equal {
self.ordering = f(&self.a).cmp(&f(&self.b));
}
self
}
fn finish(self) -> Ordering {
self.ordering
}
}
This can be used like
struct Thing {
a: u8,
}
impl Thing {
fn b(&self) -> u8 {
println!("I'm slow!");
42
}
}
fn main() {
let a = Thing { a: 0 };
let b = Thing { a: 1 };
let res = OrdBuilder::new(&a, &b)
.compare_with(|x| x.a)
.compare_with(|x| x.b())
.finish();
println!("{:?}", res);
}

Resources