Random string generator with minimal allocations

Random string generator with minimal allocations - random

I want to generate a large file of pseudo-random ASCII characters given the parameters: size per line and number of lines. I cannot figure out a way to do this without allocating new Strings for each line. This is what I have: https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=42f5b803910e3a15ff20561117bf9176
use rand::{Rng, SeedableRng};
use std::error::Error;
fn main() -> Result<(), Box<dyn Error>> {
let mut data: Vec<u8> = Vec::new();
write_random_lines(&mut data, 10, 10)?;
println!("{}", std::str::from_utf8(&data)?);
Ok(())
}
fn write_random_lines<W>(
file: &mut W,
line_size: usize,
line_count: usize,
) -> Result<(), Box<dyn Error>>
where
W: std::io::Write,
{
for _ in 0..line_count {
let mut s: String = rand::rngs::SmallRng::from_entropy()
.sample_iter(rand::distributions::Alphanumeric)
.take(line_size)
.collect();
s.push('\n');
file.write(s.as_bytes())?;
}
Ok(())
}
I'm creating a new String every line, so I believe this is not memory efficient. There is fn fill_bytes(&mut self, dest: &mut [u8]) but this is for bytes.
I would preferably not create a new SmallRng for each line, but it is used in a loop and SmallRng cannot be copied.
How can I generate a random file in a more memory and time efficient way?

You can easily reuse a String in a loop by creating it outside the loop and clearing it after using the contents:
// Use Kevin's suggestion not to make a new `SmallRng` each time:
let mut rng_iter =
rand::rngs::SmallRng::from_entropy().sample_iter(rand::distributions::Alphanumeric);
let mut s = String::with_capacity(line_size + 1); // allocate the buffer
for _ in 0..line_count {
s.extend(rng_iter.by_ref().take(line_size)); // fill the buffer
s.push('\n');
file.write(s.as_bytes())?; // use the contents
s.clear(); // clear the buffer
}
String::clear erases the contents of the String (dropping if necessary), but does not free its backing buffer, so it can be reused without needing to reallocate.
See also
Weird behaviour when using read_line in a loop
Why does Iterator::take_while take ownership of the iterator? explains why by_ref is needed

This modification of your code does not allocate any Strings and also does not construct a new SmallRng each time, but I have not benchmarked it:
fn write_random_lines<W>(
file: &mut W,
line_size: usize,
line_count: usize,
) -> Result<(), Box<dyn Error>>
where
W: std::io::Write,
{
// One random data iterator.
let mut rng_iter = rand::rngs::SmallRng::from_entropy()
.sample_iter(rand::distributions::Alphanumeric);
// Temporary storage for encoding of chars. If the characters used
// are not all ASCII then its size should be increased to 4.
let mut char_buffer = [0; 1];
for _ in 0..line_count {
for _ in 0..line_size {
file.write(
rng_iter.next()
.unwrap() // iterator is infinite so this never fails
.encode_utf8(&mut char_buffer)
.as_bytes())?;
}
file.write("\n".as_bytes())?;
}
Ok(())
}
I am new to Rust so it may be missing some ways to tidy it up. Also, note that this writes only one character at a time; if your W is more expensive per operation than an in-memory buffer, you probably want to wrap it in std::io::BufWriter, which will batch writes to the destination (using a buffer that needs to be allocated, but only once).

I (MakotoE) benchmarked Kevin Reid's answer, and it seems their method is faster though memory allocation seems to be the same.
Benchmarking time-wise:
#[cfg(test)]
mod tests {
extern crate test;
use test::Bencher;
use super::*;
#[bench]
fn bench_write_random_lines0(b: &mut Bencher) {
let mut data: Vec<u8> = Vec::new();
data.reserve(100 * 1000000);
b.iter(|| {
write_random_lines0(&mut data, 100, 1000000).unwrap();
data.clear();
});
}
#[bench]
fn bench_write_random_lines1(b: &mut Bencher) {
let mut data: Vec<u8> = Vec::new();
data.reserve(100 * 1000000);
b.iter(|| {
// This is Kevin's implementation
write_random_lines1(&mut data, 100, 1000000).unwrap();
data.clear();
});
}
}
test tests::bench_write_random_lines0 ... bench: 764,953,658 ns/iter (+/- 7,597,989)
test tests::bench_write_random_lines1 ... bench: 360,662,595 ns/iter (+/- 886,456)
Benchmarking memory usage using valgrind's Massif shows that both are about the same. Mine used 3.072 Gi total, 101.0 MB at peak level. Kevin's used 4.166 Gi total, 128.0 MB peak.

Related

How to generate an array of random bytes from a seed? [duplicate]

This question already has answers here:
Is there a more idiomatic way to initialize an array with random numbers than a for loop?
(2 answers)
Closed 2 years ago.
I want to generate a UUID from a custom random number generator:
use uuid::{Builder, Uuid, Variant, Version};
use rand::{Rng, SeedableRng, rngs::StdRng, RngCore};
fn main() {
let seed = [5u8; 32];
let mut rng: StdRng = SeedableRng::from_seed(seed);
let bytes = ???
let uuid = Builder::from_bytes(bytes)
.set_variant(Variant::RFC4122)
.set_version(Version::Random)
.build();
println!("{:?}", uuid);
}
How do I get the bytes?

I think I have done it.
use rand::{rngs::StdRng, RngCore, SeedableRng};
use uuid::{Builder, Variant, Version};
fn main() {
let seed = [0u8; 32];
let mut rng: StdRng = SeedableRng::from_seed(seed);
let mut bytes = [0u8; 16];
rng.fill_bytes(&mut bytes);
let uuid = Builder::from_bytes(bytes)
.set_variant(Variant::RFC4122)
.set_version(Version::Random)
.build();
println!("{:?}", uuid);
}

How do I partially sort a Vec or slice?

I need to get the top N items from a Vec which is quite large in production. Currently I do it like this inefficient way:
let mut v = vec![6, 4, 3, 7, 2, 1, 5];
v.sort_unstable();
v = v[0..3].to_vec();
In C++, I'd use std::partial_sort, but I can't find an equivalent in the Rust docs.
Am I just overlooking it, or does it not exist (yet)?

The standard library doesn't contain this functionality, but it looks like the lazysort crate is exactly what you need:
So what's the point of lazy sorting? As per the linked blog post, they're useful when you do not need or intend to need every value; for example you may only need the first 1,000 ordered values from a larger set.
#![feature(test)]
extern crate lazysort;
extern crate rand;
extern crate test;
use std::cmp::Ordering;
trait SortLazy<T> {
fn sort_lazy<F>(&mut self, cmp: F, n: usize)
where
F: Fn(&T, &T) -> Ordering;
unsafe fn sort_lazy_fast<F>(&mut self, cmp: F, n: usize)
where
F: Fn(&T, &T) -> Ordering;
}
impl<T> SortLazy<T> for [T] {
fn sort_lazy<F>(&mut self, cmp: F, n: usize)
where
F: Fn(&T, &T) -> Ordering,
{
fn sort_lazy<F, T>(data: &mut [T], accu: &mut usize, cmp: &F, n: usize)
where
F: Fn(&T, &T) -> Ordering,
{
if !data.is_empty() && *accu < n {
let mut pivot = 1;
let mut lower = 0;
let mut upper = data.len();
while pivot < upper {
match cmp(&data[pivot], &data[lower]) {
Ordering::Less => {
data.swap(pivot, lower);
lower += 1;
pivot += 1;
}
Ordering::Greater => {
upper -= 1;
data.swap(pivot, upper);
}
Ordering::Equal => pivot += 1,
}
}
sort_lazy(&mut data[..lower], accu, cmp, n);
sort_lazy(&mut data[upper..], accu, cmp, n);
} else {
*accu += 1;
}
}
sort_lazy(self, &mut 0, &cmp, n);
}
unsafe fn sort_lazy_fast<F>(&mut self, cmp: F, n: usize)
where
F: Fn(&T, &T) -> Ordering,
{
fn sort_lazy<F, T>(data: &mut [T], accu: &mut usize, cmp: &F, n: usize)
where
F: Fn(&T, &T) -> Ordering,
{
if !data.is_empty() && *accu < n {
unsafe {
use std::mem::swap;
let mut pivot = 1;
let mut lower = 0;
let mut upper = data.len();
while pivot < upper {
match cmp(data.get_unchecked(pivot), data.get_unchecked(lower)) {
Ordering::Less => {
swap(
&mut *(data.get_unchecked_mut(pivot) as *mut T),
&mut *(data.get_unchecked_mut(lower) as *mut T),
);
lower += 1;
pivot += 1;
}
Ordering::Greater => {
upper -= 1;
swap(
&mut *(data.get_unchecked_mut(pivot) as *mut T),
&mut *(data.get_unchecked_mut(upper) as *mut T),
);
}
Ordering::Equal => pivot += 1,
}
}
sort_lazy(&mut data[..lower], accu, cmp, n);
sort_lazy(&mut data[upper..], accu, cmp, n);
}
} else {
*accu += 1;
}
}
sort_lazy(self, &mut 0, &cmp, n);
}
}
#[cfg(test)]
mod tests {
use test::Bencher;
use lazysort::Sorted;
use std::collections::BinaryHeap;
use SortLazy;
use rand::{thread_rng, Rng};
const SIZE_VEC: usize = 100_000;
const N: usize = 42;
#[bench]
fn sort(b: &mut Bencher) {
b.iter(|| {
let mut rng = thread_rng();
let mut v: Vec<i32> = std::iter::repeat_with(|| rng.gen())
.take(SIZE_VEC)
.collect();
v.sort_unstable();
})
}
#[bench]
fn lazysort(b: &mut Bencher) {
b.iter(|| {
let mut rng = thread_rng();
let v: Vec<i32> = std::iter::repeat_with(|| rng.gen())
.take(SIZE_VEC)
.collect();
let _: Vec<_> = v.iter().sorted().take(N).collect();
})
}
#[bench]
fn lazysort_in_place(b: &mut Bencher) {
b.iter(|| {
let mut rng = thread_rng();
let mut v: Vec<i32> = std::iter::repeat_with(|| rng.gen())
.take(SIZE_VEC)
.collect();
v.sort_lazy(i32::cmp, N);
})
}
#[bench]
fn lazysort_in_place_fast(b: &mut Bencher) {
b.iter(|| {
let mut rng = thread_rng();
let mut v: Vec<i32> = std::iter::repeat_with(|| rng.gen())
.take(SIZE_VEC)
.collect();
unsafe { v.sort_lazy_fast(i32::cmp, N) };
})
}
#[bench]
fn binaryheap(b: &mut Bencher) {
b.iter(|| {
let mut rng = thread_rng();
let v: Vec<i32> = std::iter::repeat_with(|| rng.gen())
.take(SIZE_VEC)
.collect();
let mut iter = v.iter();
let mut heap: BinaryHeap<_> = iter.by_ref().take(N).collect();
for i in iter {
heap.push(i);
heap.pop();
}
let _ = heap.into_sorted_vec();
})
}
}
running 5 tests
test tests::binaryheap ... bench: 3,283,938 ns/iter (+/- 413,805)
test tests::lazysort ... bench: 1,669,229 ns/iter (+/- 505,528)
test tests::lazysort_in_place ... bench: 1,781,007 ns/iter (+/- 443,472)
test tests::lazysort_in_place_fast ... bench: 1,652,103 ns/iter (+/- 691,847)
test tests::sort ... bench: 5,600,513 ns/iter (+/- 711,927)
test result: ok. 0 passed; 0 failed; 0 ignored; 5 measured; 0 filtered out
This code allows us to see that lazysort is faster than the solution with BinaryHeap. We can also see that BinaryHeap solution gets worse when N increases.
The problem with lazysort is that it creates a second Vec<_>. A "better" solution would be to implement the partial sort in-place. I provided an example of such an implementation.
Keep in mind that all these solutions come with overhead. When N is about SIZE_VEC / 3, the classic sort wins.
You could submit an RFC/issue to ask about adding this feature to the standard library.

There is a select_nth_unstable, the equivalent of std::nth_element. The result of this can then be sorted to achieve what you want.
Example:
let mut v = vec![6, 4, 3, 7, 2, 1, 5];
let top_three = v.select_nth_unstable(3).0;
top_three.sort();
3 here is the index of the "nth" element, so we're actually picking the 4th element, that's because select_nth_unstable returns a tuple of
a slice to the left of the nth element
a reference to the nth element
a slice to the right of the nth element

Why can't I mutably borrow a primitive from an enum?

I would like to be able to obtain references (both immutable and mutable) to the usize wrapped in Bar in the Foo enum:
use Foo::*;
#[derive(Debug, PartialEq, Clone)]
pub enum Foo {
Bar(usize)
}
impl Foo {
/* this works */
fn get_bar_ref(&self) -> &usize {
match *self {
Bar(ref n) => &n
}
}
/* this doesn't */
fn get_bar_ref_mut(&mut self) -> &mut usize {
match *self {
Bar(ref mut n) => &mut n
}
}
}
But I can't obtain the mutable reference because:
n does not live long enough
I was able to provide both variants of similar functions accessing other contents of Foo that are Boxed - why does the mutable borrow (and why only it) fail with an unboxed primitive?

You need to replace Bar(ref mut n) => &mut n with Bar(ref mut n) => n.
When you use ref mut n in Bar(ref mut n), it creates a mutable
reference to the data in Bar, so the type of n is &mut usize.
Then you try to return &mut n of &mut &mut u32 type.
This part is most likely incorrect.
Now deref coercion kicks in
and converts &mut n into &mut *n, creating a temporary value *n
of type usize, which doesn't live long enough.

These examples show the sample problem:
fn implicit_reborrow<T>(x: &mut T) -> &mut T {
x
}
fn explicit_reborrow<T>(x: &mut T) -> &mut T {
&mut *x
}
fn implicit_reborrow_bad<T>(x: &mut T) -> &mut T {
&mut x
}
fn explicit_reborrow_bad<T>(x: &mut T) -> &mut T {
&mut **&mut x
}
The explicit_ versions show what the compiler deduces through deref coercions.
The _bad versions both error in the exact same way, while the other two compile.
This is either a bug, or a limitation in how lifetimes are currently implemented in the compiler. The invariance of &mut T over T might have something to do with it, because it results in &mut &'a mut T being invariant over 'a and thus more restrictive during inference than the shared reference (&&'a T) case, even though in this situation the strictness is unnecessary.

Speedup counter game

I'm trying to solve a Rust algorithm question on hackerrank. My answer times out on some of the larger test cases. There are about 5 people who've completed it, so I believe it is possible and I assume they compile in release mode. Is there any speed-ups I'm missing?
The gist of the game is a counter (inp in main) is conditionally reduced and based on who can't reduce it any more, the winner is chosen.
use std::io;
fn main() {
let n: usize = read_one_line().
trim().parse().unwrap();
for _i in 0..n{
let inp: u64 = read_one_line().
trim().parse().unwrap();
println!("{:?}", find_winner(inp));
}
return;
}
fn find_winner(mut n: u64) -> String{
let mut win = 0;
while n>1{
if n.is_power_of_two(){
n /= 2;
}
else{
n -= n.next_power_of_two()/2;
}
win += 1;
}
let winner =
if win % 2 == 0{
String::from("Richard")
} else{
String::from("Louise")
};
winner
}
fn read_one_line() -> String{
let mut input = String::new();
io::stdin().read_line(&mut input).expect("Failed to read");
input
}

Your inner loop can be replaced by a combination of builtin functions:
let win = if n > 0 {
n.count_ones() + n.trailing_zeros() - 1
} else {
0
};
Also, instead of allocating a string every time find_winner is called,
a string slice may be returned:
fn find_winner(n: u64) -> &'static str {
let win = if n > 0 {
n.count_ones() + n.trailing_zeros() - 1
} else {
0
};
if win % 2 == 0 {
"Richard"
} else{
"Louise"
}
}

Avoiding memory allocation can help speeding up the application.
At the moment, the read_one_line function is doing one memory allocation per call, which can be avoided if you supply the String as a &mut parameter:
fn read_one_line(input: &mut String) -> &str {
io::stdin().read_line(input).expect("Failed to read");
input
}
Note how I also alter the return type to return a slice (which borrows input): further uses here do not need to modify the original string.
Another improvement is I/O. Rust is all about explicitness, and it means that io::stdin() is raw I/O: each call to read_line triggers interactions with the kernel.
You can (and should) instead used buffered I/O with std::io::BufReader. Build it once, then pass it as an argument:
fn read_one_line<'a, R>(reader: &mut R, input: &'a mut String) -> &'a str
where R: io::BufRead
{
reader.read_line(input).expect("Failed to read");
input
}
Note:
it's easier to make it generic (R) than to specify the exact type of BufReader :)
annotating the lifetime is mandatory because the return type could borrow either parameter
Putting it altogether:
fn read_one_line<'a, R>(reader: &mut R, input: &'a mut String) -> &'a str
where R: io::BufRead
{
reader.read_line(input).expect("Failed to read");
input
}
fn main() {
let mut reader = io::BufReader::new(io::stdin());
let mut input = String::new();
let n: usize = read_one_line(&mut reader, &mut input).
trim().parse().unwrap();
for _i in 0..n{
let inp: u64 = read_one_line(&mut reader, &mut input).
trim().parse().unwrap();
println!("{:?}", find_winner(inp));
}
return;
}
with the bigger win probably being I/O (might even be sufficient in itself).
Don't forget to also apply #John's advices, this way you'll be allocation-free in your main loop!

Let &mut syntax

It is possible to make the following binding in Rust:
let &mut a = &mut 5;
But what does it mean exactly? For example, let a = &mut 5 creates an immutable binding of type &mut i32, let mut a = &mut 5 creates a mutable binding of type &mut i32. What about let &mut?

An easy way to test the type of something is to assign it to the wrong type:
let _: () = a;
In this case the value is an "integral variable", or a by-value integer. It is not mutable (as testing with a += 1 shows).
This is because you are using destructuring syntax. You are pattern matching your &mut 5 against an &mut _, much like if you wrote
match &mut 5 { &mut a => {
// rest of code
} };
Thus you are adding a mutable reference and immediately dereferencing it.
To bind a mutable reference to a value instead, you can do
let ref mut a = 5;
This is useful in destructuring to take references to multiple inner values.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Random string generator with minimal allocations - random

Related

How to generate an array of random bytes from a seed? [duplicate]

How do I partially sort a Vec or slice?

Why can't I mutably borrow a primitive from an enum?

Speedup counter game

Let &mut syntax

Categories

Resources