Why enum value binding in Rust is so slow?

Why enum value binding in Rust is so slow? - performance

I am currently learning Rust because I wanted to use it in project that requires a very high performance. I initially fallen in love with enums but then I started to evaluate their performance and I have found something that is really boggling me. Here is an example:
use std::time::{Instant};
pub enum MyEnum<'a> {
V1,
V2(&'a MyEnum<'a>),
V3,
}
impl MyEnum<'_> {
pub fn eval(&self) -> i64 {
match self {
MyEnum::V1 => 1,
MyEnum::V2(_) => 2,
MyEnum::V3 => 3,
}
}
pub fn eval2(&self) -> i64 {
match self {
MyEnum::V1 => 1,
MyEnum::V2(a) => a.eval2(),
MyEnum::V3 => 3,
}
}
}
fn main() {
const EXAMPLES: usize = 10000000;
let en = MyEnum::V1{};
let start = Instant::now();
let mut sum = 0;
for _ in 0..EXAMPLES {
sum += en.eval()
}
println!("enum without fields func call sum: {} at {:?}", sum, start.elapsed());
let start = Instant::now();
let mut sum = 0;
for _ in 0..EXAMPLES {
sum += en.eval2()
}
println!("enum with field func call sum: {} at {:?}", sum, start.elapsed());
}
Results I get:
enum without fields func call sum: 10000000 at 100ns
enum with field func call sum: 10000000 at 6.3425ms
eval function should execute exactly the same instructions as eval2 for V1 enum but it's working about 60x slower. Why is this happening?

Viewing the assembly, your first loop is optimized entirely into a single mov 10000000 instruction (that is, the compiler does something equivalent to sum += EXAMPLES) while the second is not. I do not know why the second loop is not constant-optimized as heavily.

I see no difference in performance, as one would expect.
$ ./test
enum without fields func call sum: 10000000 at 307.543596ms
enum with field func call sum: 10000000 at 312.196195ms
$ rustc --version
rustc 1.43.1 (8d69840ab 2020-05-04)
$ uname -a
Darwin Windhund.local 18.7.0 Darwin Kernel Version 18.7.0: Mon Feb 10 21:08:45 PST 2020; root:xnu-4903.278.28~1/RELEASE_X86_64 x86_64 i386 MacBookPro15,2 Darwin
One problem might be the use of simple "wall clock" time for benchmarking. This simple count of how much time passed is vulnerable to anything else running which might consume resources. Anti-virus, a web browser, any program. Instead, use benchmark tests.

Related

Slow Rust Performance for a SSH Log Parsing project

I'm a student who is interested in learning rust. For a class project I wrote a rust script that parses a SSH log file, which specifically captured dates and IP addresses in the log.
When I first finished the project, the script took 3 minutes to run through a log file with 655147 entries. After major optimizations I got the processing down to 30 seconds. This is fine, but other students' python programs did it in 3 seconds. So I know it's definitely my fault and I want to know how to write it better. Could someone show me where I went wrong?
Here are the structs I made for reference:
struct DateLogins {
date: NaiveDate,
success: i32,
failure: i32,
}
struct IpAuth {
success: i32,
failure: i32,
first_attempt: NaiveDateTime,
successful_attempt: NaiveDateTime,
failed_reverse: bool,
break_in_attempt: bool,
ip_addr: String,
}
struct MinedReport {
start_date: NaiveDateTime,
end_date: NaiveDateTime,
total_success: i32,
total_failure: i32,
total_addrs: i32,
login_attempts: HashMap<String, DateLogins>,
unique_addrs: HashMap<String, IpAuth>,
}
And here is the main processing logic:
lazy_static! {
static ref IP_RGX: Regex = Regex::new(r"(\d{1,3}\.\d{1,3}\.\d{1,3}\d\.\d{1,3})").unwrap(); // regex for capturing the IP address of a message.
static ref LOGIN_GOOD_RGX: Regex = Regex::new(r"Accepted password").unwrap(); // regex for successful login attempts -- for total, date, and IP address.
static ref LOGIN_FAIL_RGX: Regex = Regex::new(r"Failed password").unwrap(); // regex for failed login attempts -- for total, date, and IP address
static ref REVERSE_LOOK_RGX: Regex = Regex::new(r"reverse mapping checking getaddrinfo").unwrap(); // regex for a failed reverse lookup
static ref BREAK_IN_RGX: Regex = Regex::new(r"POSSIBLE BREAK-IN ATTEMPT!").unwrap(); // regex for a break in attempt -- for IP address
}
fn main() {
let start: std::time::Instant;
let duration: std::time::Duration;
let file: File = File::open("./SSH.log").expect("Could not open log file!");
let reader: BufReader<File> = BufReader::new(file);
let origin_date: NaiveDateTime = NaiveDateTime::parse_from_str("0000 01 01 00:00:00", "%Y %m %d %H:%M:%S").expect("Could not parse start time!");
let mut report: MinedReport = MinedReport::new(origin_date.clone(), origin_date.clone());
start = Instant::now();
for line in reader.lines() {
let l = line.expect("Could not read a line!");
parse_line(l, &mut report, origin_date);
}
duration = start.elapsed();
store_log(report, "./report.txt");
println!("Total time elapsed: {:?}\n", duration);
}
// parses the values in each line
fn parse_line(line: String, report: &mut MinedReport, origin_date: NaiveDateTime) {
let time: &str = &line[..15];
let message: &str = &line[15..];
// parse the time to DateTime. Store things in the report.
let date_time: NaiveDateTime = parse_time(time, report, origin_date).unwrap();
// parse the IP address of the line. Store things in the report.
parse_ip(message, report, origin_date, date_time);
}
// parses the time value for each line.
fn parse_time(time_cap: &str, report: &mut MinedReport, origin_date: NaiveDateTime) -> Result<NaiveDateTime, ParseError> {
// add a random year just to have a string.
let time_str: String = time_cap.to_string();
let full_date: String;
let date_time: NaiveDateTime;
let date: NaiveDate;
let d: String;
// add a year. Move to the next year if it's january.
// (I know this is a bad solution, but the only months in the log are Dec and Jan, with no year)
if &time_str[..3] == "Dec" {
full_date = format!("{}{}", "0000 ", time_cap); // No year given, set it to 0000
} else {
full_date = format!("{}{}", "0001 ", time_cap); // No year given, set it to 0001
}
// get date time and date only.
date_time = NaiveDateTime::parse_from_str(&full_date, "%Y %b %d %H:%M:%S").unwrap();
date = date_time.date();
d = date.to_string();
if report.start_date == origin_date {
report.start_date = date_time;
}
report.end_date = date_time;
if !report.login_attempts.contains_key(&d) {
report.login_attempts.insert(d, DateLogins::new(date));
}
Ok(date_time)
}
fn parse_ip(message: &str, report: &mut MinedReport, origin_date: NaiveDateTime, current_date: NaiveDateTime) {
for ip in IP_RGX.captures_iter(message) {
let new_ip: String = String::from(&ip[1]);
let ip_clone: String = new_ip.clone();
let date: NaiveDate = current_date.date();
let d: String = date.to_string();
report.total_addrs += 1;
if !report.unique_addrs.contains_key(&new_ip) {
report.unique_addrs.insert(new_ip, IpAuth::new(ip_clone.clone(), origin_date.clone(), current_date.clone()));
}
let login_date = report.login_attempts.get_mut(&d).unwrap();
let unique_ip = report.unique_addrs.get_mut(&ip_clone).unwrap();
if LOGIN_FAIL_RGX.is_match(message) {
report.total_failure += 1;
login_date.failure += 1;
unique_ip.failure += 1;
} else if LOGIN_GOOD_RGX.is_match(message) {
report.total_success += 1;
login_date.success += 1;
unique_ip.success += 1;
unique_ip.successful_attempt = current_date.clone();
} else {
if REVERSE_LOOK_RGX.is_match(message) {
unique_ip.failed_reverse = true;
}
if BREAK_IN_RGX.is_match(message) {
unique_ip.break_in_attempt = true;
}
}
}
}
Like I said I'm new to rust, and programming in general, so there may be something I just don't know about. I already switched to using a hash map from a vector, but maybe there's something better I can use? I don't know. I have also wondered if the chrono or regex crates are my issue here and maybe there's a faster alternative. Either way, thanks to anyone who tries to understand and correct my code!

How to optimize brainf*ck instructions

I'm trying to write an optimisation feature for my brainf*ck interpreter.
It basically combines same instructions into 1 instruction.
I wrote this function but It doesn't work properly:
pub fn optimize_multiple(instructions: &Vec<Instruction>) -> Vec<OptimizedInstruction> {
let mut opt: Vec<OptimizedInstruction> = Vec::new();
let mut last_instruction = instructions.get(0).unwrap();
let mut last_count = 0;
for instruction in instructions.iter() {
if instruction == last_instruction {
last_count += 1;
}
else if let Instruction::Loop(i) = instruction {
opt.push(OptimizedInstruction::Loop(optimize_multiple(i)));
last_count = 1;
}
else {
opt.push(OptimizedInstruction::new(last_instruction.clone(), last_count));
last_instruction = instruction;
last_count = 1;
}
}
opt
}
Here's the OptimizedInstruction enum and the "new" method:
(The Instruction::Loop hand is just a place holder, I didn't used it)
#[derive(Debug, Clone, PartialEq, Eq)]
pub enum OptimizedInstruction {
IncrementPointer(usize),
DecrementPointer(usize),
Increment(usize),
Decrement(usize),
Write,
Read,
Loop(Vec<OptimizedInstruction>),
}
impl OptimizedInstruction {
pub fn new(instruction: Instruction, count: usize) -> OptimizedInstruction {
match instruction {
Instruction::IncrementPointer => OptimizedInstruction::IncrementPointer(count),
Instruction::DecrementPointer => OptimizedInstruction::DecrementPointer(count),
Instruction::Increment => OptimizedInstruction::Increment(count),
Instruction::Decrement => OptimizedInstruction::Decrement(count),
Instruction::Write => OptimizedInstruction::Write,
Instruction::Read => OptimizedInstruction::Read,
Instruction::Loop(_i) => OptimizedInstruction::Loop(Vec::new()),
}
}
}
I ran it with this input:
++-[+-++]
But it gave me this output:
[Increment(2), Loop([Increment(1), Decrement(1)])]
Insted of this:
[Increment(2), Decrement(1), Loop([Increment(1), Decrement(1), Increment(2)])]
I've been trying to solve it for 2 days and still, I don't have any idea about why it doesn't work.
~ Derin

First off, just to rain a little on your parade I just want to point out that Wilfred already made a brainf*ck compiler in Rust that can compile to a native binary through LLVM (bfc). If you are getting stuck, you may want to check his implementation to see how he does it. If you ignore the LLVM part, it isn't too difficult to read through and he has a good approach.
When broken down into its core components, this problem revolves around merging two elements together. The most elegant way I have come across to solve this is use an iterator with a merge function. I wrote an example of what I imagine that would look like below. I shortened some of the variable names since they were a bit long but the general idea is the same. The merge function has a very simple job. When given two elements, attempt to merge them into a single new element. The iterator then handles putting them through that function and returning items once they can no longer be merged. A sort of optional fold if you will.
pub struct MergeIter<I: Iterator, F> {
iter: I,
func: Box<F>,
held: Option<<I as Iterator>::Item>,
}
impl<I, F> Iterator for MergeIter<I, F>
where
I: Iterator,
F: FnMut(&<I as Iterator>::Item, &<I as Iterator>::Item) -> Option<<I as Iterator>::Item>,
{
type Item = <I as Iterator>::Item;
fn next(&mut self) -> Option<Self::Item> {
let mut first = match self.held.take() {
Some(v) => v,
None => self.iter.next()?,
};
loop {
let second = match self.iter.next() {
Some(v) => v,
None => return Some(first),
};
match (self.func)(&first, &second) {
// If merge succeeds, attempt to merge again
Some(v) => first = v,
// If merge fails, store second value for next iteration and return result
None => {
self.held = Some(second);
return Some(first);
}
}
}
}
}
pub trait ToMergeIter: Iterator {
fn merge<F>(self, func: F) -> MergeIter<Self, F>
where
Self: Sized,
F: FnMut(&Self::Item, &Self::Item) -> Option<Self::Item>;
}
impl<I: Sized + Iterator> ToMergeIter for I {
fn merge<F>(self, func: F) -> MergeIter<Self, F>
where
Self: Sized,
F: FnMut(&Self::Item, &Self::Item) -> Option<Self::Item>,
{
MergeIter {
iter: self,
func: Box::new(func),
held: None,
}
}
}
Then we can apply this process recursively to get our result. Here is a brief example of what that would look like. It isn't quite as memory efficient since it makes a new Vec each time, but it makes the process of specifying what instructions to merge way easier for you and helps make your work easier to read/debug.
pub fn optimize(instructions: Vec<Instruction>) -> Vec<Instruction> {
instructions
.into_iter()
// Recursively call function on loops
.map(|instruction| match instruction {
Instruction::Loop(x) => Instruction::Loop(optimize(x)),
x => x,
})
// Merge elements using the merge iter
.merge(|a, b| {
// State if any two given elements should be merged together or not.
match (a, b) {
(Instruction::IncPtr(x), Instruction::IncPtr(y)) => {
Some(Instruction::IncPtr(x + y))
}
(Instruction::DecPtr(x), Instruction::DecPtr(y)) => {
Some(Instruction::DecPtr(x + y))
}
(Instruction::Increment(x), Instruction::Increment(y)) => {
Some(Instruction::Increment(x + y))
}
(Instruction::Decrement(x), Instruction::Decrement(y)) => {
Some(Instruction::Decrement(x + y))
}
// Etc...
_ => None,
}
})
// Collect results to return
.collect()
}
playground link

Trait bound not satisfied building an ndarray from a tuple trait

I am very new to Rust. Currently, I am looking for a way to generate a matrix with dimension based on a tuple.
use itertools::zip;
use ndarray::Array;
fn main() {
let mut layer_width: [u64; 4] = [784, 512, 256, 10]; //in- & output layers of the nn
init_nn(&mut layer_width);
}
fn init_nn(layer_width: &mut [u64; 4]) {
for (layer_in, layer_out) in zip(&layer_width[.. 4], &layer_width[1 ..]) {
let mut params = Array::zeros((layer_in, layer_out)); //error
}
}
The iteration through the zip works fine and i get output for either layer_in and _out, but when creating the the matrix I get the following error:
the trait bound `(&i64, &i64): ndarray::Dimension` is not satisfied
the trait `ndarray::Dimension` is not implemented for `(&i64, &i64)`
note: required because of the requirements on the impl of `ndarray::IntoDimension` for `(&i64, &i64)`rustc(E0277)
main.rs(13, 39): the trait `ndarray::Dimension` is not implemented for `(&i64, &i64)`
I very much need help from the community on this issue here. Many thanks.

The issue is you're passing in (&i64, &i64) to Array::zeros(), which is not valid. Instead, you can pass in (usize, usize). After fixing that, the code will still not compile, as we haven't given the compiler any way of knowing the element type, but that error will resolve itself once you do something like assign to the array.
Here's working code:
use itertools::zip;
use ndarray::Array;
fn main() {
let mut layer_width: [usize; 4] = [784, 512, 256, 10]; // in- & output layers of the nn
init_nn(&mut layer_width);
}
fn init_nn(layer_width: &mut [usize; 4]) {
for (&layer_in, &layer_out) in zip(&layer_width[..4], &layer_width[1..]) {
let mut params = Array::zeros((layer_in, layer_out));
// Dummy assignment so the compiler can infer the element type
params[(0, 0)] = 1;
}
}
Notice the added & in for (&layer_in, &layer_out). The output of the zip() function is (&usize, &usize), so we are using destructuring to dereference the references into plain usizes. Equivalently, you could have done Array::zeros((*layer_in, *layer_out)).

Generate random float from Standard Normal distribution and multiply by another float

Trying to generate a random number from the Standard Normal distribution. Need to multiply the value by 0.1 to get the number range i'm looking for. I tried using the documentation from rand_dist you can find here: https://docs.rs/rand_distr/0.3.0/rand_distr/struct.StandardNormal.html
My Cargo.toml is the following:
[package]
name = "test_rng"
version = "0.1.0"
authors = ["Jack"]
edition = "2018"
# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html
[dependencies]
rand = "0.7.3"
rand_distr = "0.3.0"
The starting rust code is the example provided in the rand_dist docs from above:
use rand::prelude::*;
use rand_distr::StandardNormal;
fn main() {
let val: f64 = thread_rng().sample(StandardNormal);
println!("{}", val);
}
When I run this it works as expected and the output is:
C:\Users\Jack\Desktop\projects\software\rust\test_rng>cargo run
Compiling test_rng v0.1.0 (C:\Users\Jack\Desktop\projects\software\rust\test_rng)
Finished dev [unoptimized + debuginfo] target(s) in 2.11s
Running `target\debug\test_rng.exe`
0.48398855288705356
C:\Users\Jack\Desktop\projects\software\rust\test_rng>
This is where I'm hitting an issue, when I try to multiply the number by 0.1 in the following code I get the resulting error:
fn main() {
let val: f64 = 0.1 * thread_rng().sample(StandardNormal);
println!("{}", val);
}
C:\Users\Jack\Desktop\projects\software\rust\test_rng>cargo run
Compiling test_rng v0.1.0 (C:\Users\Jack\Desktop\projects\software\rust\test_rng)
error[E0284]: type annotations needed: cannot satisfy `<f64 as std::ops::Mul<_>>::Output == f64`
--> src\main.rs:5:24
|
5 | let val: f64 = 0.1 * thread_rng().sample(StandardNormal);
| ^ cannot satisfy `<f64 as std::ops::Mul<_>>::Output == f64`
error: aborting due to previous error
For more information about this error, try `rustc --explain E0284`.
error: could not compile `test_rng`.
To learn more, run the command again with --verbose.
C:\Users\Jack\Desktop\projects\software\rust\test_rng>
I tried to change 0.1 to 0.1_f64 but that gave the same error.
I tried to convert random number to f64 (which it should already be) with as f64 but that resulted in the following:
fn main() {
let val: f64 = 0.1 * thread_rng().sample(StandardNormal) as f64;
println!("{}", val);
}
C:\Users\Jack\Desktop\projects\software\rust\test_rng>cargo run
Compiling test_rng v0.1.0 (C:\Users\Jack\Desktop\projects\software\rust\test_rng)
error[E0282]: type annotations needed
--> src\main.rs:5:39
|
5 | let val: f64 = 0.1 * thread_rng().sample(StandardNormal) as f64;
| ^^^^^^ cannot infer type for type parameter `T` declared on the associated function `sample`
|
= note: type must be known at this point
help: consider specifying the type arguments in the method call
|
5 | let val: f64 = 0.1 * thread_rng().sample::<T, D>(StandardNormal) as f64;
| ^^^^^^^^
error: aborting due to previous error
For more information about this error, try `rustc --explain E0282`.
error: could not compile `test_rng`.
To learn more, run the command again with --verbose.
C:\Users\Jack\Desktop\projects\software\rust\test_rng>
Thought it was a precedence issue so I tried wrapping second half in parenthesis but got the same error.
I can get it to work by making the variable mutable and separating the line into two operations like the following:
fn main() {
let mut val: f64 = thread_rng().sample(StandardNormal);
val *= 0.1;
println!("{}", val);
}
C:\Users\Jack\Desktop\projects\software\rust\test_rng>cargo run
Compiling test_rng v0.1.0 (C:\Users\Jack\Desktop\projects\software\rust\test_rng)
Finished dev [unoptimized + debuginfo] target(s) in 1.62s
Running `target\debug\test_rng.exe`
-0.034993448117065
C:\Users\Jack\Desktop\projects\software\rust\test_rng>
Any idea what is going on with the multiplication of the f64 with the output of the random number?

You can use the following:
fn main() {
let val: f64 = 0.1 * thread_rng().sample::<f64,_>(StandardNormal);
println!("{}", val);
}
This explicitly forces the sample function to return a f64. What was likely going on is that the rust type inference doesn't realize that the RHS needs to be f64, though I'm not sure exactly why.
Edit:
I think some the blame here goes to the definition of sample, in that it uses an unrestricted type parameter. An MVE for this would be:
pub trait Marker{}
impl Marker for f64{}
impl Marker for f32{}
fn does_not_work<T>() -> T{
unimplemented!()
}
fn does_work<T: Marker>() -> T{
unimplemented!()
}
fn main() {
let val: f64 = 0.1 * does_work();
let val: f64 = 0.1 * does_not_work();
}
It's somewhat understandable that the compiler can't infer types for does_not_work, b/c how is it meant to know about every possible type that could multiply with f64? However of we restrict things to only certain types with a trait, then the list of possible types becomes finite and type inference works again.

Get the current memory usage of a variable? [duplicate]

I notice that Rust's test has a benchmark mode that will measure execution time in ns/iter, but I could not find a way to measure memory usage.
How would I implement such a benchmark? Let us assume for the moment that I only care about heap memory at the moment (though stack usage would also certainly be interesting).
Edit: I found this issue which asks for the exact same thing.

You can use the jemalloc allocator to print the allocation statistics. For example,
Cargo.toml:
[package]
name = "stackoverflow-30869007"
version = "0.1.0"
edition = "2018"
[dependencies]
jemallocator = "0.5"
jemalloc-sys = {version = "0.5", features = ["stats"]}
libc = "0.2"
src/main.rs:
use libc::{c_char, c_void};
use std::ptr::{null, null_mut};
#[global_allocator]
static ALLOC: jemallocator::Jemalloc = jemallocator::Jemalloc;
extern "C" fn write_cb(_: *mut c_void, message: *const c_char) {
print!("{}", String::from_utf8_lossy(unsafe {
std::ffi::CStr::from_ptr(message as *const i8).to_bytes()
}));
}
fn mem_print() {
unsafe { jemalloc_sys::malloc_stats_print(Some(write_cb), null_mut(), null()) }
}
fn main() {
mem_print();
let _heap = Vec::<u8>::with_capacity (1024 * 128);
mem_print();
}
In a single-threaded program that should allow you to get a good measurement of how much memory a structure takes. Just print the statistics before the structure is created and after and calculate the difference.
(The "total:" of "allocated" in particular.)
You can also use Valgrind (Massif) to get the heap profile. It works just like with any other C program. Make sure you have debug symbols enabled in the executable (e.g. using debug build or custom Cargo configuration). You can use, say, http://massiftool.sourceforge.net/ to analyse the generated heap profile.
(I verified this to work on Debian Jessie, in a different setting your mileage may vary).
(In order to use Rust with Valgrind you'll probably have to switch back to the system allocator).
P.S. There is now also a better DHAT.
jemalloc can be told to dump a memory profile. You can probably do this with the Rust FFI but I haven't investigated this route.

As far as measuring data structure sizes is concerned, this can be done fairly easily through the use of traits and a small compiler plugin. Nicholas Nethercote in his article Measuring data structure sizes: Firefox (C++) vs. Servo (Rust) demonstrates how it works in Servo; it boils down to adding #[derive(HeapSizeOf)] (or occasionally a manual implementation) to each type you care about. This is a good way of allowing precise checking of where memory is going, too; it is, however, comparatively intrusive as it requires changes to be made in the first place, where something like jemalloc’s print_stats() doesn’t. Still, for good and precise measurements, it’s a sound approach.

Currently, the only way to get allocation information is the alloc::heap::stats_print(); method (behind #![feature(alloc)]), which calls jemalloc's print_stats().
I'll update this answer with further information once I have learned what the output means.
(Note that I'm not going to accept this answer, so if someone comes up with a better solution...)

Now there is jemalloc_ctl crate which provides convenient safe typed API. Add it to your Cargo.toml:
[dependencies]
jemalloc-ctl = "0.3"
jemallocator = "0.3"
Then configure jemalloc to be global allocator and use methods from jemalloc_ctl::stats module:
jemalloc_ctl::stats::allocated
jemalloc_ctl::stats::resident
Here is official example:
use std::thread;
use std::time::Duration;
use jemalloc_ctl::{stats, epoch};
#[global_allocator]
static ALLOC: jemallocator::Jemalloc = jemallocator::Jemalloc;
fn main() {
loop {
// many statistics are cached and only updated when the epoch is advanced.
epoch::advance().unwrap();
let allocated = stats::allocated::read().unwrap();
let resident = stats::resident::read().unwrap();
println!("{} bytes allocated/{} bytes resident", allocated, resident);
thread::sleep(Duration::from_secs(10));
}
}

There's a neat little solution someone put together here: https://github.com/discordance/trallocator/blob/master/src/lib.rs
use std::alloc::{GlobalAlloc, Layout};
use std::sync::atomic::{AtomicU64, Ordering};
pub struct Trallocator<A: GlobalAlloc>(pub A, AtomicU64);
unsafe impl<A: GlobalAlloc> GlobalAlloc for Trallocator<A> {
unsafe fn alloc(&self, l: Layout) -> *mut u8 {
self.1.fetch_add(l.size() as u64, Ordering::SeqCst);
self.0.alloc(l)
}
unsafe fn dealloc(&self, ptr: *mut u8, l: Layout) {
self.0.dealloc(ptr, l);
self.1.fetch_sub(l.size() as u64, Ordering::SeqCst);
}
}
impl<A: GlobalAlloc> Trallocator<A> {
pub const fn new(a: A) -> Self {
Trallocator(a, AtomicU64::new(0))
}
pub fn reset(&self) {
self.1.store(0, Ordering::SeqCst);
}
pub fn get(&self) -> u64 {
self.1.load(Ordering::SeqCst)
}
}
Usage: (from: https://www.reddit.com/r/rust/comments/8z83wc/comment/e2h4dp9)
// needed for Trallocator struct (as written, anyway)
#![feature(integer_atomics, const_fn_trait_bound)]
use std::alloc::System;
#[global_allocator]
static GLOBAL: Trallocator<System> = Trallocator::new(System);
fn main() {
GLOBAL.reset();
println!("memory used: {} bytes", GLOBAL.get());
{
let mut vec = vec![1, 2, 3, 4];
for i in 5..20 {
vec.push(i);
println!("memory used: {} bytes", GLOBAL.get());
}
for v in vec {
println!("{}", v);
}
}
// For some reason this does not print zero =/
println!("memory used: {} bytes", GLOBAL.get());
}
I've just started using it, and it seems to work well! Straight-forward, realtime, requires no external packages, and doesn't require changing your base memory allocator.
It's also nice that, because it's intercepting the allocate/deallocate calls, you should be able to add custom logic if desired (eg. if memory usage goes above X, print the stack-trace to see what's triggering the allocations) -- although I haven't tried this yet.
I also haven't yet tested to see how much overhead this approach adds. If someone does a test for this, let me know!

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Why enum value binding in Rust is so slow? - performance

Viewing the assembly, your first loop is optimized entirely into a single mov 10000000 instruction (that is, the compiler does something equivalent to sum += EXAMPLES) while the second is not. I do not know why the second loop is not constant-optimized as heavily.

Related

Slow Rust Performance for a SSH Log Parsing project

How to optimize brainf*ck instructions

Trait bound not satisfied building an ndarray from a tuple trait

Generate random float from Standard Normal distribution and multiply by another float

Get the current memory usage of a variable? [duplicate]

Categories

Resources