Does not using references hurt performance in rust? - performance

In Rust, I often write code like the transform function below. I have some immutable input and return a newly allocated output:
// the transform function doesn't modify inp. instead, it returns something new.
fn transform(inp: String) -> String { // with reference: inp: &String
let buf: Vec<_> = inp.as_bytes().iter().map(|x| x + 1).collect();
String::from_utf8_lossy(&buf).to_string()
}
// using the transform function
fn main() {
let msg: String = String::from("HAL9000");
println!("{}", msg);
println!("{}", transform(msg)); // with reference: &msg
}
Does it hurt performance if I don't use a reference for the inp parameter to the transform function? Or does the compiler recognize that inp, because it's not mutable, can be reused/referenced? Put another way: Is it worth the extra effort to declare immutable input parameters as references, like in C++, to avoid copying huge data structures?

Related

generic callback with data

There is already a very popular question about this topic but I don;t fully understand the answer.
The goal is:
I need a list (read a Vec) of "function pointers" that modify data stored elsewhere in a program. The simplest example I can come up with are callbacks to be called when a key is pressed. So when any key is pressed, all functions passed to the object will be called in some order.
Reading the answer, it is not clear to me how I would be able to make such a list. It sounds like I would need to restrict the type of the callback to something known, else I don't know how you would be able to make an array of it.
It's also not clear to me how to store the data pointers/references.
Say I have
struct Processor<CB>
where
CB: FnMut(),
{
callback: CB,
}
Like the answer suggests, I can't make an array of processors, can I? since each Processor is technically a different type depending on the generic isntantiation.
Indeed, you can't make a vector of processors. Usually, closures all have different, innominable types. What you want instead are trait objects, which allow you to have dynamic dispatch of callback calls. Since those are not Sized, you'd probably want to put them in a Box. The final type is Vec<Box<dyn FnMut()>>.
fn add_callback(list: &mut Vec<Box<dyn FnMut()>>, cb: impl FnMut() + 'static) {
list.push(Box::new(cb))
}
fn run_callback(list: &mut [Box<dyn FnMut()>]) {
for cb in list {
cb()
}
}
see the playground
If you do like that, however, you might have some issues with the lifetimes (because your either force to move-in everything, or only modify values that life for 'static, which isn't very convenient. Instead, the following might be better
#[derive(Default)]
struct Producer<'a> {
list: Vec<Box<dyn FnMut() + 'a>>,
}
impl<'a> Producer<'a> {
fn add_callback(&mut self, cb: impl FnMut() + 'a) {
self.list.push(Box::new(cb))
}
fn run_callbacks(&mut self) {
for cb in &mut self.list {
cb()
}
}
}
fn callback_1() {
println!("Hello!");
}
fn main() {
let mut modified = 0;
let mut prod = Producer::default();
prod.add_callback(callback_1);
prod.add_callback(
|| {
modified += 1;
println!("World!");
}
);
prod.run_callbacks();
drop(prod);
println!("{}", modified);
}
see the playground
Just a few things to note:
You manually have to drop the producer, otherwise Rust will complain that it will be dropped at the end of the scope, but it contains (through the closure) an exclusive reference to modified, which is not ok since I try to read it.
Current, run_callbacks take a &mut self, because we only require for a FnMut. If you wanted it to be only a &self, you'd need to replace FnMut with Fn, which means the callbacks can still modify things outside of them, but not inside.
Yes, all closures are differents type, so if you want to have a vec of different closure you will need to make them trait objects. This can be archieve with Box<dyn Trait> (or any smart pointer). Box<dyn FnMut()> implements FnMut(), so you can have Processor<Box<dyn FnMut()>> and can make a vec of them, and call the callbacks on them: playground

Want to know about automatic drop of data that no one references

I don't think the garbace collection of rust is fully explained with only the scope of the ownership.
I have googled it, and this is what I've got.
[ temporary data ]
If you reference temporary data, lifetime of the data differs by the expression where it comes : at the end of the scope, or at the end of the expression.
Look for the further explanation in here :
https://doc.bccnsoft.com/docs/rust-1.36.0-docs-html/reference/expressions.html#temporary-lifetimes
[ reassignment of variables ]
let a = String::from("first");
a = String::from("second");
In above case, first string data automatically drops while second assignment.
However, I couldn't search for more information since then.
In my prediction,
Fields of structures and indexes of arrays may be considered as "an independent variable" so that changing them to be considered as an reassignment of variable.
struct A {
a: String,
b: String
}
let mut x = A {
a: String::from("first"),
b: String::from("second")
}
x.a = String::from("reassignment"); // first string drops here
Also, we all know that if a variable drops, it drops all of its contents. (as below)
{
let a = vec!(String::from("first"), String::from("second"));
} // all the strings are dropped here.
OK. Then what about more complicated stuffs??
Box<T> or HashMap<String, i32> ... etc... which is data that consumes ownership of anyother data.
What if we change its inner data??
Is it the same act as reassigning a field of structure??
I wonder if they're just complex structures or totally different objects.
Is there any other rules of "auto drop of data" I should know?
If you want to known more and observe on automatic dropping then I recommend implementing Drop trait for a type. Rust doesn't have any garbage collector but the compiler does that job of adding instructions to cleanup the objects when they go out of scope.
Check this Rust RAII
If you implement Drop trait and use logging or printing in drop fn then you known when it goes out of scope and reclaimed. In below example observe when the drop method of Data struct gets called when the object goes out of scope.
use std::collections::HashMap;
#[derive(Debug)]
struct Data{
val: String
}
impl Drop for Data {
fn drop(&mut self) {
println!("Dropping for {:?}", self.val);
}
}
#[async_std::main]
async fn main() -> std::io::Result<()> {
{
let mut dataref = Data{val: "d0".to_string()};
{
let data = Data{val: "d1".to_string()};
let data2 = Data{val: "d2".to_string()};
dataref = data2;
println!("scope marker 1");
}
println!("scope marker 2");
let mut map = HashMap::new();
map.insert("k1".to_string(), Data{val:"v1".to_string()});
println!("Before chaning for k1");
map.insert("k1".to_string(), Data{val:"v1".to_string()});
println!("Changed k1");
}
Ok(())
}

"Recycling" items in iterators for better performance

I have a file that contains multiple instances of some complex data type (think of a trajectory of events). The API to read this file is written in C and I don't have much control over it. To expose it to Rust, I implemented the following interface:
// a single event read from the file
struct Event {
a: u32,
b: f32,
}
// A handle to the file used for I/O
struct EventFile;
impl EventFile {
fn open() -> Result<EventFile, Error> {
unimplemented!()
}
// read the next step of the trajectory into event
fn read(&self, event: &mut Event) -> Result<(), Error> {
event.a = unimplemented!();
event.b = unimplemented!();
}
}
To access the file contents, I could call the read function until it returns an Err similar to this:
let event_file = EventFile::open();
let mut event = Event::new();
let mut result = event_file.read(&mut event);
while let Ok(_) = result {
println!("{:?}", event);
result = event_file.read(&mut event);
}
Because event is reused for each call of read, there's no repeated allocation/deallocation of memory which hopefully results in some performance improvement (the event struct is much bigger in the actual implementation).
Now, It would be nice to be able to access this data through an iterator. However, to my understanding, this means that I have to create a new instance of Event each time the iterator yields - because I cannot reuse the event inside with an iterator. And this will hurt the performance:
struct EventIterator {
event_file: EventFile,
}
impl Iterator for EventIterator {
type Item = Event;
fn next(&mut self) -> Option<Event> {
let mut event = Event::new(); // costly allocation
let result = self.event_file.read(&mut event);
match result {
Ok(_) => Some(event),
Err(_) => None,
}
}
}
let it = EventIterator { event_file };
it.map(|event| unimplemented!())
Is there a way to somehow "recycle" or "reuse" events inside the iterator? Or is this a concept that is simply not transferable to Rust and I have to live with worse performance using iterators in this case?
You can "recycle" items between iterations by wrapping the Item in a reference counter. The idea here is that if the caller keeps the item around between iterations, the iterator allocates a new object and returns that new object. If the item is dropped by the caller before the next iteration begins, the item is recycled. This is ensured by std::rc::Rc::get_mut(), which will only return a reference if the reference-count is exactly 1.
This has the downside that your Iterator yields Rc<Foo> instead of Foo. There is also the added code-complexity and (maybe) some runtime-cost due to the reference-counting (which may get elided completely if the compiler can prove that).
You will, therefore, need to measure if this actually gets you a performance win. Allocating a new object on every single iteration may seem costly, but allocators are good at this...
Something to the tune of
use std::rc::Rc;
#[derive(Default)]
struct FoobarIterator {
item: Rc<String>,
}
impl Iterator for FoobarIterator {
type Item = Rc<String>;
fn next(&mut self) -> Option<Self::Item> {
let item = match Rc::get_mut(&mut self.item) {
Some(item) => {
// This path is only taken if the caller
// did not keep the item around
// so we are the only reference-holder!
println!("Item is re-used!");
item
},
None => {
// Let go of the item (the caller gets to keep it)
// and create a new one
println!("Creating new item!");
self.item = Rc::new(String::new());
Rc::get_mut(&mut self.item).unwrap()
}
};
// Create the item, possible reusing the same allocation...
item.clear();
item.push('a');
Some(Rc::clone(&self.item))
}
}
fn main() {
// This will only print "Item is re-used"
// because `item` is dropped before the next cycle begins
for item in FoobarIterator::default().take(5) {
println!("{}", item);
}
// This will allocate new objects every time
// because the Vec retains ownership.
let _: Vec<_> = FoobarIterator::default().take(5).collect();
}
The compiler (or LLVM) will most likely employ return value optimization in this case, so you do not need to prematurely optimize by yourself.
See this Godbolt example, particularly lines 43 to 47. My comprehension of Assembly is limited, but it seems that next() simply writes the Event value to the memory passed by the caller via a pointer (initially in rdi). In subsequent loop iterations this memory place can be reused.
Note that you get a much longer assembly output (which I did not analyze in depth) if you compile without the -O flag (e.g. when building in the "debug" mode as opposed to "release").

Is there a way to make an immutable reference mutable?

I want to solve a leetcode question in Rust (Remove Nth Node From End of List). My solution uses two pointers to find the Node to remove:
#[derive(PartialEq, Eq, Debug)]
pub struct ListNode {
pub val: i32,
pub next: Option<Box<ListNode>>,
}
impl ListNode {
#[inline]
fn new(val: i32) -> Self {
ListNode { next: None, val }
}
}
// two-pointer sliding window
impl Solution {
pub fn remove_nth_from_end(head: Option<Box<ListNode>>, n: i32) -> Option<Box<ListNode>> {
let mut dummy_head = Some(Box::new(ListNode { val: 0, next: head }));
let mut start = dummy_head.as_ref();
let mut end = dummy_head.as_ref();
for _ in 0..n {
end = end.unwrap().next.as_ref();
}
while end.as_ref().unwrap().next.is_some() {
end = end.unwrap().next.as_ref();
start = start.unwrap().next.as_ref();
}
// TODO: fix the borrow problem
// ERROR!
// start.unwrap().next = start.unwrap().next.unwrap().next.take();
dummy_head.unwrap().next
}
}
I borrow two immutable references of the linked-list. After I find the target node to remove, I want to drop one and make the other mutable. Each of the following code examples leads to a compiler error:
// ERROR
drop(end);
let next = start.as_mut().unwrap.next.take();
// ERROR
let mut node = *start.unwrap()
I don't know if this solution is possible to be written in Rust. If I can make an immutable reference mutable, how do I do it? If not, is there anyway to implement the same logic while making the borrow checker happy?
The correct answer is that you should not be doing this. This is undefined behavior, and breaks many assumptions made by the compiler when compiling your program.
However, it is possible to do this. Other people have also mentioned why this is not a good idea, but they haven't actually shown what the code to do something like this would look like. Even though you should not do this, this is what it would look like:
unsafe fn very_bad_function<T>(reference: &T) -> &mut T {
let const_ptr = reference as *const T;
let mut_ptr = const_ptr as *mut T;
&mut *mut_ptr
}
Essentially, you convert a constant pointer into a mutable one, and then make the mutable pointer into a reference.
Here's one example why this is very unsafe and unpredictable:
fn main() {
static THIS_IS_IMMUTABLE: i32 = 0;
unsafe {
let mut bad_reference = very_bad_function(&THIS_IS_IMMUTABLE);
*bad_reference = 5;
}
}
If you run this... you get a segfault. What happened? Essentially, you invalidated memory rules by trying to write to an area of memory that had been marked as immutable. Essentially, when you use a function like this, you break the trust the compiler has made with you to not mess with constant memory.
Which is why you should never use this, especially in a public API, because if someone passes an innocent immutable reference to your function, and your function mutates it, and the reference is to an area of memory not meant to be written to, you'll get a segfault.
In short: don't try to cheat the borrow checker. It's there for a reason.
EDIT: In addition to the reasons I just mentioned on why this is undefined behavior, another reason is breaking reference aliasing rules. That is, since you can have both a mutable and immutable reference to a variable at the same time with this, it causes loads of problems when you pass them in separately to the same function, which assumes the immutable and mutable references are unique. Read this page from the Rust docs for more information about this.
Is there a way to make an immutable reference mutable?
No.
You could write unsafe Rust code to force the types to line up, but the code would actually be unsafe and lead to undefined behavior. You do not want this.
For your specific problem, see:
How to remove the Nth node from the end of a linked list?
How to use two pointers to iterate a linked list in Rust?

Rust: How to specify lifetimes in closure arguments?

I'm writing a parser generator as a project to learn rust, and I'm running into something I can't figure out with lifetimes and closures. Here's my simplified case (sorry it's as complex as it is, but I need to have the custom iterator in the real version and it seems to make a difference in the compiler's behavior):
Playpen link: http://is.gd/rRm2aa
struct MyIter<'stat, T:Iterator<&'stat str>>{
source: T
}
impl<'stat, T:Iterator<&'stat str>> Iterator<&'stat str> for MyIter<'stat, T>{
fn next(&mut self) -> Option<&'stat str>{
self.source.next()
}
}
struct Scanner<'stat,T:Iterator<&'stat str>>{
input: T
}
impl<'main> Scanner<'main, MyIter<'main,::std::str::Graphemes<'main>>>{
fn scan_literal(&'main mut self) -> Option<String>{
let mut token = String::from_str("");
fn get_chunk<'scan_literal,'main>(result:&'scan_literal mut String,
input: &'main mut MyIter<'main,::std::str::Graphemes<'main>>)
-> Option<&'scan_literal mut String>{
Some(input.take_while(|&chr| chr != "\"")
.fold(result, |&mut acc, chr|{
acc.push_str(chr);
&mut acc
}))
}
get_chunk(&mut token,&mut self.input);
println!("token is {}", token);
Some(token)
}
}
fn main(){
let mut scanner = Scanner{input:MyIter{source:"\"foo\"".graphemes(true)}};
scanner.scan_literal();
}
There are two problems I know of here. First, I have to shadow the 'main lifetime in the get_chunk function (I tried using the one in the impl, but the compiler complains that 'main is undefined inside get_chunk). I think it will still work out because the call to get_chunk later will match the 'main from the impl with the 'main from get_chunk, but I'm not sure that's right.
The second problem is that the &mut acc inside the closure needs to have a lifetime of 'scan_literal in order to work like I want it to (accumulating characters until the first " is encountered for this example). I can't add an explicit lifetime to &mut acc though, and the compiler says its lifetime is limited to the closure itself, and thus I can't return the reference to use in the next iteration of fold. I've gotten the function to compile and run in various other ways, but I don't understand what the problem is here.
My main question is: Is there any way to explicitly specify the lifetime of an argument to a closure? If not, is there a better way to accumulate the string using fold without doing multiple copies?
First, about lifetimes. Functions defined inside other functions are static, they are not connected with their outside code in any way. Consequently, their lifetime parameters are completely independent. You don't want to use 'main as a lifetime parameter for get_chunk() because it will shadow the outer 'main lifetime and give nothing but confusion.
Next, about closures. This expression:
|&mut acc, chr| ...
very likely does not what you really think it does. Closure/function arguments allow irrefutable patterns in them, and & have special meaning in patterns. Namely, it dereferences the value it is matched against, and assigns its identifier to this dereferenced value:
let x: int = 10i;
let p: &int = &x;
match p {
&y => println!("{}", y) // prints 10
}
You can think of & in a pattern as an opposite to & in an expression: in an expression it means "take a reference", in a pattern it means "remove the reference".
mut, however, does not belong to & in patterns; it belongs to the identifier and means that the variable with this identifier is mutable, i.e. you should write not
|&mut acc, chr| ...
but
|& mut acc, chr| ...
You may be interested in this RFC which is exactly about this quirk in the language syntax.
It looks like that you want to do a very strange thing, I'm not sure I understand where you're getting at. It is very likely that you are confusing different string kinds. First of all, you should read the official guide which explains ownership and borrowing and when to use them (you may also want to read the unfinished ownership guide; it will soon get into the main documentation tree), and then you should read strings guide.
Anyway, your problem can be solved in much simpler and generic way:
#[deriving(Clone)]
struct MyIter<'s, T: Iterator<&'s str>> {
source: T
}
impl<'s, T: Iterator<&'s str>> Iterator<&'s str> for MyIter<'s, T>{
fn next(&mut self) -> Option<&'s str>{ // '
self.source.next()
}
}
#[deriving(Clone)]
struct Scanner<'s, T: Iterator<&'s str>> {
input: T
}
impl<'m, T: Iterator<&'m str>> Scanner<'m, T> { // '
fn scan_literal(&mut self) -> Option<String>{
fn get_chunk<'a, T: Iterator<&'a str>>(input: T) -> Option<String> {
Some(
input.take_while(|&chr| chr != "\"")
.fold(String::new(), |mut acc, chr| {
acc.push_str(chr);
acc
})
)
}
let token = get_chunk(self.input.by_ref());
println!("token is {}", token);
token
}
}
fn main(){
let mut scanner = Scanner{
input: MyIter {
source: "\"foo\"".graphemes(true)
}
};
scanner.scan_literal();
}
You don't need to pass external references into the closure; you can generate a String directly in fold() operation. I also generified your code and made it more idiomatic.
Note that now impl for Scanner also works with arbitrary iterators returning &str. It is very likely that you want to write this instead of specializing Scanner to work only with MyIter with Graphemes inside it. by_ref() operation turns &mut I where I is an Iterator<T> into J, where J is an Iterator<T>. It allows further chaining of iterators even if you only have a mutable reference to the original iterator.
By the way, your code is also incomplete; it will only return Some("") because the take_while() will stop at the first quote and won't scan further. You should rewrite it to take initial quote into account.

Resources