Why is the efficient example of BufRead in the Rust book, efficient? - performance

The rust book gives two (relevant here) examples of how to use BufRead.
They first give a "beginner friendly" example, before going onto a more "Efficient method".
The beginner friendly example reads a file line by line:
use std::fs::File;
use std::io::{ self, BufRead, BufReader };
fn read_lines(filename: String) -> io::Lines<BufReader<File>> {
// Open the file in read-only mode.
let file = File::open(filename).unwrap();
// Read the file line by line, and return an iterator of the lines of the file.
return io::BufReader::new(file).lines();
}
fn main() {
// Stores the iterator of lines of the file in lines variable.
let lines = read_lines("./hosts".to_string());
// Iterate over the lines of the file, and in this case print them.
for line in lines {
println!("{}", line.unwrap());
}
}
The "efficient method" does nearly the same:
use std::fs::File;
use std::io::{self, BufRead};
use std::path::Path;
fn main() {
// File hosts must exist in current path before this produces output
if let Ok(lines) = read_lines("./hosts") {
// Consumes the iterator, returns an (Optional) String
for line in lines {
if let Ok(ip) = line {
println!("{}", ip);
}
}
}
}
// The output is wrapped in a Result to allow matching on errors
// Returns an Iterator to the Reader of the lines of the file.
fn read_lines<P>(filename: P) -> io::Result<io::Lines<io::BufReader<File>>>
where P: AsRef<Path>, {
let file = File::open(filename)?;
Ok(io::BufReader::new(file).lines())
}
The rust book states for the latter:
This process is more efficient than creating a String in memory especially working with larger files.
While the latter is slightly cleaner, using if let instead of unwrap, why is it more efficient to return a Result?
I assume that once we unwrap the iterator in the second example (in if let Ok(lines) = read_lines("./hosts")), that performance wise it should be identical to the first example.
Why does it differ then?
Why does the iterator in the second example return a result each time?

Related

Rust returns integer when doing usize.clone()?

I have the following code that takes usize in an enum and I wanted to iterate on that usize. When I pass usize directly to the for loop, I get compilation error "expected Integer but found &usize. However, when I clone the usize, the for loop works.
Up on looking the documentation, the clone() method is expected to return usize as well. Is this code working because the clone method gives ownership to the for loop but the original size variable is passed by reference ?
pub enum Command {
Uppercase,
Trim,
Append(usize),
}
fn some_fun(command: Command, string: String) {
match command {
Command::Append(size)=> {
let mut str = string.clone();
let s = size.clone();
for i in 0..s {
str.push_str("bar");
}
}
}
For a range expression, you need values, not references. The type of size ends up being a reference due to "match ergonomics". You don't show the expression you are matching on, but it's likely the type of your match value is &Command. If you add an & at the beginning of your pattern, i.e. &Command::Append(size), the type of size will be usize, and iterating over 0..size should work fine.
Yes. Iterating over ranges requires values, not references. However, since usize is Copy, it is better to just dereference: for i in 0..*size.

Does not using references hurt performance in rust?

In Rust, I often write code like the transform function below. I have some immutable input and return a newly allocated output:
// the transform function doesn't modify inp. instead, it returns something new.
fn transform(inp: String) -> String { // with reference: inp: &String
let buf: Vec<_> = inp.as_bytes().iter().map(|x| x + 1).collect();
String::from_utf8_lossy(&buf).to_string()
}
// using the transform function
fn main() {
let msg: String = String::from("HAL9000");
println!("{}", msg);
println!("{}", transform(msg)); // with reference: &msg
}
Does it hurt performance if I don't use a reference for the inp parameter to the transform function? Or does the compiler recognize that inp, because it's not mutable, can be reused/referenced? Put another way: Is it worth the extra effort to declare immutable input parameters as references, like in C++, to avoid copying huge data structures?

"Recycling" items in iterators for better performance

I have a file that contains multiple instances of some complex data type (think of a trajectory of events). The API to read this file is written in C and I don't have much control over it. To expose it to Rust, I implemented the following interface:
// a single event read from the file
struct Event {
a: u32,
b: f32,
}
// A handle to the file used for I/O
struct EventFile;
impl EventFile {
fn open() -> Result<EventFile, Error> {
unimplemented!()
}
// read the next step of the trajectory into event
fn read(&self, event: &mut Event) -> Result<(), Error> {
event.a = unimplemented!();
event.b = unimplemented!();
}
}
To access the file contents, I could call the read function until it returns an Err similar to this:
let event_file = EventFile::open();
let mut event = Event::new();
let mut result = event_file.read(&mut event);
while let Ok(_) = result {
println!("{:?}", event);
result = event_file.read(&mut event);
}
Because event is reused for each call of read, there's no repeated allocation/deallocation of memory which hopefully results in some performance improvement (the event struct is much bigger in the actual implementation).
Now, It would be nice to be able to access this data through an iterator. However, to my understanding, this means that I have to create a new instance of Event each time the iterator yields - because I cannot reuse the event inside with an iterator. And this will hurt the performance:
struct EventIterator {
event_file: EventFile,
}
impl Iterator for EventIterator {
type Item = Event;
fn next(&mut self) -> Option<Event> {
let mut event = Event::new(); // costly allocation
let result = self.event_file.read(&mut event);
match result {
Ok(_) => Some(event),
Err(_) => None,
}
}
}
let it = EventIterator { event_file };
it.map(|event| unimplemented!())
Is there a way to somehow "recycle" or "reuse" events inside the iterator? Or is this a concept that is simply not transferable to Rust and I have to live with worse performance using iterators in this case?
You can "recycle" items between iterations by wrapping the Item in a reference counter. The idea here is that if the caller keeps the item around between iterations, the iterator allocates a new object and returns that new object. If the item is dropped by the caller before the next iteration begins, the item is recycled. This is ensured by std::rc::Rc::get_mut(), which will only return a reference if the reference-count is exactly 1.
This has the downside that your Iterator yields Rc<Foo> instead of Foo. There is also the added code-complexity and (maybe) some runtime-cost due to the reference-counting (which may get elided completely if the compiler can prove that).
You will, therefore, need to measure if this actually gets you a performance win. Allocating a new object on every single iteration may seem costly, but allocators are good at this...
Something to the tune of
use std::rc::Rc;
#[derive(Default)]
struct FoobarIterator {
item: Rc<String>,
}
impl Iterator for FoobarIterator {
type Item = Rc<String>;
fn next(&mut self) -> Option<Self::Item> {
let item = match Rc::get_mut(&mut self.item) {
Some(item) => {
// This path is only taken if the caller
// did not keep the item around
// so we are the only reference-holder!
println!("Item is re-used!");
item
},
None => {
// Let go of the item (the caller gets to keep it)
// and create a new one
println!("Creating new item!");
self.item = Rc::new(String::new());
Rc::get_mut(&mut self.item).unwrap()
}
};
// Create the item, possible reusing the same allocation...
item.clear();
item.push('a');
Some(Rc::clone(&self.item))
}
}
fn main() {
// This will only print "Item is re-used"
// because `item` is dropped before the next cycle begins
for item in FoobarIterator::default().take(5) {
println!("{}", item);
}
// This will allocate new objects every time
// because the Vec retains ownership.
let _: Vec<_> = FoobarIterator::default().take(5).collect();
}
The compiler (or LLVM) will most likely employ return value optimization in this case, so you do not need to prematurely optimize by yourself.
See this Godbolt example, particularly lines 43 to 47. My comprehension of Assembly is limited, but it seems that next() simply writes the Event value to the memory passed by the caller via a pointer (initially in rdi). In subsequent loop iterations this memory place can be reused.
Note that you get a much longer assembly output (which I did not analyze in depth) if you compile without the -O flag (e.g. when building in the "debug" mode as opposed to "release").

Wrapping RefCell and Rc in a struct type

I would like to have a struct which has a writable field, but explicitly borrowable:
struct App<W: Clone<BorrowMut<Write>>> {
stdout: W,
}
... so it can internally use it:
impl<W: Clone<BorrowMut<Write>>> App<W> {
fn hello(&mut self) -> Result<()> {
Rc::clone(&self.stdout).borrow_mut().write(b"world\n")?;
Ok(())
}
}
I tried to pass it a cursor and then use it:
let mut cursor = Rc::new(RefCell::new(Cursor::new(vec![0])));
let mut app = App { stdout: cursor };
app.hello().expect("failed to write");
let mut line = String::new();
Rc::clone(&cursor).borrow_mut().read_line(&mut line).unwrap();
Rust barks:
error[E0107]: wrong number of type arguments: expected 0, found 1
--> src/bin/play.rs:6:21
|
6 | struct App<W: Clone<BorrowMut<Write>>> {
| ^^^^^^^^^^^^^^^^ unexpected type argument
My end goal: pass stdin, stdout and stderr to an App struct. In fn main, these would be real stdin/stdout/stderr. In tests, these could be cursors. Since I need to access these outside of App (e.g. in tests), I need multiple owners (thus Rc) and runtime mutable borrow (thus RefCount).
How can I implement this?
This isn't how you apply multiple constraints to a type parameter. Instead you use the + operator, like this: <W: Clone + Write + BorrowMut>
But, if you want BorrowMut to be an abstraction for RefCell, it won't work. The borrow_mut method of RefCell is not part of any trait so you will need to depend on RefCell directly in your data structure:
struct App<W: Clone + Write> {
stdout: Rc<RefCell<W>>,
}
Having said that, it's considered best practice not to put unneeded constraints on a struct. You can actually leave them off here, and just mention them on the impl later.
struct App<W> {
stdout: Rc<RefCell<W>>,
}
In order to access the contents of a Rc, you need to dereference with *. This can be a bit tricky in your case because there is a blanket impl of BorrowMut, which means that Rc has a different borrow_mut, which you definitely don't want.
impl<W: Clone + Write> App<W> {
fn hello(&mut self) -> Result<()> {
(*self.stdout).borrow_mut().write(b"world\n")?;
Ok(())
}
}
Again, when you use this, you'll need to dereference the Rc:
let cursor = Rc::new(RefCell::new(Cursor::new(vec![0])));
let mut app = App { stdout: cursor.clone() };
app.hello().expect("failed to write");
let mut line = String::new();
let mut cursor = (&*cursor).borrow_mut();
// move to the beginning or else there's nothing to read
cursor.set_position(0);
cursor.read_line(&mut line).unwrap();
println!("result = {:?}", line);
Also, notice that the Rc was cloned into the cursor. Otherwise it would be moved and you couldn't use it again later.

Rust: How to specify lifetimes in closure arguments?

I'm writing a parser generator as a project to learn rust, and I'm running into something I can't figure out with lifetimes and closures. Here's my simplified case (sorry it's as complex as it is, but I need to have the custom iterator in the real version and it seems to make a difference in the compiler's behavior):
Playpen link: http://is.gd/rRm2aa
struct MyIter<'stat, T:Iterator<&'stat str>>{
source: T
}
impl<'stat, T:Iterator<&'stat str>> Iterator<&'stat str> for MyIter<'stat, T>{
fn next(&mut self) -> Option<&'stat str>{
self.source.next()
}
}
struct Scanner<'stat,T:Iterator<&'stat str>>{
input: T
}
impl<'main> Scanner<'main, MyIter<'main,::std::str::Graphemes<'main>>>{
fn scan_literal(&'main mut self) -> Option<String>{
let mut token = String::from_str("");
fn get_chunk<'scan_literal,'main>(result:&'scan_literal mut String,
input: &'main mut MyIter<'main,::std::str::Graphemes<'main>>)
-> Option<&'scan_literal mut String>{
Some(input.take_while(|&chr| chr != "\"")
.fold(result, |&mut acc, chr|{
acc.push_str(chr);
&mut acc
}))
}
get_chunk(&mut token,&mut self.input);
println!("token is {}", token);
Some(token)
}
}
fn main(){
let mut scanner = Scanner{input:MyIter{source:"\"foo\"".graphemes(true)}};
scanner.scan_literal();
}
There are two problems I know of here. First, I have to shadow the 'main lifetime in the get_chunk function (I tried using the one in the impl, but the compiler complains that 'main is undefined inside get_chunk). I think it will still work out because the call to get_chunk later will match the 'main from the impl with the 'main from get_chunk, but I'm not sure that's right.
The second problem is that the &mut acc inside the closure needs to have a lifetime of 'scan_literal in order to work like I want it to (accumulating characters until the first " is encountered for this example). I can't add an explicit lifetime to &mut acc though, and the compiler says its lifetime is limited to the closure itself, and thus I can't return the reference to use in the next iteration of fold. I've gotten the function to compile and run in various other ways, but I don't understand what the problem is here.
My main question is: Is there any way to explicitly specify the lifetime of an argument to a closure? If not, is there a better way to accumulate the string using fold without doing multiple copies?
First, about lifetimes. Functions defined inside other functions are static, they are not connected with their outside code in any way. Consequently, their lifetime parameters are completely independent. You don't want to use 'main as a lifetime parameter for get_chunk() because it will shadow the outer 'main lifetime and give nothing but confusion.
Next, about closures. This expression:
|&mut acc, chr| ...
very likely does not what you really think it does. Closure/function arguments allow irrefutable patterns in them, and & have special meaning in patterns. Namely, it dereferences the value it is matched against, and assigns its identifier to this dereferenced value:
let x: int = 10i;
let p: &int = &x;
match p {
&y => println!("{}", y) // prints 10
}
You can think of & in a pattern as an opposite to & in an expression: in an expression it means "take a reference", in a pattern it means "remove the reference".
mut, however, does not belong to & in patterns; it belongs to the identifier and means that the variable with this identifier is mutable, i.e. you should write not
|&mut acc, chr| ...
but
|& mut acc, chr| ...
You may be interested in this RFC which is exactly about this quirk in the language syntax.
It looks like that you want to do a very strange thing, I'm not sure I understand where you're getting at. It is very likely that you are confusing different string kinds. First of all, you should read the official guide which explains ownership and borrowing and when to use them (you may also want to read the unfinished ownership guide; it will soon get into the main documentation tree), and then you should read strings guide.
Anyway, your problem can be solved in much simpler and generic way:
#[deriving(Clone)]
struct MyIter<'s, T: Iterator<&'s str>> {
source: T
}
impl<'s, T: Iterator<&'s str>> Iterator<&'s str> for MyIter<'s, T>{
fn next(&mut self) -> Option<&'s str>{ // '
self.source.next()
}
}
#[deriving(Clone)]
struct Scanner<'s, T: Iterator<&'s str>> {
input: T
}
impl<'m, T: Iterator<&'m str>> Scanner<'m, T> { // '
fn scan_literal(&mut self) -> Option<String>{
fn get_chunk<'a, T: Iterator<&'a str>>(input: T) -> Option<String> {
Some(
input.take_while(|&chr| chr != "\"")
.fold(String::new(), |mut acc, chr| {
acc.push_str(chr);
acc
})
)
}
let token = get_chunk(self.input.by_ref());
println!("token is {}", token);
token
}
}
fn main(){
let mut scanner = Scanner{
input: MyIter {
source: "\"foo\"".graphemes(true)
}
};
scanner.scan_literal();
}
You don't need to pass external references into the closure; you can generate a String directly in fold() operation. I also generified your code and made it more idiomatic.
Note that now impl for Scanner also works with arbitrary iterators returning &str. It is very likely that you want to write this instead of specializing Scanner to work only with MyIter with Graphemes inside it. by_ref() operation turns &mut I where I is an Iterator<T> into J, where J is an Iterator<T>. It allows further chaining of iterators even if you only have a mutable reference to the original iterator.
By the way, your code is also incomplete; it will only return Some("") because the take_while() will stop at the first quote and won't scan further. You should rewrite it to take initial quote into account.

Resources