I saw in the Rust book that you can define two different variables with the same name:
let hello = "Hello";
let hello = "Goodbye";
println!("My variable hello contains: {}", hello);
This prints out:
My variable hello contains: Goodbye
What happens with the first hello? Does it get freed up? How could I access it?
I know it would be bad to name two variables the same, but if this happens by accident because I declare it 100 lines below it could be a real pain.
Rust does not have a garbage collector.
Does Rust free up the memory of overwritten variables?
Yes, otherwise it'd be a memory leak, which would be a pretty terrible design decision. The memory is freed when the variable is reassigned:
struct Noisy;
impl Drop for Noisy {
fn drop(&mut self) {
eprintln!("Dropped")
}
}
fn main() {
eprintln!("0");
let mut thing = Noisy;
eprintln!("1");
thing = Noisy;
eprintln!("2");
}
0
1
Dropped
2
Dropped
what happens with the first hello
It is shadowed.
Nothing "special" happens to the data referenced by the variable, other than the fact that you can no longer access it. It is still dropped when the variable goes out of scope:
struct Noisy;
impl Drop for Noisy {
fn drop(&mut self) {
eprintln!("Dropped")
}
}
fn main() {
eprintln!("0");
let thing = Noisy;
eprintln!("1");
let thing = Noisy;
eprintln!("2");
}
0
1
2
Dropped
Dropped
See also:
Is the resource of a shadowed variable binding freed immediately?
I know it would be bad to name two variables the same
It's not "bad", it's a design decision. I would say that using shadowing like this is a bad idea:
let x = "Anna";
println!("User's name is {}", x);
let x = 42;
println!("The tax rate is {}", x);
Using shadowing like this is reasonable to me:
let name = String::from(" Vivian ");
let name = name.trim();
println!("User's name is {}", name);
See also:
Why do I need rebinding/shadowing when I can have mutable variable binding?
but if this happens by accident because I declare it 100 lines below it could be a real pain.
Don't have functions that are so big that you "accidentally" do something. That's applicable in any programming language.
Is there a way of cleaning memory manually?
You can call drop:
eprintln!("0");
let thing = Noisy;
drop(thing);
eprintln!("1");
let thing = Noisy;
eprintln!("2");
0
Dropped
1
2
Dropped
However, as oli_obk - ker points out, the stack memory taken by the variable will not be freed until the function exits, only the resources taken by the variable.
All discussions of drop require showing its (very complicated) implementation:
fn drop<T>(_: T) {}
What if I declare the variable in a global scope outside of the other functions?
Global variables are never freed, if you can even create them to start with.
There is a difference between shadowing and reassigning (overwriting) a variable when it comes to drop order.
All local variables are normally dropped when they go out of scope, in reverse order of declaration (see The Rust Programming Language's chapter on Drop). This includes shadowed variables. It's easy to check this by wrapping the value in a simple wrapper struct that prints something when it (the wrapper) is dropped (just before the value itself is dropped):
use std::fmt::Debug;
struct NoisyDrop<T: Debug>(T);
impl<T: Debug> Drop for NoisyDrop<T> {
fn drop(&mut self) {
println!("dropping {:?}", self.0);
}
}
fn main() {
let hello = NoisyDrop("Hello");
let hello = NoisyDrop("Goodbye");
println!("My variable hello contains: {}", hello.0);
}
prints the following (playground):
My variable hello contains: Goodbye
dropping "Goodbye"
dropping "Hello"
That's because a new let binding in a scope does not overwrite the previous binding, so it's just as if you had written
let hello1 = NoisyDrop("Hello");
let hello2 = NoisyDrop("Goodbye");
println!("My variable hello contains: {}", hello2.0);
Notice that this behavior is different from the following, superficially very similar, code (playground):
fn main() {
let mut hello = NoisyDrop("Hello");
hello = NoisyDrop("Goodbye");
println!("My variable hello contains: {}", hello.0);
}
which not only drops them in the opposite order, but drops the first value before printing the message! That's because when you assign to a variable (instead of shadowing it with a new one), the original value gets dropped first, before the new value is moved in.
I began by saying that local variables are "normally" dropped when they go out of scope. Because you can move values into and out of variables, the analysis of figuring out when variables need to be dropped can sometimes not be done until runtime. In such cases, the compiler actually inserts code to track "liveness" and drop those values when necessary, so you can't accidentally cause leaks by overwriting a value. (However, it's still possible to safely leak memory by calling mem::forget, or by creating an Rc-cycle with internal mutability.)
See also
What's the semantic of assignment in Rust?
There are a few things to note here:
In the program you gave, when compiling it, the "Hello" string does not appear in the binary. This might be a compiler optimization because the first value is not used.
fn main(){
let hello = "Hello xxxxxxxxxxxxxxxx"; // Added for searching more easily.
let hello = "Goodbye";
println!("My variable hello contains: {}", hello);
}
Then test:
$ rustc ./stackoverflow.rs
$ cat stackoverflow | grep "xxx"
# No results
$ cat stackoverflow | grep "Goodbye"
Binary file (standard input) matches
$ cat stackoverflow | grep "My variable hello contains"
Binary file (standard input) matches
Note that if you print the first value, the string does appear in the binary though, so this proves that this is a compiler optimization to not store unused values.
Another thing to consider is that both values assigned to hello (i.e. "Hello" and "Goodbye") have a &str type. This is a pointer to a string stored statically in the binary after compiling. An example of a dynamically generated string would be when you generate a hash from some data, like MD5 or SHA algorithms (the resulting string does not exist statically in the binary).
fn main(){
// Added the type to make it more clear.
let hello: &str = "Hello";
let hello: &str = "Goodbye";
// This is wrong (does not compile):
// let hello: String = "Goodbye";
println!("My variable hello contains: {}", hello);
}
This means that the variable is simply pointing to a location in the static memory. No memory gets allocated during runtime, nor gets freed. Even if the optimization mentioned above didn't exist (i.e. omit unused strings), only the memory address location pointed by hello would change, but the memory is still used by static strings.
The story would be different for a String type, and for that refer to the other answers.
Related
I am quite new to Rust. I'm trying to build a global cache using lru::LruCache and a RwLock for safety. It needs to be globally accessible based on my program's architecture.
//Size to take up 5MB
const CACHE_ENTRIES: usize = (GIBYTE as usize/ 200) / (BLOCK_SIZE);
pub type CacheEntry = LruCache<i32, Bytes>;
static mut CACHE : CacheEntry = LruCache::new(CACHE_ENTRIES);
lazy_static!{
static ref BLOCKCACHE: RwLock<CacheEntry> = RwLock::new(CACHE);
}
//this gets called from another setup function
async fn download_block(&self,
context: &BlockContext,
buf: &mut [u8],
count: u32, //bytes to read
offset: u64,
buf_index: u32
) -> Result<u32> {
let block_index = context.block_index;
let block_data : Bytes = match {BLOCKCACHE.read().unwrap().get(&block_index)} {
Some(data) => data.to_owned(),
None => download_block_from_remote(context).await.unwrap().to_owned(),
};
//the rest of the function does stuff with the block_data
}
async fn download_block_from_remote(context: &BlockContext) -> Result<Bytes>{
//code to download block data from remote into block_data
{BLOCKCACHE.write().unwrap().put(block_index, block_data.clone())};
Ok(block_data)
}
Right now I am getting an error on this line:
let block_data = match {BLOCKCACHE.read().unwrap().get(&block_index)} {
"cannot borrow as mutable" for the value inside the braces.
"help: trait DerefMut is required to modify through a dereference, but it is not implemented for std::sync::RwLockReadGuard<'_, LruCache<i32, bytes::Bytes>>"
I have gotten some other errors involving ownership and mutability, but I can't seem to get rid of this. Is anyone able to offer guidance on how to get this to work, or set me on the right path if it's just not possible/feasible?
The problem here is that LruCache::get() requires mutable access to the cache object. (Reason: it's a cache object, which has to change its internal state when querying things for the actual caching)
Therefore, if you use an RwLock, you need to use the write() method instead of the read() method.
That said, the fact that get() requires mutable access makes the entire RwLock pretty pointless, and I'd use a normal Mutex instead. There is very rarely the necessity to use an RwLock as it has more overhead compared to a simple Mutex.
I saw in the Rust book that you can define two different variables with the same name:
let hello = "Hello";
let hello = "Goodbye";
println!("My variable hello contains: {}", hello);
This prints out:
My variable hello contains: Goodbye
What happens with the first hello? Does it get freed up? How could I access it?
I know it would be bad to name two variables the same, but if this happens by accident because I declare it 100 lines below it could be a real pain.
Rust does not have a garbage collector.
Does Rust free up the memory of overwritten variables?
Yes, otherwise it'd be a memory leak, which would be a pretty terrible design decision. The memory is freed when the variable is reassigned:
struct Noisy;
impl Drop for Noisy {
fn drop(&mut self) {
eprintln!("Dropped")
}
}
fn main() {
eprintln!("0");
let mut thing = Noisy;
eprintln!("1");
thing = Noisy;
eprintln!("2");
}
0
1
Dropped
2
Dropped
what happens with the first hello
It is shadowed.
Nothing "special" happens to the data referenced by the variable, other than the fact that you can no longer access it. It is still dropped when the variable goes out of scope:
struct Noisy;
impl Drop for Noisy {
fn drop(&mut self) {
eprintln!("Dropped")
}
}
fn main() {
eprintln!("0");
let thing = Noisy;
eprintln!("1");
let thing = Noisy;
eprintln!("2");
}
0
1
2
Dropped
Dropped
See also:
Is the resource of a shadowed variable binding freed immediately?
I know it would be bad to name two variables the same
It's not "bad", it's a design decision. I would say that using shadowing like this is a bad idea:
let x = "Anna";
println!("User's name is {}", x);
let x = 42;
println!("The tax rate is {}", x);
Using shadowing like this is reasonable to me:
let name = String::from(" Vivian ");
let name = name.trim();
println!("User's name is {}", name);
See also:
Why do I need rebinding/shadowing when I can have mutable variable binding?
but if this happens by accident because I declare it 100 lines below it could be a real pain.
Don't have functions that are so big that you "accidentally" do something. That's applicable in any programming language.
Is there a way of cleaning memory manually?
You can call drop:
eprintln!("0");
let thing = Noisy;
drop(thing);
eprintln!("1");
let thing = Noisy;
eprintln!("2");
0
Dropped
1
2
Dropped
However, as oli_obk - ker points out, the stack memory taken by the variable will not be freed until the function exits, only the resources taken by the variable.
All discussions of drop require showing its (very complicated) implementation:
fn drop<T>(_: T) {}
What if I declare the variable in a global scope outside of the other functions?
Global variables are never freed, if you can even create them to start with.
There is a difference between shadowing and reassigning (overwriting) a variable when it comes to drop order.
All local variables are normally dropped when they go out of scope, in reverse order of declaration (see The Rust Programming Language's chapter on Drop). This includes shadowed variables. It's easy to check this by wrapping the value in a simple wrapper struct that prints something when it (the wrapper) is dropped (just before the value itself is dropped):
use std::fmt::Debug;
struct NoisyDrop<T: Debug>(T);
impl<T: Debug> Drop for NoisyDrop<T> {
fn drop(&mut self) {
println!("dropping {:?}", self.0);
}
}
fn main() {
let hello = NoisyDrop("Hello");
let hello = NoisyDrop("Goodbye");
println!("My variable hello contains: {}", hello.0);
}
prints the following (playground):
My variable hello contains: Goodbye
dropping "Goodbye"
dropping "Hello"
That's because a new let binding in a scope does not overwrite the previous binding, so it's just as if you had written
let hello1 = NoisyDrop("Hello");
let hello2 = NoisyDrop("Goodbye");
println!("My variable hello contains: {}", hello2.0);
Notice that this behavior is different from the following, superficially very similar, code (playground):
fn main() {
let mut hello = NoisyDrop("Hello");
hello = NoisyDrop("Goodbye");
println!("My variable hello contains: {}", hello.0);
}
which not only drops them in the opposite order, but drops the first value before printing the message! That's because when you assign to a variable (instead of shadowing it with a new one), the original value gets dropped first, before the new value is moved in.
I began by saying that local variables are "normally" dropped when they go out of scope. Because you can move values into and out of variables, the analysis of figuring out when variables need to be dropped can sometimes not be done until runtime. In such cases, the compiler actually inserts code to track "liveness" and drop those values when necessary, so you can't accidentally cause leaks by overwriting a value. (However, it's still possible to safely leak memory by calling mem::forget, or by creating an Rc-cycle with internal mutability.)
See also
What's the semantic of assignment in Rust?
There are a few things to note here:
In the program you gave, when compiling it, the "Hello" string does not appear in the binary. This might be a compiler optimization because the first value is not used.
fn main(){
let hello = "Hello xxxxxxxxxxxxxxxx"; // Added for searching more easily.
let hello = "Goodbye";
println!("My variable hello contains: {}", hello);
}
Then test:
$ rustc ./stackoverflow.rs
$ cat stackoverflow | grep "xxx"
# No results
$ cat stackoverflow | grep "Goodbye"
Binary file (standard input) matches
$ cat stackoverflow | grep "My variable hello contains"
Binary file (standard input) matches
Note that if you print the first value, the string does appear in the binary though, so this proves that this is a compiler optimization to not store unused values.
Another thing to consider is that both values assigned to hello (i.e. "Hello" and "Goodbye") have a &str type. This is a pointer to a string stored statically in the binary after compiling. An example of a dynamically generated string would be when you generate a hash from some data, like MD5 or SHA algorithms (the resulting string does not exist statically in the binary).
fn main(){
// Added the type to make it more clear.
let hello: &str = "Hello";
let hello: &str = "Goodbye";
// This is wrong (does not compile):
// let hello: String = "Goodbye";
println!("My variable hello contains: {}", hello);
}
This means that the variable is simply pointing to a location in the static memory. No memory gets allocated during runtime, nor gets freed. Even if the optimization mentioned above didn't exist (i.e. omit unused strings), only the memory address location pointed by hello would change, but the memory is still used by static strings.
The story would be different for a String type, and for that refer to the other answers.
Given the following simplified program:
#[macro_use] extern crate log;
extern crate ansi_term;
extern crate fern;
extern crate time;
extern crate threadpool;
extern crate id3;
mod logging;
use std::process::{exit, };
use ansi_term::Colour::{Yellow, Green};
use threadpool::ThreadPool;
use std::sync::mpsc::channel;
use std::path::{Path};
use id3::Tag;
fn main() {
logging::setup_logging();
let n_jobs = 2;
let files = vec!(
"/tmp/The Dynamics - Version Excursions/01-13- Move on Up.mp3",
"/tmp/The Dynamics - Version Excursions/01-09- Whole Lotta Love.mp3",
"/tmp/The Dynamics - Version Excursions/01-10- Feel Like Making Love.mp3"
);
let pool = ThreadPool::new(n_jobs);
let (tx, rx) = channel();
let mut counter = 0;
for file_ in files {
let file_ = Path::new(file_);
counter = counter + 1;
let tx = tx.clone();
pool.execute(move || {
debug!("sending {} from thread", Yellow.paint(counter.to_string()));
let tag = Tag::read_from_path(file_).unwrap();
let a_name = tag.artist().unwrap();
debug!("recursed file from: {} {}",
Green.paint(a_name), file_.display());
tx.send(".").unwrap();
// TODO amb: not working..
// tx.send(a_name).unwrap();
});
}
for value in rx.iter().take(counter) {
debug!("receiving {} from thread", Green.paint(value));
}
exit(0);
}
Everything works as expected, unless the one commented line (tx.send(a_name).unwrap();) is put back in. In that case I get the following error:
error: `tag` does not live long enough
let a_name = tag.artist().unwrap();
^~~
note: reference must be valid for the static lifetime...
note: ...but borrowed value is only valid for the block suffix following statement 1 at 39:58
let tag = Tag::read_from_path(file_).unwrap();
let a_name = tag.artist().unwrap();
debug!("recursed file from: {} {}",
Green.paint(a_name), file_.display());
...
Generally I understand what the compiler tells me, but I don't see a problem since the variable tag is defined inside of the closure block. The only problem that I can guess is, that the variable tx is cloned outside and therefore can collide with the lifetime of tag.
My goal is to put all the current logic in the thread-closure inside of the thread, since this is the "processing" I want to spread to multiple threads. How can I accomplish this, but still send some value to the longer existing tx?
I'm using the following Rust version:
$ rustc --version
rustc 1.9.0 (e4e8b6668 2016-05-18)
$ cargo --version
cargo 0.10.0-nightly (10ddd7d 2016-04-08)
a_name is &str borrowed from tag. Its lifetime is therefore bounded by tag. Sending non 'static references down a channel to another thread is unsafe. It refers to something on the threads stack which might not even exist anymore once the receiver tries to access it.
In your case you should promote a_name to an owned value of type String, which will be moved to the receiver thread.
tx.send(a_name.to_owned()).unwrap();
First off, I've read this question:
What's the right way to make a stack (or other dynamically resizable vector-like thing) in Rust?
The problem
The selected answer just tells the question asker to use the standard library instead of explaining the implementation, which is fine if my goal was to build something. Except I'm trying to learn about the implementation of a stack, while following a data structure textbook written for Java (Algorithms by Robert Sedgwick & Kevin Wayne), where they implement a stack via resizing an array (Page 136).
I'm in the process of implementing the resize method, and it turns out the size of the array needs to be a constant expression.
meta: are arrays in rust called slices?
use std::mem;
struct DynamicStack<T> {
length: uint,
internal: Box<[T]>,
}
impl<T> DynamicStack<T> {
fn new() -> DynamicStack<T> {
DynamicStack {
length: 0,
internal: box [],
}
}
fn resize(&mut self, new_size: uint) {
let mut temp: Box<[T, ..new_size]> = box unsafe { mem::uninitialized() };
// ^^ error: expected constant expr for array
// length: non-constant path in constant expr
// code for copying elements from self.internal
self.internal = temp;
}
}
For brevity the compiler error was this
.../src/lib.rs:51:23: 51:38 error: expected constant expr for array length: non-constant path in constant expr
.../src/lib.rs:51 let mut temp: Box<[T, ..new_size]> = box unsafe { mem::uninitialized() };
^~~~~~~~~~~~~~~
.../src/lib.rs:51:23: 51:38 error: expected constant expr for array length: non-constant path in constant expr
.../src/lib.rs:51 let mut temp: Box<[T, ..new_size]> = box unsafe { mem::uninitialized() };
^~~~~~~~~~~~~~~
The Question
Surely there is a way in rust to initialize an array with it's size determined at runtime (even if it's unsafe)? Could you also provide an explanation of what's going on in your answer?
Other attempts
I've considered it's probably possible to implement the stack in terms of
struct DynamicStack<T> {
length: uint,
internal: Box<Optional<T>>
}
But I don't want the overhead of matching optional value to remove the unsafe memory operations, but this still doesn't resolve the issue of unknown array sizes.
I also tried this (which doesn't even compile)
fn resize(&mut self, new_size: uint) {
let mut temp: Box<[T]> = box [];
let current_size = self.internal.len();
for i in range(0, current_size) {
temp[i] = self.internal[i];
}
for i in range(current_size, new_size) {
temp[i] = unsafe { mem::uninitialized() };
}
self.internal = temp;
}
And I got this compiler error
.../src/lib.rs:55:17: 55:21 error: cannot move out of dereference of `&mut`-pointer
.../src/lib.rs:55 temp[i] = self.internal[i];
^~~~
.../src/lib.rs:71:19: 71:30 error: cannot use `self.length` because it was mutably borrowed
.../src/lib.rs:71 self.resize(self.length * 2);
^~~~~~~~~~~
.../src/lib.rs:71:7: 71:11 note: borrow of `*self` occurs here
.../src/lib.rs:71 self.resize(self.length * 2);
^~~~
.../src/lib.rs:79:18: 79:22 error: cannot move out of dereference of `&mut`-pointer
.../src/lib.rs:79 let result = self.internal[self.length];
^~~~
.../src/lib.rs:79:9: 79:15 note: attempting to move value to here
.../src/lib.rs:79 let result = self.internal[self.length];
^~~~~~
.../src/lib.rs:79:9: 79:15 help: to prevent the move, use `ref result` or `ref mut result` to capture value by reference
.../src/lib.rs:79 let result = self.internal[self.length];
I also had a look at this, but it's been awhile since I've done any C/C++
How should you do pointer arithmetic in rust?
Surely there is a way in Rust to initialize an array with it's size determined at runtime?
No, Rust arrays are only able to be created with a size known at compile time. In fact, each tuple of type and size constitutes a new type! The Rust compiler uses that information to make optimizations.
Once you need a set of things determined at runtime, you have to add runtime checks to ensure that Rust's safety guarantees are always valid. For example, you can't access uninitialized memory (such as by walking off the beginning or end of a set of items).
If you truly want to go down this path, I expect that you are going to have to get your hands dirty with some direct memory allocation and unsafe code. In essence, you will be building a smaller version of Vec itself! To that end, you can check out the source of Vec.
At a high level, you will need to allocate chunks of memory big enough to hold N objects of some type. Then you can provide ways of accessing those elements, using pointer arithmetic under the hood. When you add more elements, you can allocate more space and move old values around. There are lots of nuanced things that may or may not come up, but it sounds like you are on the beginning of a fun journey!
Edit
Of course, you could choose to pretend that most of the methods of Vec don't even exist, and just use the ones that are analogs of Java's array. You'll still need to use Option to avoid uninitialized values though.
I want to open a file, replace some characters, and make some splits. Then I want to return the list of strings. however I get error: broken does not live long enough. My code works when it is in main, so it is only an issue with lifetimes.
fn tokenize<'r>(fp: &'r str) -> Vec<&'r str> {
let data = match File::open(&Path::new(fp)).read_to_string(){
Ok(n) => n,
Err(e) => fail!("couldn't read file: {}", e.desc)
};
let broken = data.replace("'", " ' ").replace("\"", " \" ").replace(" ", " ");
let mut tokens = vec![];
for t in broken.as_slice().split_str(" ").filter(|&x| *x != "\n"){
tokens.push(t)
}
return tokens;
}
How can I make the value returned by this function live in the scope of the caller?
The problem is that your function signature says "the result has the same lifetime as the input fp", but that's simply not true. The result contains references to data, which is allocated inside your function; it has nothing to do with fp! As it stands, data will cease to exist at the end of your function.
Because you're effectively creating new values, you can't return references; you need to transfer ownership of that data out of the function. There are two ways I can think of to do this, off the top of my head:
Instead of returning Vec<&str>, return Vec<String>, where each token is a freshly-allocated string.
Return data inside a wrapper type which implements the splitting logic. Then, you can have fn get_tokens(&self) -> Vec<&str>; the lifetime of the slices can be tied to the lifetime of the object which contains data.