I want to open a file, replace some characters, and make some splits. Then I want to return the list of strings. however I get error: broken does not live long enough. My code works when it is in main, so it is only an issue with lifetimes.
fn tokenize<'r>(fp: &'r str) -> Vec<&'r str> {
let data = match File::open(&Path::new(fp)).read_to_string(){
Ok(n) => n,
Err(e) => fail!("couldn't read file: {}", e.desc)
};
let broken = data.replace("'", " ' ").replace("\"", " \" ").replace(" ", " ");
let mut tokens = vec![];
for t in broken.as_slice().split_str(" ").filter(|&x| *x != "\n"){
tokens.push(t)
}
return tokens;
}
How can I make the value returned by this function live in the scope of the caller?
The problem is that your function signature says "the result has the same lifetime as the input fp", but that's simply not true. The result contains references to data, which is allocated inside your function; it has nothing to do with fp! As it stands, data will cease to exist at the end of your function.
Because you're effectively creating new values, you can't return references; you need to transfer ownership of that data out of the function. There are two ways I can think of to do this, off the top of my head:
Instead of returning Vec<&str>, return Vec<String>, where each token is a freshly-allocated string.
Return data inside a wrapper type which implements the splitting logic. Then, you can have fn get_tokens(&self) -> Vec<&str>; the lifetime of the slices can be tied to the lifetime of the object which contains data.
Related
I am quite new to Rust. I'm trying to build a global cache using lru::LruCache and a RwLock for safety. It needs to be globally accessible based on my program's architecture.
//Size to take up 5MB
const CACHE_ENTRIES: usize = (GIBYTE as usize/ 200) / (BLOCK_SIZE);
pub type CacheEntry = LruCache<i32, Bytes>;
static mut CACHE : CacheEntry = LruCache::new(CACHE_ENTRIES);
lazy_static!{
static ref BLOCKCACHE: RwLock<CacheEntry> = RwLock::new(CACHE);
}
//this gets called from another setup function
async fn download_block(&self,
context: &BlockContext,
buf: &mut [u8],
count: u32, //bytes to read
offset: u64,
buf_index: u32
) -> Result<u32> {
let block_index = context.block_index;
let block_data : Bytes = match {BLOCKCACHE.read().unwrap().get(&block_index)} {
Some(data) => data.to_owned(),
None => download_block_from_remote(context).await.unwrap().to_owned(),
};
//the rest of the function does stuff with the block_data
}
async fn download_block_from_remote(context: &BlockContext) -> Result<Bytes>{
//code to download block data from remote into block_data
{BLOCKCACHE.write().unwrap().put(block_index, block_data.clone())};
Ok(block_data)
}
Right now I am getting an error on this line:
let block_data = match {BLOCKCACHE.read().unwrap().get(&block_index)} {
"cannot borrow as mutable" for the value inside the braces.
"help: trait DerefMut is required to modify through a dereference, but it is not implemented for std::sync::RwLockReadGuard<'_, LruCache<i32, bytes::Bytes>>"
I have gotten some other errors involving ownership and mutability, but I can't seem to get rid of this. Is anyone able to offer guidance on how to get this to work, or set me on the right path if it's just not possible/feasible?
The problem here is that LruCache::get() requires mutable access to the cache object. (Reason: it's a cache object, which has to change its internal state when querying things for the actual caching)
Therefore, if you use an RwLock, you need to use the write() method instead of the read() method.
That said, the fact that get() requires mutable access makes the entire RwLock pretty pointless, and I'd use a normal Mutex instead. There is very rarely the necessity to use an RwLock as it has more overhead compared to a simple Mutex.
I saw in the Rust book that you can define two different variables with the same name:
let hello = "Hello";
let hello = "Goodbye";
println!("My variable hello contains: {}", hello);
This prints out:
My variable hello contains: Goodbye
What happens with the first hello? Does it get freed up? How could I access it?
I know it would be bad to name two variables the same, but if this happens by accident because I declare it 100 lines below it could be a real pain.
Rust does not have a garbage collector.
Does Rust free up the memory of overwritten variables?
Yes, otherwise it'd be a memory leak, which would be a pretty terrible design decision. The memory is freed when the variable is reassigned:
struct Noisy;
impl Drop for Noisy {
fn drop(&mut self) {
eprintln!("Dropped")
}
}
fn main() {
eprintln!("0");
let mut thing = Noisy;
eprintln!("1");
thing = Noisy;
eprintln!("2");
}
0
1
Dropped
2
Dropped
what happens with the first hello
It is shadowed.
Nothing "special" happens to the data referenced by the variable, other than the fact that you can no longer access it. It is still dropped when the variable goes out of scope:
struct Noisy;
impl Drop for Noisy {
fn drop(&mut self) {
eprintln!("Dropped")
}
}
fn main() {
eprintln!("0");
let thing = Noisy;
eprintln!("1");
let thing = Noisy;
eprintln!("2");
}
0
1
2
Dropped
Dropped
See also:
Is the resource of a shadowed variable binding freed immediately?
I know it would be bad to name two variables the same
It's not "bad", it's a design decision. I would say that using shadowing like this is a bad idea:
let x = "Anna";
println!("User's name is {}", x);
let x = 42;
println!("The tax rate is {}", x);
Using shadowing like this is reasonable to me:
let name = String::from(" Vivian ");
let name = name.trim();
println!("User's name is {}", name);
See also:
Why do I need rebinding/shadowing when I can have mutable variable binding?
but if this happens by accident because I declare it 100 lines below it could be a real pain.
Don't have functions that are so big that you "accidentally" do something. That's applicable in any programming language.
Is there a way of cleaning memory manually?
You can call drop:
eprintln!("0");
let thing = Noisy;
drop(thing);
eprintln!("1");
let thing = Noisy;
eprintln!("2");
0
Dropped
1
2
Dropped
However, as oli_obk - ker points out, the stack memory taken by the variable will not be freed until the function exits, only the resources taken by the variable.
All discussions of drop require showing its (very complicated) implementation:
fn drop<T>(_: T) {}
What if I declare the variable in a global scope outside of the other functions?
Global variables are never freed, if you can even create them to start with.
There is a difference between shadowing and reassigning (overwriting) a variable when it comes to drop order.
All local variables are normally dropped when they go out of scope, in reverse order of declaration (see The Rust Programming Language's chapter on Drop). This includes shadowed variables. It's easy to check this by wrapping the value in a simple wrapper struct that prints something when it (the wrapper) is dropped (just before the value itself is dropped):
use std::fmt::Debug;
struct NoisyDrop<T: Debug>(T);
impl<T: Debug> Drop for NoisyDrop<T> {
fn drop(&mut self) {
println!("dropping {:?}", self.0);
}
}
fn main() {
let hello = NoisyDrop("Hello");
let hello = NoisyDrop("Goodbye");
println!("My variable hello contains: {}", hello.0);
}
prints the following (playground):
My variable hello contains: Goodbye
dropping "Goodbye"
dropping "Hello"
That's because a new let binding in a scope does not overwrite the previous binding, so it's just as if you had written
let hello1 = NoisyDrop("Hello");
let hello2 = NoisyDrop("Goodbye");
println!("My variable hello contains: {}", hello2.0);
Notice that this behavior is different from the following, superficially very similar, code (playground):
fn main() {
let mut hello = NoisyDrop("Hello");
hello = NoisyDrop("Goodbye");
println!("My variable hello contains: {}", hello.0);
}
which not only drops them in the opposite order, but drops the first value before printing the message! That's because when you assign to a variable (instead of shadowing it with a new one), the original value gets dropped first, before the new value is moved in.
I began by saying that local variables are "normally" dropped when they go out of scope. Because you can move values into and out of variables, the analysis of figuring out when variables need to be dropped can sometimes not be done until runtime. In such cases, the compiler actually inserts code to track "liveness" and drop those values when necessary, so you can't accidentally cause leaks by overwriting a value. (However, it's still possible to safely leak memory by calling mem::forget, or by creating an Rc-cycle with internal mutability.)
See also
What's the semantic of assignment in Rust?
There are a few things to note here:
In the program you gave, when compiling it, the "Hello" string does not appear in the binary. This might be a compiler optimization because the first value is not used.
fn main(){
let hello = "Hello xxxxxxxxxxxxxxxx"; // Added for searching more easily.
let hello = "Goodbye";
println!("My variable hello contains: {}", hello);
}
Then test:
$ rustc ./stackoverflow.rs
$ cat stackoverflow | grep "xxx"
# No results
$ cat stackoverflow | grep "Goodbye"
Binary file (standard input) matches
$ cat stackoverflow | grep "My variable hello contains"
Binary file (standard input) matches
Note that if you print the first value, the string does appear in the binary though, so this proves that this is a compiler optimization to not store unused values.
Another thing to consider is that both values assigned to hello (i.e. "Hello" and "Goodbye") have a &str type. This is a pointer to a string stored statically in the binary after compiling. An example of a dynamically generated string would be when you generate a hash from some data, like MD5 or SHA algorithms (the resulting string does not exist statically in the binary).
fn main(){
// Added the type to make it more clear.
let hello: &str = "Hello";
let hello: &str = "Goodbye";
// This is wrong (does not compile):
// let hello: String = "Goodbye";
println!("My variable hello contains: {}", hello);
}
This means that the variable is simply pointing to a location in the static memory. No memory gets allocated during runtime, nor gets freed. Even if the optimization mentioned above didn't exist (i.e. omit unused strings), only the memory address location pointed by hello would change, but the memory is still used by static strings.
The story would be different for a String type, and for that refer to the other answers.
I'm trying to subscribe to Windows events using EvtSubscribe from the winapi crate, but I'm getting ERROR_INVALID_PARAMETER.
I can not find an example in Rust, but did find a C++ example.
My code that produces ERROR_INVALID_PARAMETER:
fn main() {
unsafe {
let mut callback: winapi::um::winevt::EVT_SUBSCRIBE_CALLBACK = None;
let mut session = std::ptr::null_mut();
let mut signal_event = std::ptr::null_mut();
let mut bookmark = std::ptr::null_mut();
let mut context = std::ptr::null_mut();
let channel_path = "Security";
let channel_path: winnt::LPWSTR = to_wchar(channel_path);
let query = "Event/System[EventID=4624]";
let query: winnt::LPWSTR = to_wchar(query);
let event_handle = winevt::EvtSubscribe(
session,
signal_event,
channel_path,
query,
bookmark,
context,
callback,
winevt::EvtSubscribeStartAtOldestRecord,
);
//println!("{:?}", &event_handle);
println!("{:?}", &winapi::um::errhandlingapi::GetLastError());
} //unsafe end
}
fn to_vec(str: &str) -> Vec<u16> {
return OsStr::new(str)
.encode_wide()
.chain(Some(0).into_iter())
.collect();
}
fn to_wchar(str: &str) -> *mut u16 {
return to_vec(str).as_mut_ptr();
}
The documentation for EvtSubscribe states:
SignalEvent
[...] This parameter must be NULL if the Callback parameter is not
NULL.
Callback
[...] This parameter must be NULL if the SignalEvent parameter is
not NULL.
The unstated implication here is that exactly one of these parameters must be provided. Passing both is explicitly disallowed, but passing neither would not make sense, as otherwise there would be no way for your code to receive the event.
Passing one of these values should cause the code to start working.
Editorially, this is a good example of where a Rust enum would have been a better way to model the API. This would clearly show that the two options are mutually exclusive and one is required:
enum Subscriber {
EventObject(HANDLE),
Callback(EVT_SUBSCRIBE_CALLBACK),
}
Incidentally, your implementation of to_wchar is incorrect and likely leads to memory unsafety. to_vec allocates memory, you take a pointer to it, then that memory is deallocated, creating a dangling pointer. The bad pointer is read by the C code inside of the unsafe block — part of the reason unsafe is needed.
You either need to use mem::forget, as shown in How to expose a Rust `Vec<T>` to FFI? (and then you need to prevent leaking the memory somehow), or you need to take a reference to the data instead of taking the raw pointer.
I saw in the Rust book that you can define two different variables with the same name:
let hello = "Hello";
let hello = "Goodbye";
println!("My variable hello contains: {}", hello);
This prints out:
My variable hello contains: Goodbye
What happens with the first hello? Does it get freed up? How could I access it?
I know it would be bad to name two variables the same, but if this happens by accident because I declare it 100 lines below it could be a real pain.
Rust does not have a garbage collector.
Does Rust free up the memory of overwritten variables?
Yes, otherwise it'd be a memory leak, which would be a pretty terrible design decision. The memory is freed when the variable is reassigned:
struct Noisy;
impl Drop for Noisy {
fn drop(&mut self) {
eprintln!("Dropped")
}
}
fn main() {
eprintln!("0");
let mut thing = Noisy;
eprintln!("1");
thing = Noisy;
eprintln!("2");
}
0
1
Dropped
2
Dropped
what happens with the first hello
It is shadowed.
Nothing "special" happens to the data referenced by the variable, other than the fact that you can no longer access it. It is still dropped when the variable goes out of scope:
struct Noisy;
impl Drop for Noisy {
fn drop(&mut self) {
eprintln!("Dropped")
}
}
fn main() {
eprintln!("0");
let thing = Noisy;
eprintln!("1");
let thing = Noisy;
eprintln!("2");
}
0
1
2
Dropped
Dropped
See also:
Is the resource of a shadowed variable binding freed immediately?
I know it would be bad to name two variables the same
It's not "bad", it's a design decision. I would say that using shadowing like this is a bad idea:
let x = "Anna";
println!("User's name is {}", x);
let x = 42;
println!("The tax rate is {}", x);
Using shadowing like this is reasonable to me:
let name = String::from(" Vivian ");
let name = name.trim();
println!("User's name is {}", name);
See also:
Why do I need rebinding/shadowing when I can have mutable variable binding?
but if this happens by accident because I declare it 100 lines below it could be a real pain.
Don't have functions that are so big that you "accidentally" do something. That's applicable in any programming language.
Is there a way of cleaning memory manually?
You can call drop:
eprintln!("0");
let thing = Noisy;
drop(thing);
eprintln!("1");
let thing = Noisy;
eprintln!("2");
0
Dropped
1
2
Dropped
However, as oli_obk - ker points out, the stack memory taken by the variable will not be freed until the function exits, only the resources taken by the variable.
All discussions of drop require showing its (very complicated) implementation:
fn drop<T>(_: T) {}
What if I declare the variable in a global scope outside of the other functions?
Global variables are never freed, if you can even create them to start with.
There is a difference between shadowing and reassigning (overwriting) a variable when it comes to drop order.
All local variables are normally dropped when they go out of scope, in reverse order of declaration (see The Rust Programming Language's chapter on Drop). This includes shadowed variables. It's easy to check this by wrapping the value in a simple wrapper struct that prints something when it (the wrapper) is dropped (just before the value itself is dropped):
use std::fmt::Debug;
struct NoisyDrop<T: Debug>(T);
impl<T: Debug> Drop for NoisyDrop<T> {
fn drop(&mut self) {
println!("dropping {:?}", self.0);
}
}
fn main() {
let hello = NoisyDrop("Hello");
let hello = NoisyDrop("Goodbye");
println!("My variable hello contains: {}", hello.0);
}
prints the following (playground):
My variable hello contains: Goodbye
dropping "Goodbye"
dropping "Hello"
That's because a new let binding in a scope does not overwrite the previous binding, so it's just as if you had written
let hello1 = NoisyDrop("Hello");
let hello2 = NoisyDrop("Goodbye");
println!("My variable hello contains: {}", hello2.0);
Notice that this behavior is different from the following, superficially very similar, code (playground):
fn main() {
let mut hello = NoisyDrop("Hello");
hello = NoisyDrop("Goodbye");
println!("My variable hello contains: {}", hello.0);
}
which not only drops them in the opposite order, but drops the first value before printing the message! That's because when you assign to a variable (instead of shadowing it with a new one), the original value gets dropped first, before the new value is moved in.
I began by saying that local variables are "normally" dropped when they go out of scope. Because you can move values into and out of variables, the analysis of figuring out when variables need to be dropped can sometimes not be done until runtime. In such cases, the compiler actually inserts code to track "liveness" and drop those values when necessary, so you can't accidentally cause leaks by overwriting a value. (However, it's still possible to safely leak memory by calling mem::forget, or by creating an Rc-cycle with internal mutability.)
See also
What's the semantic of assignment in Rust?
There are a few things to note here:
In the program you gave, when compiling it, the "Hello" string does not appear in the binary. This might be a compiler optimization because the first value is not used.
fn main(){
let hello = "Hello xxxxxxxxxxxxxxxx"; // Added for searching more easily.
let hello = "Goodbye";
println!("My variable hello contains: {}", hello);
}
Then test:
$ rustc ./stackoverflow.rs
$ cat stackoverflow | grep "xxx"
# No results
$ cat stackoverflow | grep "Goodbye"
Binary file (standard input) matches
$ cat stackoverflow | grep "My variable hello contains"
Binary file (standard input) matches
Note that if you print the first value, the string does appear in the binary though, so this proves that this is a compiler optimization to not store unused values.
Another thing to consider is that both values assigned to hello (i.e. "Hello" and "Goodbye") have a &str type. This is a pointer to a string stored statically in the binary after compiling. An example of a dynamically generated string would be when you generate a hash from some data, like MD5 or SHA algorithms (the resulting string does not exist statically in the binary).
fn main(){
// Added the type to make it more clear.
let hello: &str = "Hello";
let hello: &str = "Goodbye";
// This is wrong (does not compile):
// let hello: String = "Goodbye";
println!("My variable hello contains: {}", hello);
}
This means that the variable is simply pointing to a location in the static memory. No memory gets allocated during runtime, nor gets freed. Even if the optimization mentioned above didn't exist (i.e. omit unused strings), only the memory address location pointed by hello would change, but the memory is still used by static strings.
The story would be different for a String type, and for that refer to the other answers.
I'm writing an Windows phone application with C++/CX. The function tries to copy input array to output array asynchronously:
IAsyncAction CopyAsync(const Platform::Array<byte, 1>^ input, Platform::WriteOnlyArray<byte, 1>^ output)
{
byte *inputData = input->Data;
byte *outputData = output->Data;
int byteCount = input->Length;
// if I put it here, there is no error
//memcpy_s(outputData, byteCount, inputData, byteCount);
return concurrency::create_async([&]() -> void {
memcpy_s(outputData, byteCount, inputData, byteCount); // access violation exception
return;
});
}
This function compiles but cannot run correctly and produces an "Access violation exception". How can I modify values in the output array?
This is Undefined Behaviour: by the time you use your 3 captured (by reference) variables inputData/outputData/byteCount in the lambda, you already returned from CopyAsync and the stack has been trashed.
It's really the same issue as if you returned a reference to a local variable from a function (which we know is evil), except that here the references are hidden inside the lambda so it's a bit harder to see at first glance.
If you are sure that input and output won't change and will still be reachable between the moment you call CopyAsync and the moment you run the asynchronous action, you can capture your variables by value instead of by reference:
return concurrency::create_async([=]() -> void {
// ^ here
memcpy_s(outputData, byteCount, inputData, byteCount);
return;
});
Since they're only pointers (and an int), you won't be copying the pointed-to data, only the pointers themselves.
Or you could just capture input and output by value: since they're garbage-collected pointers this will at least make sure the objects are still reachable by the time you run the lambda:
return concurrency::create_async([=]() -> void {
memcpy_s(output->Data, input->Length, input->Data, input->Length);
return;
});
I for one prefer this second solution, it provides more guarantees (namely, object reachability) than the first one.