Converting a immutable variable to mutable by taking ownership - for-loop

I was going through the Rust book from the official rust website and came across the following paragraph:
Note that we needed to make v1_iter mutable: calling the next method on an iterator changes internal state that the iterator uses to keep track of where it is in the sequence. In other words, this code consumes, or uses up, the iterator. Each call to next eats up an item from the iterator. We didn’t need to make v1_iter mutable when we used a for loop because the loop took ownership of v1_iter and made it mutable behind the scenes.
If you notice the last line. It says the for loop makes a mutable variable immutable behind the scenes. If that's possible, then is it possible for us as a programmer to do the same?
Like I know it's not safe and we're not supposed to do that and stuff, but just wondering if that would be possible.

It is completely fine to rebind an immutable variable to a mutable, e.g. the following code works:
let x = 5;
let mut x = x;
After the last statement we can mutate x variable. In fact, thought, there are two variables, the first is moved into the latest. The same can be done with the function as well:
fn f(mut x: i32) {
x += 1;
}
let y = 5;
f(y);
The thing prohibited in Rust is changing immutable references to a mutable ones. That is an important difference because owned value is always safe to change comparing to a borrowed one.

Related

Is c++11 operator[] equivalent to emplace on map insertion?

For C++11, is there still a performance difference between the following?
(for std::map<Foo, std::vector<Bar> > as an example)
map[key] = myVector and map.emplace(key, myVector)
The part I'm not figuring out is the exact internal of operator[]. My understanding so far has been (when key doesn't exist):
Create a new key and the associated empty default vector in place inside the map
Return the reference of the associated empty vector
Assign myVector to the reference???
The point 3 is the part I couldn't understand, how can you assign a new value to a reference in the first place?
Though I cannot sort through point 3 I think somehow there's just a copy/move required. Assuming C++11 will be smart enough to know it's gonna be a move operation, is this whole "[]" assignment then already cheaper than insert()? Is it almost equivalent to emplace()? ---- default construction and move content over, versus construct vector with content directly in place?
There are a lot of differences between the two.
If you use operator[], then the map will default construct the value. The return value from operator[] will be this default constructed object, which will then use operator= to assign to it.
If you use emplace, the map will directly construct the value with the parameters you provide.
So the operator[] method will always use two-stage construction. If the default constructor is slow, or if copy/move construction is faster than copy/move assignment, then it could be problematic.
However, emplace will not replace the value if the provided key already exists. Whereas operator[] followed by operator= will always replace the value, whether there was one there or not.
There are other differences too. If copying/moving throws, emplace guarantees that the map will not be changed. By contrast, operator[] will always insert a default constructed element. So if the later copy/move assignment fails, then the map has already been changed. That key will exist with a default constructed value_type.
Really, performance is not the first thing you should be thinking about when deciding which one to use. You need to focus first on whether it has the desired behavior.
C++17 will provide insert_or_assign, which has the effect of map[] = v;, but with the exception safety of insert/emplace.
how can you assign a new value to a reference in the first place?
It's fundamentally no different from assigning to any non-const reference:
int i = 5;
int &j = i;
j = 30;
i == 30; //This is true.

Why exactly is std::move needed on a function parameter which is already an rvalue reference? [duplicate]

On Artima article about C++ rvalue reference (http://www.artima.com/cppsource/rvalue.html) there is words: That's why it is necessary to say move(x) instead of just x when passing down to the base class. This is a key safety feature of move semantics designed to prevent accidently moving twice from some named variable.
I can't think situation when such double move can perform. Can you give an example of this? In other words, what will go wrong if all members of T&& would be rvalue references and not just references?
Consider this scenario:
void foo(std::string x) {}
void bar(std::string y) {}
void test(std::string&& str)
{
// to be determined
}
We want to call foo with str, then bar with str, both with the same value. The best way to do this is:
foo(str); // copy str to x
bar(std::move(str)); // move str to y; we move it because we're done with it
It would be a mistake to do this:
foo(std::move(str)); // move str to x
bar(std::move(str)); // move str to y...er, except now it's empty
Because after the first move the value of str is unspecified.
So in the design of rvalue references, this implicit move is not there. If it were, our best way above would not work because the first mention of str would be std::move(str) instead.
The way I see it is that if we had some rvalue reference x:
T&& x = ...;
and we called some function using x as a parameter:
f(x)
We need someway to tell f whether or not it can damage x (or "take ownership of x", or "is the last client to use x").
One way to design this would be to qualify every call:
f(yours_now(x)) // ok to damage
f(still_mine(x)) // dont damage
and make the unqualified call illegal.
Another way would be to make one way the default:
Either:
f(yours_now(x)) // ok to damage
f(x) // dont damage
or
f(x) // ok to damage
f(still_mine(x)) // dont damage
So if we agree qualifying every use is too bulky and we should default to one way, which is best? Well lets look at the cost of accidentally picking the default in both cases:
In the first case it was ok to damage, but we accidentally said it wasnt. In this case we lose performance because an unnecessary copy was made, but other than that no big deal.
In the second case it was not ok to damage an object, but we accidentally said it was. This may cause a difficult to detect logical bug in the program, as x is now in a damaged state as f returns, but the author expected it not to be.
So the first case is what was chosen because its "safer".

Purity of Memoized Functions in D

Are there any clever ways of preserving purity when memoizing functions in D?
I want this when caching SHA1-calculations of large datasets kept in RAM.
Short answer: Pick memoization or purity. Don't try and have both.
Long answer: I don't see how it would be possible to preserve purity with memoization unless you used casts to lie to the compiler and claim that a function is pure when it isn't, because in order to memoize, you have to store the arguments and the result, which breaks purity, since the number one guarantee of pure functions is that they don't access mutable global or static variables (which is the only way that you'd be able to memoize anything).
So, if you did something like
alias pure nothrow Foo function() FuncType;
auto result = (cast(FuncType)&theFunc)();
then you can treat theFunc as if it were pure when it isn't, but then it's up to you to ensure that the function acts pure from the outside - including dealing with the fact that the compiler thinks that it can change the mutability of the return type of a strongly pure function which returns a mutable type. For instance, this code will compile just fine
char[] makeString(size_t len) pure
{
return new char[](len);
}
void main()
{
char[] a = makeString(5);
const(char)[] b = makeString(5);
const(char[]) c = makeString(5);
immutable(char)[] d = makeString(5);
immutable(char[]) e = makeString(5);
}
even though the return type is always mutable. And that's because the compiler knows that makeString is strongly pure and returns a value which could not have been passed to it - so, it's guaranteed to be a new value every time - and therefore changing changing the mutability of the return type to const or immutable doesn't violate the type system.
If you were to do something inside of makeString that involved casting a function to pure when it violated the guarantee that makeString always returned a new value, then you'd have broken the type system, and you'd be risking having very buggy code depending on what you did with the values returned from makeString.
The only way that I'm aware of getting purity when you don't have it is to cast a function pointer so that it's pure, but if you do that, then you must fully understand what guarantees a pure function makes and what the compiler thinks that it can do with it so that you fully mimic that behavior. That's easier if you're returning immutable data or a value type, because then you don't have the issue of the compiler changing the mutability of the return type, but it's still very tricky business.
So, if you're thinking about casting something to pure, think again. Yes, it's possible to do some stuff that way that you couldn't otherwise, but it's very risky. Personally, I'd advise that you decide whether purity matters more to you or memoization matters more to you and that you drop the other. Anything else is highly risky.
What D allows to express within the type system is an impure function that memoizes a pure one.
Conceptually the memoizer is also pure, but the type system is not sufficiently expressive to allow that. You'd need to cheat somewhere.

Ruby equivalent of C#'s 'yield' keyword, or, creating sequences without preallocating memory

In C#, you could do something like this:
public IEnumerable<T> GetItems<T>()
{
for (int i=0; i<10000000; i++) {
yield return i;
}
}
This returns an enumerable sequence of 10 million integers without ever allocating a collection in memory of that length.
Is there a way of doing an equivalent thing in Ruby? The specific example I am trying to deal with is the flattening of a rectangular array into a sequence of values to be enumerated. The return value does not have to be an Array or Set, but rather some kind of sequence that can only be iterated/enumerated in order, not by index. Consequently, the entire sequence need not be allocated in memory concurrently. In .NET, this is IEnumerable and IEnumerable<T>.
Any clarification on the terminology used here in the Ruby world would be helpful, as I am more familiar with .NET terminology.
EDIT
Perhaps my original question wasn't really clear enough -- I think the fact that yield has very different meanings in C# and Ruby is the cause of confusion here.
I don't want a solution that requires my method to use a block. I want a solution that has an actual return value. A return value allows convenient processing of the sequence (filtering, projection, concatenation, zipping, etc).
Here's a simple example of how I might use get_items:
things = obj.get_items.select { |i| !i.thing.nil? }.map { |i| i.thing }
In C#, any method returning IEnumerable that uses a yield return causes the compiler to generate a finite state machine behind the scenes that caters for this behaviour. I suspect something similar could be achieved using Ruby's continuations, but I haven't seen an example and am not quite clear myself on how this would be done.
It does indeed seem possible that I might use Enumerable to achieve this. A simple solution would be to us an Array (which includes module Enumerable), but I do not want to create an intermediate collection with N items in memory when it's possible to just provide them lazily and avoid any memory spike at all.
If this still doesn't make sense, then consider the above code example. get_items returns an enumeration, upon which select is called. What is passed to select is an instance that knows how to provide the next item in the sequence whenever it is needed. Importantly, the whole collection of items hasn't been calculated yet. Only when select needs an item will it ask for it, and the latent code in get_items will kick into action and provide it. This laziness carries along the chain, such that select only draws the next item from the sequence when map asks for it. As such, a long chain of operations can be performed on one data item at a time. In fact, code structured in this way can even process an infinite sequence of values without any kinds of memory errors.
So, this kind of laziness is easily coded in C#, and I don't know how to do it in Ruby.
I hope that's clearer (I'll try to avoid writing questions at 3AM in future.)
It's supported by Enumerator since Ruby 1.9 (and back-ported to 1.8.7). See Generator: Ruby.
Cliche example:
fib = Enumerator.new do |y|
y.yield i = 0
y.yield j = 1
while true
k = i + j
y.yield k
i = j
j = k
end
end
100.times { puts fib.next() }
Your specific example is equivalent to 10000000.times, but let's assume for a moment that the times method didn't exist and you wanted to implement it yourself, it'd look like this:
class Integer
def my_times
return enum_for(:my_times) unless block_given?
i=0
while i<self
yield i
i += 1
end
end
end
10000.my_times # Returns an Enumerable which will let
# you iterate of the numbers from 0 to 10000 (exclusive)
Edit: To clarify my answer a bit:
In the above example my_times can be (and is) used without a block and it will return an Enumerable object, which will let you iterate over the numbers from 0 to n. So it is exactly equivalent to your example in C#.
This works using the enum_for method. The enum_for method takes as its argument the name of a method, which will yield some items. It then returns an instance of class Enumerator (which includes the module Enumerable), which when iterated over will execute the given method and give you the items which were yielded by the method. Note that if you only iterate over the first x items of the enumerable, the method will only execute until x items have been yielded (i.e. only as much as necessary of the method will be executed) and if you iterate over the enumerable twice, the method will be executed twice.
In 1.8.7+ it has become to define methods, which yield items, so that when called without a block, they will return an Enumerator which will let the user iterate over those items lazily. This is done by adding the line return enum_for(:name_of_this_method) unless block_given? to the beginning of the method like I did in my example.
Without having much ruby experience, what C# does in yield return is usually known as lazy evaluation or lazy execution: providing answers only as they are needed. It's not about allocating memory, it's about deferring computation until actually needed, expressed in a way similar to simple linear execution (rather than the underlying iterator-with-state-saving).
A quick google turned up a ruby library in beta. See if it's what you want.
C# ripped the 'yield' keyword right out of Ruby- see Implementing Iterators here for more.
As for your actual problem, you have presumably an array of arrays and you want to create a one-way iteration over the complete length of the list? Perhaps worth looking at array.flatten as a starting point - if the performance is alright then you probably don't need to go too much further.

Using function arguments as local variables

Something like this (yes, this doesn't deal with some edge cases - that's not the point):
int CountDigits(int num) {
int count = 1;
while (num >= 10) {
count++;
num /= 10;
}
return count;
}
What's your opinion about this? That is, using function arguments as local variables.
Both are placed on the stack, and pretty much identical performance wise, I'm wondering about the best-practices aspects of this.
I feel like an idiot when I add an additional and quite redundant line to that function consisting of int numCopy = num, however it does bug me.
What do you think? Should this be avoided?
As a general rule, I wouldn't use a function parameter as a local processing variable, i.e. I treat function parameters as read-only.
In my mind, intuitively understandabie code is paramount for maintainability, and modifying a function parameter to use as a local processing variable tends to run counter to that goal. I have come to expect that a parameter will have the same value in the middle and bottom of a method as it does at the top. Plus, an aptly-named local processing variable may improve understandability.
Still, as #Stewart says, this rule is more or less important depending on the length and complexity of the function. For short simple functions like the one you show, simply using the parameter itself may be easier to understand than introducing a new local variable (very subjective).
Nevertheless, if I were to write something as simple as countDigits(), I'd tend to use a remainingBalance local processing variable in lieu of modifying the num parameter as part of local processing - just seems clearer to me.
Sometimes, I will modify a local parameter at the beginning of a method to normalize the parameter:
void saveName(String name) {
name = (name != null ? name.trim() : "");
...
}
I rationalize that this is okay because:
a. it is easy to see at the top of the method,
b. the parameter maintains its the original conceptual intent, and
c. the parameter is stable for the rest of the method
Then again, half the time, I'm just as apt to use a local variable anyway, just to get a couple of extra finals in there (okay, that's a bad reason, but I like final):
void saveName(final String name) {
final String normalizedName = (name != null ? name.trim() : "");
...
}
If, 99% of the time, the code leaves function parameters unmodified (i.e. mutating parameters are unintuitive or unexpected for this code base) , then, during that other 1% of the time, dropping a quick comment about a mutating parameter at the top of a long/complex function could be a big boon to understandability:
int CountDigits(int num) {
// num is consumed
int count = 1;
while (num >= 10) {
count++;
num /= 10;
}
return count;
}
P.S. :-)
parameters vs arguments
http://en.wikipedia.org/wiki/Parameter_(computer_science)#Parameters_and_arguments
These two terms are sometimes loosely used interchangeably; in particular, "argument" is sometimes used in place of "parameter". Nevertheless, there is a difference. Properly, parameters appear in procedure definitions; arguments appear in procedure calls.
So,
int foo(int bar)
bar is a parameter.
int x = 5
int y = foo(x)
The value of x is the argument for the bar parameter.
It always feels a little funny to me when I do this, but that's not really a good reason to avoid it.
One reason you might potentially want to avoid it is for debugging purposes. Being able to tell the difference between "scratchpad" variables and the input to the function can be very useful when you're halfway through debugging.
I can't say it's something that comes up very often in my experience - and often you can find that it's worth introducing another variable just for the sake of having a different name, but if the code which is otherwise cleanest ends up changing the value of the variable, then so be it.
One situation where this can come up and be entirely reasonable is where you've got some value meaning "use the default" (typically a null reference in a language like Java or C#). In that case I think it's entirely reasonable to modify the value of the parameter to the "real" default value. This is particularly useful in C# 4 where you can have optional parameters, but the default value has to be a constant:
For example:
public static void WriteText(string file, string text, Encoding encoding = null)
{
// Null means "use the default" which we would document to be UTF-8
encoding = encoding ?? Encoding.UTF8;
// Rest of code here
}
About C and C++:
My opinion is that using the parameter as a local variable of the function is fine because it is a local variable already. Why then not use it as such?
I feel silly too when copying the parameter into a new local variable just to have a modifiable variable to work with.
But I think this is pretty much a personal opinion. Do it as you like. If you feel sill copying the parameter just because of this, it indicates your personality doesn't like it and then you shouldn't do it.
If I don't need a copy of the original value, I don't declare a new variable.
IMO I don't think mutating the parameter values is a bad practice in general,
it depends on how you're going to use it in your code.
My team coding standard recommends against this because it can get out of hand. To my mind for a function like the one you show, it doesn't hurt because everyone can see what is going on. The problem is that with time functions get longer, and they get bug fixes in them. As soon as a function is more than one screen full of code, this starts to get confusing which is why our coding standard bans it.
The compiler ought to be able to get rid of the redundant variable quite easily, so it has no efficiency impact. It is probably just between you and your code reviewer whether this is OK or not.
I would generally not change the parameter value within the function. If at some point later in the function you need to refer to the original value, you still have it. in your simple case, there is no problem, but if you add more code later, you may refer to 'num' without realizing it has been changed.
The code needs to be as self sufficient as possible. What I mean by that is you now have a dependency on what is being passed in as part of your algorithm. If another member of your team decides to change this to a pass by reference then you might have big problems.
The best practice is definitely to copy the inbound parameters if you expect them to be immutable.
I typically don't modify function parameters, unless they're pointers, in which case I might alter the value that's pointed to.
I think the best-practices of this varies by language. For example, in Perl you can localize any variable or even part of a variable to a local scope, so that changing it in that scope will not have any affect outside of it:
sub my_function
{
my ($arg1, $arg2) = #_; # get the local variables off the stack
local $arg1; # changing $arg1 here will not be visible outside this scope
$arg1++;
local $arg2->{key1}; # only the key1 portion of the hashref referenced by $arg2 is localized
$arg2->{key1}->{key2} = 'foo'; # this change is not visible outside the function
}
Occasionally I have been bitten by forgetting to localize a data structure that was passed by reference to a function, that I changed inside the function. Conversely, I have also returned a data structure as a function result that was shared among multiple systems and the caller then proceeded to change the data by mistake, affecting these other systems in a difficult-to-trace problem usually called action at a distance. The best thing to do here would be to make a clone of the data before returning it*, or make it read-only**.
* In Perl, see the function dclone() in the built-in Storable module.
** In Perl, see lock_hash() or lock_hash_ref() in the built-in Hash::Util module).

Resources