Java-8 stream expression to 'OR' several enum values together - java-8

I am aggregating a bunch of enum values (different from the ordinal values) in a foreach loop.
int output = 0;
for (TestEnum testEnum: setOfEnums) {
output |= testEnum.getValue();
}
Is there a way to do this in streams API?
If I use a lambda like this in a Stream<TestEnum> :
setOfEnums.stream().forEach(testEnum -> (output |= testEnum.getValue());
I get a compile time error that says, 'variable used in lambda should be effectively final'.

Predicate represents a boolean valued function, you need to use reduce method of stream to aggregate bunch of enum values.
if we consider that you have HashSet as named SetOfEnums :
//int initialValue = 0; //this is effectively final for next stream pipeline if you wont modify this value in that stream
final int initialValue = 0;//final
int output = SetOfEnums.stream()
.map(TestEnum::getValue)
.reduce(initialValue, (e1,e2)-> e1|e2);

You nedd to reduce stream of enums like this:
int output = Arrays.stream(TestEnum.values()).mapToInt(TestEnum::getValue).reduce(0, (acc, value) -> acc | value);

I like the recommendations to use reduction, but perhaps a more complete answer would illustrate why it is a good idea.
In a lambda expression, you can reference variables like output that are in scope where the lambda expression is defined, but you cannot modify the values. The reason for that is that, internally, the compiler must be able to implement your lambda, if it chooses to do so, by creating a new function with your lambda as its body. The compiler may choose to add parameters as needed so that all of the values used in this generated function are available in the parameter list. In your case, such a function would definitely have the lambda's explicit parameter, testEnum, but because you also reference the local variable output in the lambda body, it could add that as a second parameter to the generated function. Effectively, the compiler might generate this function from your lambda:
private void generatedFunction1(TestEnum testEnum, int output) {
output |= testEnum.getValue();
}
As you can see, the output parameter is a copy of the output variable used by the caller, and the OR operation would only be applied to the copy. Since the original output variable wouldn't be modified, the language designers decided to prohibit modification of values passed implicitly to lambdas.
To get around the problem in the most direct way, setting aside for the moment that the use of reduction is a far better approach, you could wrap the output variable in a wrapper (e.g. an int[] array of size 1 or an AtomicInteger. The wrapper's reference would be passed by value to the generated function, and since you would now update the contents of output, not the value of output, output remains effectively final, so the compiler won't complain. For example:
AtomicInteger output = new AtomicInteger();
setOfEnums.stream().forEach(testEnum -> (output.set(output.get() | testEnum.getValue()));
or, since we're using AtomicInteger, we may as well make it thread-safe in case you later choose to use a parallel Stream,
AtomicInteger output = new AtomicInteger();
setOfEnums.stream().forEach(testEnum -> (output.getAndUpdate(prev -> prev | testEnum.getValue())));
Now that we've gone over an answer that most resembles what you asked about, we can talk about the superior solution of using reduction, that other answers have already recommended.
There are two kinds of reduction offered by Stream, stateless reduction (reduce(), and stateful reduction (collect()). To visualize the difference, consider a conveyer belt delivering hamburgers, and your goal is to collect all of the hamburger patties into one big hamburger. With stateful reduction, you would start with a new hamburger bun, and then collect the patty out of each hamburger as it arrives, and you add it to the stack of patties in the hamburger bun you set up to collect them. In stateless reduction, you start out with an empty hamburger bun (called the "identity", since that empty hamburger bun is what you end up with if the conveyer belt is empty), and as each hamburger arrives on the belt, you make a copy of the previous accumulated burger and add the patty from the new one that just arrived, discarding the previous accumulated burger.
The stateless reduction may seem like a huge waste, but there are cases when copying the accumulated value is very cheap. One such case is when accumulating primitive types -- primitive types are very cheap to copy, so stateless reduction is ideal when crunching primitives in applications such as summing, ORing, etc.
So, using stateless reduction, your example might become:
setOfEnums.stream()
.mapToInt(TestEnum::getValue) // or .mapToInt(testEnum -> testEnum.getValue())
.reduce(0, (resultSoFar, testEnum) -> resultSoFar | testEnum);
Some points to ponder:
Your original for loop is probably faster than using streams, except perhaps if your set is very large and you use parallel streams. Don't use streams for the sake of using streams. Use them if they make sense.
In my first example, I showed the use of Stream.forEach(). If you ever find yourself creating a Stream and just calling forEach(), it is more efficient just to call forEach() on the collection directly.
You didn't mention what kind of Set you are using, but I hope you are using EnumSet<TestEnum>. Because it is implemented as a bit field, It performs much better (O(1)) than any other kind of Set for all operations, even copying. EnumSet.noneOf(TestEnum.class) creates an empty Set, EnumSet.allOf(TestEnum.class) gives you a set of all enum values, etc.

Related

Game Data Structure in OCaml

I am currently working on a game / simulation of a computer like logistics system (like the minecraft mod applied energestics).
The main part of the game is the 2d grid of blocks.
All the blocks have common properties like a position.
But then there are supposed to be different kinds of Blocks like:
Item-Containers,
Input and Export Busses,
etc.
In an imperative object-oriented language (like Java) I would implement this with:
a main block class
with the common properties like position
then have sub classes
that inherit from the block class
These subclasses would implement the different properties of the different block types.
In ocaml I am a litle bit lost.
I could create objects which inherit but this does not work Like in Java.
For example:
I cant put objects of different subclasses in one list together.
I also wanted to approach the data structure differently by separating data from logic. I would not add methods to the objects. I tried using records instead of objects.
I don't know how to implement the different block types.
I tried using custom data types like this:
type blockType = Container | Input | Output | Air
type block = {blockType :blockType; pos :int * int}
I struggled to add the individual additional properties. I tried to add an entity field to the block record type which would hold the additional properties:
type entity = Container of inventory | OutputEntity of facing | InputEntity of facing | NoEntity
(Where inventory and facing are also custom types)
This solution doesn't really feel fitting.
One problem I have is that there are logic operations I want to perform on blocks of type Input and Output. I have to repeat code like this:
let rotateBlock block =
match block.blockType with
| Input -> {block with entity = nextDirection block.entity}
| Output -> {block with entity = nextDirection block.entity}
| _ -> block
That's not that bad with two types but I plan to add much more so it is a big negative in terms of scalability.
Another point of critism of this kind of structure is that it is a litle bit inconsistent. I use a field in the record to implement the different types on the block level and multiple constructors on the entity level. I did it to be able to access the positon of every block easily with block.pos instead of using pattern matching.
I am not really happy with this solution.
Request
I hope somebody can point me in the right direction regarding the data structure.
You're trying to satisfy competing goals. You can't have both, a rigid static model of blocks and a dynamic extensible block type. So you need to choose. Fortunately, OCaml provides solutions for both, and even for something between, but as always for the middle-ground solutions, they kind of bad in both. So let's try.
Rigid static hierarchy using ADT
We can use sum types to represent a static hierarchy of objects. In this case, it would be easy for us to add new methods, but hard to add new types of objects. As the base type, we will use a polymorphic record, that is parametrized with a concrete block type (the concrete block type could be polymorphic itself, and this will allow us to build the third layer of hierarchy and so on).
type pos = {x : int; y : int}
type 'a block = {pos : pos; info = 'a}
type block_info = Container of container | Input of facing | Air | Solid
where info is an additional concrete block specific payload, i.e., a value of type block_info. This solution allows us to write polymorphic functions that accept different blocks, e.g.,
let distance b1 b2 =
sqrt ((float (b1.x - b2.x))**2. + (float (b1.y - b2.y)) **2.)
The distance function has type 'a blk -> 'b blk -> float and will compute the distance between two blocks of any type.
This solution has several drawbacks:
Hard to extend. Adding a new kind of block is hard, you basically need to design beforehand what blocks do you need and hope that you do not need to add a new block in the future. It looks like that you're expecting that you will need to add new block types, so this solution might not suit you. However, I believe that you will actually need a very small number of block kinds if you will treat each block as a syntactical element of the world grammar, you will soon notice that the minimal set of block kinds is quite small. Especially, if you will make your blocks recursive, i.e., if you will allow block composition (i.e., mixtures of different blocks in the same block).
You can't put blocks of different kinds in the same container. Because to do this, you need to forget the type of block. If you will do this, you will eventually end up with a container of positions. We will try to alleviate this in our middle-ground solution by using existential types.
Your type model doesn't impose right constraints. The world constraint is that the world is composed of blocks, and each coordinate either has a block or doesn't have one (i.e., it is the void). In your case, two blocks may have the same coordinates.
Not so rigid hierarchy using GADT
We may relax few restrictions of the previous solution, by using existential GADT. The idea of the existential is that you can forget the kind of a block, and later recover it. This is essentially the same as a variant type (or dynamic type in C#). With existentials, you can have an infinite amount of block kinds and even put them all in the same container. Essentially, an existential is defined as a GADT that forgets its type, e.g., the first approximation
type block = Block : block_info -> {pos : pos; info : block_info}
So now we have a unified block type, that is locally quantified with the type of block payload. You may even move further, and make the block_info type extensible, e.g.,
type block_info = ..
type block_info += Air
Instead of building an existential representation by yourself (it's a nice exercise in GADT), you may opt to use some existing libraries. Search for "universal values" or "universals" in the OPAM repositories, there are a couple of solutions.
This solution is more dynamic and allows us to store values of the same type in the same container. The hierarchy is extensible. This comes with a price of course, as now we can't have one single point of definition for a particular method, in fact, method definitions would be scattered around your program (kind of close to the Common Lisp CLOS model). However, this is an expected price for an extensible dynamic system. Also, we lose the static property of our model, so we will use lots of wildcard in pattern matching, and we can't rely on the type system to check that we covered all possible combinations. And the main problem is still there our model is not right.
Not so rigid structure with OO
OCaml has the Object Oriented Layer (hence the name) so you can build classical OO hierarchies. E.g.,
class block x y = object
val x = x
val y = y
method x = x
method y = y
method with_x x = {< x = x >}
method with_y y = {< y = y >}
end
class input_block facing = object
inherit block
val facing = facing
method facing = facing
method with_facing f = {< facing = f >}
end
This solution is essentially close to the first solution, except that your hierarchy is now extensible at the price that the set of methods is now fixed. And although you can put different blocks in the same container by forgetting a concrete type of a block using upcasting, this won't make much sense since OCaml doesn't have the down-cast operator, so you will end up with a container of coordinates. And we still have the same problem - our model is not right.
Dynamic world structure using Flyweights
This solution kills two bunnies at the same time (and I believe that this should be the way how it is implemented in Minecraft). Let's start with the second problem. If you will represent every item in your world with a concrete record that has all attributes of that item you will end up with lots of duplicates and extreme memory consumption. That's why in real-world applications a pattern called Flyweight is used. So, if you think about scalability you will still end up in using this approach. The idea of the Flyweight pattern is that your objects share attributes by using finite mapping, and objects itself are represented as identifiers, e.g.,
type block = int
type world = {
map : pos Int.Map.t;
facing : facing Int.Map.t;
air : Int.Set.t;
}
where 'a Int.Map.t is a mapping from int to 'a, and Int.Set.t is a set of integers (I'm using the Core library here).
In fact, you may even decide that you don't need a closed world type, and just have a bunch of finite mappings, where each particular module adds and maintains its own set of mappings. You can use abstract types to store this mapping in a central repository.
You may also consider the following representation of the block type, instead of one integer, you may use two integers. The first integer denotes an identity of a block and the second denotes its equality.
type block = {id : int; eq : int}
The idea is that every block in the game will have a unique id that will distinguish it from other blocks even if they are equal "as two drops of water". And eq will denote structural equality of two blocks, i.e., two blocks with exact same attributes will have the same eq number. This solution is hard to implement if you world structure is not closed though (as in this case the set of attributes is not closed).
The main drawback of this solution is that it is so dynamic that it sorts of leaving the OCaml type system out of work. That's a reasonable penalty, in fact you can't have a dynamic system that is fully verified in static time. (Unless you have a language with dependent types, but this is a completely different story).
To summarize, if I were devising such kind of game, I will use the last solution. Mainly because it scales well to a large number of blocks, thanks to hashconsing (another name for Flyweight). If scalability is not an issue, then I will build a static structure of blocks with different composition operators, e.g.,
type block =
| Regular of regular
| ...
| Compose of compose_kind * block * block
type compose_kind = Horizontal | Vertical | Inplace
And now the world is just a block. This solution is pure mathematical though, and doesn't really scale to larger worlds.
Sounds like fun.
I cant put objects of different subclasses in one list together.
You can actually. Suppose you had lots of different block-objects that
all had a 'decay' method. You could have a function "get me the
decayables" and it could put all those blocks in a list for you, and
you could then at timed intervals iterate over the list and apply the
decay method on each of those blocks. This is all well typed and easy
to do with OCaml's object system. What you can't do is take out a
'decayable' from that list and say, actually, this was always also
an AirBlock, and I want to treat it like a full-fledged AirBlock now,
instead of a decayable.
...
type blockType = Container | Input | Output | Air
You can only have 240 so variants per type. If you plan on having more
blocks than this, an easy way to gain extra space would be to
categorize your blocks and work with e.g. Solid Rock | Liquid Lava
rather than Rock | Lava.
type block = {blockType :blockType; pos :int * int}
What's the position of a block in your inventory? What's the position
of a block that's been mined out of its place in the world, and is
now sort of sitting on the ground, waiting to be picked up? Why not
keep the positions in the array indices or map key that you're using
to represent the locations of blocks in the world? Otherwise you also
have to consider what it means for blocks to have the same position,
or impossible positions.
let rotateBlock block =
match block.blockType with
| Input -> {block with entity = nextDirection block.entity}
| Output -> {block with entity = nextDirection block.entity}
| _ -> block
I don't really follow this input/output stuff, but it seems that in this
function you're interested in some kind of property like "has a next direction
to face, if rotated". Why not name that property and make it what you match on?
type block = {
id : blockType;
burnable : bool;
consumable : bool;
wearable : bodypart option; (* None - not wearable *)
hitpoints : int option; (* None - not destructible *)
oriented : direction option; (* None - doesn't have distinct faces *)
}
let rotateBlock block =
match block.oriented with
| None -> block
| Some dir -> {block with oriented = Some (nextDirection dir)}
let burn block =
match block.burnable, block.hitpoints with
| false, _ | true, None -> block
| true, Some hp when hp > 5 -> { block with hitpoints = Some (hp - 5) }
| true, Some hp -> ash
The block type is interesting because each kind will have different operations.
Item-Containers,
Input and Export Busses,
etc.
type container = { specificinfo : int ; etc ...}
type bus .....
type block =
| Item of position * container
| Bus of position * bus
type inventory = block list
My intuition say me that you can perhaps use GADT for creating the type of the operations on block and implement an evaluator easily for the simulator.
UPDATE
to answer to your comment :
If you have a common information to all your variants, you need to extract them, you can imagine something like :
type block_info =
| Item of specific_item_type ....
| Bus of specific_bus_type
type block = {position:Vector.t ; information : block_info}
let get_block_position b = b.position

Why don't Java Streams have intermediate combining operations?

Note: I am not necessarily looking for solutions to the concrete example problems described below. I am genuinely interested why this isn't possible out of the box in java 8.
Java streams are lazy. At the very end they have a single terminal operation.
My interpretation is that this terminal operation will pull all the values through the stream. None of the intermediate operations can do that. Why are there no intermediate operations that pull in an arbitrary amount of elements through the stream? Something like this:
stream
.mapMultiple(this::consumeMultipleElements) // or groupAndMap or combine or intermediateCollect or reverseFlatMap
.collect(Collectors.toList());
When a downstream operation tries to advance the stream once, the intermediate operation might try to advance the upstream multiple times (or not at all).
I would see a couple of use cases:
(These are just examples. So you can see that it is certainly possible to handle these use cases but it is "not the streaming way" and these solutions lack the desirable laziness property that Streams have.)
Combine multiple elements into a single new element to be passed down the rest of the stream. (E.g., making pairs (1,2,3,4,5,6) ➔ ((1,2),(3,4),(5,6)))
// Something like this,
// without needing to consume the entire stream upfront,
// and also more generic. (The combiner should decide for itself how many elements to consume/combine per resulting element. Maybe the combiner is a Consumer<Iterator<E>> or a Consumer<Supplier<E>>)
public <E, R> Stream<R> combine(Stream<E> stream, BiFunction<E, E, R> combiner) {
List<E> completeList = stream.collect(toList());
return IntStream.range(0, completeList.size() / 2)
.mapToObj(i -> combiner.apply(
completeList.get(2 * i),
completeList.get(2 * i + 1)));
}
Determine if the Stream is empty (mapping the Stream to an Optional non-empty Stream)
// Something like this, without needing to consume the entire stream
public <E> Optional<Stream<E>> toNonEmptyStream(Stream<E> stream) {
List<E> elements = stream.collect(toList());
return elements.isEmpty()
? Optional.empty()
: Optional.of(elements.stream());
}
Having a lazy Iterator that doesn't terminate the stream (allowing to skip elements based on more complicated logic then just skip(long n)).
Iterator<E> iterator = stream.iterator();
// Allow this without throwing a "java.lang.IllegalStateException: stream has already been operated upon or closed"
stream.collect(toList());
When they designed Streams and everything around them, did they forget about these use cases or did they explicitly leave this out?
I understand that these might give unexpected results when dealing with parallel streams but in my opinion this is a risk that can be documented.
Well all of the operations that you want are actually achievable in the Stream API, but not out of the box.
Combining multiple elements into pairs of elements - you need a custom Spliterator for that. Here is Tagir Valeev doing that. He has a absolute beast of the library called StreamEx that does many other useful things that are not supported out of the box.
I did not understand your second example, but I bet it's doable also.
skip to a more complicated operation are in java-9 via dropWhile and takeWhile that take a Predicate as input.
Just notice that when you say that none of the intermediate operations can do that is not accurate - there is sorted and distinct that do exactly that. They can't work otherwise. There's also flatMap that acts like that, but that is treated more like a bug.
One more thing is that intermediate operations for parallel streams have no defined order, so such a stateful intermediate operation would have unknown entries for a parallel stream. On the other hand you always have the option to abuse things like:
List<Something> list = Collections.synchronizedList()
.map(x -> {
list.add(x);
// your mapping
})
I would not do that if I were you and really think if I might need that, but just in case...
Not every terminal operation will “pull all the values through the stream”. The terminal operations iterator() and spliterator() do not immediately fetch all values and allow to do lazy processing, including constructing a new Stream again. For the latter, it’s strongly recommended to use spliterator() as this allows more meta information to be passed to the new stream and also implies less wrapping of objects.
E.g. your second example could be implemented as
public static <T> Stream<T> replaceWhenEmpty(Stream<T> s, Supplier<Stream<T>> fallBack) {
boolean parallel = s.isParallel();
Spliterator<T> sp = s.spliterator();
Stream.Builder<T> firstElement;
if(sp.getExactSizeIfKnown()==0 || !sp.tryAdvance(firstElement=Stream.builder())) {
s.close();
return fallBack.get();
}
return Stream.concat(firstElement.build(), StreamSupport.stream(sp, parallel))
.onClose(s::close);
}
For your general question, I don’t see how a general abstraction of these examples should look like, except like the spliterator() method that already exist. As the documentation puts it
However, if the provided stream operations do not offer the desired functionality, the BaseStream.iterator() and BaseStream.spliterator() operations can be used to perform a controlled traversal.

Is c++11 operator[] equivalent to emplace on map insertion?

For C++11, is there still a performance difference between the following?
(for std::map<Foo, std::vector<Bar> > as an example)
map[key] = myVector and map.emplace(key, myVector)
The part I'm not figuring out is the exact internal of operator[]. My understanding so far has been (when key doesn't exist):
Create a new key and the associated empty default vector in place inside the map
Return the reference of the associated empty vector
Assign myVector to the reference???
The point 3 is the part I couldn't understand, how can you assign a new value to a reference in the first place?
Though I cannot sort through point 3 I think somehow there's just a copy/move required. Assuming C++11 will be smart enough to know it's gonna be a move operation, is this whole "[]" assignment then already cheaper than insert()? Is it almost equivalent to emplace()? ---- default construction and move content over, versus construct vector with content directly in place?
There are a lot of differences between the two.
If you use operator[], then the map will default construct the value. The return value from operator[] will be this default constructed object, which will then use operator= to assign to it.
If you use emplace, the map will directly construct the value with the parameters you provide.
So the operator[] method will always use two-stage construction. If the default constructor is slow, or if copy/move construction is faster than copy/move assignment, then it could be problematic.
However, emplace will not replace the value if the provided key already exists. Whereas operator[] followed by operator= will always replace the value, whether there was one there or not.
There are other differences too. If copying/moving throws, emplace guarantees that the map will not be changed. By contrast, operator[] will always insert a default constructed element. So if the later copy/move assignment fails, then the map has already been changed. That key will exist with a default constructed value_type.
Really, performance is not the first thing you should be thinking about when deciding which one to use. You need to focus first on whether it has the desired behavior.
C++17 will provide insert_or_assign, which has the effect of map[] = v;, but with the exception safety of insert/emplace.
how can you assign a new value to a reference in the first place?
It's fundamentally no different from assigning to any non-const reference:
int i = 5;
int &j = i;
j = 30;
i == 30; //This is true.

Initialize member variables in a method and not the constructor

I have a public method which uses a variable (only in the scope of the public method) I pass as a parameter we will call A, this method calls a private method multiple times which also requires the parameter.
At present I am passing the parameter every time but it looks weird, is it bad practice to make this member variable of the class or would the uncertainty about whether it is initialized out way the advantages of not having to pass it?
Simplified pseudo code:
public_method(parameter a)
do something with a
private_method(string_a, a)
private_method(string_b, a)
private_method(string_c, a)
private_method(String, parameter a)
do something with String and a
Additional information: parameter a is a read only map with over 100 entries and in reality I will be calling private_method about 50 times
I had this same problem myself.
I implemented it differently in 3 different contexts to see hands-on what are result using 3 different strategies, see below.
Note that I am type of programmer that makes many changes to the code always trying to improve it. Thus I settle only for the code that is amenable to changes, readbale, would you call this "flexible" code. I settle only for very clear code.
After experimentation, I came to these results:
Passing a as parameter is perfectly OK if you have one or two - short number - of such values. Passing in parmeters has very good visibility, clarity, clear passing lines, well visible lifetime (initialization points, destruction points), amenable to changes, easy to track.
If number of such values begin to grow to >= 5-6 values, I swithc to approach #3 below.
Passing values through class members -- did not do good to clarity of my code, eventually I got rid of it. It makes for less clear code. Code becomes muddled. I did not like it. It had no advantages.
As alternative to (1) and (2), I adopted Inner class approach, in cases when amount of such values is > 5 (which makes for too long argument list).
I pack those values into small Inner class and pass such object by reference as argument to all internal members.
Public function of a class usually creates an object of Inner class (I call is Impl or Ctx or Args) and passes it down to private functions.
This combines clarity of arg passing with brevity. It's perfect.
Good luck
Edit
Consider preparing array of strings and using a loop rather than writing 50 almost-identical calls. Something like char *strings[] = {...} (C/C++).
This really depends on your use case. Does 'a' represent a state that your application/object care about? Then you might want to make it a member of your object. Evaluate the big picture, think about maintenance, extensibility when designing structures.
If your parameter a is a of a class of your own, you might consider making the private_method a public method for the variable a.
Otherwise, I do not think this looks weird. If you only need a in just 1 function, making it a private variable of your class would be silly (at least to me). However, if you'd need it like 20 times I would do so :P Or even better, just make 'a' an object of your own that has that certain function you need.
A method should ideally not pass more than 7 parameters. Using the number of parameters more than 6-7 usually indicates a problem with the design (do the 7 parameters represent an object of a nested class?).
As for your question, if you want to make the parameter private only for the sake of passing between private methods without the parameter having anything to do with the current state of the object (or some information about the object), then it is not recommended that you do so.
From a performance point of view (memory consumption), reference parameters can be passed around as method parameters without any significant impact on the memory consumption as they are passed by reference rather than by value (i.e. a copy of the data is not created). For small number of parameters that can be grouped together you can use a struct. For example, if the parameters represent x and y coordinates of a point, then pass them in a single Point structure.
Bottomline
Ask yourself this question, does the parameter that you are making as a members represent any information (data) about the object? (data can be state or unique identification information). If the answer to his question is a clear no, then do not include the parameter as a member of the class.
More information
Limit number of parameters per method?
Parameter passing in C#

Using function arguments as local variables

Something like this (yes, this doesn't deal with some edge cases - that's not the point):
int CountDigits(int num) {
int count = 1;
while (num >= 10) {
count++;
num /= 10;
}
return count;
}
What's your opinion about this? That is, using function arguments as local variables.
Both are placed on the stack, and pretty much identical performance wise, I'm wondering about the best-practices aspects of this.
I feel like an idiot when I add an additional and quite redundant line to that function consisting of int numCopy = num, however it does bug me.
What do you think? Should this be avoided?
As a general rule, I wouldn't use a function parameter as a local processing variable, i.e. I treat function parameters as read-only.
In my mind, intuitively understandabie code is paramount for maintainability, and modifying a function parameter to use as a local processing variable tends to run counter to that goal. I have come to expect that a parameter will have the same value in the middle and bottom of a method as it does at the top. Plus, an aptly-named local processing variable may improve understandability.
Still, as #Stewart says, this rule is more or less important depending on the length and complexity of the function. For short simple functions like the one you show, simply using the parameter itself may be easier to understand than introducing a new local variable (very subjective).
Nevertheless, if I were to write something as simple as countDigits(), I'd tend to use a remainingBalance local processing variable in lieu of modifying the num parameter as part of local processing - just seems clearer to me.
Sometimes, I will modify a local parameter at the beginning of a method to normalize the parameter:
void saveName(String name) {
name = (name != null ? name.trim() : "");
...
}
I rationalize that this is okay because:
a. it is easy to see at the top of the method,
b. the parameter maintains its the original conceptual intent, and
c. the parameter is stable for the rest of the method
Then again, half the time, I'm just as apt to use a local variable anyway, just to get a couple of extra finals in there (okay, that's a bad reason, but I like final):
void saveName(final String name) {
final String normalizedName = (name != null ? name.trim() : "");
...
}
If, 99% of the time, the code leaves function parameters unmodified (i.e. mutating parameters are unintuitive or unexpected for this code base) , then, during that other 1% of the time, dropping a quick comment about a mutating parameter at the top of a long/complex function could be a big boon to understandability:
int CountDigits(int num) {
// num is consumed
int count = 1;
while (num >= 10) {
count++;
num /= 10;
}
return count;
}
P.S. :-)
parameters vs arguments
http://en.wikipedia.org/wiki/Parameter_(computer_science)#Parameters_and_arguments
These two terms are sometimes loosely used interchangeably; in particular, "argument" is sometimes used in place of "parameter". Nevertheless, there is a difference. Properly, parameters appear in procedure definitions; arguments appear in procedure calls.
So,
int foo(int bar)
bar is a parameter.
int x = 5
int y = foo(x)
The value of x is the argument for the bar parameter.
It always feels a little funny to me when I do this, but that's not really a good reason to avoid it.
One reason you might potentially want to avoid it is for debugging purposes. Being able to tell the difference between "scratchpad" variables and the input to the function can be very useful when you're halfway through debugging.
I can't say it's something that comes up very often in my experience - and often you can find that it's worth introducing another variable just for the sake of having a different name, but if the code which is otherwise cleanest ends up changing the value of the variable, then so be it.
One situation where this can come up and be entirely reasonable is where you've got some value meaning "use the default" (typically a null reference in a language like Java or C#). In that case I think it's entirely reasonable to modify the value of the parameter to the "real" default value. This is particularly useful in C# 4 where you can have optional parameters, but the default value has to be a constant:
For example:
public static void WriteText(string file, string text, Encoding encoding = null)
{
// Null means "use the default" which we would document to be UTF-8
encoding = encoding ?? Encoding.UTF8;
// Rest of code here
}
About C and C++:
My opinion is that using the parameter as a local variable of the function is fine because it is a local variable already. Why then not use it as such?
I feel silly too when copying the parameter into a new local variable just to have a modifiable variable to work with.
But I think this is pretty much a personal opinion. Do it as you like. If you feel sill copying the parameter just because of this, it indicates your personality doesn't like it and then you shouldn't do it.
If I don't need a copy of the original value, I don't declare a new variable.
IMO I don't think mutating the parameter values is a bad practice in general,
it depends on how you're going to use it in your code.
My team coding standard recommends against this because it can get out of hand. To my mind for a function like the one you show, it doesn't hurt because everyone can see what is going on. The problem is that with time functions get longer, and they get bug fixes in them. As soon as a function is more than one screen full of code, this starts to get confusing which is why our coding standard bans it.
The compiler ought to be able to get rid of the redundant variable quite easily, so it has no efficiency impact. It is probably just between you and your code reviewer whether this is OK or not.
I would generally not change the parameter value within the function. If at some point later in the function you need to refer to the original value, you still have it. in your simple case, there is no problem, but if you add more code later, you may refer to 'num' without realizing it has been changed.
The code needs to be as self sufficient as possible. What I mean by that is you now have a dependency on what is being passed in as part of your algorithm. If another member of your team decides to change this to a pass by reference then you might have big problems.
The best practice is definitely to copy the inbound parameters if you expect them to be immutable.
I typically don't modify function parameters, unless they're pointers, in which case I might alter the value that's pointed to.
I think the best-practices of this varies by language. For example, in Perl you can localize any variable or even part of a variable to a local scope, so that changing it in that scope will not have any affect outside of it:
sub my_function
{
my ($arg1, $arg2) = #_; # get the local variables off the stack
local $arg1; # changing $arg1 here will not be visible outside this scope
$arg1++;
local $arg2->{key1}; # only the key1 portion of the hashref referenced by $arg2 is localized
$arg2->{key1}->{key2} = 'foo'; # this change is not visible outside the function
}
Occasionally I have been bitten by forgetting to localize a data structure that was passed by reference to a function, that I changed inside the function. Conversely, I have also returned a data structure as a function result that was shared among multiple systems and the caller then proceeded to change the data by mistake, affecting these other systems in a difficult-to-trace problem usually called action at a distance. The best thing to do here would be to make a clone of the data before returning it*, or make it read-only**.
* In Perl, see the function dclone() in the built-in Storable module.
** In Perl, see lock_hash() or lock_hash_ref() in the built-in Hash::Util module).

Resources