Help me refactor this loop - refactoring

I am working on the redesign of an existing class. In this class about a 400-line while loop that does most of the work. The body of the loop is a minefield of if statements, variable assignments and there is a "continue" in the middle somewhere. The purpose of the loop is hard to understand.
In pseudocode, here's where I'm at the redesign:
/* Some code here to create the objects based on config parameters */
/* Rather than having if statements scattered through the loop I */
/* create instances of the appropriate classes. The constructors */
/* take a database connection. */
FOR EACH row IN mySourceOfData
int p = batcher.FindOrCreateBatch( row );
int s = supplierBatchEntryCreator.CreateOrUpdate( row, p );
int b = buyerBatchEntryCreator.CreateOrUpdate( row, p );
mySouceOfData.UpdateAsIncludedInBatch( p, s, b);
NEXT
/* Allow things to complete their last item */
mySupplierBatchEntry.finish();
myBuyerBatchEntry.finish();
myBatcher.finish();
/* Some code here to dispose of things */
RETURN myBatch.listOfBatches();
Inside FindOrCreateBatch() it figures out using some rules if a new batch needs to be created or if an existing one can be used. The different implementations of this interface will have different rules for how it finds them, etc. The return value is the surrogate key (id) from the database of the payment batch that it found or created. This id is required by following processes that take p as a parameter.
This is an improvement over where I started, but I have an uneasy feeling about the class containing this loop.
It doesn't seem to a be a domain object, it's more of a "Manager" or "Controller" type class.
It seems to be getting inbetween batcher and supplierBatchEntryCreator (and the other classes). At the moment only an int is passed, but if that changes all three classes need to change. This seems like a Law of Dementer violation.
Any suggestions, or is this ok? The actual language is java.

I have a couple of questions to ask you:
Does it work?
Is it fast enough?
Is it readable/maintainable?
If the answer to all three is yes then, beyond that, further changes are really just wasted effort in my opinion. Don't refactor just for the sake of refactoring.
Far too often people change things in anticipation of what might be (your "changing int" for example). I prefer to subscribe to the YAGNI school of thought. The right time to worry about that is when you do it.
And the Law of Demeter is a design guideline, not a rule. In the real world, pragmatism usually beats dogmatism :-)

What is the relationship between each XXXEntryCreator and XXXEntry? I feel like I am missing something, since the "Creators" only return integers.
Beyond that, you took 400 lines of crud down to something that fits on a screen, and has a reasonably visible data flow between steps. Kudos. (I have experienced strong resistance in the past for trying to make such changes -- why do people write N-100/1000 line run-on else-if drivel?)

FindOrCreate and CreateOrUpdate suggest to me that maybe multiple passes would be simpler (and not knowing the rest of the code, I can't know if it would degrade performance, which is a common concern raised when multiple passes are suggested).
If you had one loop to create any missing batches, suppliers, and buyers (or three loops), then this loop could be reduced to
FOR EACH row IN mySourceOfData
int p = batcher.FindBatch( row );
int s = supplierBatchEntryCreator.Update( row, p );
int b = buyerBatchEntryCreator.Update( row, p );
mySouceOfData.UpdateAsIncludedInBatch( p, s, b);
NEXT
Now I see that the Creator's are updating - is that right? Does splitting the creation and update responsibility into two classes make sense, perhaps?
It's starting to look a little simpler to me. Does it help?

Related

Vectorize object oriented implementation in MATLAB

I'm trying to optimize a given object oriented code in matlab. It is an economical model and consists of a Market and Agents. The time consuming part is to update certain attributes of all Agents during each timestep which is implemented in a for loop.
However, I fail to vectorize the object oriented code.
Here is an example (Note, the second thing that slows down the code so far is the fact, that new entries are attached to the end of the vector. I'm aware of that and will fix that also):
for i=1:length(obj.traders)
obj.traders(i).update(obj.Price,obj.Sentiment(end),obj.h);
end
Where update looks like
function obj=update(obj,price,s,h)
obj.pos(end+1)=obj.p;
obj.wealth(end+1)=obj.w(1,1,1);
obj.g(end+1)=s;
obj.price=price;
obj.Update_pos(sentiment,h);
if (obj.c)
obj.Switch_Pos;
end
...
My first idea was to try something like
obj.traders(:).update(obj.Price,obj.Sentiment(end),obj.h);
Which didn't work. If someone has any suggestions how to vectorize this code, while keeping the object oriented implementation, I would be very happy.
I cannot provide a complete solution as this depends on the details of your implementation, but here are some tips which you could use to improve your code:
Remembering that a MATLAB object generally behaves like a struct, assignment of a constant value to a field can be done using [obj.field] =​deal(val); e.g.:
[obj.trader.price] = deal(obj.Price);
This can also be extended to non-constant RHS, using cell, like so:
[aStruct.(fieldNamesCell{idx})] = deal(valueCell{:}); %// or deal(numericVector(:));
To improve the update function, I would suggest making several lines where you create the RHS vectors\cells followed by "simultaneous" assignment to all relevant fields of the objects in the array.
Other than that consider:
setfield: s = setfield(s,{sIndx1,...,sIndxM},'field',{fIndx1,...,fIndxN},value);
structfun:
s = structfun(#(x)x(1:3), s, 'UniformOutput', false, 'ErrorHandler', #errfn);
"A loop-based solution can be flexible and easily readable".
P.S.
On a side note, I'd suggest you name the obj in your functions according to the class name, which would make it more readable to others, i.e.:
function obj=update(obj,price,s,h) => function traderObj=update(traderObj,price,s,h)

Initialize member variables in a method and not the constructor

I have a public method which uses a variable (only in the scope of the public method) I pass as a parameter we will call A, this method calls a private method multiple times which also requires the parameter.
At present I am passing the parameter every time but it looks weird, is it bad practice to make this member variable of the class or would the uncertainty about whether it is initialized out way the advantages of not having to pass it?
Simplified pseudo code:
public_method(parameter a)
do something with a
private_method(string_a, a)
private_method(string_b, a)
private_method(string_c, a)
private_method(String, parameter a)
do something with String and a
Additional information: parameter a is a read only map with over 100 entries and in reality I will be calling private_method about 50 times
I had this same problem myself.
I implemented it differently in 3 different contexts to see hands-on what are result using 3 different strategies, see below.
Note that I am type of programmer that makes many changes to the code always trying to improve it. Thus I settle only for the code that is amenable to changes, readbale, would you call this "flexible" code. I settle only for very clear code.
After experimentation, I came to these results:
Passing a as parameter is perfectly OK if you have one or two - short number - of such values. Passing in parmeters has very good visibility, clarity, clear passing lines, well visible lifetime (initialization points, destruction points), amenable to changes, easy to track.
If number of such values begin to grow to >= 5-6 values, I swithc to approach #3 below.
Passing values through class members -- did not do good to clarity of my code, eventually I got rid of it. It makes for less clear code. Code becomes muddled. I did not like it. It had no advantages.
As alternative to (1) and (2), I adopted Inner class approach, in cases when amount of such values is > 5 (which makes for too long argument list).
I pack those values into small Inner class and pass such object by reference as argument to all internal members.
Public function of a class usually creates an object of Inner class (I call is Impl or Ctx or Args) and passes it down to private functions.
This combines clarity of arg passing with brevity. It's perfect.
Good luck
Edit
Consider preparing array of strings and using a loop rather than writing 50 almost-identical calls. Something like char *strings[] = {...} (C/C++).
This really depends on your use case. Does 'a' represent a state that your application/object care about? Then you might want to make it a member of your object. Evaluate the big picture, think about maintenance, extensibility when designing structures.
If your parameter a is a of a class of your own, you might consider making the private_method a public method for the variable a.
Otherwise, I do not think this looks weird. If you only need a in just 1 function, making it a private variable of your class would be silly (at least to me). However, if you'd need it like 20 times I would do so :P Or even better, just make 'a' an object of your own that has that certain function you need.
A method should ideally not pass more than 7 parameters. Using the number of parameters more than 6-7 usually indicates a problem with the design (do the 7 parameters represent an object of a nested class?).
As for your question, if you want to make the parameter private only for the sake of passing between private methods without the parameter having anything to do with the current state of the object (or some information about the object), then it is not recommended that you do so.
From a performance point of view (memory consumption), reference parameters can be passed around as method parameters without any significant impact on the memory consumption as they are passed by reference rather than by value (i.e. a copy of the data is not created). For small number of parameters that can be grouped together you can use a struct. For example, if the parameters represent x and y coordinates of a point, then pass them in a single Point structure.
Bottomline
Ask yourself this question, does the parameter that you are making as a members represent any information (data) about the object? (data can be state or unique identification information). If the answer to his question is a clear no, then do not include the parameter as a member of the class.
More information
Limit number of parameters per method?
Parameter passing in C#

Using function arguments as local variables

Something like this (yes, this doesn't deal with some edge cases - that's not the point):
int CountDigits(int num) {
int count = 1;
while (num >= 10) {
count++;
num /= 10;
}
return count;
}
What's your opinion about this? That is, using function arguments as local variables.
Both are placed on the stack, and pretty much identical performance wise, I'm wondering about the best-practices aspects of this.
I feel like an idiot when I add an additional and quite redundant line to that function consisting of int numCopy = num, however it does bug me.
What do you think? Should this be avoided?
As a general rule, I wouldn't use a function parameter as a local processing variable, i.e. I treat function parameters as read-only.
In my mind, intuitively understandabie code is paramount for maintainability, and modifying a function parameter to use as a local processing variable tends to run counter to that goal. I have come to expect that a parameter will have the same value in the middle and bottom of a method as it does at the top. Plus, an aptly-named local processing variable may improve understandability.
Still, as #Stewart says, this rule is more or less important depending on the length and complexity of the function. For short simple functions like the one you show, simply using the parameter itself may be easier to understand than introducing a new local variable (very subjective).
Nevertheless, if I were to write something as simple as countDigits(), I'd tend to use a remainingBalance local processing variable in lieu of modifying the num parameter as part of local processing - just seems clearer to me.
Sometimes, I will modify a local parameter at the beginning of a method to normalize the parameter:
void saveName(String name) {
name = (name != null ? name.trim() : "");
...
}
I rationalize that this is okay because:
a. it is easy to see at the top of the method,
b. the parameter maintains its the original conceptual intent, and
c. the parameter is stable for the rest of the method
Then again, half the time, I'm just as apt to use a local variable anyway, just to get a couple of extra finals in there (okay, that's a bad reason, but I like final):
void saveName(final String name) {
final String normalizedName = (name != null ? name.trim() : "");
...
}
If, 99% of the time, the code leaves function parameters unmodified (i.e. mutating parameters are unintuitive or unexpected for this code base) , then, during that other 1% of the time, dropping a quick comment about a mutating parameter at the top of a long/complex function could be a big boon to understandability:
int CountDigits(int num) {
// num is consumed
int count = 1;
while (num >= 10) {
count++;
num /= 10;
}
return count;
}
P.S. :-)
parameters vs arguments
http://en.wikipedia.org/wiki/Parameter_(computer_science)#Parameters_and_arguments
These two terms are sometimes loosely used interchangeably; in particular, "argument" is sometimes used in place of "parameter". Nevertheless, there is a difference. Properly, parameters appear in procedure definitions; arguments appear in procedure calls.
So,
int foo(int bar)
bar is a parameter.
int x = 5
int y = foo(x)
The value of x is the argument for the bar parameter.
It always feels a little funny to me when I do this, but that's not really a good reason to avoid it.
One reason you might potentially want to avoid it is for debugging purposes. Being able to tell the difference between "scratchpad" variables and the input to the function can be very useful when you're halfway through debugging.
I can't say it's something that comes up very often in my experience - and often you can find that it's worth introducing another variable just for the sake of having a different name, but if the code which is otherwise cleanest ends up changing the value of the variable, then so be it.
One situation where this can come up and be entirely reasonable is where you've got some value meaning "use the default" (typically a null reference in a language like Java or C#). In that case I think it's entirely reasonable to modify the value of the parameter to the "real" default value. This is particularly useful in C# 4 where you can have optional parameters, but the default value has to be a constant:
For example:
public static void WriteText(string file, string text, Encoding encoding = null)
{
// Null means "use the default" which we would document to be UTF-8
encoding = encoding ?? Encoding.UTF8;
// Rest of code here
}
About C and C++:
My opinion is that using the parameter as a local variable of the function is fine because it is a local variable already. Why then not use it as such?
I feel silly too when copying the parameter into a new local variable just to have a modifiable variable to work with.
But I think this is pretty much a personal opinion. Do it as you like. If you feel sill copying the parameter just because of this, it indicates your personality doesn't like it and then you shouldn't do it.
If I don't need a copy of the original value, I don't declare a new variable.
IMO I don't think mutating the parameter values is a bad practice in general,
it depends on how you're going to use it in your code.
My team coding standard recommends against this because it can get out of hand. To my mind for a function like the one you show, it doesn't hurt because everyone can see what is going on. The problem is that with time functions get longer, and they get bug fixes in them. As soon as a function is more than one screen full of code, this starts to get confusing which is why our coding standard bans it.
The compiler ought to be able to get rid of the redundant variable quite easily, so it has no efficiency impact. It is probably just between you and your code reviewer whether this is OK or not.
I would generally not change the parameter value within the function. If at some point later in the function you need to refer to the original value, you still have it. in your simple case, there is no problem, but if you add more code later, you may refer to 'num' without realizing it has been changed.
The code needs to be as self sufficient as possible. What I mean by that is you now have a dependency on what is being passed in as part of your algorithm. If another member of your team decides to change this to a pass by reference then you might have big problems.
The best practice is definitely to copy the inbound parameters if you expect them to be immutable.
I typically don't modify function parameters, unless they're pointers, in which case I might alter the value that's pointed to.
I think the best-practices of this varies by language. For example, in Perl you can localize any variable or even part of a variable to a local scope, so that changing it in that scope will not have any affect outside of it:
sub my_function
{
my ($arg1, $arg2) = #_; # get the local variables off the stack
local $arg1; # changing $arg1 here will not be visible outside this scope
$arg1++;
local $arg2->{key1}; # only the key1 portion of the hashref referenced by $arg2 is localized
$arg2->{key1}->{key2} = 'foo'; # this change is not visible outside the function
}
Occasionally I have been bitten by forgetting to localize a data structure that was passed by reference to a function, that I changed inside the function. Conversely, I have also returned a data structure as a function result that was shared among multiple systems and the caller then proceeded to change the data by mistake, affecting these other systems in a difficult-to-trace problem usually called action at a distance. The best thing to do here would be to make a clone of the data before returning it*, or make it read-only**.
* In Perl, see the function dclone() in the built-in Storable module.
** In Perl, see lock_hash() or lock_hash_ref() in the built-in Hash::Util module).

Is there a case where parameter validation may be considered redundant?

The first thing I do in a public method is to validate every single parameter before they get any chance to get used, passed around or referenced, and then throw an exception if any of them violate the contract. I've found this to be a very good practice as it lets you catch the offender the moment the infraction is committed but then, quite often I write a very simple getter/indexer such as this:
private List<Item> m_items = ...;
public Item GetItemByIdx( int idx )
{
if( (idx < 0) || (idx >= m_items.Count) )
{
throw new ArgumentOutOfRangeException( "idx", "Invalid index" );
}
return m_items[ idx ];
}
In this case the index parameter directly relates to the indexes in the list, and I know for a fact (e.g. documentation) that the list itself will do exactly the same and will throw the same exception. Should I remove this verification or I better leave it alone?
I wanted to know what you guys think, as I'm now in the middle of refactoring a big project and I've found many cases like the above.
Thanks in advance.
It's not just a matter of taste, consider
if (!File.Exists(fileName)) throw new ArgumentException("...");
var s = File.OpenText(fileName);
This looks similar to your example but there are several reasons (concurrency, access rights) why the OpenText() method could still fail, even with a FileNotFound error. So the Exists-check is just giving a false feeling of security and control.
It is a mind-set thing, when you are writing the GetItemByIdx method it probably looks quite sensible. But if you look around in a random piece of code there are usually lots of assumptions you could check before proceeding. It's just not practical to check them all, over and over. We have to be selective.
So in a simple pass-along method like GetItemByIdx I would argue against redundant checks. But as soon as the function adds more functionality or if there is a very explicit specification that says something about idx that argument turns around.
As a rule of thumb an exception should be thrown when a well defined condition is broken and that condition is relevant at the current level. If the condition belongs to a lower level, then let that level handle it.
I would only do parameter verification where it would lead to some improvement in code behavior. Since you know, in this case, that the check will be performed by the List itself, then your own check is redundant and provides no extra value, so I wouldn't bother.
It's true that possibly you duplicated work that's already been done in the API, but it's there now. If your error handling framework works and is solid, and isn't causing performance issues (profiling IYF) then I reckon leave it, and gradually phase it out if you have time. It doesn't sound like a top priority!

What are the pros and cons of putting as much logic as possible in a minimum(one-liners) piece of code?

Is it cool?
IMO one-liners reduces the readability and makes debugging/understanding more difficult.
Maximize understandability of the code.
Sometimes that means putting (simple, easily understood) expressions on one line in order to get more code in a given amount of screen real-estate (i.e. the source code editor).
Other times that means taking small steps to make it obvious what the code means.
One-liners should be a side-effect, not a goal (nor something to be avoided).
If there is a simple way of expressing something in a single line of code, that's great. If it's just a case of stuffing in lots of expressions into a single line, that's not so good.
To explain what I mean - LINQ allows you to express quite complicated transformations in relative simplicity. That's great - but I wouldn't try to fit a huge LINQ expression onto a single line. For instance:
var query = from person in employees
where person.Salary > 10000m
orderby person.Name
select new { person.Name, person.Deparment };
is more readable than:
var query = from person in employees where person.Salary > 10000m orderby person.Name select new { person.Name, person.Deparment };
It's also more readabe than doing all the filtering, ordering and projection manually. It's a nice sweet-spot.
Trying to be "clever" is rarely a good idea - but if you can express something simply and concisely, that's good.
One-liners, when used properly, transmit your intent clearly and make the structure of your code easier to grasp.
A python example is list comprehensions:
new_lst = [i for i in lst if some_condition]
instead of:
new_lst = []
for i in lst:
if some_condition:
new_lst.append(i)
This is a commonly used idiom that makes your code much more readable and compact. So, the best of both worlds can be achieved in certain cases.
This is by definition subjective, and due to the vagueness of the question, you'll likely get answers all over the map. Are you referring to a single physical line or logical line? EG, are you talking about:
int x = BigHonkinClassName.GetInstance().MyObjectProperty.PropertyX.IntValue.This.That.TheOther;
or
int x = BigHonkinClassName.GetInstance().
MyObjectProperty.PropertyX.IntValue.
This.That.TheOther;
One-liners, to me, are a matter of "what feels right." In the case above, I'd probably break that into both physical and logic lines, getting the instance of BigHonkinClassName, then pulling the full path to .TheOther. But that's just me. Other people will disagree. (And there's room for that. Like I said, subjective.)
Regarding readability, bear in mind that, for many languages, even "one-liners" can be broken out into multiple lines. If you have a long set of conditions for the conditional ternary operator (? :), for example, it might behoove you to break it into multiple physical lines for readability:
int x = (/* some long condition */) ?
/* some long method/property name returning an int */ :
/* some long method/property name returning an int */ ;
At the end of the day, the answer is always: "It depends." Some frameworks (such as many DAL generators, EG SubSonic) almost require obscenely long one-liners to get any real work done. Othertimes, breaking that into multiple lines is quite preferable.
Given concrete examples, the community can provide better, more practical advice.
In general, I definitely don't think you should ever "squeeze" a bunch of code onto a single physical line. That doesn't just hurt legibility, it smacks of someone who has outright disdain for the maintenance programmer. As I used to teach my students: always code for the maintenance programmer, because it will often be you.
:)
Oneliners can be useful in some situations
int value = bool ? 1 : 0;
But for the most part they make the code harder to follow. I think you only should put things on one line when it is easy to follow, the intent is clear, and it won't affect debugging.
One-liners should be treated on a case-by-case basis. Sometimes it can really hurt readability and a more verbose (read: easy-to-follow) version should be used.
There are times, however when a one-liner seems more natural. Take the following:
int Total = (Something ? 1 : 2)
+ (SomethingElse ? (AnotherThing ? x : y) : z);
Or the equivalent (slightly less readable?):
int Total = Something ? 1 : 2;
Total += SomethingElse ? (AnotherThing ? x : y) : z;
IMHO, I would prefer either of the above to the following:
int Total;
if (Something)
Total = 1;
else
Total = 2;
if (SomethingElse)
if (AnotherThing)
Total += x;
else
Total += y;
else
Total += z
With the nested if-statements, I have a harder time figuring out the final result without tracing through it. The one-liner feels more like the math formula it was intended to be, and consequently easier to follow.
As far as the cool factor, there is a certain feeling of accomplishment / show-off factor in "Look Ma, I wrote a whole program in one line!". But I wouldn't use it in any context other than playing around; I certainly wouldn't want to have to go back and debug it!
Ultimately, with real (production) projects, whatever makes it easiest to understand is best. Because there will come a time that you or someone else will be looking at the code again. What they say is true: time is precious.
That's true in most cases, but in some cases where one-liners are common idioms, then it's acceptable. ? : might be an example. Closure might be another one.
No, it is annoying.
One liners can be more readable and they can be less readable. You'll have to judge from case to case.
And, of course, on the prompt one-liners rule.
VASTLY more important is developing and sticking to a consistent style.
You'll find bugs MUCH faster, be better able to share code with others, and even code faster if you merely develop and stick to a pattern.
One aspect of this is to make a decision on one-liners. Here's one example from my shop (I run a small coding department) - how we handle IFs:
Ifs shall never be all on one line if they overflow the visible line length, including any indentation.
Thou shalt never have else clauses on the same line as the if even if it comports with the line-length rule.
Develop your own style and STICK WITH IT (or, refactor all code in the same project if you change style).
.
The main drawback of "one liners" in my opinion is that it makes it hard to break on the code and debug. For example, pretend you have the following code:
a().b().c(d() + e())
If this isn't working, its hard to inspect the intermediate values. However, it's trivial to break with gdb (or whatever other tool you may be using) in the following, and check each individual variable and see precisely what is failing:
A = a();
B = A.b();
D = d();
E = e(); // here i can query A B D and E
B.C(d + e);
One rule of thumb is if you can express the concept of the one line in plain language in a very short sentence. "If it's true, set it to this, otherwise set it to that"
For a code construct where the ultimate objective of the entire structure is to decide what value to set a single variable, With appropriate formatting, it is almost always clearer to put multiple conditonals into a single statement. With multiple nested if end if elses, the overall objective, to set the variable...
" variableName = "
must be repeated in every nested clause, and the eye must read all of them to see this.. with a singlr statement, it is much clearer, and with the appropriate formatting, the complexity is more easily managed as well...
decimal cost =
usePriority? PriorityRate * weight:
useAirFreight? AirRate * weight:
crossMultRegions? MultRegionRate:
SingleRegionRate;
The prose is an easily understood one liner that works.
The cons is the concatenation of obfuscated gibberish on one line.
Generally, I'd call it a bad idea (although I do it myself on occasion) -- it strikes me as something that's done more to impress on how clever someone is than it is to make good code. "Clever tricks" of that sort are generally very bad.
That said, I personally aim to have one "idea" per line of code; if this burst of logic is easily encapsulated in a single thought, then go ahead. If you have to stop and puzzle it out a bit, best to break it up.

Resources