Refactoring methods in existing code base with huge number of parameters

Refactoring methods in existing code base with huge number of parameters - methods

I have inherited an existing code base where the "features" are as follows:
huge monolithic classes with
(literally) 100's of member variables
and methods that go one for pages
(er. screens)
public and private methods with a large number of arguments.
I am trying to clean up and refactor the code, to leave it a little better
than how I found it. So my questions
is worth it (or do you) refactor methods with 10 or so arguments so that they are more readable ?
are there best practices on how long methods should be ? How long do you usually keep them?
are monolithic classes bad ?

is worth it (or do you) refactor methods with 10 or so arguments so that they are more readable ?
Yes, it is worth it. It is typically more important to refactor methods that are not "reasonable" than ones that already are nice, short, and have a small argument list.
Typically, if you have many arguments, it's because a method does too much - most likely, it should be a class of it's own, not a method.
That being said, in those cases when many parameters are required, it's best to encapsulate the parameters into a single class (ie: SpecificAlgorithmOptions), and pass one instance of that class. This way, you can provide clean defaults, and its very obvious which methods are essential vs. optional (based on what is required to construct the options class).
are there best practices on how long methods should be ? How long do you usually keep them?
A method should be as short as possible. It should have one purpose, and be used for one task, whenver possible. If it's possible to split it into separate methods, where each as a real, qualitative "task", then do so when refactoring.
are monolithic classes bad ?
Yes.

if the code is working and there is no need to touch it, i wouldn't refactor. i only refactor very problematic cases if i anyway have to touch them (either for extending them for functionality or bug-fixing). I favor the pragmatic way: Only (in 95%) touch, what you change.
Some first thoughts on your specific problem (though in detail it is difficult without knowing the code):
start to group instance variables, these groups will then be target to do 'extract class'
when having grouped these variables you hopefully can group some methods, which also be moved when doing 'extract class'
often there are many methods which aren't using any fields. make them static (they most likely are helper methods, which can be extracted to helper-classes.
in case non-related instance fields are mixed in many methods, do loads of 'extract method'
use automatic refactoring tools as much as possible, because you most likely have no tests in place and automation is more safe.
Regarding your other concrete questions.
is worth it (or do you) refactor methods with 10 or so arguments so that they are more readable?
definetely. 10 parameters are too many to grasp for us humans. most likely the method is doing too much.
are there best practices on how long methods should be ? How long do you usually keep them?
it depends... on preferences. i stated some things on this thread (though the question was PHP). still i would apply these numbers/metrics to any language.
are monolithic classes bad ?
it depends, what you mean with monolithic. if you mean many instance variables, endless methods, a lot of if/else complexity, yes.
also have a look at a real gem (to me a must have for every developer): working effectively with legacy code

Assuming the code is functioning I would suggest you think about these questions first:
is the code well documented?
do you understand the code?
how often are new features being added?
how often are bugs reported and fixed?
how difficult is it to modify and fix the code?
what is the expected life of the code?
how many versions of the compiler are you behind (if at all)?
is the OS it runs on expected to change during its lifetime?
If the system will be replaced in five years, is documented well, will undergo few changes, and bugs are easy to fix - leave it alone regardless of the size of the classes and the number of parameters. If you are determined to refactor make a list of your refactoring proposals in the order of maximum benefit with minimum changes and attack it incrementally.

Related

Monolithic Methods - Disadvantages

Let's say I am creating a MATH class and need to provide a method to process two numbers. [
Instead of providing the traditional mechanism of have methods for each possible operation I provide a single method eval: float eval(ArgObj); where ArgObj is an object which can hold two numbers and an operator. Thus now with a single method I can do multiple operations.
What are the disadvantages of this design?
Two certain disadvantages are maintenance and documentation as eval get the ability to process more operations.
What are the other disadvantages that I am missing out here?
Update:
What I am trying to figure out are negatives of a large monolithic method, the above example is just hypothetical another similar example would be a method like
float doSomething(int basedOn)
where doSomething can do a bunch of operations.

Reading code should be a pleasurable experience, knowing what a method does should be blatantly obvious.
Would you also like to reduce the English language to 10 words? Of course not...
Learn from well-used and well-loved APIs, and ensure your API is easy to learn, easy to use, and difficult to misuse.
I would suggest doSomething doesn't come close. What happens when you need a different action, do you call the new method doSomething2? Hopefully, you don't see that as a viable option...

How do you go about splitting a class with TDD?

I feel pretty skilled in TDD, and I'm even consired the "TDD expert" in my company, but nevertheless, there are some cases that I feel I don't know how to handle properly, so I would like to hear other's opinions.
My problems is as follows:
Even though in general TDD helps me think of the core responsibility of a class, and extract every other responsibility to dependent classes, there are cases that after some time I realize that one of the classes has multiple responsibilities and it needs to be refactored and split it into 2 classes. This conclusion often comes because the tests of that class start to become complicated or repetitive. I can pretty easily do refactoring to split this class to the design I want (and I do it in small steps, keeping on the green bar). My problem is that I end up with the same complicated and repetitive tests that now tests the 2 classes together, while I would like to have seperate tests for each class.
The only (more-or-less safe) manner I could think of for doing that, is to do the following for each test (after I completed the refactoring of the production code):
Duplicate the test case
Change one copy of the test to use a mock instead of the 1st class, and the other copy of the test to use a mock instead of the 2nd class.
Then if I see that an identical test already exists for one of the copies, I delete it.
I think that sometimes its possible to do the following:
start by creating the 2 classes from scratch (using TDD of course)
Change the old tests to use the new classes instead of the old one
Delete the old class
Delete the old tests
Both of these techniques seems pretty cumbersome and time consuming, so I wonder: how do the "real experts" go about this issue?

Without an actual example I can't be sure I know what exactly you mean. But it sounds like you try to test every class (and maybe even every method) in isolation.
When I get to a point where I want to/have to split a class into multiple classes, I tend to still view the resulting collection of classes as a unit and test it as a whole. Only when they stop building a functional whole and start to become independent units, I test them independently of each other.

I can pretty easily do refactoring to split this class to the design I
want (and I do it in small steps, keeping on the green bar). My
problem is that I end up with the same complicated and repetitive
tests that now tests the 2 classes together, while I would like to
have seperate tests for each class.
I've gotten to this point as well. Here I start refactoring the tests, using the same techniques as for the non-test code - convert variable to field, move field, extract method, move variable, etc etc. Naming is of course very important and provides a lot of design guidance.
eg http://www.kdgregory.com/index.php?page=junit.refactoring
eg http://www.natpryce.com/articles/000686.html
eg http://www-public.it-sudparis.eu/~gibson/Teaching/CSC7302/ReadingMaterial/vanDeursenMdenBK01.pdf
That last article has some example smells and refactorings common to refactoring tests specifically.

I start with asking myself (as you have) what are the responsibilities of a class. Let's say for example that your class is responsible to aggregate weather data and generate a weather report.
At this point I make three (3) lists:
Data aggregation members (attributes, behaviors)
Report generation members
Common members
The first two are easy, the members that exclusively belong in one class exclusively become part of one of the two new classes. I will keep the original dual-responsibility class as a facade, whose members are a pass-through to the new classes, so that tests and functionality will not be broken while refactoring. Depending on circumstances, I may eventually remove the facade, and refactor the tests and dependent objects to use the new classes.
As for the members that are common to both responsibilities - I will move them to a helper class (usually scoped as internal), that the new classes (and any others may use). The functionality has proven to be reused, and may be reused again. Note that the common members might not necessarily all land in one helper class; the helping functionality might be added to one new class, multiple (depending, of course on responsibilities) classes, and some functionality may be added to existing helper classes, if one fits the bill.

I wondered about this a while back, and couldn't really find a satisfactory answer. Here are some discussions I found on the topic:
http://tech.groups.yahoo.com/group/testdrivendevelopment/message/27199
and
http://tech.groups.yahoo.com/group/testdrivendevelopment/message/16227
Personally, I've adopted a "hair-trigger" approach to moving responsibilites into dependencies, and while "spinning off" a new dependency before there is a clear need for it smacks of YAGNI, I've found that re-absorbing a dependency that turned out to be too anemic to warrant being a separate class is much easier than the rigmarole involved with splitting out a separate class from a class that already has a significant battery of tests written for it.
Edit:
Oh - and I should probably point out that I'm not at all a "real expert" ;)

howto struct a code? e.g. at the top all variables, then all methods and last all event handlers

i'm currently working on a big projekt and i loose many time searching the right thing in the code. i need to get e.g. a method which makes someting special. so i scroll the whole code.
are there any common and effective methods to struct a file of code? e.g.
1. all global variables
2. constructor etc.
3. all methods
4. all event handlers
do you know common methods to do this??

It's more usual to break large projects into several source files, with logically related functionality. This helps with speeding up compilation and reducing coupling in your design as well as helping you navigate the code.
An example might be to have separate files for
UI functionality
helper classes (such as geometric/maths stuff)
file I/O
core functionality that connects the rest together
Design is a large topic, the book Code Complete by Steve McConnell might be a good starting point for you.

You shouldnt use global variables :)
Try spreading things out over different classes and files. Maks sure each class has only one purpose, instead of 1 class that manages a whole lot of different tasks.

That sounds like a sensible enough structure to me, what would really benefit you though is learning to use the tools you have available — whatever editor you're using it will have a search function, you can use that to quickly find what you're looking for.
Some editors will also include bookmarks too, and most offer a way to move back and forward through recent positions in the file.

Seen this sort of things started, never seen it kept on under the pressure to turn out code though.
Basically my rule of thumb is, if I feel the need to do this, break the code file up.

Typing similar code

I've occasionally found myself in situations where I have to type out redundant code... where only one variable or two will change in each block of code. Usually I'll copy and paste this block and make the necessary changes on each block of code... but is there a better way to handle this?

Heavy use of cut and paste usually means there's something not quite right in the design of the code. Think about how you could refactor such as breaking out the cut/paste functionality into commonly called methods.

Yes. There is always a better way to do it than copy-and-paste. You should always get a little uneasy (kind of like you feel when you're about to give a speech in front of a huge crowd) when you're about to hit "Ctrl-V."
In almost any introductory class you're likely to be using a language that has functions, methods, or sub procedures. (What they're called and what they do depends on the language in question). Any variable that changes needs to be a parameter to that function/method/subprocedure.
When you do that (and the method/function/sub is accessible) you can replace the HUGE chunks of code with a single call to your new m-f-s.
There are a lot of other ways to do this, but when you're just getting started this is probably the way to go.

you have a lot of approaches to this situation. I don't know if you're working with OO or structured programming but you can build methods or functions and give them cohesion and unique responsibilities. I think it's an easy way of thinking...
In the OO paradigm we use some therms on how to avoid this situation: cohesion and low decoupling (you could search for them over the Internet). If you can apply both of them in your code, it will be easier to read and maintain.
That's all

How to name factory like methods?

I guess that most factory-like methods start with create. But why are they called "create"? Why not "make", "produce", "build", "generate" or something else? Is it only a matter of taste? A convention? Or is there a special meaning in "create"?
createURI(...)
makeURI(...)
produceURI(...)
buildURI(...)
generateURI(...)
Which one would you choose in general and why?

Some random thoughts:
'Create' fits the feature better than most other words. The next best word I can think of off the top of my head is 'Construct'. In the past, 'Alloc' (allocate) might have been used in similar situations, reflecting the greater emphasis on blocks of data than objects in languages like C.
'Create' is a short, simple word that has a clear intuitive meaning. In most cases people probably just pick it as the first, most obvious word that comes to mind when they wish to create something. It's a common naming convention, and "object creation" is a common way of describing the process of... creating objects.
'Construct' is close, but it is usually used to describe a specific stage in the process of creating an object (allocate/new, construct, initialise...)
'Build' and 'Make' are common terms for processes relating to compiling code, so have different connotations to programmers, implying a process that comprises many steps and possibly a lot of disk activity. However, the idea of a Factory "building" something is a sensible idea - especially in cases where a complex data-structure is built, or many separate pieces of information are combined in some way.
'Generate' to me implies a calculation which is used to produce a value from an input, such as generating a hash code or a random number.
'Produce', 'Generate', 'Construct' are longer to type/read than 'Create'. Historically programmers have favoured short names to reduce typing/reading.

Joshua Bloch in "Effective Java" suggests the following naming conventions
valueOf — Returns an instance that has, loosely speaking, the same value
as its parameters. Such static factories are effectively
type-conversion methods.
of — A concise alternative to valueOf, popularized by EnumSet (Item 32).
getInstance — Returns an instance that is described by the parameters
but cannot be said to have the same value. In the case of a singleton,
getInstance takes no parameters and returns the sole instance.
newInstance — Like getInstance, except that newInstance guarantees that
each instance returned is distinct from all others.
getType — Like getInstance, but used when the factory method is in a
different class. Type indicates the type of object returned by the
factory method.
newType — Like newInstance, but used when the factory method is in a
different class. Type indicates the type of object returned by the
factory method.

Wanted to add a couple of points I don't see in other answers.
Although traditionally 'Factory' means 'creates objects', I like to think of it more broadly as 'returns me an object that behaves as I expect'. I shouldn't always have to know whether it's a brand new object, in fact I might not care. So in suitable cases you might avoid a 'Create...' name, even if that's how you're implementing it right now.
Guava is a good repository of factory naming ideas. It is popularising a nice DSL style. examples:
Lists.newArrayListWithCapacity(100);
ImmutableList.of("Hello", "World");

"Create" and "make" are short, reasonably evocative, and not tied to other patterns in naming that I can think of. I've also seen both quite frequently and suspect they may be "de facto standards". I'd choose one and use it consistently at least within a project. (Looking at my own current project, I seem to use "make". I hope I'm consistent...)
Avoid "build" because it fits better with the Builder pattern and avoid "produce" because it evokes Producer/Consumer.
To really continue the metaphor of the "Factory" name for the pattern, I'd be tempted by "manufacture", but that's too long a word.

I think it stems from “to create an object”. However, in English, the word “create” is associated with the notion “to cause to come into being, as something unique that would not naturally evolve or that is not made by ordinary processes,” and “to evolve from one's own thought or imagination, as a work of art or an invention.” So it seems as “create” is not the proper word to use. “Make,” on the other hand, means “to bring into existence by shaping or changing material, combining parts, etc.” For example, you don’t create a dress, you make a dress (object). So, in my opinion, “make” by meaning “to produce; cause to exist or happen; bring about” is a far better word for factory methods.

Partly convention, partly semantics.
Factory methods (signalled by the traditional create) should invoke appropriate constructors. If I saw buildURI, I would assume that it involved some computation, or assembly from parts (and I would not think there was a factory involved). The first thing that I thought when I saw generateURI is making something random, like a new personalized download link. They are not all the same, different words evoke different meanings; but most of them are not conventionalised.

I'd call it UriFactory.Create()
Where,
UriFactory is the name of the class type which is provides method(s) that create Uri instances.
and Create() method is overloaded for as many as variations you have in your specs.
public static class UriFactory
{
//Default Creator
public static UriType Create()
{
}
//An overload for Create()
public static UriType Create(someArgs)
{
}
}

I like new. To me
var foo = newFoo();
reads better than
var foo = createFoo();
Translated to english we have foo is a new foo or foo is create foo. While I'm not a grammer expert I'm pretty sure the latter is grammatically incorrect.

I'd point out that I've seen all of the verbs but produce in use in some library or other, so I wouldn't call create being an universal convention.
Now, create does sound better to me, evokes the precise meaning of the action.
So yes, it is a matter of (literary) taste.

Personally I like instantiate and instantiateWith, but that's just because of my Unity and Objective C experiences. Naming conventions inside the Unity engine seem to revolve around the word instantiate to create an instance via a factory method, and Objective C seems to like with to indicate what the parameter/s are. This only really works well if the method is in the class that is going to be instantiated though (and in languages that allow constructor overloading, this isn't so much of a 'thing').
Just plain old Objective C's initWith is also a good'un!

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio