Is there a good way to use polymorphism to remove this switch statement? - refactoring

I've been reading on refactoring and replacing conditional statements with polymorphism. The trouble I have is that it only seems to make sense to me when you have a more complex case where, without polymorphism, you would have to repeat the same switch statements or if-elses many times. I don't see how it makes sense if you're only doing it once - you have to have that conditional somewhere, right?
As an example, I recently wrote the following class, which is responsible for reading a XML file and converting its data into the program's objects. There are 2 possible formats for the file that we are supporting, so I simply wrote a method in the class for handling each one, and used a case-switch to determine which one to use:
public class ComponentXmlReader
{
public IEnumerable<UserComponent> ImportComponentsFromXml(string path)
{
var xmlFile = XElement.Load(path);
switch (xmlFile.Name.LocalName)
{
case "CaseDefinition":
return ImportComponentsFromA(xmlFile);
case "Root":
return ImportComponentsFromB(xmlFile);
}
}
private IEnumerable<UserComponent> ImportComponentsFromA(XContainer file)
{
//do stuff
}
private IEnumerable<UserComponent> ImportComponentsFromB(XContainer file)
{
//do stuff
}
}
As far as I can tell, I could write a class hierarchy for this to do the parsing, but I don't see the advantage here - I'd still have to use a case-switch to determine which class to instantiate. It looks to me like it would be extra complexity for no benefit. If I was going to keep these classes around and do more things with them that depended on the file type, then it would eliminate doing the same switch in multiple places, but this is single-use. Is this right, or is there some reason or technique I'm not seeing that makes it a good idea to use a polymorphic class hierarchy to do this?

If you had, say, an abstract ComponentImporter class, with concrete subclasses FromA and FromB, you could instantiate one of each, and put it in a Map. Then you could call componentImporterMap.get(xmlFile.Name.LocalName).importComponents() and avoid the switch.

As with all design choices, context is key. In this case, you have what seems to be a fairly simple class handling two very similar tasks. If the two Import methods contained very little duplicate code, then including them in a single class is perhaps the best choice since, as you say, it reduces complexity.
However, it's possible you'll use this class in the future, and even add new types of imports. In that case, the class would be more reusable if it was polymorphic.
Additionally, since these methods sound very similar, you're likely to have a bunch of duplicate code, which you could keep in a base class and only put import-specific code in the child classes.
Plus, as Carl mentions, there are numbers of ways to implement this logic without using a case statement.

Related

When to use Encapsulate Collection?

In the smell Data Class as Martin Fowler described in Refactoring, he suggests if I have a collection field in my class I should encapsulate it.
The pattern Encapsulate Collection(208) says we should add following methods:
get_unmodified_collection
add_item
remove_item
and remove these:
get_collection
set_collection
To make sure any changes on this collection need go through the class.
Should I refactor every class which has a collection field with this pattern? Or it depends on some other reasons like frequency of usage?
I use C++ in my project now.
Any suggestion would be helpful. Thanks.
These are well formulated questions and my answer is:
Should I refactor every class which has a collection field with this
pattern?
No, you should not refactor every class which has a collection field. Every fundamentalism is a way to hell. Use common sense and do not make your design too good, just good enough.
Or it depends on some other reasons like frequency of usage?
The second question comes from a common mistake. The reason why we refactor or use design pattern is not primarily the frequency of use. We do it to make the code more clear, more maintainable, more expandable, more understandable, sometimes (but not always!) more effective. Everything which adds to these goals is good. Everything which does not, is bad.
You might have expected a yes/no answer, but such one is not possible here. As said, use your common sense and measure your solution from the above mentioned viewpoints.
I generally like the idea of encapsulating collections. Also encapsulating plain Strings into named business classes. I do it almost always when the classes are meaningful in the business domain.
I would always prefer
public class People {
private final Collection<Man> people;
... // useful methods
}
over the plain Collection<Man> when Man is a business class (a domain object). Or I would sometimes do it in this way:
public class People implements Collection<Man> {
private final Collection<Man> people;
... // delegate methods, such as
#Override
public int size() {
return people.size();
}
#Override
public Man get(int index) {
// Here might also be some manipulation with the returned data etc.
return people.get(index);
}
#Override
public boolean add(Man man) {
// Decoration - added some validation
if (/* man does not match some criteria */) {
return false;
}
return people.add(man);
}
... // useful methods
}
Or similarly I prefer
public class StreetAddress {
private final String value;
public String getTextValue() { return value; }
...
// later I may add more business logic, such as parsing the street address
// to street name and house number etc.
}
over just using plain String streetAddress - thus I keep the door opened to any future change of the underlying logic and to adding any useful methods.
However, I try not to overkill my design when it is not needed so I am as well as happy with plain collections and plain Strings when it is more suited.
I think it depends on the language you are developing with. Since there are already interfaces that do just that C# and Java for example. In C# we have ICollection, IEnumerable, IList. In Java Collection, List, etc.
If your language doesn't have an interface to refer to a collection regarless of their inner implementation and you require to have your own abstraction of that class, then it's probably a good idea to do so. And yes, you should not let the collection to be modified directly since that completely defeats the purpose.
It would really help if you tell us which language are you developing with. Granted, it is kind of a language-agnostic question, but people knowledgeable in that language might recommend you the best practices in it and if there's already a way to achieve what you need.
The motivation behind Encapsulate Collection is to reduce the coupling of the collection's owning class to its clients.
Every refactoring tries to improve maintainability of the code, so future changes are easier. In this case changing the collection class from vector to list for example, changes all the clients' uses of the class. If you encapsulate this with this refactoring you can change the collection without changes to clients. This follows on of SOLID principles, the dependency inversion principle: Depend upon Abstractions. Do not depend upon concretions.
You have to decide for your own code base, whether this is relevant for you, meaning that your code base is still being changed and has to be maintained (then yes, do it for every class) or not (then no, leave the code be).

Replace lots of switches with polymorphism but no type code

I have a class which could benefit with the state pattern. However the common "Replace Type Code with State/Strategy" refactoring does not seem to fit well in my case: the state is calculated by watching other objects, there is no type code variable.
Most of my class code is just "calculating" some state when it is called, and running the functions for that state.
Forcing a type code variable feels wrong because:
I will be forced to call an "updateState()" function in every place where the polymorphic functions are used.
My class will no longer be 100% behavior, which I would rather habe instead of some internal state.
Since the state must be calculated every single time its functions are called, I am wonder if I am thinking about the wrong pattern.
Normally I refactor this:
if (this.someOtherThingIsRunning()) {
...
} else {
...
}
like this:
typecode.doSomething()
// that being polymorphic
it seems strange doing:
updateTypeCode()
typecode.doSomething()
Does the state pattern applies to this case? Is there any alternative strategy pull from polymorphism without a type code?
While writing my question, I realized that maybe I could just make the type code a function and return a temporal (function scope) type code. Like:
typecode().doSomething()
This solution would never store the state, which is what I want to avoid. However I am still wondering if my problem started because I am using the wrong pattern.
If you're open to storing the state, maybe think about combining State and Observer to modify the state as the dependent classes change (rather than checking on every call). There's only certain models that this will work for though.
Otherwise you might as well say object.doSomething() and have the checks inside doSomething(). In this case using design patterns doesn't present any significant advantages (though if you loosen up slightly on the definitions of design patterns, many things would be considered such). I'd probably go with:
doSomething()
{
if (someOtherThingIsRunning())
doOneThing();
else
doAnotherThing();
}
The alternative (that you already suggested) is to have the above checks in typecode() and to return another class that contains the method doSomething().

Alternatives to testing private methods in TDD

I'm trying to use TDD when writing a class that needs to parse an XML document. Let's say the class is called XMLParser, and its constructor takes in a string for the path to the XML file to parse. I would like to have a Load() method that tries to load this XML into memory, and performs a few checks on the file such as file system errors, whether or not its an XML file, etc.
My question is about alternatives: I've read that it's bad practice to have private methods that you need to test, and that you should be able to just test the public interface and let the private methods do their thing. But in this case, this functionality is pretty important, and I don't think it should be public.
Does anyone have good advice for a scenario like this?
I suggest to redesign your architecture a bit. Currently, you have one high level class with low level functionality embedded. Split that into multiple classes that belong to different layers (I use the term "layer" very loosely here).
Example:
Have one class with the public interface of your current class. (-> High level layer)
Have one class responsible for loading files from disk and handling IO errors (-> Low level layer)
Have one class responsible for validating XML documents (-> Inbetween)
Now you can test all three of these classes independently!
You will see that your high level class will do not much more than just composing the two lower level classes.
Use no access modifier (which is the next up to private) and write the test in the same package.
Good OOD is important but for really important functionality testing is more important. Good practices are always only a guideline and they are good in the general scenario.
You could also try to encapsulate that specific file-checking behaviour in another object and have your parser instantiate it and use it. This is probably what I would do. In this way you could also even use this functionality somewhere else with minimal effort.
You can make a subclass as part of your test package that exposes public accessors to the private methods (which should then be protected).
Public class TestableClass : MyClass
{
public someReturnType TestMethod() {
return base.PrivateMethod();
}
}

method name for a long method

The good style (Clean Code book) says that a method's name should describe what the method does. So for example if I have a method that verifies an address, stores it in a database, and sends an email, should the name be something such as verifyAddressAndStoreToDatabaseAndSendEmail(address);
or
verifyAddress_StoreToDatabase_SendEmail(address);
although I can divide that functionality in 3 methods, I'll still need a method to call these 3 methods. So a large method name is inevitable.
Having And named methods certainly describes what the method does, but IMO it's not very readable as names can be very very large. How would you solve it?
EDIT: Maybe I could use fluent style to decompose the method name such as:
verifyAddress(address).storeToDatabase().sendEmail();
but I need a way to ensure the order of invocation. Maybe by using the state pattern, but this causes the code to grow.
How I approach this is to make the 3 smaller methods as you mentioned and then in the higher method that calls the 3 smaller ones, I name it after the "why" I need to do those three things.
Try to define why you need to do those steps and use that as the basis of the method name.
A single method should not do 3 things. Thus divide the work into 3 methods:
verifyAddress
storeAddress
sendEmail
I'm following up on my previous comment, but I've got more here than would fit reasonably in a comment so I'm answering.
The details of the method belong in the documentation not in the name of the method (in my opinion). Think of it this way... By putting SendEmail in the name of the method, you're committing implementation details to the method name. What if a decision is made down the road to send notification via SMS or twitter or something else instead of email? Do you change the name of the method and break your API, or do you have a method name that misleads the consumers of the API? Something to consider.
If you insist on keeping the functionality of the method in its name, I'd urge you to find something more generic. Perhaps something along the lines of VerifySaveAndNotify(Address address). That way, the method name tells you what it's doing without specifying how it does it. The parameter of type Address let's you know what is being verified and saved. All of that works together to make your method name informative, flexible, and terse.
EDIT: Maybe I could use fluent style to decompose the method name such as:
verifyAddress(address).storeToDatabase().sendEmail();
but I need a way to ensure the order of invocation. Maybe by using the state pattern, but this causes the code to grow.
To ensure ordering of commands in a fluent style, each result would be an object that exposes only the functionality required by the next step. For example:
public class Verifier
{
public DataStorer VerifyAddress(string address)
{
...
return new DataStorer(address);
}
}
public class DataStorer
{
public Emailer StoreToDataBase()
{
...
return new Emailer(...);
}
}
public class Emailer
{
public void SendEmail()
{
...
}
}
This is handy if you need to create a very granular design and want to optimise your classes for reuseability, but is likely to be design overkill under most circumstances. Better probably as others have said to choose a name that represents what the whole process is supposed to represent. You could simply call it "StoreAndEmail", making an assumption that verification is something you do routinely before committing data to any destination. The alternative if you don't mind names being long is to simply describe it in full and accept that a long name is necessary. In the end, it really doesn't cost you anything, but can certainly make you code more specific in its intent.

Method naming convention

If a method takes a class/struct as an input parameter, what is the best way to name it?
Example:
class Person{}
class Address{}
class Utility{
//name **style 1** - use method overloading
public void Save(Person p){}
public void Save(Address a){}
*//name **style 2** - use unique names that define what they are doing
//or public void SavePerson(Person p){}
//and public void SaveAddress(Address a){}*
}
I personally like style 1 (Use the languages features - in this case overloading).
If you like style 1, can you point me to any "official" documentation, that states this to be a standard?
I would say your challenge is not in the field of method naming, but rather type design. A type that is responsible for saving both Person objects and Address objects seems like a type with more than one responsibility. Such a type will tend to grow and grow and grow and will eventually get hard to maintain. If you instead create more specialized types, method naming may automatically become a simpler task.
If you would still want to collect these methods in the same type, it's mostly a matter of style. One thing to perhaps think about is whether this type may be consumed by code written in another language, and that does not support method overloading. In such cases the longer names is the way to go. Otherwise just stick to what feels best (or whatever is the ruling convention at your workplace).
It is a matter of style.
If you don't like long method names, go with 1.
If you don't like long overload lists, go with 2.
The important bit is to keep consistent, so do not mix the two styles in one project.
If you are seeing that you have many such methods, you may need to rethink your design - perhaps a solution involving inheritance would be more appropriate.
Distinct names avoid entirely any problems associated with method overloading. For example:
Ambiguity is avoided if an argument's type matches more than one of the candidates.
In C++, overloaded methods can hide those of the same name in a superclass.
In Java, type erasure prevents overloaded methods differing only by type parameterization.
It would also be worthwhile to ask whether polymorphism could be used instead of overloading.

Resources