When to use Encapsulate Collection? - refactoring

In the smell Data Class as Martin Fowler described in Refactoring, he suggests if I have a collection field in my class I should encapsulate it.
The pattern Encapsulate Collection(208) says we should add following methods:
get_unmodified_collection
add_item
remove_item
and remove these:
get_collection
set_collection
To make sure any changes on this collection need go through the class.
Should I refactor every class which has a collection field with this pattern? Or it depends on some other reasons like frequency of usage?
I use C++ in my project now.
Any suggestion would be helpful. Thanks.

These are well formulated questions and my answer is:
Should I refactor every class which has a collection field with this
pattern?
No, you should not refactor every class which has a collection field. Every fundamentalism is a way to hell. Use common sense and do not make your design too good, just good enough.
Or it depends on some other reasons like frequency of usage?
The second question comes from a common mistake. The reason why we refactor or use design pattern is not primarily the frequency of use. We do it to make the code more clear, more maintainable, more expandable, more understandable, sometimes (but not always!) more effective. Everything which adds to these goals is good. Everything which does not, is bad.
You might have expected a yes/no answer, but such one is not possible here. As said, use your common sense and measure your solution from the above mentioned viewpoints.
I generally like the idea of encapsulating collections. Also encapsulating plain Strings into named business classes. I do it almost always when the classes are meaningful in the business domain.
I would always prefer
public class People {
private final Collection<Man> people;
... // useful methods
}
over the plain Collection<Man> when Man is a business class (a domain object). Or I would sometimes do it in this way:
public class People implements Collection<Man> {
private final Collection<Man> people;
... // delegate methods, such as
#Override
public int size() {
return people.size();
}
#Override
public Man get(int index) {
// Here might also be some manipulation with the returned data etc.
return people.get(index);
}
#Override
public boolean add(Man man) {
// Decoration - added some validation
if (/* man does not match some criteria */) {
return false;
}
return people.add(man);
}
... // useful methods
}
Or similarly I prefer
public class StreetAddress {
private final String value;
public String getTextValue() { return value; }
...
// later I may add more business logic, such as parsing the street address
// to street name and house number etc.
}
over just using plain String streetAddress - thus I keep the door opened to any future change of the underlying logic and to adding any useful methods.
However, I try not to overkill my design when it is not needed so I am as well as happy with plain collections and plain Strings when it is more suited.

I think it depends on the language you are developing with. Since there are already interfaces that do just that C# and Java for example. In C# we have ICollection, IEnumerable, IList. In Java Collection, List, etc.
If your language doesn't have an interface to refer to a collection regarless of their inner implementation and you require to have your own abstraction of that class, then it's probably a good idea to do so. And yes, you should not let the collection to be modified directly since that completely defeats the purpose.
It would really help if you tell us which language are you developing with. Granted, it is kind of a language-agnostic question, but people knowledgeable in that language might recommend you the best practices in it and if there's already a way to achieve what you need.

The motivation behind Encapsulate Collection is to reduce the coupling of the collection's owning class to its clients.
Every refactoring tries to improve maintainability of the code, so future changes are easier. In this case changing the collection class from vector to list for example, changes all the clients' uses of the class. If you encapsulate this with this refactoring you can change the collection without changes to clients. This follows on of SOLID principles, the dependency inversion principle: Depend upon Abstractions. Do not depend upon concretions.
You have to decide for your own code base, whether this is relevant for you, meaning that your code base is still being changed and has to be maintained (then yes, do it for every class) or not (then no, leave the code be).

Related

What's the better coding style in oop: one method with one parameter VS two methods without parameters?

What's the better way for clean code from the oop point of view? Having two related methods with different names or one common method with an extra parameter?
(Simplified) Example:
1.) public void LogError() { ... }
public void LogWarning() { ... }
VS
2.) public void Log(LogType logType) { ... } //LogType.Error vs LogType.Warning
Both are good choices. Maybe a few examples can make it more clear. Usually, I try to think who is gonna use the library (me or someone else) and what programming language I use.
For example:
If I use strongly typed language like Java, C#, etc then I prefer choice 2.
If I use something else like PHP or Python, then I prefer choice 1.
If I want to make a simplified interface for other developers that are gonna use my library, for example, then I prefer choice 1 too.
When you have LogType enum for example, then it really doesn’t matter. Just try to think about how to describe the intent and make it clear.
Watch out boolean parameters that can be confusing many times. For example:
public void SaveProduct(bool cache) { ... }
In those situations, choice 1 is usually better because it can be very hard to understand what boolean values do. (How it changes the behavior) Also, it usually tells that the method is doing two different actions so possibly there is a way to refactor it. For example, splitting it into two methods and then the developer does not need to know about the implementation details.

Confused about the Interface and Class coding guidelines for TypeScript

I read through the TypeScript Coding guidelines
And I found this statement rather puzzling:
Do not use "I" as a prefix for interface names
I mean something like this wouldn't make a lot of sense without the "I" prefix
class Engine implements IEngine
Am I missing something obvious?
Another thing I didn't quite understand was this:
Classes
For consistency, do not use classes in the core compiler pipeline. Use
function closures instead.
Does that state that I shouldn't use classes at all?
Hope someone can clear it up for me :)
When a team/company ships a framework/compiler/tool-set they already have some experience, set of best practices. They share it as guidelines. Guidelines are recommendations. If you don't like any you can disregard them.
Compiler still will compile your code.
Though when in Rome...
This is my vision why TypeScript team recommends not I-prefixing interfaces.
Reason #1 The times of the Hungarian notation have passed
Main argument from I-prefix-for-interface supporters is that prefixing is helpful for immediately grokking (peeking) whether type is an interface. Statement that prefix is helpful for immediately grokking (peeking) is an appeal to Hungarian notation. I prefix for interface name, C for class, A for abstract class, s for string variable, c for const variable, i for integer variable. I agree that such name decoration can provide you type information without hovering mouse over identifier or navigating to type definition via a hot-key. This tiny benefit is outweighed by Hungarian notation disadvantages and other reasons mentioned below. Hungarian notation is not used in contemporary frameworks. C# has I prefix (and this the only prefix in C#) for interfaces due to historical reasons (COM). In retrospect one of .NET architects (Brad Abrams) thinks it would have been better not using I prefix. TypeScript is COM-legacy-free thereby it has no I-prefix-for-interface rule.
Reason #2 I-prefix violates encapsulation principle
Let's assume you get some black-box. You get some type reference that allows you to interact with that box. You should not care if it is an interface or a class. You just use its interface part. Demanding to know what is it (interface, specific implementation or abstract class) is a violation of encapsulation.
Example: let's assume you need to fix API Design Myth: Interface as Contract in your code e.g. delete ICar interface and use Car base-class instead. Then you need to perform such replacement in all consumers. I-prefix leads to implicit dependency of consumers on black-box implementation details.
Reason #3 Protection from bad naming
Developers are lazy to think properly about names. Naming is one of the Two Hard Things in Computer Science. When a developer needs to extract an interface it is easy to just add the letter I to the class name and you get an interface name. Disallowing I prefix for interfaces forces developers to strain their brains to choose appropriate names for interfaces. Chosen names should be different not only in prefix but emphasize intent difference.
Abstraction case: you should not not define an ICar interface and an associated Car class. Car is an abstraction and it should be the one used for the contract. Implementations should have descriptive, distinctive names e.g. SportsCar, SuvCar, HollowCar.
Good example: WpfeServerAutosuggestManager implements AutosuggestManager, FileBasedAutosuggestManager implements AutosuggestManager.
Bad example: AutosuggestManager implements IAutosuggestManager.
Reason #4 Properly chosen names vaccinate you against API Design Myth: Interface as Contract.
In my practice, I met a lot of people that thoughtlessly duplicated interface part of a class in a separate interface having Car implements ICar naming scheme. Duplicating interface part of a class in separate interface type does not magically convert it into abstraction. You will still get concrete implementation but with duplicated interface part. If your abstraction is not so good, duplicating interface part will not improve it anyhow. Extracting abstraction is hard work.
NOTE: In TS you don't need separate interface for mocking classes or overloading functionality.
Instead of creating a separate interface that describes public members of a class you can use TypeScript utility types. E.g. Required<T> constructs a type consisting of all public members of type T.
export class SecurityPrincipalStub implements Required<SecurityPrincipal> {
public isFeatureEnabled(entitlement: Entitlement): boolean {
return true;
}
public isWidgetEnabled(kind: string): boolean {
return true;
}
public areAdminToolsEnabled(): boolean {
return true;
}
}
If you want to construct a type excluding some public members then you can use combination of Omit and Exclude.
Clarification regarding the link that you reference:
This is the documentation about the style of the code for TypeScript, and not a style guideline for how to implement your project.
If using the I prefix makes sense to you and your team, use it (I do).
If not, maybe the Java style of SomeThing (interface) with SomeThingImpl (implementation) then by all means use that.
I find #stanislav-berkov's a pretty good answer to the OP's question. I would only share my 2 cents adding that, in the end it is up to your Team/Department/Company/Whatever to get to a common understanding and set its own rules/guidelines to follow across.
Sticking to standards and/or conventions, whenever possible and desirable, is a good practice and it keeps things easier to understand. On the other side, I do like to think we are still free to choose the way how we write our code.
Thinking a bit on the emotional side of it, the way we write code, or our coding style, reflects our personality and in some cases even our mood. This is what keeps us humans and not just coding machines following rules. I believe coding can be a craft not just an industrialized process.
I personally quite like the idea of turning a noun into an adjective by adding the -able suffix. It sounds very impropper, but I love it!
interface Walletable {
inPocket:boolean
cash:number
}
export class Wallet implements Walletable {
//...
}
}
The guidelines that are suggested in the Typescript documentation aren't for the people who use typescript but rather for the people who are contributing to the typescript project. If you read the details at the begging of the page it clearly defines who should use that guideline. Here is a link to the guidelines.
Typescript guidelines
In conclusion as a developer you can name you interfaces the way you see fit.
I'm trying out this pattern similar to other answers, but exporting a function that instantiates the concrete class as the interface type, like this:
export interface Engine {
rpm: number;
}
class EngineImpl implements Engine {
constructor() {
this.rpm = 0;
}
}
export const createEngine = (): Engine => new EngineImpl();
In this case the concrete implementation is never exported.
I do like to add a Props suffix.
interface FormProps {
some: string;
}
const Form:VFC<FormProps> = (props) => {
...
}
The type being an interface is an implementation detail. Implementation details should be hidden in API:s. That is why you should avoid I.
You should avoid both prefix and suffix. These are both wrong:
ICar
CarInterface
What you should do is to make a pretty name visible in the API and have a the implemtation detail hidden in the implementation. That is why I propose:
Car - An interface that is exposed in the API.
CarImpl - An implementation of that API, that is hidden from the consumer.

Is there a good way to use polymorphism to remove this switch statement?

I've been reading on refactoring and replacing conditional statements with polymorphism. The trouble I have is that it only seems to make sense to me when you have a more complex case where, without polymorphism, you would have to repeat the same switch statements or if-elses many times. I don't see how it makes sense if you're only doing it once - you have to have that conditional somewhere, right?
As an example, I recently wrote the following class, which is responsible for reading a XML file and converting its data into the program's objects. There are 2 possible formats for the file that we are supporting, so I simply wrote a method in the class for handling each one, and used a case-switch to determine which one to use:
public class ComponentXmlReader
{
public IEnumerable<UserComponent> ImportComponentsFromXml(string path)
{
var xmlFile = XElement.Load(path);
switch (xmlFile.Name.LocalName)
{
case "CaseDefinition":
return ImportComponentsFromA(xmlFile);
case "Root":
return ImportComponentsFromB(xmlFile);
}
}
private IEnumerable<UserComponent> ImportComponentsFromA(XContainer file)
{
//do stuff
}
private IEnumerable<UserComponent> ImportComponentsFromB(XContainer file)
{
//do stuff
}
}
As far as I can tell, I could write a class hierarchy for this to do the parsing, but I don't see the advantage here - I'd still have to use a case-switch to determine which class to instantiate. It looks to me like it would be extra complexity for no benefit. If I was going to keep these classes around and do more things with them that depended on the file type, then it would eliminate doing the same switch in multiple places, but this is single-use. Is this right, or is there some reason or technique I'm not seeing that makes it a good idea to use a polymorphic class hierarchy to do this?
If you had, say, an abstract ComponentImporter class, with concrete subclasses FromA and FromB, you could instantiate one of each, and put it in a Map. Then you could call componentImporterMap.get(xmlFile.Name.LocalName).importComponents() and avoid the switch.
As with all design choices, context is key. In this case, you have what seems to be a fairly simple class handling two very similar tasks. If the two Import methods contained very little duplicate code, then including them in a single class is perhaps the best choice since, as you say, it reduces complexity.
However, it's possible you'll use this class in the future, and even add new types of imports. In that case, the class would be more reusable if it was polymorphic.
Additionally, since these methods sound very similar, you're likely to have a bunch of duplicate code, which you could keep in a base class and only put import-specific code in the child classes.
Plus, as Carl mentions, there are numbers of ways to implement this logic without using a case statement.

Encapsulation Aggregation / Composition

The Wikipedia article about encapsulation states:
"Encapsulation also protects the integrity of the component, by preventing users from setting the internal data of the component into an invalid or inconsistent state"
I started a discussion about encapsulation on a forum, in which I asked whether you should always clone objects inside setters and/or getters as to preserve the above rule of encapsulation. I figured that, if you want to make sure the objects inside a main object aren't tampered with outside the main object, you should always clone it.
One discussant argued that you should make a distinction between aggregation and composition in this matter. Basically what I think he ment is this:
If you want to return an object that is part of a composition (for instance, a Point of a Rectangle), clone it.
If you want to return an object that is part of aggregation (for instance, a User as part of a UserManager), just return it without breaking the reference.
That made sense to me too. But now I'm a bit confused. And would like to have your opinions on the matter.
Strictly speaking, does encapulation always mandate cloning?
PS.: I program in PHP, where resource management might be a little more relevant, since it's a scripted language.
Strictly speaking, does encapulation always mandate cloning?
No, it does not.
The person you mention is probably confusing the protection of the state of an object with the protection of the implementation details of an object.
Remember this: Encapsulation is a technique to increase the flexibility of our code. A well encapsulated class can change its implementation without impacting its clients. This is the essence of encapsulation.
Suppose the following class:
class PayRoll {
private List<Employee> employees;
public void addEmployee(Employee employee) {
this.employees.add(employee);
}
public List<Employee> getEmployees() {
return this.employees;
}
}
Now, this class has low encapsulation. You can say the method getEmployees breaks encapsulation because by returning the type List you can no longer change this detail of implementation without affecting the clients of the class. I could not change it for instance for a Map collection without potentially affecting client code.
By cloning the state of your object, you are potentially changing the expected behavior from clients. This is a harmful way to interpret encapsulation.
public List<Employee> getEmployees() {
return this.employees.clone();
}
One could say the code above improves encapsulation in the sense that now addEmployee is the only place where the internal List can be modified from. So If I have a design decision to add the new Employee items at the head of the List instead of at the tail. I can do this modification:
public void addEmployee(Employee employee) {
this.employees.insert(employee); //note "insert" is used instead of "add"
}
However, that is a small increment of the encapsulation for a big price. Your clients are getting the impression of having access to the employees when in fact they only have a copy. So If I wanted to update the telephone number of employee John Doe I could mistakenly access the Employee object expecting the changes to be reflected at the next call to to the PayRoll.getEmployees.
A implementation with higher encapsulation would do something like this:
class PayRoll {
private List<Employee> employees;
public void addEmployee(Employee employee) {
this.employees.add(employee);
}
public Employee getEmployee(int position) {
return this.employees.get(position);
}
public int count() {
return this.employees.size();
}
}
Now, If I want to change the List for a Map I can do so freely.
Furthermore, I am not breaking the behavior the clients are probably expecting: When modifying the Employee object from the PayRoll, these modifications are not lost.
I do not want to extend myself too much, but let me know if this is clear or not. I'd be happy to go on to a more detailed example.
No, encapsulation simply mandates the ability to control state by creating a single access point to that state.
For example if you had a field in a class that you wanted to encapsulate you could create a public method that would be the single access point for getting the value that field contains. Encapsulation is simply this process of creating a single access point around that field.
If you wish to change how that field's value is returned (cloning, etc.) you are free to do so since you know that you control the single avenue to that field.

Where do you add new methods?

When you add a new method to a class where do you put it? At the end of the class...the top? Do you organize methods into specific groupings? Sorted alphabetically?
Just looking for general practices in keeping class methods organized.
Update When grouped where do you add the new method in the group? Just tack on the end or do you use some sort of sub-grouping, sorting?
Update 2 Mmmm...guess the question isn't as clear as I thought. I'm not really looking for class organization. I'm specifically interested in adding a new method to an existing class. For example:
public class Attendant
{
public void GetDrinks(){}
public void WelcomeGuests(){}
public void PickUpTrask(){}
public void StrapIn(){}
}
Now we're going to add a new method PrepareForCrash(). Where does it go? At the top of the list, bottom, alphabetically or near the StrapIn() method since it's related.
Near "StrapIn" because it's related. That way if you refactor later, all related code is nearby.
Most code editors allow you to browse method names alphabetically in another pane, so organizing your code functionally makes sense within the actual code itself. Group functional methods together, makes life easier when navigating through the class.
For goodness sake, not alphabetically!
I tend to group my functions in the order I expect them to be called during the life of the object, so that a top to bottom read of the header file tends to explain the operation of the class.
I think it's a personal choice.
However I like to organise my classes as such.
public class classname
{
<member variables>
<constructors>
<destructor>
<public methods>
<protected methods>
<private methods>
}
The reason for this is as such.
Member variables at the top
To see what member variables exist and if they are initialised.
Constructors
To see if the member variables are setup/initialised as well as what are all the construction options for the class.
Destructor
To see the how the class is cleaned up and verify it with the constructors and member variables.
Public methods
To see what are the available contracts callers of the object can use.
Protected methods
To see what inherited classes would be using.
Private methods
As it's information about the internals of the class if you needed to know about the internals you can just scroll straight to the end quickly. But to know the interface for the class it's all at the start.
UPDATE - Based on OP's update
Logically a good way would be to organise the methods by categories of what they do.
This way you get the readabilty of categorising your methods as well as the alphabetical search from you IDE (provided this is in your IDE).
However in a practical sense I think placing the methods at the end of that section is the best way. It would be quite hard to continually police where each method goes, as it's subjective, for every method if the code is shared by more than yourself.
If you were to make this a standard it'd be quite hard to provide the boundaries for where to put each method.
What I like about C# and VB.net is the ability to use #region tags, so generally my classes look like this
class MyClass
{
#region Constructors
public MyClass()
{
}
public MyClass(int x)
{
_x = x;
}
#endregion
#region Members
private int _x;
#endregion
#region methods
public void DoSomething()
{
}
#endregion
#region Properties
public int Y {get; private set;}
#endregion
}
So basically You put similar things together so you can collapse everything to definition and get to your stuff really faster.
Generally, it depends on the existing grouping; if there's an existing grouping that the new method fits into, I'll put it there. For example, if there's a grouping of operators, I'll put the new method with the operators if it's an operator.
Of course, if there is no good grouping, adding a method may suggest a new grouping; I treat that as an opportunity for refactoring, and try to regroup the existing operators where reasonable.
I organize all methods into regions like public methods, private methods or sometimes by features like Saving methods, etc..
IMHO:
If you organize your methods alphabetically, put a new one depends on its name. Otherwise put it at the bottom of related group. This helps to know, what method is newer. The bigger problem is how to organize methods in groups, e.g. depend on what properties, but this is more individual for everyone and depends on a specific class.

Resources