Is there such a thing as ‘class bloat’ - i.e. too many classes causing inefficiencies?

Is there such a thing as ‘class bloat’ - i.e. too many classes causing inefficiencies? - parse-platform

E.g. let’s consider I have the following classes:
Item
ItemProperty which would include objects such as Colour and Size. There's a relation-property of the Item class which lists all of the ItemProperty objects applicable to this Item (i.e. for one item you might need to specify the Colour and for another you might want to specify the Size).
ItemPropertyOption would include objects such as Red, Green (for Colour) and Big, Small (for Size).
Then an Item Object would relate to an ItemProperty, whereas an ItemChoice Object would relate to an ItemPropertyOption (and the ItemProperty which the ItemPropertyOption refers to could be inferred).
The reason for this is so I could then make use of queries much more effectively. i.e. give me all item-choices which are Red. It would also allow me to use the Parse Dashboard to quickly add elements to the site as I could easily specify more ItemPropertys and ItemPropertyOptions, rather than having to add them in the codebase.
This is just a small example and there's many more instances where I'd like to use classes so that 'options' for various drop-downs in forms are in the database and can easily be added and edited by me, rather than hard-coded.
1) I’ll probably be doing this in a similar way for 5+ more similar kinds of class-structures
2) there could be hundreds of nested properties that I want to access via ‘inverse querying’
So, I can think of 2 potential causes of inefficiency and wanted to know if they’re founded:
Is having lots of classes inefficient?
Is back-querying against nested classes inefficient?
The other option I can think of — if ‘class-bloat’ really is a problem — is to make fields on parent classes that, instead of being nested across other classes (that represent further properties, as above), just representing them as a nested JSON property directly.

The job of designing is to render in object descriptions truths about the world that are relevent to the system's requirements. In the world of the OP's "items", it's a fact that items have color, and it's a relevant fact because users care about an item's color. You'd only call a system inefficient if it consumes computing resources that it doesn't need to consume.
So, for something like a configurator, the fact that we have items, and that those items have properties, and those properties have an enumerable set of possible values sounds like a perfectly rational design.
Is it inefficient or "bloated"? The only place I'd raise doubt is in the explicit assertion that items have properties. Of course they do, but that's natively true of javascript objects and parse entities.
In other words, you might be able to get along with just item and several flavors of propertyOptions: e.g. Item has an attribute called "colorProperty" that is a pointer to an instance of "ColorProperty" (whose instances have a name property like 'red', 'green', etc. and maybe describe other pertinent facts, like a more precise description in RGB form).
There's nothing wrong with lots of classes if they represent relevant truth. Do that first. You might discover empirically that your design is too resource consumptive (I doubt you will in this case), at which point we'd start looking for cheats to be somehow skinnier. But do it the right way first, cheat later only if you must.

Is having lots of classes inefficient?
It's certainly inefficient for poor humans who have to remember what all those classes do and how they're related to each other. It takes time to write all those classes in the first place, and every line that you write is a line that has to be maintained.
Beyond that, there's certainly some cost for each class in any OOP language, and creating more classes than you really need will mean that you're paying more than you need to for the work that you're doing, which is pretty much the definition of inefficient.
I’ll probably be doing this in a similar way for 5+ more similar kinds of class-structures
Maybe you could spend some time thinking about the similarity between these cases and come up with a single set of more flexible classes that you can use in all those cases. Writing general code is harder than writing very specific code, but if you do a good job you'll recoup the extra effort many times over through reuse.

Related

Refactoring methods in existing code base with huge number of parameters

I have inherited an existing code base where the "features" are as follows:
huge monolithic classes with
(literally) 100's of member variables
and methods that go one for pages
(er. screens)
public and private methods with a large number of arguments.
I am trying to clean up and refactor the code, to leave it a little better
than how I found it. So my questions
is worth it (or do you) refactor methods with 10 or so arguments so that they are more readable ?
are there best practices on how long methods should be ? How long do you usually keep them?
are monolithic classes bad ?

is worth it (or do you) refactor methods with 10 or so arguments so that they are more readable ?
Yes, it is worth it. It is typically more important to refactor methods that are not "reasonable" than ones that already are nice, short, and have a small argument list.
Typically, if you have many arguments, it's because a method does too much - most likely, it should be a class of it's own, not a method.
That being said, in those cases when many parameters are required, it's best to encapsulate the parameters into a single class (ie: SpecificAlgorithmOptions), and pass one instance of that class. This way, you can provide clean defaults, and its very obvious which methods are essential vs. optional (based on what is required to construct the options class).
are there best practices on how long methods should be ? How long do you usually keep them?
A method should be as short as possible. It should have one purpose, and be used for one task, whenver possible. If it's possible to split it into separate methods, where each as a real, qualitative "task", then do so when refactoring.
are monolithic classes bad ?
Yes.

if the code is working and there is no need to touch it, i wouldn't refactor. i only refactor very problematic cases if i anyway have to touch them (either for extending them for functionality or bug-fixing). I favor the pragmatic way: Only (in 95%) touch, what you change.
Some first thoughts on your specific problem (though in detail it is difficult without knowing the code):
start to group instance variables, these groups will then be target to do 'extract class'
when having grouped these variables you hopefully can group some methods, which also be moved when doing 'extract class'
often there are many methods which aren't using any fields. make them static (they most likely are helper methods, which can be extracted to helper-classes.
in case non-related instance fields are mixed in many methods, do loads of 'extract method'
use automatic refactoring tools as much as possible, because you most likely have no tests in place and automation is more safe.
Regarding your other concrete questions.
is worth it (or do you) refactor methods with 10 or so arguments so that they are more readable?
definetely. 10 parameters are too many to grasp for us humans. most likely the method is doing too much.
are there best practices on how long methods should be ? How long do you usually keep them?
it depends... on preferences. i stated some things on this thread (though the question was PHP). still i would apply these numbers/metrics to any language.
are monolithic classes bad ?
it depends, what you mean with monolithic. if you mean many instance variables, endless methods, a lot of if/else complexity, yes.
also have a look at a real gem (to me a must have for every developer): working effectively with legacy code

Assuming the code is functioning I would suggest you think about these questions first:
is the code well documented?
do you understand the code?
how often are new features being added?
how often are bugs reported and fixed?
how difficult is it to modify and fix the code?
what is the expected life of the code?
how many versions of the compiler are you behind (if at all)?
is the OS it runs on expected to change during its lifetime?
If the system will be replaced in five years, is documented well, will undergo few changes, and bugs are easy to fix - leave it alone regardless of the size of the classes and the number of parameters. If you are determined to refactor make a list of your refactoring proposals in the order of maximum benefit with minimum changes and attack it incrementally.

MVC Views: Display Logic?

I've been reading this paper: Enforcing Strict Model-View Separation in Template Engines (PDF).
It contends that you only need four constructs within the views:
attribute reference
conditional template inclusion based upon presence/absence of an attribute
recursive template references
template application to a multi-valued attribute similar to lambda functions
and LISP’s map operator
No other logic is allowed within the view - for example if (attr < 5) would not be allowed, only if (valueLessThanFive)
Is this reasonable? Most MVC frameworks allow any logic within the views, which can lead to business logic creeping into the view. But is it really possible to get by with only those four constructs?
For "zebra striped" tables, the paper suggests mapping a list of templates onto the list of data - for example:
map(myList, [oddRowTemplate, evenRowTemplate])
But what if you want to make the first and last rows different styles? What if the 3rd row should be purple on Tuesdays because your graphic designer is mad? These are view-specific (for example if I was outputting the same data as XML I wouldn't use them) so don't seem to belong in the model or controller.
In summary:
Do you need logic above the four constructs listed above within views?
If so, how do you restrict business logic creeping in?
If not, how would you make the 3rd row purple on Tuesdays?

Terence Parr is a very smart guy and his paper has much to commend it, but from a practical point of view I have found using just these constructs somewhat limiting.
It is difficult to prevent business logic creeping in especially if you have the ability to do anything, as for example ASP.NET and JSP would give you. It boils down to how you spend your time:
Allow limited additional functionality (I'm not an advocate for "anything goes") and use code review mechanisms to ensure correct usage, or
Restrict to the four constructs above, and spend more time providing attributes like valueLessThanFive (remembering to rename it to valueLessThanSix when the business requirement changes, or adding valueMoreThanThree - it's a bit facetious as an example but I think you'll know what I'm getting at).
In practice, I find that allowing conditionals and looping constructs is beneficial, as is allowing property traversal such as attr[index].value in template expressions. That allows presentational logic to be effectively managed, while incurring only a small risk of misuse.
Allowing more functionality such as arbitrary method calls gets progressively more "dangerous" (in terms of facilitating misuse). It comes down, to some extent, to the development culture in place in your environment, the development processes, and the level of skill and experience in your team.
Another factor is whether, in your environment, you have the luxury of enforcing strict separation of work between presentation and logic, in terms of having dedicated non-programmer designers who would be fazed by advanced constructs in the template. In that event, you would likely be better off with the more restricted template functionality.

To answer your question about the 3rd row purple on Tuesdays:
The original (or one of the very early) MVC patterns had the 'View' being a data-view, there was no concept of a UI in the pattern. Modern versions of the MVC pattern have felt the need for this data-view which is why we have things like MVVM, MVP, even MV-poo. Once you can create a 'view' of your data that's specific to the UI View it's easier to solve lots of the concerns.
In our case our 'model' is going to get extra properties on it, such as Style or Colour (style is better as it still lets the view define how that style is represented). The controller will take raw 'model' items and present to the view custom 'model' items with this extra Style, giving the 3rd row on Tuesdays a style of 'MadDesignerSpecial', which the view uses to apply the purple colour.

Question about object oriented design with Ruby

I'm thinking of writing a CLI Monopoly game in Ruby. This would be the first large project I've done in Ruby. Most of my experience in programming has been with functional programming languages like Clojure and Haskell and such. I understand Object Orientation pretty well, but I have no experience with designing object oriented programs.
Now, here's the deal.
In Monopoly, there are lots of spaces spread around the board. Most spaces are properties, and others do other things.
Would it be smart to have a class for each of the spaces? I was thinking of having a Space class that all other spaces inherit from, and having a Property class that inherits from Space, and then having a class for every property that inherits from Property. This would mean a lot of classes, which leads me to believe that this is an awful way to do what I'm trying to do.
What I also intended to do, was use the 'inherited' hook method to keep track of all the properties, so that I could search them and remove them from the unbought list when needed.
This sort of problem seems to arise in a lot of programs, so my question is: Is there a better way to do this, and am I missing something very crucial to Object Oriented design?
I'm sorry if this is a stupid question, I'm just clueless when it comes to OOPD.
Thanks.

You're on the right track, but you've gone too far, which is a common beginner's mistake with OOP. Every property should not be a separate class; they should all be instances of the Property class. I'd do classes with attributes along these lines:
Space
Name
Which pieces are on it
Which space is next (and maybe previous?)
Any special actions which take place when landing on it
Property (extends Space)
Who owns it
How many houses/hotels are on it
Property value
Property monopoly group
Rent rates
Whether it is mortgaged
So, for example, Boardwalk would be an object of type Property, with specific values that apply to it, such as belonging to the dark blue monopoly group.

Classes for spaces is probably not a bad idea; but let's see why that's the case.
If you were writing this in a procedural language, where would the majority of the 'if' and 'switch' statements be? They'd be in determining what happened to each player when they landed on a space. In OO, we want to prevent as many if/switch statements as possible, and replace them with polymorphism. So, clearly, if we create many derivatives of Space, we will prevent a large number of if/switch statements.
But before we make that leap, let's look at the metaphor of the game. The game is all about a city. Addresses, streets, utilities, public works, jails, etc. Each square on the board represents some kind of location in the city. Indeed the players travel through the city paying rent (or some other action) anywhere they happen to land.
So let's call the spaces, CityBlocks. Each CityBlock has an address, like "Boardwalk" or "Electric Company", or "Community Chest". Are there different types of CityBlock objects? Or should we simply consider CityBlocks to be locations that have a certain kind of ZoningOrdinance?
I kind of like that. |CityBlock|---->|ZoningOrdinance|
Now we could have several different derivatives of ZoningOrdinance. This is the "Strategy" Pattern. I rather like the flexibility this give us, and the separation of 'where' and 'what' that it provides.
You'll also need a Game class that understand the rules of dice, tokens, FreeParking, Passing Go, etc. That class will manipulate the players, and invoke the methods on CityBlock. CityBlock will invoke methods on ZoningOrdinance.
Anyway, that's one way...

What is an elegant way to track the size of a set of objects without a single authoritative collection to reference?

Update: Please read this question in the context of design principles, elegance, expression of intent, and especially the "signals" sent to other programmers by design choices.
I have two "views" of a set of objects. One is a dictionary/map indexing the objects by a string value. The other is a dictionary/map indexing the objects by an ordinal (ordering integer). There is no "master" collection of the objects by themselves that can serve as the authoritative source for the number of objects, but the two dictionaries should always both contain references to all the objects.
When a new item is added to the set a reference is added to both dictionaries, and then some processing needs to be done which is affected by the new total number of objects.
What should I use as the authoritative source to reference for the current size of the set of objects? It seems that all my options are flawed in one dimension or another. I can just consistently reference one of the dictionaries, but that would codify an implication of that dictionary's superiority over the other. I could add a 3rd collection, a simple list of the objects to serve as the authoritative list, but that increases redundancy. Storing a running count seems simplest, but also increases redundancy and is more brittle than referencing a collection's self-tracked count on the fly.
Is there another option that will allow me to avoid choosing the lesser evil, or will I have to accept a compromise on elegance?

I would create a class that has (at least) two collections.
A version of the collection that is
sorted by string
A version of the
collection that is sorted by ordinal
(Optional) A master collection
The class would handle the nitty gritty management:
The syncing of the contents for the collections
Standard collection actions (e.g. Allow users get the size, Add or retrieve items)
Let users get by string or ordinal
That way you can use the same collection wherever you need either behavior, but still abstract away the "indexing" behavior you are going for.
The separate class gives you a single interface with which to explain your intent regarding how this class is to be used.

I'd suggest encapsulation: create a class that hides the "management" details (such as the current count) and use it to expose immutable "views" of the two collections.
Clients will ask the "manglement" object for an appropriate reference to one of the collections.
Clients adding a "term" (for lack of a better word) to the collections will do so through the "manglement" object.
This way your assumptions and implementation choices are "hidden" from clients of the service and you can document that the choice of collection for size/count was arbitrary. Future maintainers can change how the count is managed without breaking clients.
BTW, yes, I meant "manglement" - my favorite malapropism for management (in any context!)

If both dictionaries contain references to every object, the count should be the same for both of them, correct? If so, just pick one and be consistent.

I don't think it is a big deal at all. Just reference the sets in the same order each time
you need to get access to them.
If you really are concerned about it you could encapsulate the collections with a wrapper that exposes the public interfaces - like
Add(item)
Count()
This way it will always be consistent and atomic - or at least you could implement it that way.
But, I don't think it is a big deal.

How do you feel about code folding? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 7 years ago.
Improve this question
For those of you in the Visual Studio environment, how do you feel about wrapping any of your code in #regions? (or if any other IDE has something similar...)

9 out of 10 times, code folding means that you have failed to use the SoC principle for what its worth.
I more or less feel the same thing about partial classes. If you have a piece of code you think is too big you need to chop it up in manageable (and reusable) parts, not hide or split it up.It will bite you the next time someone needs to change it, and cannot see the logic hidden in a 250 line monster of a method.
Whenever you can, pull some code out of the main class, and into a helper or factory class.
foreach (var item in Items)
{
//.. 100 lines of validation and data logic..
}
is not as readable as
foreach (var item in Items)
{
if (ValidatorClass.Validate(item))
RepositoryClass.Update(item);
}
My $0.02 anyways.

This was talked about on Coding Horror.
My personal belief is that is that they are useful, but like anything in excess can be too much.
I use it to order my code blocks into:
Enumerations
Declarations
Constructors
Methods
Event Handlers
Properties

Sometimes you might find yourself working on a team where #regions are encouraged or required. If you're like me and you can't stand messing around with folded code you can turn off outlining for C#:
Options -> Text Editor -> C# -> Advanced Tab
Uncheck "Enter outlining mode when files open"

I use #Region to hide ugly and useless automatically generated code, which really belongs in the automatically generated part of the partial class. But, when working with old projects or upgraded projects, you don't always have that luxury.
As for other types of folding, I fold Functions all the time. If you name the function well, you will never have to look inside unless you're testing something or (re-)writing it.

While I understand the problem that Jeff, et. al. have with regions, what I don't understand is why hitting CTRL+M,CTRL+L to expand all regions in a file is so difficult to deal with.

I use Textmate (Mac only) which has Code folding and I find it really useful for folding functions, I know what my "getGet" function does, I don't need it taking up 10 lines of oh so valuable screen space.
I never use it to hide a for loop, if statement or similar unless showing the code to someone else where I will hide code they have seen to avoid showing the same code twice.

I prefer partial classes as opposed to regions.
Extensive use of regions by others also give me the impression that someone, somewhere, is violating the Single Responsibility Principle and is trying to do too many things with one object.

#Tom
Partial classes are provided so that you can separate tool auto-generated code from any customisations you may need to make after the code gen has done its bit. This means your code stays intact after you re-run the codegen and doesn't get overwritten. This is a good thing.

I'm not a fan of partial classes - I try to develop my classes such that each class has a very clear, single issue for which it's responsible. To that end, I don't believe that something with a clear responsibility should be split across multiple files. That's why I don't like partial classes.
With that said, I'm on the fence about regions. For the most part, I don't use them; however, I work with code every day that includes regions - some people go really heavy on them (folding up private methods into a region and then each method folded into its own region), and some people go light on them (folding up enums, folding up attributes, etc). My general rule of thumb, as of now, is that I only put code in regions if (a) the data is likely to remain static or will not be touched very often (like enums), or (b) if there are methods that are implemented out of necessity because of subclassing or abstract method implementation, but, again, won't be touched very often.

Regions must never be used inside methods. They may be used to group methods but this must be handled with extreme caution so that the reader of the code does not go insane. There is no point in folding methods by their modifiers. But sometimes folding may increase readability. For e.g. grouping some methods that you use for working around some issues when using an external library and you won't want to visit too often may be helpful. But the coder must always seek for solutions like wrapping the library with appropriate classes in this particular example. When all else fails, use folding for improving readibility.

This is just one of those silly discussions that lead to nowhere. If you like regions, use them. If you don't, configure your editor to turn them off. There, everybody is happy.

I generally find that when dealing with code like Events in C# where there's about 10 lines of code that are actually just part of an event declaration (the EventArgs class the delegate declaration and the event declaration) Putting a region around them and then folding them out of the way makes it a little more readable.

Region folding would be fine if I didn't have to manually maintain region groupings based on features of my code that are intrinsic to the language. For example, the compiler already knows it's a constructor. The IDE's code model already knows it's a constructor. But if I want to see a view of the code where the constructors are grouped together, for some reason I have to restate the fact that these things are constructors, by physically placing them together and then putting a group around them. The same goes for any other way of slicing up a class/struct/interface. What if I change my mind and want to see the public/protected/private stuff separated out into groups first, and then grouped by member kind?
Using regions to mark out public properties (for example) is as bad as entering a redundant comment that adds nothing to what is already discernible from the code itself.
Anyway, to avoid having to use regions for that purpose, I wrote a free, open source Visual Studio 2008 IDE add-in called Ora. It provides a grouped view automatically, making it far less necessary to maintain physical grouping or to use regions. You may find it useful.

I think that it's a useful tool, when used properly. In many cases, I feel that methods and enumerations and other things that are often folded should be little black boxes. Unless you must look at them for some reason, their contents don't matter and should be as hidden as possible. However, I never fold private methods, comments, or inner classes. Methods and enums are really the only things I fold.

My approach is similar to a few others here, using regions to organize code blocks into constructors, properties, events, etc.
There's an excellent set of VS.NET macros by Roland Weigelt available from his blog entry, Better Keyboard Support for #region ... #endregion. I've been using these for years, mapping ctrl+. to collapse the current region and ctrl++ to expand it. Find that it works a lot better that the default VS.NET functionality which folds/unfolds everything.

I personally use #Regions all the time. I find that it helps me to keep things like properties, declarations, etc separated from each other.
This is probably a good answer, too!
Coding Horror
Edit: Dang, Pat beat me to this!

The Coding Horror article actual got me thinking about this as well.
Generally, I large classes I will put a region around the member variables, constants, and properties to reduce the amount of text I have to scroll through and leave everything else outside of a region. On forms I will generally group things into "member variables, constants, and properties", form functions, and event handlers. Once again, this is more so I don't have to scroll through a lot of text when I just want to review some event handlers.

I prefer #regions myself, but an old coworker couldn't stand to have things hidden. I understood his point once I worked on a page with 7 #regions, at least 3 of which had been auto-generated and had the same name, but in general I think they're a useful way of splitting things up and keeping everything less cluttered.

I really don't have a problem with using #region to organize code. Personally, I'll usually setup different regions for things like properties, event handlers, and public/private methods.

Eclipse does some of this in Java (or PHP with plugins) on its own. Allows you to fold functions and such. I tend to like it. If I know what a function does and I am not working on it, I dont need to look at it.

Emacs has a folding minor mode, but I only fire it up occasionally. Mostly when I'm working on some monstrosity inherited from another physicist who evidently had less instruction or took less care about his/her coding practices.

Using regions (or otherwise folding code) should have nothing to do with code smells (or hiding them) or any other idea of hiding code you don't want people to "easily" see.
Regions and code folding is really all about providing a way to easily group sections of code that can be collapsed/folded/hidden to minimize the amount of extraneous "noise" around what you are currently working on. If you set things up correctly (meaning actually name your regions something useful, like the name of the method contained) then you can collapse everything except for the function you are currently editing and still maintain some level of context without having to actually see the other code lines.
There probably should be some best practice type guidelines around these ideas, but I use regions extensively to provide a standard structure to my code files (I group events, class-wide fields, private properties/methods, public properties/methods). Each method or property also has a region, where the region name is the method/property name. If I have a bunch of overloaded methods, the region name is the full signature and then that entire group is wrapped in a region that is just the function name.

I personally hate regions. The only code that should be in regions in my opinion is generated code.
When I open file I always start with Ctrl+M+O. This folds to method level. When you have regions you see nothing but region names.
Before checking in I group methods/fields logically so that it looks ok after Ctrl+M+O.
If you need regions you have to much lines in your class. I also find that this is very common.
region ThisLooksLikeWellOrganizedCodeBecauseIUseRegions
// total garbage, no structure here
endregion

Enumerations
Properties
.ctors
Methods
Event Handlers
That's all I use regions for. I had no idea you could use them inside of methods.
Sounds like a terrible idea :)

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio