What is the name of this anti-pattern? - performance

Surely some of you have dealt with this one. It tends to happen when programmers get a bit too taken by OO and forget about performance and having a database.
For an example, lets say we have an Email table and they need to be sent by this program. At start-up, it looks for anything that needs to be sent as follows:
Emails = find_every_damn_email_in_the_database();
FOR Email in Emails
IF !Email.IsSent() THEN Email.Send()
This is a good from a do-not-repeat-yourself perspective, but sometimes it's unavoidable and it should be:
Emails = find_unsent_emails();
FOR Email in Emails
Email.Send()
Is there a name of this one?

I'll have a go at it and coin the name "the lazy filter (anti) pattern".

I saw that once. That programmer wasn't around too long.
We called that the "firehose method".

To me it's Joel Spolsky's leaky abstraction.
It's not exactly an anti-pattern, but whoever wrote this code, didn't really understand where Active Record pattern abstraction leaks.

I call that "The Shotgun Approach".

I'm not sure this is necessarily database related, since you could have a complex and expensive procedure (e.g., more than a flag) for applying a filter for a group.
I don't think there's a name to it, since the first design is simply not good, and it violates the one-responsibility-only principle. If you search, filter, and print the filtered you are doing multiple things, so you need to refactor it into "searched filtered" and print.
The only thing different than a simple refactoring here is that it also affects performance, in the same way that inner loops can be designed in ways that harm performance.

Appear to have derived from the following anti-patterns:
Standing On The Shoulders Of Midgets
If It Is Working Dont Change
The original developer would have possibly not been allowed to write the find_unsent_emails() implementation, and would therefore have reused the midget function. And then, why change it after development and testing?

This is frequently due to it being a lot easier to use an existing query and then filtering in code than getting a new SQL query added. Maybe because the DBAs control all queries and getting a new query approved takes days, or maybe because the ORM tool you're using makes it very difficult to define your own custom queries.
If I were to name it I'd call it the "Easy Way Out" (anti)pattern. Whether it's an antipattern or not really depends on the individual situation. If it will always be a fairly small number of items you need to retrieve, doing the filtering in code really isn't a big problem. But if the number of items is large and has the potential to continually grow, then obviously the filtering should be done on the server.

I've seen similar issues elsewhere, where instead of a simple array of things to do, there was a "transaction cluster" based on a "list cluster" based on a "collection cluster" based on a "memory cluster". Needless to say, the simplest thing turned into a great big freakin' deal.
I called it galloping generality.

Stoopid Amateurs.
Seriously, I've only seen this one in people with Computer Science degrees and no professional experience at all. When I was teaching at Duke, my advisor and I ran a "Large Scale Programming" class where we made people look at exactly these sorts of errors.

The performance of the first one can actually be fine, depending on the type of Emails. If it's just an iterator (think of std::vector::begin() in C++) then it's fine and better than storing all unsent e-mails in some container first.

This antipattern has several possible names.
"Don't-know-SQL" antipattern
"Fascist-DBA" antipattern
"What-does-'latency'-mean?" antipattern

There is a nice example at The Daily WTF.

Inspired partly by 1800's "the lazy filter (anti) pattern", how about "dysfunctional programming" (ie the opposite of functional programming)?

Related

Bootstrapping complex data sets

The application I am working on has "Default lists" one of which is already created in the app currently. The list has events and events touch 2-3 other models. Which would make seeding, etc very time consuming due to the complexity of the lists and the associated models the list has data in
Due to the complexity of the lists I would prefer to build the lists though the UI and then extracting it for later use.
Is there any worthwhile way of extracting the aforementioned list object and for lack of a better term "bootstrap it"
Thanks for your help in advance.
I think what you are trying to get at is seed data. Take a look at this railscats on just that.
Solution: https://github.com/rhalff/seed_dump
I highly enjoy the comments on the github page
It mainly exists for people who are too lazy writing create statements
in db/seeds.rb themselves and need something (seed_dump) to dump data
from the table(s) into seeds.rb
My response to that is "work smart, not hard" no need for me to spend a day or 2 writing out long seeds instead of doing actual work.
Unless i'm hungover then i'll just pretend seed_dump is on the fritz ;)

Is it easy to abuse the observer pattern?

I have a project where I am using the observer pattern extensively for the first time. One thing I've found though, is that if I inspect a typical object in this project, it tends to be astonishingly large with all of the observers and observables, and then the times when an observer has other observers, etc.
That seems to be be beside the point since the performance is fine. But I've found that occasionally when I'm in the debugger, if I try to print an instance variable that it will lock up my machine until I kill the process. This has me concerned that there is some opportunity for this to happen while the code is in production. Or that this is just a warning that I am abusing the pattern.
Any tips, suggestions?
TL;DR: Yep, but that doesn't mean it's not perfect sometimes.
"Astonishingly large" implies... it's pretty large; what does that actually mean? How many observers/observables are there? Are they deeply nested?
IMO the correlation between doing stuff in a debugger and "real life" isn't particularly strong; has it ever locked up in production or testing? I'd be more likely to think it's an artifact of the debugging process/app.
"Spooky action at a distance" creates non-locality that must be understood in order to reason correctly about code and behavior. This kind of development needs to be groomed aggressively; rather than saying "I'll just create a new observer", architect it in, and keep reasoning as linear as possible.
You could override the inspect method to be less verbose.

Web site performance - what is it about? Readability and ignorance Vs. Performance

Let me cut to the chase...
On one hand, many of the programming advices given (here and on other places) emphasize the notion that code should always be as readable and as clear as possible, at (almost?!) any pefromance cost.
On the other hand there are SO many slow web sites (at least one of whom, I know from personal experience).
Obviously round trips and DB access, are issues a web developer should always keep in mind. But the trade-off between readability and what not to do because it slows things down, for me is very unclear.
Question are- 1.What else? 2.Is there a rule (preferably simple, but probably quite general) one should adhere to in order to make sure his code does not slow things down too much?
General best practices as well as specific advices would be much appreciated. Advices based on experience would be especially appreciated.
Thanks.
Edit: A little clarification: General peformance advices aren't hard to find. That's not what I'm looking for. I'm asking about two things- 1. While trying to make my code as readable as possible, when should I stop and say: "Now I'm hurting performance too much". 2. Little, less known things like- is selecting just one column faster than selecting all (Thanks Otávio)... Thanks again!
See the Stack Overflow discussion here:
What is the most important effect on performance in a database-backed web application?
The top voted answer was, "write it clean, and use a profiler to identify real problems and address them."
In my experience, the biggest mistake (using C#/asp.net/linq) is over-querying due to LINQ's ease-of-use. One huge query is usually much faster than 10000 small ones.
The other ASP.NET gotcha I see a lot is when the viewstate gets extremely fat and bloated. EnableViewState=false is your best friend, start every new project with it!
For web applications that have a database back end, it is extremely important that:
indexing is done properly
retrieval is done for what is needed (avoid select * when selecting specific fields will do - even more so if they are part of a covered index)
Also, whenever possible an appropriate caching strategy can help performance
Optimizing your code.
While making your code as readable as possible is very important. Optimizing it is equally as important. I've listed some items that will hopefully get you in the right direction.
For example in regards to Databases:
When you define the schema of your database, you should make sure that it is normalized and the indexes of fields are defined properly.
When running a query, specifically SELECT, only select the fields you need.
You should only make one connection to the database per page load.
Re-factor. This is probably the most important factor in producing clean, optimized code. Always go back and look at your code and see what can be done to improve it.
PHP Code:
Always test your work with a tool like PHPUnit.
echo is faster than print.
Wrap your string in single quotes (‘) instead of double quotes (“) is faster because PHP searches for variables inside “…” and not in ‘…’, use this when you’re not using variables you need evaluating in your string.
Use echo’s multiple parameters (or stacked) instead of string concatenation.
Unset or null your variables to free memory, especially large arrays.
Use strict code, avoid suppressing errors, notices and warnings thus resulting in cleaner code and less overheads. Consider having error_reporting(E_ALL) always on.
Incrementing an undefined local variable is 9-10 times slower than a pre-initialized one.
Methods in derived classes run faster than ones defined in the base class.
Error suppression with # is very slow.
Website Optimization
A good place to start is here (http://developer.yahoo.com/performance/rules.html)
Performance is a huge topic and there are a lot of things that you can do to help improve the performance of your website. It's something that takes time and experience.
Best,
Richard Castera
Scott and Rcastera did a good job covering DB and querying optimization. To address your question from a HTML / CSS / JavaScript standpoint:
CSS:
Readability is key. CSS is rendered so fast that you should never feel it is necessary to sacrifice readability for performance. As such, focus on adding in as many comments as necessary to document the code, why certain rules (like hacks) are there, and whatever else floats your comment boat. In CSS there are a few obvious rules to follow: 1) Use external stylsheets. 2) Limit external stylesheets to limit GET requests.
HTML: Like CSS, HTML is read so fast by the browser you should really only focus on writing clean code. Use whitespace, indentation, and comments to properly document your work. Only major things in HTML to remember are: 1) declare the <meta charset /> early within the head section. 2) Follow this guys advice to minimize browser reflows. *this rule actually applies to CSS as well.
JavaScript: Most optimizations for JavaScript are really well known by now so these'll seem obvious, like initializing variables outside of loops, pushing javascript to bottom of body so DOM loads before scripts start tying up all of the resources, avoiding costly statements like eval() or with(). Not to sound like a broken record, but keeping a well commented and easily readable script should still be a priority when developing JavaScript code. Especially since you can just minimize and compress away all the excess when you deploy it.

Stories and Scenarios that implies UI

I am trying to learn how to use BDD for our development process and I sometimes end-up writing things that implies a UI design, so for brand new development or new features, the UI does not always exists.
For example, if I say this in a scenario "When a column header is clicked" it implies that this feature is based on some sort of table or grid, but at this point we are still just writing user-stories so there is no UI yet.
That gets me confused to know at what point in the process do we come up with a UI design ?
Keep in mind, I only have read articles about BDD and I think it would help our team a lot but still very new at this! Thx!
If you write your scenarios with a focus on the capabilities of the system, you'll be able to refactor the underlying steps within those scenarios more easily. It keeps them flexible. So I'd ask - what does clicking the column get for you? Are you selecting something? What are you going to do with the selection? Are you searching for something and sorting by a value?
I like to see scenarios which say things like:
When I look for the entry
When I go to the diary for January
When I look at the newest entries
When I look at the same T-shirt in black
These could all involve clicking on a column header, but the implementation detail doesn't matter. It's the capability of the system.
Beneath these high-level scenarios and steps I like to create a screen or page with the smaller steps like clicking buttons in it. This makes it easy to refactor.
I wrote this in a DSL rather than English, but it works with the same idea - you can't tell from the steps whether it's a GUI or a web page, and some of the steps involve multiple UI actions:
http://code.google.com/p/wipflash/source/browse/Example.PetShop.Scenarios/PetRegistrationAndPurchase.cs
Hope you find it interesting and maybe it helps. Good luck!
I guess you can write around that by saying "when I sort the information by X, then..." But then you would have to adjust your scenario to remove any mention of the data being displayed in a grid format, which could lead to some rather obtuse writing.
I think it's a good idea to start with UI design as soon as you possibly can. In the case you mentioned above, I think it would be perfectly valid to augment the user story with sketch of the relevant UI as you would imagine it, and then refine it as you go along. A pencil sketch on a piece of paper should be fine. Or you could use a tablet and SketchBook Pro if you want something all digital.
My point is that I don't see a real reason for the UI design to be left out of user stories. You probably already know that you're going to build a Windows, WPF, or Web application. And it's safe to assume that when you want to display tabular data, you'll be using a grid. Keeping these assumptions out of the requirements obfuscates them without adding any real value.
User stories benefit from the fact, that you describe concrete interactions and once you know concrete data and behaviour of the system for it, you might as well add more information about the way you interact. This allows you to use some tools like Cucumber, which with Selenium enables you to translate a story to a test. You might go even further and e.g. for web apps capture all pages you start concrete story at and collect all interactions with that page resulting in some sort of information architecture you might use for documentation or prototyping and later UI testing.
On the other hand, this makes your stories somewhat brittle when it comes to UI changes. I think the agile way of thinking about this is same as when it comes to design changes - do not design for the future, do the simplest possible thing, in the future you might need to change it anyway.
If you stripped your user stories of all concrete things (even inputs) you will end up with use cases(at least in their simplest format, depends on how you write your stories). Use cases are in this respect not brittle at all, they specify only goals. This makes them resistant to change, but its harder to transfer information automatically using tools.
As for the process, RUP/UP derives UI from use cases, but I think agile is in its nature incremental (I will not say iterative, this would exclude agile methods like FDD and Kanban). This means, as you implement new story, you add to your UI what is necessary. This only makes adding UI specifics in stories more reasonable. The problem is, that this is not a very good way to create UI or more generally UX(user experience). This is exactly what one might call a weakpoint of agile. The Agile manifesto concentrates on functional software, but that is it. There are as far as I know no agile techniques for designing UI or UX.
I think you just need to step back a bit.
BAD: When I click the column header, the rows get sorted by the column I clicked.
GOOD: Then I sort the rows by name, or sometimes by ZIP code if the name is very common, like "Smith".
A user story / workflow is a sequence of what the user wants to achieve, not a sequence of actions how he achieves that. You are collecting the What's so you can determine the best How's for all users and use cases.
Looking at a singular aspect of your post:
if I say this in a scenario "When a column header is clicked" it implies that this feature is based on some sort of table or grid, but at this point we are still just writing user-stories so there is no UI yet.
If this came from a user, not from you, it would show a hidden expectation that there actually is a table or grid with column headers. Even coming from you it's not entirely without value, as you might be a user, too. It might be short-sighted, thinking of a grid just because it comes from an SQL query, or it might be spot-on because it's the presentation you expect the data in. A creative UI isnÄt a bad thing as such, but ignoring user expectations is.

How "defensive" should my code be?

I was having a discussion with one of my colleagues about how defensive your code should be. I am all pro defensive programming but you have to know where to stop. We are working on a project that will be maintained by others, but this doesn't mean we have to check for ALL the crazy things a developer could do. Of course, you could do that but this will add a very big overhead to your code.
How do you know where to draw the line?
Anything a user enters directly or indirectly, you should always sanity-check. Beyond that, a few asserts here and there won't hurt, but you can't really do much about crazy programmers editing and breaking your code, anyway!-)
I tend to change the amount of defense I put in my code based on the language. Today I'm primarily working in C++ so my thoughts are drifting in that direction.
When working in C++ there cannot be enough defensive programming. I treat my code as if I'm guarding nuclear secrets and every other programmer is out to get them. Asserts, throws, compiler time error template hacks, argument validation, eliminating pointers, in depth code reviews and general paranoia are all fair game. C++ is an evil wonderful language that I both love and severely mistrust.
I'm not a fan of the term "defensive programming". To me it suggests code like this:
void MakePayment( Account * a, const Payment * p ) {
if ( a == 0 || p == 0 ) {
return;
}
// payment logic here
}
This is wrong, wrong, wrong, but I must have seen it hundreds of times. The function should never have been called with null pointers in the first place, and it is utterly wrong to quietly accept them.
The correct approach here is debatable, but a minimal solution is to fail noisily, either by using an assert or by throwing an exception.
Edit: I disagree with some other answers and comments here - I do not think that all functions should check their parameters (for many functions this is simply impossible). Instead, I believe that all functions should document the values that are acceptable and state that other values will result in undefined behaviour. This is the approach taken by the most succesful and widely used libraries ever written - the C and C++ standard libraries.
And now let the downvotes begin...
I don't know that there's really any way to answer this. It's just something that you learn from experience. You just need to ask yourself how common a potential problem is likely to be and make a judgement call. Also consider that you don't necessarily have to always code defensively. Sometimes it's acceptable just to note any potential problems in your code's documentation.
Ultimately though, I think this is just something that a person has to follow their intuition on. There's no right or wrong way to do it.
If you're working on public APIs of a component then its worth doing a good amount of parameter validation. This led me to have a habit of doing validation everywhere. Thats a mistake. All that validation code never gets tested and potentially makes the system more complicated than it needs to be.
Now I prefer to validate by unit testing. Validation definitely happens for data coming from external sources, but not for calls from non-external developers.
I always Debug.Assert my assumptions.
My personal ideology: the defensiveness of a program should be proportional to the maximum naivety/ignorance of the potential user base.
Being defensive against developers consuming your API code is not that different from being defensive against regular users.
Check the parameters to make sure they are within appropriate bounds and of expected types
Verify that the number of API calls which could be made are within your Terms of Service. Generally called throttling it usually only applies to web services and password checking functions.
Beyond that there's not much else to do except make sure your app recovers well in the event of a problem and that you always give ample information to the developer so that they understand what's going on.
Defensive programming is only one way of hounouring a contract in a design-by-contract manner of coding.
The other two are
total programming and
nominal programming.
Of course you shouldnt defend yourself against every crazy thing a developer could do, but then you should state in wich context it will do what is expected to using preconditions.
//precondition : par is so and so and so
function doSth(par)
{
debug.assert(par is so and so and so )
//dostuf with par
return result
}
I think you have to bring in the question of whether you're creating tests as well. You should be defensive in your coding, but as pointed out by JaredPar -- I also believe it depends on the language you're using. If it's unmanaged code, then you should be extremely defensive. If it's managed, I believe you have a little bit of wiggleroom.
If you have tests, and some other developer tries to decimate your code, the tests will fail. But then again, it depends on test coverage on your code (if there is any).
I try to write code that is more than defensive, but down right hostile. If something goes wrong and I can fix it, I will. if not, throw or pass on the exception and make it someone elses problem. Anything that interacts with a physical device - file system, database connection, network connection should be considered unereliable and prone to failure. anticipating these failures and trapping them is critical
Once you have this mindset, the key is to be consistent in your approach. do you expect to hand back status codes to comminicate problems in the call chain or do you like exceptions. mixed models will kill you or at least drive you to drink. heavily. if you are using someone elses api, then isolate these things into mechanisms that trap/report in terms you use. use these wrapping interfaces.
If the discussion here is how to code defensively against future (possibly malevolent or incompetent) maintainers, there is a limit to what you can do. Enforcing contracts through test coverage and liberal use of asserting your assumptions is probably the best you can do, and it should be done in a way that ideally doesn't clutter the code and make the job harder for the future non-evil maintainers of the code. Asserts are easy to read and understand and make it clear what the assumptions of a given piece of code is, so they're usually a great idea.
Coding defensively against user actions is another issue entirely, and the approach that I use is to think that the user is out to get me. Every input is examined as carefully as I can manage, and I make every effort to have my code fail safe - try not to persist any state that isn't rigorously vetted, correct where you can, exit gracefully if you cannot, etc. If you just think about all the bozo things that could be perpetrated on your code by outside agents, it gets you in the right mindset.
Coding defensively against other code, such as your platform or other modules, is exactly the same as users: they're out to get you. The OS is always going to swap out your thread at an inopportune time, networks are always going to go away at the wrong time, and in general, evil abounds around every corner. You don't need to code against every potential problem out there - the cost in maintenance might not be worth the increase in safety - but it sure doesn't hurt to think about it. And it usually doesn't hurt to explicitly comment in the code if there's a scenario you thought of but regard as unimportant for some reason.
Systems should have well designed boundaries where defensive checking happens. There should be a decision about where user input is validated (at what boundary) and where other potential defensive issues require checking (for example, third party integration points, publicly available APIs, rules engine interaction, or different units coded by different teams of programmers). More defensive checking than that violates DRY in many cases, and just adds maintenance cost for very little benifit.
That being said, there are certain points where you cannot be too paranoid. Potential for buffer overflows, data corruption and similar issues should be very rigorously defended against.
I recently had scenario, in which user input data was propagated through remote facade interface, then local facade interface, then some other class, to finally get to the method where it was actually used. I was asking my self a question: When should be the value validated? I added validation code only to the final class, where the value was actually used. Adding other validation code snippets in classes laying on the propagation path would be too defensive programming for me. One exception could be the remote facade, but I skipped it too.
Good question, I've flip flopped between doing sanity checks and not doing them. Its a 50/50
situation, I'd probably take a middle ground where I would only "Bullet Proof" any routines that are:
(a) Called from more than one place in the project
(b) has logic that is LIKELY to change
(c) You can not use default values
(d) the routine can not be 'failed' gracefully
Darknight

Resources