Related
What are some of the common and notable performance issues/bottlenecks that are typically encountered in a web application in both, the front-end layer, and the back-end layer?
An example of what I mean in a database is not having something you are querying on be an index. That would slow down the query. On the front-end it might be something funky going on with JavaScript that makes your application seem slow.
What are the general rules of thumb that help navigate such issues? And what are some good to-do's?
Thanks,
Alex
On front-end:
-push all of your assets - css files, images, static content - to a CDN. Edgecast is pretty good and reasonably priced.
-don't use load entire javascript frameworks when you only need a few features from it. only load what's needed.
On back-end
-memcache the results from all database calls by using a hash of the sql query as the key name, and the result set as the value
-make sure you are not making your database tables really 'wide' - tons of columns and column types like 'text' and 'blob'
For the front-end, there are well-known guidelines/rules you can follow, and there are some great tools like YSlow that can help you pinpoint the bottlenecks.
For the back-end, as you've noted, efficient use of indexes is a must. Other optimizations usually involve caching, and basic stuff like avoiding doing stuff within loops that can be done once. I'm sure people here will have suggestions, but remember "premature optimization is the root of all evil!" :-)
Millhouse is on to it. I can also add:
Batch expensive operations up. For example: don't make lots of individual calls to a database if you can do it all in one hit.
Avoid server hops where you can.
Process in parallel if you can (not so common for your 'average' web app but quite possible in larger Enterprise scale apps).
Pre-process: crunching data, pre-puiblishing content etc, the more you can do before it's needed the better.
Use a CQRS-based architecture. CQRS stands for Command/Query Responsability Segregation; it basically means that you have different code (services) for reading from the DB and writing to the DB. A good practice for scalability is to have separate DB's for reading and writing (it actually does make sense, if you read more about CQRS), and you can scale out the reading database by having copies run on multiple servers.
CQRS is not only interesting from a scalability point of view, but also from a code maintenance and clarity point of view. It does take some effort to learn about CQRS and understand it, though.
Check out these links:
http://www.slideshare.net/skillsmatter/ddd-exchange-2010-udi-dahan-on-architectural-innovation-cqrs
http://www.slideshare.net/pjvdsande/rethink-your-architecture-with-cqrs
convert dynamic contents to static contents. regenerate those static contents if their dependent objects changed. I saw one article said that more than 80 percent contents are static on Amazon website.
Have you applied test driven development to purely sql scripts? if so what has been your experience. Is it worth it? What are the rewards? disadvantages ? etc.
I have played around a bit, to be honest I would rather generate my regular DB code. I was this awhile ago and thought it was interesting. http://sourceforge.net/apps/trac/tsqlunit/
Most (all?) of my database "scripts" are generated, not hand-written. And, I avoid stored procedures and views. I basically treat my database as a file. The testing and logic stays in the application layer (where it belongs, IMO).
This approach works pretty well for me and the kinds of applications that I develop. It might not work so well in other situations.
For me the answer to your question is "not applicable".
SQL was the vehicle for one of my very first TDD collaborations. This was in a setting where I was an application developer (C++, I think, but it's been a while) and we had a DBA with responsibility for all queries. I wouldn't choose to go that route again, but that's another story. The time came when I needed a new query, so I wrote up some test data and expected results and sent it to the DBA; he wrote the script and thanked me for making the requirements so clear and precise.
TDD as it's usually practiced doesn't fit well with SQL (or maybe the other way 'round), but it's not really all that hard to adapt the practice to work well with the language. One-button testing may be a little harder to bring into the mix, but it's rarely hard to run a query.
Let me cut to the chase...
On one hand, many of the programming advices given (here and on other places) emphasize the notion that code should always be as readable and as clear as possible, at (almost?!) any pefromance cost.
On the other hand there are SO many slow web sites (at least one of whom, I know from personal experience).
Obviously round trips and DB access, are issues a web developer should always keep in mind. But the trade-off between readability and what not to do because it slows things down, for me is very unclear.
Question are- 1.What else? 2.Is there a rule (preferably simple, but probably quite general) one should adhere to in order to make sure his code does not slow things down too much?
General best practices as well as specific advices would be much appreciated. Advices based on experience would be especially appreciated.
Thanks.
Edit: A little clarification: General peformance advices aren't hard to find. That's not what I'm looking for. I'm asking about two things- 1. While trying to make my code as readable as possible, when should I stop and say: "Now I'm hurting performance too much". 2. Little, less known things like- is selecting just one column faster than selecting all (Thanks Otávio)... Thanks again!
See the Stack Overflow discussion here:
What is the most important effect on performance in a database-backed web application?
The top voted answer was, "write it clean, and use a profiler to identify real problems and address them."
In my experience, the biggest mistake (using C#/asp.net/linq) is over-querying due to LINQ's ease-of-use. One huge query is usually much faster than 10000 small ones.
The other ASP.NET gotcha I see a lot is when the viewstate gets extremely fat and bloated. EnableViewState=false is your best friend, start every new project with it!
For web applications that have a database back end, it is extremely important that:
indexing is done properly
retrieval is done for what is needed (avoid select * when selecting specific fields will do - even more so if they are part of a covered index)
Also, whenever possible an appropriate caching strategy can help performance
Optimizing your code.
While making your code as readable as possible is very important. Optimizing it is equally as important. I've listed some items that will hopefully get you in the right direction.
For example in regards to Databases:
When you define the schema of your database, you should make sure that it is normalized and the indexes of fields are defined properly.
When running a query, specifically SELECT, only select the fields you need.
You should only make one connection to the database per page load.
Re-factor. This is probably the most important factor in producing clean, optimized code. Always go back and look at your code and see what can be done to improve it.
PHP Code:
Always test your work with a tool like PHPUnit.
echo is faster than print.
Wrap your string in single quotes (‘) instead of double quotes (“) is faster because PHP searches for variables inside “…” and not in ‘…’, use this when you’re not using variables you need evaluating in your string.
Use echo’s multiple parameters (or stacked) instead of string concatenation.
Unset or null your variables to free memory, especially large arrays.
Use strict code, avoid suppressing errors, notices and warnings thus resulting in cleaner code and less overheads. Consider having error_reporting(E_ALL) always on.
Incrementing an undefined local variable is 9-10 times slower than a pre-initialized one.
Methods in derived classes run faster than ones defined in the base class.
Error suppression with # is very slow.
Website Optimization
A good place to start is here (http://developer.yahoo.com/performance/rules.html)
Performance is a huge topic and there are a lot of things that you can do to help improve the performance of your website. It's something that takes time and experience.
Best,
Richard Castera
Scott and Rcastera did a good job covering DB and querying optimization. To address your question from a HTML / CSS / JavaScript standpoint:
CSS:
Readability is key. CSS is rendered so fast that you should never feel it is necessary to sacrifice readability for performance. As such, focus on adding in as many comments as necessary to document the code, why certain rules (like hacks) are there, and whatever else floats your comment boat. In CSS there are a few obvious rules to follow: 1) Use external stylsheets. 2) Limit external stylesheets to limit GET requests.
HTML: Like CSS, HTML is read so fast by the browser you should really only focus on writing clean code. Use whitespace, indentation, and comments to properly document your work. Only major things in HTML to remember are: 1) declare the <meta charset /> early within the head section. 2) Follow this guys advice to minimize browser reflows. *this rule actually applies to CSS as well.
JavaScript: Most optimizations for JavaScript are really well known by now so these'll seem obvious, like initializing variables outside of loops, pushing javascript to bottom of body so DOM loads before scripts start tying up all of the resources, avoiding costly statements like eval() or with(). Not to sound like a broken record, but keeping a well commented and easily readable script should still be a priority when developing JavaScript code. Especially since you can just minimize and compress away all the excess when you deploy it.
I have been playing with some LINQ ORM (LINQ directly to SQL) and I have to admit I like its expressive powers . For small utility-like apps, It also works quite fast: dropping a SQL server on some surface and you're set to linq away.
For larger apps however, the DAL never was that big of an issue to me to setup, nor maintain, and more often than not, once it was set, all the programming was not happening there anyway...
My, honest - I am an ORM newbie - question : what is the big advantage of ORM over writing a decent DAL by hand?
(seems like a double, couldn't find it though)
UPDATE : OK its a double :-) I found it myself eventually :
ORM vs Handcoded Data Access Layer
Strong-typing
No need to write the DAL yourself => time savings
No need to write SQL code yourself =>
less error-prone
I've used Hibernate in the past to dynamically create quite complex queries. The logic involved to create the appropriate SQL would have been very time-consuming to implement, compared with the logic to build the appropriate Criteria. Additionally, Hibernate knew how to work with various different databases, so I didn't need to put any of that logic in our code. We had to test against different databases of course, and I needed to write an extension to handle "like" queries appropriately, but then it ran against SQL Server, Oracle and HSqldb (for testing) with no issues.
There's also the fact that it's more code you don't have to write, which is always a nice thing :) I can't say I've used LINQ to SQL in anything big, but where I've used it for a "quick and dirty" web-site (very small, rarely updated, little benefit from full layer abstraction) it was lovely.
I used JPA in a project, and at first I was extremely impressed. Gosh it saved me all that time writing SQL! Gradually, however, I became a bit disenchanted.
Difficulty defining tables without surrogate keys. Sometimes we need tables that don't have surrogate keys. Sometimes we want a multicolumn primary key. TopLink had difficulties with that.
Forced datastructure relationships. JPA uses annotations to describe the relationship between a field and the container or referencing class. While this may seem great at first site, what do you do when you reference the objects differently in the application? Say for example, you need just specific objects that reference specific records based on some specific criteria (and it needs to be high-performance with no unnecessary object allocation or record retrieval). The effort to modify Entity classes will almost always exceed the effort that would have existed had you never used JPA in the first place (assuming you are at all successful getting JPA to do what you want).
Caching. JPA defines the notion of caches for your objects. It must be remembered that the database has its own cache, typically optimized around minimizing disk reads. Now you're caching your data twice (ignoring the uncollected GC heap). How this can be an advantage is beyond me.
Data != Objects. For high-performance applications, the retrieval of data from the DB must be done very efficiently. Forcing object creation is not always a good thing. For example, sometimes you may want arrays of primitives. This is about 30 minutes of work for an experienced programmer working with straight JDBC.
Performance, debugging.
It is much more difficult to gauge the performance of an application with complex things going on in the (sub-optimal, autogenerated) caching subsystem, further straining project resources and budgets.
Most developers don't really understand the impedence mismatch problem that has always existed when mapping objects to tables. This fact ensures that JPA and friends will probably enjoy considerable (cough cough) success for the forseeable future.
Well, for me it is a lot about not having to reinvent/recreate the wheel each time I need to implement a new domain model. It is simply a lot more efficient to use for instance nHibernate (my ORM of choice) for creating, using and maintaining the data access layer.
You don't specify exactly how you build your DAL, but for me I used to spend quite some time doing the same stuff over and over again. I used to start with the database model and work my way up from there, creating stored procedures etc. Even if I sometimes used little tools to generate parts of the setup, it was a lot of repetitive coding.
Nowadays I start with the domain. I model it in UML, and for most of the time I'm able to generate everything from that model, including the database schema. It need a few tweaks here and there, but with my current setup I get 95% of the job with the data access done in no time at all. The time I save I can use to fine tune the parts that need tuning. I seldom need to write any SQL statements.
That's my two cents. :-)
Portability between different db vendors.
My, honest - i am an ORM newbie - question : what is the big advance of ORM over writing a decent DAL by hand?
Not all programmers are willing or even capable of writing "a decent DAL". Those who can't or get scared from the mere thought of it, find LINQ or any other ORM a blessing.
I personally use LINQ to manipulate collections in the code because of its expressiveness. It offers a very compact and transparent way to perform some common tasks on collections directly in code.
LINQ will stop being useful to you when you will want to create very specific and optimized queries by hand. Then you are likely to get a mixture of LINQ queries intermingled with custom stored procedures wired into it. Because of this considerations, I decided against LINQ to SQL in my current project (since I have a decent (imho) DAL layer). But I'm sure LINW will do just fine for simple sites like maybe your blog (or SO for that matter).
With LINQ/ORM there may also be a consideration of lagging for high traffic sites (since each incoming query will have to be compiled all over again). Though I have to admit I do not see any performance issues on SO.
You can also consider waiting for the Entity Framework v2. It should be more powerful than LINQ (and hopefully not that bad as v1 (according to some people)).
Transparent persistence - changes get saved (and cascaded) without you having to call Save(). At first glance this seems like a nightmare, but once you get used to working with it rather than against it, your domain code can be freed of persistence concerns almost completely. I don't know of any ORM other than Hibernate / NHibernate that does this, though there might be some...
The best way to answer the question is to understand exactly what libraries like Hibernate are actually accomplishing on your behalf. Most of the time abstractions exist for a reason, often to make certain problems less complex, or in the case Hibernate is almost a DSL for expression certain persistance concepts in a simple terse manner.
One can easily change the fetch strategy for collections by changing an annotation rather than writing up lots of code.
Hibernate and Linq are proven and tested by many, there is little chance you can achieve this quality without lots of work.
Hibernate addresses many features that would take you months and years to code.
Also, while the JPA documentation says that composite keys are supported, it can get very (very) tricky quickly. You can easily spend hours (days?) trying to get something quite simple working. If JPA really makes things simpler then developers should be freed from thinking too much about these details. It doesn't, and we are left with having to understand two levels of abstraction, the ORM (JPA) and JDBC. For my current project I'm using a very simple implementation that uses a package protected static get "constructor" that takes a ResultSet and returns an Object. This is about 4 lines of code per class, plus one line of code for each field. It's simple, high-performance, and quite effective, and I retain total control. If I need to access objects differently I can add another method that reads different fields (leaving the others null, for example). I don't require a spec that tells me "how ORMs must (!) be done". If I require caching for that class, I can implement it precisely as required.
I have used Linq, I found it very useful. I saves a lot of your time writing data access code. But for large applications you need more than DAL, for them you can easily extent classes created by it. Believe me, it really improves your productivity.
How do you like your CRUD programs. Code-generated, framework-driven, or manually written?
My experience with code generators is that they're a good start but after the changes have settled down I usually want to rewrite the modules by hand. Of course, that can become a maintenance problem. But it really turns into a "how long is a piece of rope" question. Which generators, frameworks, and resources are you dealing with? Some of them are horrors to deal with, others work all right.
I like code generators with custom templates for the following reasons:
Reduces coding effort
Easy to make global changes
Embed architecture in templates ensures developer compliance.
Less chance of coding errors.
Consistent functionality
Less to test.
In fact, using code generators I was able to create, or recreate, the store procedures, entity classes, and DAL from a modified database with 60+ tables in minutes when the schema was updated. By using custom templates, I was ensured that the all layers worked with my naming rules and ensured proper error handling and prevention of double insertion.
Great for fixed price contracts. If it is hourly, then you might want to do it by hand :-)
I like a mixture of framework driven and manually written. I've done a little bit with NHibernate and LinqtoSql and sometimes the queries they generate for me need a little bit of help.
This really depends on the size of your application. Hand-crafted Data Access Layers make the most sense for a very small application as you have ultimate control but for any medium to large size application I would recommend a code generator. I've had various experience with APEX SQL (not great), LINQ and Subsonic (both very good). I'm just about to evaluate a Telerik ORM shortly but I imagine that will be pretty good also.
If you use .Net use Linq, then it is easy to maintain. LinqToSql makes it easy to update your data model with out having to change the code a whole lot.
In my opinion code generators are a sign of bad design and violate DRY. Where as a good framework will have you maintaining less code. With frameworks you also end up extending and refactoring code rather than a code template.
Frameworks are choice one, if I need to use a code generator I like to throw together a quick Perl script that generates the code so I understand exactly what is getting generated and why.
They are useful if you view your users as data entry clerks to maintain your database tables for you. They help minimize the programming time required to meet minimum requirements.
If you want the quality of your work to reflect something better than that, the best that can be said for them is they might give you a jumpstart if you're not too sure how to do simple consistent UI screens yourself.
Personally I find that refactoring them into something useful and attractive based on real Use Cases takes longer than doing it from scratch. They're the kind of technique Dilbert's pointy-haired boss would love.
I find a good framework for CRUD logic better than code generators. I have run into situations when a complex set of tables generated a terribly slow query to produce the result.