I am currently investigating a ASP.NET MVC web application which is reported to have poor performance under load. (But load is only a few requests per second).
We are using MySQL + NHibernate + Castle ActiveRecord for the mapping. A NHibernate session is opened at the beginning of every session and kept open in view.
I already optimized the data access pattern to avoid Select N+1 problems where possible.
Now what I'm thinking about is.. on each request a database transaction is opened and commited at the end. And in 99% of our requests (MVC actions) no data has to be written to the database.
Is it possible and do you see benefit in closing sessions/transactions earlier or even mark sessions as read-only?
Could database locking be a bottleneck and if so is it possible to explicitly avoid locking at least for the read-only transactions?
You should verify that your application is not loading huge amount of data from DB. Even with all select n+1 resolved you can load millions of records and it is going to be very slow.
Verify your pages with NHibernate profiler. It will come up with optimization suggestions. If not, probably NH is not your bottleneck.
If you only have few requests per second, then the overhead of opening transactions is not the reason for the poor performance. Try to let NHibernate log all the SQL that is sent to the server. This can give you some idea of why the thing i slow. Probably it is sending a billion queries for each HTTP request or else some well-chosen indices on your tables could probably help you.
Related
Correct me if I'm wrong, but from my understanding, "database caches" are usually implemented with an in-memory database that is local to the web server (same machine as the web server). Also, these "database caches" store the actual results of queries. I have also read up on the multiple caching strategies like - Cache Aside, Read Through, Write Through, Write Behind, Write Around.
For some context, the Write Through strategy looks like this:
and the Cache Aside strategy looks like this:
I believe that the "Application" refers to a backend server with a REST API.
My first question is, in the Write Through strategy (application writes to cache, cache then writes to database), how does this work? From my understanding, the most commonly used database caches are Redis or Memcached - which are just key-value stores. Suppose you have a relational database as the main database, how are these key-value stores going to write back to the relational database? Do these strategies only apply if your main database is also a key-value store?
In a Write Through (or Read Through) strategy, the cache sits in between the application and the database. How does that even work? How do you get the cache to talk to the database server? From my understanding, the web server (the application) is always the one facilitating the communication between the cache and the main database - which is basically a Cache Aside strategy. Unless Redis has some kind of functionality that allows it to talk to another database, I don't quite understand how this works.
Isn't it possible to mix and match caching strategies? From how I see it, Cache Aside and Read Through are caching strategies for application reads (user wants to read data), while Write Through and Write Behind are caching strategies for application writes (user wants to write data). Couldn't you have a strategy that uses both Cache Aside and Write Through? Why do most articles always seem to portray them as independent strategies?
What happens if you have a cluster of webs servers? Do they each have their own local in-memory database that acts as a cache?
Could you implement a cache using a normal (not in-memory) database? I suppose this would still be somewhat useful since you do not need to make an additional network hop to the database server (since the cache lives on the same machine as the web server)?
Introduction & clarification
I guess you have one misunderstood point, that the cache is NOT expclicitely stored on the same server as the werbserver. Sometimes, not even the database is sperated on it's own server from the webserver. If you think of APIs, like HTTP REST APIs, you can use caching to not spend too many resources on database connections & queries. Generally, you want to use as few database connections & queries as possible. Now imagine the following setting:
You have a werbserver who serves your application and a REST API, which is used by the webserver to work with some resources. Those resources come from a database (lets say a relational database) which is also stored on the same server. Now there is one endpoint which serves e.g. a list of posts (like blog-posts). Every user can fetch all posts (to make it simple in this example). Now we have a case where one can say that this API request could be cached, to not let all users always trigger the database, just to query the same resources (via the REST API) over and over again. Here comes caching. Redis is one of many tools which can be used for caching. Since redis is a simple in-memory key-value storage, you can just put all of your posts (remember the REST API) after the first DB-query, into the cache. All future requests for the posts-list would first check whether the posts are alreay cached or not. If they are, the API will return the cache-content for this specific request.
This is one simple example to show off, what caching can be used for.
Answers on your question
My first question is, why would you ever write to a cache?
To reduce the amount of database connections and queries.
how is writing to these key-value stores going to help with updating the relational database?
It does not help you with updating, but instead it helps you with spending less resources. It also helps you in terms of "temporary backing up" some data - but that only as a very little side effect. For this, out there are more attractive solutions (Since redis is also not persistent by default. But it supports persistence.)
Do these cache writing strategies only apply if your main database is also a key-value store?
No, it is not important which database you use. Whether it's a NoSQL or SQL DB. It strongly depends on what you want to cache and how the database and it's tables are set up. Do you have frequent changes in your recources? Do resources get updated manually or only on user-initiated actions? Those are questions, leading you to the right caching implementation.
Isn't it possible to mix and match caching strategies?
I am not an expert at caching strategies, but let me try:
I guess it is possible but it also, highly depends on what you are doing in your DB and what kind of application you have. I guess if you find out what kind of application you are building up, then you will know, what strategy you have to use - i guess it is also not recommended to mix those strategies up, because those strategies are coupled to your application type - in other words: It will not work out pretty well.
What happens if you have a cluster of webs servers? Do they each have their own local in-memory database that acts as a cache?
I guess that both is possible. Usually you have one database, maybe clustered or synchronized with copies, to which your webservers (e.g. REST APIs) make their requests. Then whether each of you API servers would have it's own cache, to not query the database at all (in cloud-based applications your database is also maybe on another separated server - so another "hop" in terms of networking). OR (what i also can imagine) you have another middleware between your APIs (clusterd up) and your DB (maybe also clustered up) - but i guess that no one would do that because of the network traffic. It would result in a higher response-time, what you usually want to prevent.
Could you implement a cache using a normal (not in-memory) database?
Yes you could, but it would be way slower. A machine can access in-memory data faster then building up another (local) connection to a database and query your cached entries. Also, because your database has to write the entries into files on your machine, to persist the data.
Conclusion
All in all, it is all about being fast in terms of response times and to prevent much network traffic. I hope that i could help you out a little bit.
I'm trying to work out of I can take advantage of a caching layer in my web application or not (and if so which technology).
Our web app has and internal and external component and I would like if possible to add an in-memory cache tier between the Web App and DB Tier for the public external component. We are suffering DB performance issues and I want to alleviate stress on the DB as much as possible (plus make our public facing site of the component lightening fast).
The external component offers a location search facility based on a post code. E.g enter post code for an area and you get 50 results back each time (the data is relatively stale) the DB might change (new record added 1 per day) so I was thinking if a cache tier was possible then I could invalidate the cache nightly and then load it again (as opposed to the cache aside pattern).
Question:
Based on my overview above e.g. postcode mapping to multiple records (JSON or serializable objects) can I use a cache tier to store the data in-memory (total size of data ~100 MG, heaps of RAM free) and retrieve multiple records back per post code based on a caching technology "key-value data stores"?
If number 1 above is feasible, what caching technology, we are using a PHP front end, Zend server has an im-memory cache but it doesn't look mature, I would prefer Redis over Memcached for caching, thoughts?
If pre-loading the cache nightly is not achievable, thoughts on a better approach to utilise the cache?
If in-memory caching is not achievable at all (based on my requirement) then should I look at opmtiising the DB (it's SQL Server), e.g. loading the search table into SQL cache on SQL Server start-up?
Other, something I'm missing?
Thanks in advance, all comments welcome!
Cheers,
Okay, so I have an old ASP Classic website. I've determined I can reduce a huge number of DB calls by caching the data daily. Our site data is read only, and changes very slowly. I think based on our site usage, I would be able to cache pages by query string for every visit each day, without a hit to our server.
My first thought was to use Output Caching, but the problem I discovered right away was that it wasn't until the third page request was generated that I gained any performance. I verified this using SQL profiler, but I'm not sure why.
My second thought was to add this ObjPageCache include file from https://web.archive.org/web/20211020131054/https://www.4guysfromrolla.com/webtech/032002-1.shtml After some research I discovered that this could cause more issues than it may solve http://support.microsoft.com/kb/316451
I'm hoping someone on here will tell me that since 2002 the issue with Sending ServerXMLHTTP or WinHTTP Requests to the Same Server has been resolved with Microsoft.
Depending on how your data is maintained you could choose from a number of ways to cache it.
If your data is changed and saved in one single place you could choose to generate an html-file which you save to the serverdisk and refer to in your linking. This will require write access for the process running your site though (e.g. NETWORK SERVICE). This will produce fast pages as the server serves these pages without any scriptingengine getting involved.
Another option is reading the data into an DomDocument which you store in the Application object and refer to on the page that needs it (hence saving the roundtrip to the database). You could keep two timestamps together with the cached data (one for the cachingtime and one for the time of change of data in the database). Timestamps will allow for fast check for staleness of the cached data: cached timestamp <> database timestamp => refresh data; otherwise use cached data. One thing to note about this approach is that Application does not accept objects other than multithreaded object so you will have to use the MSXML2.FreeThreadedDomDocument.6.0
Personally I prefer the last one as it allows for a more dynamic usage and I don't have to worry about write access permissions for the process running my site (which would probably pose security risks anyways).
I am now writing a report about MS Access and I can't find any information about its performance speed in comparison to other alternatives such as Micorsoft SQL Server, MySQL, Oracle, etc... It's obvious that MS Access is going to be the slowest among the rest, but there is no solid documents confirming this other than forums threads, and I don't have the time and resources to do the research myself.
Access isnt always the slowest. For fairly simple queries with one user, it is actually quite fast.
but throw a few extra users in there, or use complex joins and it will fall apart on you.
Here is what I could find quickly:
http://blog.nkadesign.com/2009/access-vs-sql-server-some-stats-part-1/
http://www.linuxtoday.com/news_story.php3?ltsn=2001-07-27-006-20-RV-SW
http://swik.net/MySQL/MySQL+vs+MS+SQL+Server
Oddly enough, few people compare access to the "real" databases since the user limit is such a limiting factor.
Here is Microsoft's reasons to upgrade to SQL Server from Access:
http://office.microsoft.com/en-us/access-help/move-access-data-to-a-sql-server-database-by-using-the-upsizing-wizard-HA010275537.aspx
I've actually seen an Access address book of over 3 million records performing very very fast while being used by hundreds of users. This is however an exception. Access databases decrease in performance and stability as soon as the database is modified while in use and especially if it is modified by more than a couple of users.
yes, Access (Jet) is slow. It's best to move to Access Data Projects- this allows you to use Access forms and reports with SQL Server, a REAL database.
SQL Server is the most popular database anywhere.. has been installed more often than any other database, period.
I've got an old classic asp/sql server app which is constantly throwing 500 errors/timeouts even though the load is not massive. Some of the DB queries are pretty intensive but nothing that should be causing it to fall over.
Are there any good pieces of software I can install on my server which will show up precisely where the bottlenecks are in either the asp or the DB?
Some tools you can try:
HP (formerly Mercury) LoadRunner or Performance Center
Visual Studio Application Center Test (Enterprise Editions only?)
Microsoft Web Application Stress tool (aka WAS, aka "Homer"; predecessor to Application Center Test)
WebLoad
MS Visual Studio Analyzer if you want to trace through the application code. This can show you how long the app waits on DB calls, and what the SQL was that was used. You can then use the SQL profiler to tune the queries.
Where is the timeout occurring? Is it at lines when ASP is connecting/executing sql? If so your problem is either with the connection to the db server or at the db itself. Load up SQL profiler in MSSQL to see how long the queries take. Perhaps it is due to locks in the database.
Do you use transactions? If so make sure they do not lock your database for a long time. Make sure you use transactions in ADO and not on the entire ASP page. You can also ignore lock in SQL Selects by using WITH (NOLOCK) hint on tables.
Make sure you database is optimized with indexes.
Also make sure you are conencted to the DB for as shortest time as possible i.e (example not working code): conn.open; set rs = conn.execute(); rs.close; conn.close. So store recordsets in a variable instead of looping through while holding the connection to the DB open. A good way is to use GetRows() function in ADO.
Always explicitly close and set ADO objects to nothing. This can cause the connection to the DB to remain open.
Enable connection pooling.
Load ADO constants in global.asa if you are using them
Do not store any objects in session or application scopes.
Upgrade to latest versions of ADO, MDac, SQL Server service packs etc.
Are you sure the server can handle the load? Maybe upgrade it? Is it on shared hosting? Maybe your app is not the problem.
It is quite simple to measure a script performance by timing it from the 1 line to the last line. This way you can identify slow running pages.
Have you tried running the SQL Server Profiler on the server? It will highlight any unexpected activity hitting the database from the app as well as help identifying badly performing queries.
If you're happy that the DB queries are needfully intensive then perhaps you need to set more appropriate timeouts on those pages that use these queries.
Set the Server.ScriptTimeout to something larger, you may also need to set the timeout on ADO Command objects used by the script.
Here's how I'd approach it.
Look at the running tasks on the server. Which is taking up more CPU time - SQL server or IIS? Most of the time, it will be SQL server and it certainly sounds that way based on your post. It's very rare that any ASP application actually does a lot of processing on the ASP side of things as opposed to the COM or SQL sides.
Use SQL Profiler to check out all the queries hitting the database server.
Deal with the low-hanging fruit first. Generally you will have a few "problem" queries that hit the database frequently and chew up a lot of time. Deal with these. (A truism in software development is that 10% of the code chews up 90% of the execution time...)
In addition to looking at query costs with SQL Profiler and Query Analyzer/SQL Studio and doing the normal SQL performance detective work you might also want to check if your database calls are returning inordinate amounts of data to your ASP code. I've seen cases where innocuous-looking queries returned HUGE amounts of unneeded data to ASP - the classic ("select * from tablename") kind of query written by lazy/inexperienced programmers that returns 10,000 huge rows when the programmer really only needed 1 field from 1 row. The reason I mention this special case is because these sorts of queries often have low execution times and low query costs on the SQL side of things and can therefore slip under the radar.