I'm getting a little tired of all the UI demos of auto completion in ASP.Net. I believe the UI portion of autocompletion has been solved multiple times over again.
My question is how do you best handle the queries hitting your webservices? I'm currently implementing an autocompletion service for a musician database. The database is fairly small with only 20,000 rows, but autocompletion is extremely speed sensitive. It needs to be fairly instant to be of any use.
I'm currently using NHibernate for my DAL, but I'm wondering if this is a place where I may want to bypass NHibernate. Perhaps projections on named queries would be the best strategy? Where do I cache? NHibernate's 2nd level cache? Let the web service cache?
I've already thought of a lot of naive methods to develop this, but I would like to soak in any tips that people already have in the wild. Also, what if you have many different types of entities you want autocompletion on? Do you spread those implementations around in their different repositories or do you design/implement a completely separate autocompletion service?
This depends on how large your sites traffic is. I generally suggest using a product such as MemCached or MemCached Win32 depending on your environment availability (MemCached for cheap linux boxes if you can is best...all that is needed is a ton of memory!). You might also look to something like Velocity (MS's new cache cloud offering). This would then allow you to cache a key (what ever the query is) with the results efficiently! Keep your cache times down based on however frequently you are updating your dataset. If you don't update often then the cache time can be longer. If you find that your cache cloud is growing like crazy you might want to only cache what is most frequently asked for (though your cache implementation should handle this by removing what is not accessed frequently!).
Related
Given: Each call to a BE module takes several seconds even with a SSD drive. (A well configured setup runs below 1 second for general BE tasks.)
What are likely bottlenecks?
How to check for them?
What options to speed up?
On purpose I don't give a special configuration, but ask for a general checklist, so that the answer is suitable for many people as first entry point.
General tips on performance tuning for TYPO3 can be found here: https://wiki.typo3.org/Performance_tuning
However, in my experience most general performance problems are due to one of a few reasons:
Bad/no caching. Usually this is a problem with one or more extensions (partly) disabling cache. Try disabling all third party extensions and enabling them one by one to see which causes the site to slow down the most. $GLOBALS['TSFE']->set_no_cache() will disable all cache, so you could search for that. USER_INT and COA_INT in TypoScript also disable cache for anything that's configured inside there.
A lot of data. Check the database for any tables containing a lot of data. How many constitutes "a lot", depends on a lot of factors, but generally anything below a million records shouldn't be too much of a problem unless for example you do queries with things like LIKE '%...%' on fields containing a lot of data.
Not enough resources on the server. To fix this, add more memory and/or CPU cores to the server. Or if it's a shared server, reduce the number of sites running on it.
Heavy traffic. No matter how many resources a server has, it will always have a limit to the number of requests it can process in a given time. If this is your problem you will have to look into load balancing and caching servers. If you don't (normally) have a lot of visitors, high traffic can still be caused by robots crawling your site too quickly. These are usually easy to block on IP address in your firewall or webserver configuration.
A slow backend on a server without any other traffic (you're the only one who can access it) rules out 1 (can only cause a slow backend if users are accessing the frontend and causing a high server load) and 4 (no other traffic).
one further aspect you could inspect: in the user record a lot of things are stored, for example the settings you used in the log module.
one setting which could consume a lot of memory (and time to serialize and deserialize) is the state of the pagetree (which pages are expanded/ which are not).
Cleaning the user settings could make the backend faster for this user.
If you have a large page tree and the user has to navigate through many pages the effect will stall. another draw back: you loose all settings as there still is no selective cleaning.
Cannot comment here but need to say: The TSFE-Object does absolutely nothing in the TYPO3 Backend. The Backend is always uncached. The TYPO3-Backend is a standalone module to edit and maintenance the frontend output. There are tons of Google search results that will ignore this fact.
Possible performance bottlenecks are poor written extensions that do rendering or data processing. Hooks to core functions are usually no big deal but rendering of many elements for edit forms (especially in TYPO3s Fluid Template Engine) can cause performance problems.
The Extbase-DBAL-Layer can also cause massive performance problems. The reason is the database model does not know indexes. It' simple but stupid. A SQL-Join on a big table of 2000 records+ will delay the output perceptibly, depending on the data model.
Also TYPO3 Backend does not really depend on the Typoscript-Configuration but in effect to control some output or loaded by extensions, the full parsing of the *.ts files is needed. And this parser is very slow.
If you want to speed things up you need to know what goes wrong. The only way to debug this behaviour is to inspect the runtime with a PHP profiling tool like xdebug because the TYPO3 Framework is very complex. It's using some kind of Doctrine Framework and will load tons of files, by every request. Thus a good configured OpCache is a must.
Most reason the whole thing is slow is because it is poor written. You can confirm that fact by inspecting the runtime.
In addition to what already has been said, put the runtime environment onto your checklist:
Memory:
If heavy IDE and other tools are open at the same time, available memory can become an issue. To check the memory profile, you may start a tool that monitors the memory usage of the machine.
If virtualization is used, check the memory assigned to the box. Try if assigning more memory improves behaviour.
If required and possible spend more memory to your machine. This should not be a bugfix to poorly written code. Bad code can blow up any size of memory.
File access:
TYPO3 reads and writes thousands of files. If you work with a contemporary SSD, this is surprisingly fast. I did measure this. Loading all class files of TYPO3 takes just a fraction of a second.
However this may look different if you do not work with a standard setup. Many factors may slow you down:
USB-Sticks as storage.
Memory cards as storage.
All kind of external storage may be limited due to slow drivers.
Virtualization can become an issue. Again it's a question of drivers.
In doubt test and store your files and DB on a different drive to compere the behaviour.
Routing
The database itself may be fast. A bad routing of your request may still slow you down. Think of firewalls, proxies etc. even on your local machine and specially if virtualisation is used.
Database connection:
I fast database connection is crucial. If the database access is slow TYPO3 can't be fast.
Especially due to Extbase TYPO3 often queries much more data than really required and more often than really required, because a lot of relations are resolved in the PHP layer instead of the DB layer itself. Loading data structures like the root line may cause a lot of ping-pong between the PHP and the DB layer.
I can't give advice, how to measure your DB-connection. You have to as your admin for that. What you always can do is to test and compare with another DB from a completely different environment.
The speed of the database may depend on the type of the database itself. Typically you use MySQL/Maria-DB which should be fast. It also depends on the factors mentioned above, memory, file access and routing.
Strategy:
Even without being and admin and knowing all performance tools, you can always exchange parts of your system and check if matters improve. By this approach you can localise the culprit without being an expert. Once having spotted the culprit, Google may help you to get more information.
When it comes to a clean and performant setup of routing or virtualisation it's still the best idea to ask an experienced admin.
Summary
This is all in addition to what others have already pointed to.
What would be really helpful would be a BE-Plugin, that analyses and measures the environment. May there are some out there I don't know.
I'm working on ASP.net MVC3 Web application that is facing scalability issue.
For improving performance I want to store dynamically generated pages in html and serve them from generated html directly rather then querying database for each page request.
I'm sure this will dramatically increase performance.
Can any one share any hint / example / tutorial on how to do it? And what are challenges?
I would also like to know how others are handling performance issue for large e-commerce sites with at-least thousand categories and 200k products with at least 200-500 concurrent visitors? What are the best approaches?
Thanks in advance.
You shoult have a look at Improving Performance with Output Caching.
It provides several ways to cache the output of your controllers like this:
[OutputCache(Duration=10, VaryByParam="none")]
public ActionResult Index()
{
return View();
}
You don't need to do that, just enable the output cache. It will be the same, instead of hitting all your logic for creating the pages you will be retrieving a static one, but from the cache instead of disk.
I would look at implementing a cache system for commonly viewed pages. The .Net platform has some really nice cache libraries that can be used that would allow you to manage the cache in real time.
Cache Best Practices
msdn.microsoft.com/en-us/library/aa478965.aspx
I might also take a look a writing a restful API that can be load balanced across multiple nodes of a cluster.
Do you know where your capacity bottleneck is?
My guess is your DB is the bottleneck, but unless you measure to find out where the bottleneck is you're likely to spend a lot of time optimising things that may not make much difference.
First things to do are get hold of a copy of PAL and monitor the web server and DB to see what it tells you.
Also run SQL queries to diagnose the most expensive and frequent queries.
Measure you're actually page generation times and follow it down the stack to see what calls are being made and which are expensive and can be cached.
Rather than output caching, I'd generally introduce a caching layer infront of the webservers, and caching layer between the app server and the DB but without measuring it's very hard to judge.
If you want to know more about caching in general, I'd read this answer I wrote a while back How to get started with web caching, CDNs, and proxy servers?
If you need caching in your website to make database use lower, do you have to do it using memcache or memcached (in PHP, for example) or can you achieve this by using professional services like CloudFlare, Incapsula or others like that do some caching for you?
Services like Cloudflare cache your HTML and/or assets like images and CSS files in a CDN, so that your entire server is hit less often. This is great for semi-static sites but may not be the best fit for highly dynamic sites.
Local caches like memcached just store any data in a way that's fast to access. You can use that to cache database queries and lower your database activity, but you can also use it to store pre-computed data that would be expensive to re-create all the time or whatever else you may want to store non-permanently in a fast-to-access way.
Both solutions solve different problems. You may use both together, or either, or neither. It really depends on where exactly your bottleneck is and which solution fits your problem better.
I'm the CEO of CloudFlare and I'd say: more (intelligent) caching is almost always a good thing. While we can significantly decrease the load coming to your web server, to get the best performance it's still extremely important to optimize your web application and it's interaction with your database. To that end, memcache and other fast caching layers can play an important role and I'd never discourage them.
PS - we work great with dynamic sites. 95%+ of our sites are highly dynamic web applications.
Since this doesn't touch a real problem of mine I'm somwhat uncertain, if it is even worth to be asked here. However maybe some of you would like to share your opinion on that.
In general I have to admit, that 'better' means anything and nothing at all at the same time. So I probably should be more specific, but I tried not to overflow the topic. In a regular hosted environment on one of those cheap webhosters (like Dreamhost), with around 1000 articles in Joomla, a couple of users and a few hundreds visitors a day, would a SQLite database with a persistent connection (sqlite_popen) perform noticeable faster than the MySQL equivalent (with the TCP/IP overhead etc.)?
Or in short: Would it be wise to call Joomla to support SQLite?
I have never used sqlite on a website, but I have used it extensively for other purposes and I quite like it. The truth is, you won't know till you try. If you try, I reccomend creating a db abstraction layer first so that you can easily swap in other db's.
The downside to sqlite is that it's not really meant to be a multi user database. If you rarely write to the db, but do lots of reading, sqlite will probably be fine. If you find that you need multiple processes writing to the same db, I believe sqlite uses file level locking to maintain database consistency.. So, if all you're tables are in the same file, you'll lock the whole file while it's being written to even if another process wants to modify a completely different table.
In my opinion it's not the big multi user databases of the world that should be worried about competition from sqlite... It's all the regular files out there (and there custom file formats) that applications create and use that should be shaking in their boots about sqlite...
Linux ISPs for whatever reason seem to have settled on MySQL. This is what they offer and you will lock yourself to a limited number of service providers if you wander outside the norm.
I'm starting to step into unfamiliar territory with regards to performance improvement and our RIA (Rich Internet Application) built with GWT. For those unfamiliar with GWT, essentially when deployed it's just pure JavaScript. We're interfacing with the server side using a REST-style XML web service via XMLHttpRequest.
Our XML is un-marshalled into JavaScript objects and used within the application to represent the data model behind the interface. When changes occur, the model is updated and marshalled back to XML and sent back to the server.
I've learned the number one rule of performance (in terms of user experience) is to make as few requests as possible. Obviously this brings up the possibility of caching. Caching is great for static data but things get tricky in a multi-user system where data on the server may be changing. Also, use of "Last-Modified" and "If-Modified-Since" requests don't quite do enough since we'd like to avoid unnecessary requests altogether.
I'm trying to figure out if caching data in the browser is even right for us before researching the approaches. I hope someone has tread this path before. I'm looking for similar approaches, lessons learned, things to avoid, etc.
I'm happy to provide more specific info if needed...
For GWT, if performance matters that much to you, you get better performance by sending all the data you need in a single request, instead of querying multiple small data. I would recommend against client-side data caching as there are lots of issues like keeping the data in sync with the database.
Besides, you already have a good advantage with GWT over traditional html apps. Unless you are dealing with special data (eg: does not become stale too quickly - implies mostly-read queries) I found out that there is no special need for caching. You are better off doing a service-layer caching, since most of the time should come of server-side processing.
If you can provide more details about the nature of the app, maybe some different conclusions can be taken.