I would like to know if any of you had, or may propose a solution to increase efficiency for online store based on Magento platform.
We currently use multifront architecture (front == each separate server) using load-balancing and two Memcache servers.
We're considering connection for each separate front an Memcache server, but at this point a problem arises with memcache synchronization, so that each store the same value.
Any advice appreciated :)
If you are running memcached on its own hardware, there is no benefit to giving each store its own memcached server.
Configure all front ends to use both memcached instances. This way, all front ends will go to the same memcached instance for a given key. Plus, you get automatic fail-over if one instance croaks, and you can scale up almost linearly as the demand for cache increases.
One of the biggest improvements I've seen for more advanced setups like these is to use a PHP Opcode cache like Xcode. Since Magento uses many PHP files, the opcode cache will end up saving a lot of compilation between runs. Also make sure that all your caches are turned on and leveraged as much as possible. Enable flat catalog and flat product tables as well.
Related
I am a new developer and am trying to implement Laravel's (5.1) caching facility to improve the speed of my app. I started out caching a large DB table that my app constantly references - but it got too large so I have backed away from that and am now 'forever' caching smaller chunks of data - for example, for each page only the portions of that large DB table that are relevant.
I have watched 'Caching Essentials' on Laracasts, done some Googling and had a search in this forum (and Laracasts') but I still have a couple of questions:
I am not totally clear on how the cache size limits work when you are using Laravel's file-based system - is there an overall in-app size limit for the cache or is one limited size-wise only per key and by your server size?
What are the signs you should switch from file-based caching to something like Memcached or Redis - and what are the benefits of using one of those services? Is it the fact that your caching is handled on a different server (thereby lightening the load on your own)? Do you switch over to one of these services when your local, file-based cache gets too big for your server?
My app utilizes several tables that have 3,000-4,000 rows - the data in these tables is constantly referenced and will remain static unless I decide to add new options. I am basically looking for the best way to speed up queries to the data in these tables.
Thanks!
I don't think Laravel imposes any limitations on its file i/o at all - the limitations will be with how much what PHP can read / write to a file at once, or hold in its memory / process at any one time.
It does serialise the data that you cache, and unserialise it when you reload it, so your PHP environment would have to be able to process the entire cache file (which is equivalent to the top level cache key) at once. So, if you are getting cacheduser.firstname, it would have to load the whole cacheduser key from the file, unserialise it, then get the firstname key from that.
I would take the PHP memory limit (classic, i know!) as a first point to investigate if you want to keep down this road.
Caching services like Redis or memcached are bespoke, optimised caching solutions. They take some of the logic and responsibility out of your PHP environment.
They can, for example, retrieve sub-keys from items without having to process the whole thing, so can retrieve part of some cached data in a memory efficient way. So, when you request cacheduser.firstname from redis, it just returns you the firstname attribute.
They have other advantages regarding tagging / clearing out subsets of caches (see [the cache tags Laravel docs] (https://laravel.com/docs/5.4/cache#cache-tags))
Another thing to think about is scaling. If your site is large enough, and is load-balanced across multiple servers, the filesystem caching may be different across those servers, as each server can only check their local filesystem for the cache files. A caching service can be on a different server (many hosts will have a separate redis / memcached services available), so isn't victim to this issue.
Also - as I understand it (and this might be the most important thing), the file cache driver in Laravel is mainly for local development and testing. Although it can work fine for simple applications with basic caching needs, it's not intended for large scalable production environments.
Personally, I develop locally and test with file caching, as i'm only dealing with small amounts of data then, and use redis to cache on production environments.
It doesn't necessarily need to be on a separate server to get the benefits. If you are never going to scale to multiple application servers, then using a caching service on the same server will already be a large improvement to caching large documents.
I understand that memcached is a distributed caching system. However, is it entirely necessary for memcached to replicate? The objective is to persist sessions in a clustered environment.
For example if we have memcached running on say 2 servers, both with data on it, and server #1 goes down, could we potentially lose session data that was stored on it? In other words, what should we expect to see happen should any memcached server (storing data) goes down and how would it affect our sessions in a clustered environment?
At the end of the day, will it be up to use to add some fault tolerance to our application? For example, if the key doesn't exist possibly because one of the servers it was on went down, re-query and store back to memcached?
From what I'm reading, it appears to lean in this direction but would like confirmation: https://developers.google.com/appengine/articles/scaling/memcache#transient
Thanks in advance!
Memcached has it's own fault tolerance built in so you don't need to add it to your application. I think providing an example will show why this is the case. Let's say you have 2 memcached servers set up in front of your database (let's say it's mysql). Initially when you start your application there will be nothing in memcached. When your application needs to get data if will first check in memcached and if it doesn't exist then it will read the data from the database and insert it into memcached before returning it to the user. For writes you will make sure that you insert the data into both your database and memcached. As you application continues to run it will populate the memcached servers with a bunch of data and take load off of your database.
Now one of your memcached servers crashes and you lose half of your cached data. What will happen is that your application will now be going to the database more frequently right after the crash and your application logic will continue to insert data into memcached except everything will go directly to the server that didn't crash. The only consequence here is that your cache is smaller and your database might need to do a little bit more work if everything doesn't fit into the cache. Your memcached client should also be able to handle the crash since it will be able to figure out where your remaining healthy memcached servers are and it will automatically hash values into them accordingly. So in short you don't need any extra logic for failure situations in memcached since the memcached client should take care of this for you. You just need to understand that memcached servers going down might mean your database has to do a lot of extra work. I also wouldn't recommend re-populating the cache after a failure. Just let the cache warm itself back up since there's no point in loading items that you aren't going to use in the near future.
m03geek also made a post where he mentioned that you could also use Couchbase and this is true, but I want to add a few things to his response about what the pros and cons are. First off Couchbase has two bucket (database) types and these are the Memcached Bucket and the Couchbase Bucket. The Memcached bucket is plain memcached and everything I wrote above is valid for this bucket. The only reasons you might want to go with Couchbase if you are going to use the memcached bucket are that you get a nice web ui which will provide stats about your memcached cluster along with ease of use of adding and removing servers. You can also get paid support down the road for Couchbase.
The Couchbase bucket is totally different in that it is not a cache, but an actual database. You can completely drop your backend database and just use this bucket type. One nice thing about the Couchbase bucket is that it provides replication and therefore prevents the cold cache problem that memcached has. I would suggest reading the Couchbase documentation if this sounds interesting you you since there are a lot of feature you get with the Couchbase bucket.
This paper about how Facebook uses memcached might be interesting too.
https://www.usenix.org/system/files/conference/nsdi13/nsdi13-final170_update.pdf
Couchbase embedded memcached and "vanilla" memcached have some differences. One of them, as far as I know, is that couchbase's memcached servers act like one. This means that if you store your key-value on one server, you'll be able to retreive it from another server in cluster. And vanilla memcached "clusters" are usally built with sharding technique, which means on app side you should know what server contain desired key.
My opinion is that replicating memcached data is unnessesary. Modern datacenters provide almost 99% uptime. So if someday one of your memcached servers will go down just some of your online users will be needed to relogin.
Also on many websites you can see "Remember me" checkbox that sets a cookie, which can be used to restore session. If your users will have that cookie they will not even notice that one of your servers were down. (that's answer for your question about "add some fault tolerance to our application")
But you can always use something like haproxy and replicate all your session data on 2 or more independent servers. In this case to store 1 user session you'll need N times more RAM, where N is number of replicas.
Another way - to use couchbase to store sessions. Couchbase cluster support replicas "out of the box" and it also stores data on disk, so if your node (or all nodes) will suddenly shutdown or reboot, session data will not lost.
Short answer: memcached with "remember me" cookie and without replication should be enough.
At work, we've recently started designing an application to me "large scale" (we're engineering for the potential to serve up many millions of hits a day). One of the senior devs and the sysadmin have set up memcache on the server.
As I understand it, Memcache will hold query results and certain tables in memory for X amount of time and keep everything hunky dory.
A drawback of memcache it seems is that I just can't for the life of me manage to set it up on my local dev environment. I've followed a few different instructionals on how to compile it for yourself. Most, if not all of the steps seem to work properly but get this error on PHPLoad:
[11-Sep-2010 16:02:30] PHP Warning: PHP Startup: Unable to load dynamic library '/Applications/MAMP/bin/php5.3/lib/php/extensions/no-debug-non-zts-20090626/memcached.so' - dlopen(/Applications/MAMP/bin/php5.3/lib/php/extensions/no-debug-non-zts-20090626/memcached.so, 9): image not found in Unknown on line 0
Not the primary question but incedentally, if you've been able to compile Memcache for MAMP 1.9 on Snow Leopard, please let me know the trick.
My primary question is about what the differences are between the various web caching technologies. I've seen mention of Memcache, APC and Xcache (here: Cache results of a mysql query manually to a txt file) but don't know the pros, cons and differences between each.
To my mind, Memcache has the advantage of being the one that the project's lead dev and our sysadmin chose. It has the disadvantage of being utter foobar to try and set up and compile on a Mac. :-^)
Anyone who I'd love to hear from anyone who can enumerate the pros and cons of each (or even one of) the other cachine technologies. Where are they best used, how are they best used. And so on.
It's all useful information I think.
Thanks so much for lending your time to expanding my knowledge.
- Alex.
First, a list of opcode cachers for php.
Second Memcache/MemcacheD is not an Opcode Cacher. It is a distributed memory caching system. It does not improve the speed/performance of your PHP code. It can be used to store data only.
APC, EAccelerator, XCache and the others are non distributed, meaning you can only store data on the local web-server. However all of these are opcode cachers and can improve the performance of your PHP app. Most, excluding EAccelerator (in the current version) can also store data.
I generally choose APC for the opcode cacher (It reportedly will be included into the core of PHP 6). However if I also have more than one web-server for the site I will also make use of MemcacheD.
Edit 1 I agree it is very annoying to setup APC, Memcache on MAMP. There are however tutorials out there dealing with such.
Edit 2 Also with regards to the best Opcode Cacher for your app really depends on which server you are using. Some work better on some systems. It also depends on the size and scale of your app as to how the cachers perform.
Edit 3 Very interesting article here about comparing performance of a few different cachers. (This article appears to be written in 2006 and should not really be used for current reference)
APC is a opcode cache. It will store parsed PHP code so that every time your PHP files do not need to get parsed.
Memcache is a data cache. It will store data as a key value pair.
We have 3 front-end servers each running multiple web applications. Each web application has an in memory cache.
Recreating the cache is very expensive (>1 min). Therefore we repopulate it using a web service call to each web application on each front-end server every 5 minutes.
The main problem with this setup is maintaining the target list for updating and the cost of creating the cache several times every few minutes.
We are considering using AppFabric or something similar but I am unsure how time consuming it is to get up and running. Also we really need the easiest solution.
How would you update an expensive in memory cache across multiple front-end servers?
The problem with memory caching is that it's unique to the server. I'm going with the idea that this is why you want to use AppFabric. I'm also assuming that you're re-creating the cache every few minutes to keep the in memory caches in sync across all servers. With all this work, I can well appreciate that caching is expensive for you.
It sounds like you're doing a lot of work that probably isn't necessary. This article has some detail about the caching mechanisms available within SharePoint. You may be interested in the output cache discussed near the top of the article. You may also want to read the linked TechNet article and the linked article called "Custom Caching Overview".
The only SharePoint way to do that is to use Service Application infrastructure. The only problem is that it requires some time to understand how it works. Also it's too complicated to do it from scratch. You might consider downloading one of existing applications and rename classes/GUIDs to match your naming conventions. I used this one: http://www.parago.de/2011/09/paragoservices-a-sharepoint-2010-service-application-sample/. In this case you can have single cache per N front-end servers.
I'm building an application with multiple server involved. (4 servers where each one has a database and a webserver. 1 master database and 3 slaves + one load balancer)
There is several approach to enable caching. Right now it's fairly simple and not efficient at all.
All the caching is done on an NFS partition share between all servers. NFS is the bottleneck in the architecture.
I have several ideas implement
caching. It can be done on a server
level (local file system) but the
problem is to invalidate a cache
file when the content has been
update on all server : It can be
done by having a small cache
lifetime (not efficient because the
cache will be refresh sooner that it
should be most of the time)
It can also be done by a messaging
sytem (XMPP for example) where each
server communicate with each other.
The server responsible for the
invalidation of the cache send a
request to all the other to let them
know that the cache has been
invalidated. Latency is probably
bigger (take more time for everybody
to know that the cache has been
invalidated) but my application
doesn't require atomic cache
invalidation.
Third approach is to use a cloud
system to store the cache (like
CouchDB) but I have no idea of the
performance for this one. Is it
faster than using a SQL database?
I planned to use Zend Framework but I don't think it's really relevant (except that some package probably exists in other Framework to deal with XMPP, CouchDB)
Requirements: Persistent cache (if a server restart, the cache shouldn't be lost to avoid bringing down the server while re-creating the cache)
http://www.danga.com/memcached/
Memcached covers most of the requirements you lay out - message-based read, commit and invalidation. High availability and high speed, but very little atomic reliability (sacrificed for performance).
(Also, memcached powers things like YouTube, Wikipedia, Facebook, so I think it can be fairly well-established that organizations with the time, money and talent to seriously evaluate many distributed caching options settle with memcached!)
Edit (in response to comment)
The idea of a cache is for it to be relatively transitory compared to your backing store. If you need to persist the cache data long-term, I recommend looking at either (a) denormalizing your data tier to get more performance, or (b) adding a middle-tier database server that stores high-volume data in straight key-value-pair tables, or something closely approximating that.
In defence of memcached as a cache store, if you want high peformance with low impact of a server reboot, why not just have 4 memcached servers? Or 8? Each 'reboot' would have correspondingly less effect on the database server.
I think I found a relatively good solution.
I use Zend_Cache to store locally each cache file.
I've created a small daemon based on nanoserver which manage cache files locally too.
When one server create/modify/delete a cache file locally, it send the same action to all server through the daemon which do the same action.
That mean I have local caching files and remote actions at the same time.
Probably not perfect, but should work for now.
CouchDB was too slow and NFS is not reliable enough.