I'm currently storing generated HTML pages in a memcached in-memory cache. This works great, however I am wanting to increase the storage capacity of the cache beyond available memory. What I would really like is:
memcached semantics (i.e. not reliable, just a cache)
memcached api preferred (but not required)
large in-memory first level cache (MRU)
huge on-disk second level cache (main)
evicted from on-disk cache at maximum storage using LRU or LFU
proven implementation
In searching for a solution I've found the following solutions but they all miss my marks in some way. Does anyone know of either:
other options that I haven't considered
a way to make memcachedb do evictions
Already considered are:
memcachedb
best fit but doesn't do evictions: explicitly "not a cache"
can't see any way to do evictions (either manual or automatic)
tugela cache
abandoned, no support
don't want to recommend it to customers
nmdb
doesn't use memcache api
new and unproven
don't want to recommend it to customers
Tokyo Cabinet/Tokyo Tyrant?
Seems that later versions of memcachedb can be cleaned up manually if desired using the rget command and storing the expiry time in the data record. Of course, this means that I pound both the server and network with requests for the entire data block even though I only want the expiry time. Not the best solution but seemingly the only one currently available.
I worked with EhCache and it works very good. It has in memory cache and disk storage with differents eviction policies. It's a mature library a with good support. There is a memcached api that wraps EhCache, specially developed for GAE support.
Regards,
Jonathan.
Related
I am a new developer and am trying to implement Laravel's (5.1) caching facility to improve the speed of my app. I started out caching a large DB table that my app constantly references - but it got too large so I have backed away from that and am now 'forever' caching smaller chunks of data - for example, for each page only the portions of that large DB table that are relevant.
I have watched 'Caching Essentials' on Laracasts, done some Googling and had a search in this forum (and Laracasts') but I still have a couple of questions:
I am not totally clear on how the cache size limits work when you are using Laravel's file-based system - is there an overall in-app size limit for the cache or is one limited size-wise only per key and by your server size?
What are the signs you should switch from file-based caching to something like Memcached or Redis - and what are the benefits of using one of those services? Is it the fact that your caching is handled on a different server (thereby lightening the load on your own)? Do you switch over to one of these services when your local, file-based cache gets too big for your server?
My app utilizes several tables that have 3,000-4,000 rows - the data in these tables is constantly referenced and will remain static unless I decide to add new options. I am basically looking for the best way to speed up queries to the data in these tables.
Thanks!
I don't think Laravel imposes any limitations on its file i/o at all - the limitations will be with how much what PHP can read / write to a file at once, or hold in its memory / process at any one time.
It does serialise the data that you cache, and unserialise it when you reload it, so your PHP environment would have to be able to process the entire cache file (which is equivalent to the top level cache key) at once. So, if you are getting cacheduser.firstname, it would have to load the whole cacheduser key from the file, unserialise it, then get the firstname key from that.
I would take the PHP memory limit (classic, i know!) as a first point to investigate if you want to keep down this road.
Caching services like Redis or memcached are bespoke, optimised caching solutions. They take some of the logic and responsibility out of your PHP environment.
They can, for example, retrieve sub-keys from items without having to process the whole thing, so can retrieve part of some cached data in a memory efficient way. So, when you request cacheduser.firstname from redis, it just returns you the firstname attribute.
They have other advantages regarding tagging / clearing out subsets of caches (see [the cache tags Laravel docs] (https://laravel.com/docs/5.4/cache#cache-tags))
Another thing to think about is scaling. If your site is large enough, and is load-balanced across multiple servers, the filesystem caching may be different across those servers, as each server can only check their local filesystem for the cache files. A caching service can be on a different server (many hosts will have a separate redis / memcached services available), so isn't victim to this issue.
Also - as I understand it (and this might be the most important thing), the file cache driver in Laravel is mainly for local development and testing. Although it can work fine for simple applications with basic caching needs, it's not intended for large scalable production environments.
Personally, I develop locally and test with file caching, as i'm only dealing with small amounts of data then, and use redis to cache on production environments.
It doesn't necessarily need to be on a separate server to get the benefits. If you are never going to scale to multiple application servers, then using a caching service on the same server will already be a large improvement to caching large documents.
This question is directed towards Jeroen and is a follow-up to this answer: https://stackoverflow.com/a/12482918/177984
Jeroen wrote "the server does caching" .. "so if enough memory is available it will automatically be available from memory."
How can I confirm if an object is cached 'in-memory' or not? From what I can tell (by performance) all of my objects are being read from disk. I'd like to have things read from memory to speed up data load times. Is there a way to view what's in the in-memory cache? Is there a way to force caching objects in-memory?
Thanks for your help.
The OpenCPU project is rapidly evolving. Things have changed in OpenCPU 1.0. Have a look at the website for the latest information: http://www.opencpu.org.
The answer that you cited is outdated. Currently indeed all the caching is done on disk. In a previous version, OpenCPU used Varnish to do caching, which is completely in-memory. However this turned out to make things more complicated (especially https), and performance was a bit disappointing (especially in comparison with fast disks these days). So now we're back at nginx which caches on disk, but is much more mature and configurable as a web server, and has other performance benefits.
What should one use (open source) for in memory Java caching when the caching is used for stock market project? I used hazelcast but it consumes too much memory.
Well there are some good alternatives out there (I have never worked with hazelcast). EHCache is a very good option.
Though not very robust but yes a good foundation is surely the Guava library. Or you can simple build a LRU cache yourself.
If memory is concern, you can use a ConcurrentHashMap which will map key to its cached value. But you can wrap the cached value with a Weak Reference. Hence under higher load, it will make sure that your application always has sufficient memory
The memcached evicts data slab wise due to which the LRU is running on the respective size slabs. Therefore even if the free space is available in the memcache, keys are being evicted.
I want to build a monitoring system to check which keys are being evicted prematurely due to the slabing algorithm.
I am thinking of creating a system to hit the memcached at regular intervals for all the keys inserted in to the memcached. I have a logging system already which records all the insertion keys into the memcache, this log data is stored in mongo.
Please suggest if i am correct in my approach or any better alternative ?
If we talk about your approach only, it is correct as its workable. But the problem is, this method can hurt the performance of your app as it is continually hitting the Memcache and fetching the keys.
As far as alternatives are concerned, There can be three alternative eviction policies are there,
1) Least Frequently Used
2) Least Recently Used
3) Priority based Eviction
These are the eviction policies offered by NCache which is an enterprise level distributed cache for .NET and Java and also provides a fast and reliable storage for ASP.NET and JSP Sessions. To learn more about these eviction policies, please check the following link,
http://www.alachisoft.com/resources/docs/ncache/help-4-1/eviction-policy.html?mw=NDE2&st=MQ==&sct=NTAw&ms=CQYAAAABAAAAACIBBAQC
I'm building an application with multiple server involved. (4 servers where each one has a database and a webserver. 1 master database and 3 slaves + one load balancer)
There is several approach to enable caching. Right now it's fairly simple and not efficient at all.
All the caching is done on an NFS partition share between all servers. NFS is the bottleneck in the architecture.
I have several ideas implement
caching. It can be done on a server
level (local file system) but the
problem is to invalidate a cache
file when the content has been
update on all server : It can be
done by having a small cache
lifetime (not efficient because the
cache will be refresh sooner that it
should be most of the time)
It can also be done by a messaging
sytem (XMPP for example) where each
server communicate with each other.
The server responsible for the
invalidation of the cache send a
request to all the other to let them
know that the cache has been
invalidated. Latency is probably
bigger (take more time for everybody
to know that the cache has been
invalidated) but my application
doesn't require atomic cache
invalidation.
Third approach is to use a cloud
system to store the cache (like
CouchDB) but I have no idea of the
performance for this one. Is it
faster than using a SQL database?
I planned to use Zend Framework but I don't think it's really relevant (except that some package probably exists in other Framework to deal with XMPP, CouchDB)
Requirements: Persistent cache (if a server restart, the cache shouldn't be lost to avoid bringing down the server while re-creating the cache)
http://www.danga.com/memcached/
Memcached covers most of the requirements you lay out - message-based read, commit and invalidation. High availability and high speed, but very little atomic reliability (sacrificed for performance).
(Also, memcached powers things like YouTube, Wikipedia, Facebook, so I think it can be fairly well-established that organizations with the time, money and talent to seriously evaluate many distributed caching options settle with memcached!)
Edit (in response to comment)
The idea of a cache is for it to be relatively transitory compared to your backing store. If you need to persist the cache data long-term, I recommend looking at either (a) denormalizing your data tier to get more performance, or (b) adding a middle-tier database server that stores high-volume data in straight key-value-pair tables, or something closely approximating that.
In defence of memcached as a cache store, if you want high peformance with low impact of a server reboot, why not just have 4 memcached servers? Or 8? Each 'reboot' would have correspondingly less effect on the database server.
I think I found a relatively good solution.
I use Zend_Cache to store locally each cache file.
I've created a small daemon based on nanoserver which manage cache files locally too.
When one server create/modify/delete a cache file locally, it send the same action to all server through the daemon which do the same action.
That mean I have local caching files and remote actions at the same time.
Probably not perfect, but should work for now.
CouchDB was too slow and NFS is not reliable enough.