My class caches fine on development environment, but can I be sure about production environment, with memcache, redis, or whatever running? I wonder which data types are cacheable with low level Rails.cache.write('mykey', myobj) besides strings, numbers and their array? Are there some criterion to see is a given class save to cache? At the least with the typical cache stores.
Short answer: ANY object.
Longer answer: By default, objects over 1Kb will be compressed. And by default, the compression is done via Marshal.dump.
The documentation for Marshal.dump states:
Marshal can't dump following objects:
anonymous Class/Module.
objects which are related to system (ex: Dir, File::Stat, IO, File,
Socket and so on)
an instance of MatchData, Data, Method, UnboundMethod, Proc, Thread,
ThreadGroup, Continuation
objects which define singleton methods
So in order to cache large objects that fall under the above categories, you'd need to either increase the compress_threshold, set compress: false or define an alternate way of "compressing" the data (?!).
Related
How to estimate size for Apache Shiro permissions cache?
For example, permissions strings are implemented in format:
<domain>:<resource_group>:<resource_name>:<permission>
for example
my-domain:resource-group-0001:resource-0001:permission-001
Would Shiro store all those strings as plain text?
In our case, we have 10,000+ users, 10,000+ resources and up 100 possible permissions. Of course only a fraction of all permutations would be present, but even then we are looking at 200M+ entries with potentially 10+ GB of data, which would be taxing for an in-memory cache.
The data would not be coming from a database in plain form, so no ehcache here. However, we do have to make this cache distributed, so current (smaller scale) implementation uses Redis.
Estimating sizes really difficult. We (the SHIRO Team) haven't done it. You might be better of with a cache that will "forget" old entries.
Shiro will store more than just the permission String. You can see it here (1.9.x branch): AuthorizingRealm.java:317-337
The method will retrieve (and store) an AuthorizationInfo object. That means, it will serialize this object containing:
Collection<String> getRoles();
Collection<String> getStringPermissions();
Collection<Permission> getObjectPermissions();
Now, it is different per user how many Roles and String- or ObjectPermissions they have. It may vary greatly, even within the same application.
The Permission is yet another nested structure. The default implementation, WildcardPermission, will internally it tear the String apart into multiple Collections:
private List<Set<String>> parts;
Then, the last thing to store is the cache Key, which is a PrincipalCollection. However, it is usually just a single Principal for most applications (ie a collection of size one).
If you need an estimate, you could extend the Realm you are using and override the method protected AuthorizationInfo getAuthorizationInfo(PrincipalCollection principals); to print the serialized size. However, this should only be done in a test environment.
I hope you now have an overview of what is being saved (and serialized). Let us know whether this helps to do the numbers!
I'm learning about redis/memcache and redis is clearly the more popular option. My question is about supported data types. At my company we use the memcashier library which is built in memcached. We store temporary user data when they're making a purchase in memcache. We can easily update this object as things are added to the cart or more info about the user is given. This appears to be the same functionality as a hash in redis. I don't understand how this is only a basic string data type and how it's less powerful than a hash.
If you are using strings, that's fine - but any change involves loading the data to your application, parsing it, modifying it, and serializing it back to Redis/Memcache.
This has two problems: it's slow and non atomic. You can have two servers modifying the same object arriving in an inconsistent state - such as double or missing items in a shopping cart. And again, it's slow.
With a Redis hash key, you can atomically modify specific fields of the object without loading the entire object into memory. Instead of read, parse, modify, save - you just update.
Besides, Redis has many many data structures that can create very flexible data stores with different properties, whereas Memcache can only store strings.
BTW Redis has a module that allows you to store JSON objects just as you would a string, and manipulate them directly and atomically without getting them to the client. See Rejson.io for details.
Memcached doesn't support complex datastructures
In redis you have Lists, Sets, SortedSets, HashTables , and more.
Each data-structure mentioned above supports mutation of one or more of its elements atomically and without replacing the entire data-structure/value.
Memcached on the other hand , is a simple key-value store - that means every operation involving an attribute change within a complex object is a read-modify-write. If you just go around blindly replacing fields in objects then you are risking race-conditions and operations atomicity issues (which you can get away from by using CAS )
If the library abstracts that complexity, well - that's great but it's still less efficient than mutating only the relevant field(s)
This answer only relates to your usecase. Redis holds many other virtues over memcached, which are not relevant to this question.
I have some Lua code embedded in nginx. In this code I get some small data from Redis cache. Now I wonder, if it is a good practice to cache this data (already cached in some sense) in nginx, using ngx.shared construct? Are there any pros and cons of doing it this way? In pseudo-code I expect to have something like:
local cache = ngx.shared.cache
local cached_key = cache:get("cached_key")
if cached_key == nil then
... get data from Redis
cache:set("cached_key", cached_key)
end
As stated in the documentation ngx.shared is a space shared among all the workers of the nginx server.
All the listed operations are atomic, so you only have to bother about race conditions if you use two operations on ngx.shared one after the other. In this case, they should be protected using ngx.semaphore.
The pros:
Using ngx.shared provides faster access to the data, because you avoid a request/response loop to the Redis server.
Even if you need a ngx.semaphore you can expect faster access to the data (but i have no benchmark to provide).
The cons:
The ngx.shared cache provides inaccurate data, as your local cache does not reflect the current Redis value. This is not always a crucial point, as there can always be a delta between the values used in the worker and the value stored in Redis.
Data stored in ngx.shared can be inconsistent, which is more important. For instance it can store x=true and y=false whereas in Redis x and y have always the same value. It depends on how you update your local cache.
You have to handle yourself the cache, by updating the values in your cache whenever they are sent to Redis. This can be easily done by wrapping the redis functions. Expect bugs if you handle updates by putting it after each call to redis.get, because you (or someone) will forget it.
You also have to handle reads: whenever a value is not found in your ngx.cache, you have to automatically read it from Redis. Expect bugs if you handle reads by putting them after each call to cache.get, because you (or someone) will forget it.
For the two last points, you can easily write a small wrapper module.
As a conclusion:
If your server runs only one instance, with one or several workers, using ngx.shared is interesting, as you can always have a cache of your Redis data that is always up-to-date.
If your server runs several instances and having an always up-to-date cache is mandatory, or if you could have consistency problems, then you should avoid caching using ngx.shared.
In all cases, if the size of your data can be huge, make sure to provide a way to clean it before memory consumption is too high. If you cannot provide cleaning, then you should not use ngx.shared.
Also, do not forget to store the cached value within a local variable, in order to avoid geting it again and again, and thus to improve efficiency.
OK, so Realm (.NET) doesn't support async queries in it's current version.
In case the underlying table for a certain RealmObject contains a lot of records, say in the hundreds of thousands or millions, what is the preferred approach (given the current no async limitation)?
My current options (none tested thus far):
On the UI thread use Realm.GetInstance().All<T> and filter it (and then enumerate the IEnumerable). My assumption is that the UI thread will block waiting for this possible lengthy operation.
Do the previous on a worker thread. The downside would be that all RealmObject's need to be mapped to some auxiliary domain model (or even the same model, but disconnected from Realm) because realm objects cannot be shared/marshaled between threads.
Is there any recommended approach (by the Realm creators, of course)? I'm aware this doesn't completely fit the question model for this site, but so be it.
Realm enumerators are truly lazy and the All<T> is a further special case, so it is certainly fast enough to do on the UI thread.
Even queries are so fast, most of the time we recommend people do them on the UI thread.
To enlarge on my comment on the question, RealmObject subclasses are woven at compile time with the property getters and setters being mapped to call directly through to the C++ core, getting memory-mapped data.
That keeps updates between threads lightning fast, as well as delivering our incredible column-scanning speed. Most cases do not require indexes nor do they need running on separate threads.
If you create a standalone RealmObject subclass eg: new Dog() it has a flag IsManaged==false which means the getter and setter methods still use the backing field, as generated by the compiler.
If you create an object with CreateObject or you take a standalone into the Realm with Realm.Manage then IsManaged==true and the backing field is ignored.
I am trying to create a custom cashing mechanism where I am returning a weak_ptr to the cache created. Internally, I hold a shared_ptr to control the lifetime of the object.
When the maximum cache pre-set is consumed, the disposer looks for those cache objects that are not accessed for a long time and will clean them up.
Unfortunately this may not be ideal. If it was possible to check how many cache objects can be accessed through the weak_ptr, then this can be a criteria for making the decision to clean up or not.
Turns out there is no way to check how many weak_ptr(s) have handle to the resource.
But when I look at the shared_ptr documentation and implementation notes
=> the number of weak_ptrs that refer to the managed object
is part of the implementation. Why is this not exposed through an API ?