Shiro cache size - caching

How to estimate size for Apache Shiro permissions cache?
For example, permissions strings are implemented in format:
<domain>:<resource_group>:<resource_name>:<permission>
for example
my-domain:resource-group-0001:resource-0001:permission-001
Would Shiro store all those strings as plain text?
In our case, we have 10,000+ users, 10,000+ resources and up 100 possible permissions. Of course only a fraction of all permutations would be present, but even then we are looking at 200M+ entries with potentially 10+ GB of data, which would be taxing for an in-memory cache.
The data would not be coming from a database in plain form, so no ehcache here. However, we do have to make this cache distributed, so current (smaller scale) implementation uses Redis.

Estimating sizes really difficult. We (the SHIRO Team) haven't done it. You might be better of with a cache that will "forget" old entries.
Shiro will store more than just the permission String. You can see it here (1.9.x branch): AuthorizingRealm.java:317-337
The method will retrieve (and store) an AuthorizationInfo object. That means, it will serialize this object containing:
Collection<String> getRoles();
Collection<String> getStringPermissions();
Collection<Permission> getObjectPermissions();
Now, it is different per user how many Roles and String- or ObjectPermissions they have. It may vary greatly, even within the same application.
The Permission is yet another nested structure. The default implementation, WildcardPermission, will internally it tear the String apart into multiple Collections:
private List<Set<String>> parts;
Then, the last thing to store is the cache Key, which is a PrincipalCollection. However, it is usually just a single Principal for most applications (ie a collection of size one).
If you need an estimate, you could extend the Realm you are using and override the method protected AuthorizationInfo getAuthorizationInfo(PrincipalCollection principals); to print the serialized size. However, this should only be done in a test environment.
I hope you now have an overview of what is being saved (and serialized). Let us know whether this helps to do the numbers!

Related

What is Redis ValueOperations?

What is Redis Value operations in Spring boot?
Is it like we can directly store Key-value pair in Redis database without creating the entity and stuff just by using RedisTemplate<String, Object> ?
Also, if we use ValueOperations how will it impact the performance?
When using Redis, you should think about what data format/datatype suits your needs best, similar to what you would do when coding in any general programming language. All those operations, ValueOperations, ListOperations, SetOperations, HashOperations, StreamOperations are the support provided for interacting with the mentioned datatypes. They are provided by the RedisTemplate.
When you are using ValueOperations, you are more or less treating your whole Redis instance as a giant hash map. For example, you can store entries in Redis like current_user = "John Doe". However, you can also do something silly such as keeping a string representation of a huge hashmap against a key, top_users = <huge_string_representing_a_hash_map> when thinking from the perspective of the second case, what if you want to get the value for one key in the mentioned hash map. Then, the task becomes more or less impossible without transferring the whole hash map in RAM. Yet, if you have used Redis Hashes and HashOperations that would have been a more trivial task.
Going back to your question, if you want to store a simple object using ValueOperations. That wouldn't degrade the performance. In contrast, if you are moving huge maps around, you'll utilise a lot of your network bandwidth and RAM capacity.
In summary, choose your Redist data types carefully to suit your needs.
https://redis.io/topics/data-types

My company uses memcache as object just fine, can't see need for redis in caching

I'm learning about redis/memcache and redis is clearly the more popular option. My question is about supported data types. At my company we use the memcashier library which is built in memcached. We store temporary user data when they're making a purchase in memcache. We can easily update this object as things are added to the cart or more info about the user is given. This appears to be the same functionality as a hash in redis. I don't understand how this is only a basic string data type and how it's less powerful than a hash.
If you are using strings, that's fine - but any change involves loading the data to your application, parsing it, modifying it, and serializing it back to Redis/Memcache.
This has two problems: it's slow and non atomic. You can have two servers modifying the same object arriving in an inconsistent state - such as double or missing items in a shopping cart. And again, it's slow.
With a Redis hash key, you can atomically modify specific fields of the object without loading the entire object into memory. Instead of read, parse, modify, save - you just update.
Besides, Redis has many many data structures that can create very flexible data stores with different properties, whereas Memcache can only store strings.
BTW Redis has a module that allows you to store JSON objects just as you would a string, and manipulate them directly and atomically without getting them to the client. See Rejson.io for details.
Memcached doesn't support complex datastructures
In redis you have Lists, Sets, SortedSets, HashTables , and more.
Each data-structure mentioned above supports mutation of one or more of its elements atomically and without replacing the entire data-structure/value.
Memcached on the other hand , is a simple key-value store - that means every operation involving an attribute change within a complex object is a read-modify-write. If you just go around blindly replacing fields in objects then you are risking race-conditions and operations atomicity issues (which you can get away from by using CAS )
If the library abstracts that complexity, well - that's great but it's still less efficient than mutating only the relevant field(s)
This answer only relates to your usecase. Redis holds many other virtues over memcached, which are not relevant to this question.

What should be stored in cache for web app?

I realize that this might be a vague question the bequests a vague answer, but I'm in need of some real world examples, thoughts, &/or best practices for caching data for a web app. All of the examples I've read are more technical in nature (how to add or remove cache data from the respective cache store), but I've not been able to find a higher level strategy for caching.
For example, my web app has an inbox/mail feature for each user. What I've been doing to date is storing typical session data in the cache. In this example, when the user logs in I go to the database and retrieve the user's mail messages and store them in cache. I'm beginning to wonder if I should just maintain a copy of all users' messages in the cache, all the time, and just retrieve them from cache when needed, instead of loading from the database upon login. I have a bunch of other data that's loaded on login (product catalogs and related entities) and login is starting to slow down.
So I guess my question to the community, is what would you do/recommend as an approach in this scenario?
Thanks.
This might be better suited to https://softwareengineering.stackexchange.com/, but generally you want to cache:
Metadata/configuration data that does not change frequently. E.g. country/state lists, external resource addresses, logic/branching settings, product/price/tax definitions, etc.
Data that is costly to retrieve or generate and that does not need to frequently change. E.g. historical data sets for reports.
Data that is unique to the current user's session.
The last item above is where you need to be careful as you can drastically increase your app's memory usage, by adding a few megabytes to the data for every active session. It also implies different levels of caching -- application wide, user session, etc.
Generally you should NOT cache data that is under active change.
In larger systems you also need to think about where the cache(s) will sit. Is it possible to have one central cache server, or is it good enough for each server/process to handle its own caching?
Also: you should have some method to quickly reset/invalidate the cached data. For a smaller or less mission-critical app, this could be as simple as restarting the web server. For the large system that I work on, we use a 12 hour absolute expiration window for most cached data, but we have a way of forcing immediate expiration if we need it.
This is a really broad question, and the answer depends heavily on the specific application/system you are building. I don't know enough about your specific scenario to say if you should cache all the users' messages, but instinctively it seems like a bad idea since you would seem to be effectively caching your entire data set. This could lead to problems if new messages come in or get deleted. Would you then update them in the cache? Would that not simply duplicate the backing store?
Caching is only a performance optimization technique, and as with any optimization, measure first before making substantial changes, to avoid wasting time optimizing the wrong thing. Maybe you don't need much caching, and it would only complicate your app. Maybe the data you are thinking of caching can be retrieved in a faster way, or less of it can be retrieved at once.
Cache anything that causes duplicate database queries.
Client side file caching is important as well. Assuming files are marked with an id in your database, cache them on every network request to avoid many network requests for the same file. A resource to do this can be found here (https://developer.mozilla.org/en-US/docs/Web/API/IndexedDB_API). If you don't need to cache files, web storage, local storage and cookies are good for smaller pieces of data.
//if file is in cache
//refer to cache
//else
//make network request and push file to cache

Cache Management with Numerous Similar Database Queries

I'm trying to introduce caching into an existing server application because the database is starting to become overloaded.
Like many server applications we have the concept of a data layer. This data layer has many different methods that return domain model objects. For example, we have an employee data access object with methods like:
findEmployeesForAccount(long accountId)
findEmployeesWorkingInDepartment(long accountId, long departmentId)
findEmployeesBySearch(long accountId, String search)
Each method queries the database and returns a list of Employee domain objects.
Obviously, we want to try and cache as much as possible to limit the number of queries hitting the database, but how would we go about doing that?
I see a couple possible solutions:
1) We create a cache for each method call. E.g. for findEmployeesForAccount we would add an entry with a key account-employees-accountId. For findEmployeesWorkingInDepartment we could add an entry with a key department-employees-accountId-departmentId and so on. The problem I see with this is when we add a new employee into the system, we need to ensure that we add it to every list where appropriate, which seems hard to maintain and bug-prone.
2) We create a more generic query for findEmployeesForAccount (with more joins and/or queries because more information will be required). For other methods, we use findEmployeesForAccount and remove entries from the list that don't fit the specified criteria.
I'm new to caching so I'm wondering what strategies people use to handle situations like this? Any advice and/or resources on this type of stuff would be greatly appreciated.
I've been struggling with the same question myself for a few weeks now... so consider this a half-answer at best. One bit of advice that has been working out well for me is to use the Decorator Pattern to implement the cache layer. For example, here is an article detailing this in C#:
http://stevesmithblog.com/blog/building-a-cachedrepository-via-strategy-pattern/
This allows you to literally "wrap" your existing data access methods without touching them. It also makes it very easy to swap out the cached version of your DAL for the direct access version at runtime quite easily (which can be useful for unit testing).
I'm still struggling to manage my cache keys, which seem to spiral out of control when there are numerous parameters involved. Inevitably, something ends up not being properly cleared from the cache and I have to resort to heavy-handed ClearAll() approaches that just wipe out everything. If you find a solution for cache key management, I would be interested, but I hope the decorator pattern layer approach is helpful.

Caching Pattern: What do you call (and how do you replace) OpenSymphony OsCache "group" paradigm

A caching issue for you cache gurus.
Context
We have used OpenSymphony's OsCache for several years and consider moving to a better/stronger/faster/actively-developed caching product.
Problem
We have used OsCache's "group entry" feature and have not found it elsewhere.
In short, OsCache allows you to specify one or more groups at 'entry
insertion time'. Later you can invalidate a "group of entries", without knowing the keys for each entry.
OsCache Example
Here is example code using this mechanism:
Object[] groups = {"mammal", "Northern Hemisphere", "cloven-feet"}
myCache.put(myKey, myValue , groups );
// later you can flush all 'mammal' entries
myCache.flushGroup("mammal")
// or flush all 'cloven-foot'
myCache.flushGroup("cloven-foot")
Alternative: Matcher Mechanism
We use another home-grown cache written by a former team member which uses a 'key matcher' pattern for invalidating entries
In this approach you would define your 'key' and matcher' class as follows:
public class AnimalKey
{
String fRegion;
String fPhylum;
String fFootType;
..getters and setters go here
}
Matcher:
public class RegionMatcher implements ICacheKeyMatcher
{
String fRegion;
public RegionMatcher(String pRegion)
{
fRegion=pRegion;
}
public boolean isMatch(Obect pKey)
{
boolean bMatch=false;
if (pKey instanceof AnimalKey)
{
AnimalKey key = (AninmalKey) pKey);
bMatch=(fRegion.equals(key.getRegion());
}
}
}
Usage:
myCache.put(new AnimalKey("North America","mammal", "chews-the-cud");
//remove all entries for 'north america'
IKeyMatcher myMatcher= new AnimalKeyMatcher("North America");
myCache.removeMatching(myMatcher);
This mechanism has simple implementation, but has a performance
downside: it has to spin through each entry to invalidate a group. (Though it's still faster than spinning through a database).
The question
(Warning: this may sound stupid) What do you call this functionality? OsCache calls it "cache groups". Neither JbossCache nor EhCache doesn't seem to neither define nor implement it. Realm? Region? Kingdom?
Do standard patterns exist for this "cache groups/region" paradigm?
How do rising-star caching products (e.g. ehcache, coherence, jbosscache) handle this problem
This paradigm isn't in the jcache spec, right? (JSR-107)
How do you handle "mass invalidation"? Caches are great until they grow stale. An API which allows you to invalidate wide swaths is a big help. (E.g. administrator wants to press a button and clear all the cached post entries for, say, a particular forum)
thanks
will
I too implemented a matcher approach when trying to scale a legacy system with an ad hoc invalidation process. The O(n) nature wasn't a problem since the caches were small, the invalidation was performed on a non-user facing thread, and it didn't hold the locks so there wasn't a contention penalty. This was needed for matching against keys that cross cut caches, such as to invalidate all data for a company in caches spread across the application. This was really a problem of having no design centers so the application was monolithic and poorly decomposed.
When we rewrote it based on domain services, I adopted a different strategy. We now had the domain for specific data centralized into specific caches, such as for configurations, so it became a desire for multi-lookup. In this case we realized that the key was just a subset of the value, so we could extract all of the keys after load from metadata (e.g. annotations). This allowed for fine grained grouping and a convenient programming model through our cache abstraction. I published the core data structure, IndexMap, in a tutorial on the idea. Its not meant for direct usage outside of an abstraction, but better solves the grouping problem we faced.
http://code.google.com/p/concurrentlinkedhashmap/wiki/IndexableCache

Resources