State vs cookie/localstorage read performance - performance

I am developing a app in React + Redux and I have a constant doubt and I can't find documentation about it. Is there any performance downside if, let's say in a saga, I read data from a cookie/localStorage instead from the state? This read process would only happen once on each load.
The key thing here is the performance, without taking into consideration if it's good or bad practice.
Thank in advance.

First of all - what do you mean state ? In redux - state is just a plain object (plus some methods, but still). So when you read data from there - you just read props from object.
While cookies, localstorage - it's DOM api, which first of all slower, plus you need not only read data, but also parse it (cause both cookies, storage work with serialized data). So definitely storage/cookie slower than state.
You can check http://jsben.ch/nvo5G
BUT! - you can't save in-memory object state between page reloads. So for this, you can use storage (pattern named persistent state. And there is probably no other way to implement this functionality (or client DB) - in case you need to restore some state on reload - you have just two options - save state on a client (cookies, storage/db), or on server (and do fetch request).
It's MICRO optimisations, mostly you shouldn't care about it (in the case of reading just on start app)

Related

What are the eviction rule for apollo-cache-inmemory?

From what I understand, anything in cache is ephemeral and is subjected to some kind of eviction rule, like LRU. In this case, if we are using the in-memory cache and apollo-link-state to replace redux or vuex, how do we guarantee that some states don't get evicted in the middle of running the application?
As of Apollo Client v2, there is no eviction whatsoever. Based on the comments it might be on the roadmap for v3.
You can check these Github issues for discussion:
https://github.com/apollographql/apollo-client/issues/3965
https://github.com/apollographql/apollo-feature-requests/issues/4
https://github.com/apollographql/apollo-client/issues/621
As for the more general question - in most cases there is no need of such a guarantee. The reason is that cache is completely transparent to the app due to the Apollo Client and React design. When you use a Query component, your subcomponent will receive data. At that point, you decide what to do if the data is available or not.
For example, if you decide to render a loading spinner if data is not available, then theoretically each time data is evicted, your component will be re-rendered and will show spinner.
I can imagine a case where you might have a long running, async operation (if it's not async then again data cannot be evicted in the middle of it, due to JavaScript execution model). In such a case (rare, but possible), you could potentially copy the data first to local variables etc.

Dealing with concurrency issues when caching for high-traffic sites

I was asked this question in an interview:
For a high traffic website, there is a method (say getItems()) that gets called frequently. To prevent going to the DB each time, the result is cached. However, thousands of users may be trying to access the cache at the same time, and so locking the resource would not be a good idea, because if the cache has expired, the call is made to the DB, and all the users would have to wait for the DB to respond. What would be a good strategy to deal with this situation so that users don't have to wait?
I figure this is a pretty common scenario for most high-traffic sites these days, but I don't have the experience dealing with these problems--I have experience working with millions of records, but not millions of users.
How can I go about learning the basics used by high-traffic sites so that I can be more confident in future interviews? Normally I would start a side project to learn some new technology, but it's not possible to build out a high-traffic site on the side :)
The problem you were asked on the interview is the so-called Cache miss-storm - a scenario in which a lot of users trigger regeneration of the cache, hitting in this way the DB.
To prevent this, first you have to set soft and hard expiration date. Lets say the hard expiration date is 1 day, and the soft 1 hour. The hard is one actually set in the cache server, the soft is in the cache value itself (or in another key in the cache server). The application reads from cache, sees that the soft time has expired, set the soft time 1 hour ahead and hits the database. In this way the next request will see the already updated time and won't trigger the cache update - it will possibly read stale data, but the data itself will be in the process of regeneration.
Next point is: you should have procedure for cache warm-up, e.g. instead of user triggering cache update, a process in your application to pre-populate the new data.
The worst case scenario is e.g. restarting the cache server, when you don't have any data. In this case you should fill cache as fast as possible and there's where a warm-up procedure may play vital role. Even if you don't have a value in the cache, it would be a good strategy to "lock" the cache (mark it as being updated), allow only one query to the database, and handle in the application by requesting the resource again after a given timeout
You could probably be better of using some distributed cache repository, as memcached, or others depending your access pattern.
You could use the Cache implementation of Google's Guava library if you want to store the values inside the application.
From the coding point of view, you would need something like
public V get(K key){
V value = map.get(key);
if (value == null) {
synchronized(mutex){
value = map.get(key);
if (value == null) {
value = db.fetch(key);
map.put(key, value);
}
}
}
return value;
}
where the map is a ConcurrentMap and the mutex is just
private static Object mutex = new Object();
In this way, you will have just one request to the db per missing key.
Hope it helps! (and don't store null's, you could create a tombstone value instead!)
Cache miss-storm or Cache Stampede Effect, is the burst of requests to the backend when cache invalidates.
All high concurrent websites I've dealt with used some kind of caching front-end. Bein Varnish or Nginx, they all have microcaching and stampede effect suppression.
Just google for Nginx micro-caching, or Varnish stampede effect, you'll find plenty of real world examples and solutions for this sort of problem.
All boils down to whether or not you'll allow requests pass through cache to reach backend when it's in Updating or Expired state.
Usually it's possible to actively refresh cache, holding all requests to the updating entry, and then serve them from cache.
But, there is ALWAYS the question "What kind of data are you supposed to be caching or not", because, you see, if it is just plain text article, which get an edit/update, delaying cache update is not as problematic than if your data should be exactly shown on thousands of displays (real-time gaming, financial services, and so on).
So, the correct answer is, microcache, suppression of stampede effect/cache miss storm, and of course, knowing which data to cache when, how and why.
It is worse to consider particular data type for caching only if data consumers are ready for getting stale date (in reasonable bounds).
In such case you could define invalidation/eviction/update policy to keep you data up-to-date (in business meaning).
On update you just replace data item in cache and all new requests will be responsed with new data
Example: Stocks info system. If you do not need real-time price info it is reasonable to keep in cache stock and update it every X mils/secs with expensive remote call.
Do you really need to expire the cache. Can you have an incremental update mechanism using which you can always increment the data periodically so that you do not have to expire your data but keep on refreshing it periodically.
Secondly, if you want to prevent too many users from hiting the db in one go, you can have a locking mechanism in your stored proc (if your db supports it) that prevents too many people hitting the db at the same time. Also, you can have a caching mechanism in your db so that if someone is asking for the exact same data from the db again, you can always return a cached value
Some applications also use a third service layer between the application and the database to protect the database from this scenario. The service layer ensures that you do not have the cache miss storm in the db
The answer is to never expire the Cache and have a background process update cache periodically. This avoids the wait and the cache-miss storms, but then why use cache in this scenario?
If your app will crash with a "Cache miss" scenario, then you need to rethink your app and what is cache verses needed In-Memory data. For me, I would use an In Memory database that gets updated when data is changed or periodically, not a Cache at all and avoid the aforementioned scenario.

Plone 4.2 how to make PAS cache external usera data

I'm implementing a PAS plugin that handles authentications against mailservers. Actually only DBMail is implemented.
I realized, that the enumerateUsers function from the PAS plugin is called numerous times per request and requires my plugin to open/close an SQL connections for every (subsequent) request. Of course, this is very expensive.
The connections itself are handled in a plone tool, which is able to handle multiple different mailservers and delegeates the enumerateUsers call to wrapper objects that represent registered servers.
My question is now, what sort of cache (OOBTree, Session?) I should use to provide a temporary local storage for repeating enumerations and avoid subsequent SQL connections?
Another idea was, to hook into the user creation process that takes place on the first login, an external user issues and completely "localize" the users.
Third idea was, to store the needed data in the specific member, if possible.
What would be best practice here?
I'd cache the query results, indeed. You need to make a decision on how long to cache the results, and if stored long term, how to invalidate that cache or check for changes.
There are no best practices for these decisions, as they depend entirely on the type of data stored and the APIs of the backends. If they support some kind of freshness query, for example, then you store everything forever and poll the backend to see if the cache needs updating.
You can start with a simple request cache; query once per request, store it on the request object. Your cache will automatically be invalidated at the end of the request as the request object is cleaned up, the next request will be a clean slate.
If your backend users rarely change, you can cache information for longer, in a local cache. I'd use a volatile attribute on the plugin. Any attribute starting with _v_ is ignored by the persistence machinery. Thus, anything stored in a _v_ volatile attribute is both thread-local and only exists for the lifetime of the process, a restart of the server clears these automatically.
At the very least you should use an _v_ volatile attribute to store your backend SQL connections. That way they can stay open between requests, and can be re-used. Something like the following method would do nicely:
def _connection(self):
# Return a backend connection
if getattr(self, '_v_connection', None) is None:
# Create connection here
self._v_connection = yourdatabaseconnection
return self._v_connection
You could also use a persistent attribute on your plugin to store your cache. This cache would be committed to the ZODB and persist across restarts. You then really need to work out how to invalidate the contents; store timestamps and evict data when to old, etc.
Your cache datastructure depends entirely on your application needs. If you don't persist information, a dictionary (username -> information) could be more than enough. Persisted caches could benefit from using a OOBTree instead of a dictionary as they reduce chances of conflicts between different threads and are more efficient when it comes to large sets of data.
Whatever you do, you do not need to use a Session. Sessions are prone to conflicts, do not scale well, and are in any case not the place to store a cache of this kind.

Where To Save Messages For Chat/Shout-Box?

I'm going to create public chat/shout-box that will refresh after several seconds (almost like IRC or something).
My plan:
1) HTML form,
2) With JavaScript disallow to normally submit that form,
3) Save submitted message (here come problems),
4) Show new messages (with AJAX, I guess);
I'm not sure where to save those messages! I could save them in database, but... that may be very slow, because each user request new messages each several seconds, right? I could try optimize, but I'm not sure how... Maybe I could save those messages somewhere else?
I don't see why the database access would be slow (assuming it's local to the webserver) - I mean the data would certainly be stored in RAM if the data is accessed permanently anyhow and databases are quite optimized to handle queries efficiently.
Obviously you could store the data in your own datastructure and then save that to the db regularly, but you're reinventing the wheel and if your server crashes you may lose data.
What I personally would do is push data to your clients and NOT pull data from them (which seems what you're planning to do), that way you'd only have to send data whenever someone logs in or a new message appears - both situations that should happen not that often.

Coldfusion: is it better to keep just the user_id in the session, or the whole user object?

I've got a cfc to handle the user object. My question is: is it better to store just the user_id in the session and create the user object anew with each request? Or is is better to store the whole user object in the session?
Here are my thoughts either way:
If I store the whole object in the session:
There will be potentially less processor overhead
There will be potentially more memory overhead
all of the methods/functions are stored in the actual object, and new functions that I update in the cfc will not be available unless users logout and back in, or if I devise some way to make it refresh itself.
There could potentially be mutex or lock problems if I'm messing with the object via concurrent ajax calls
If I store just the user_id in the session:
I'll have to create the user object with each page request (potentially more processor overhead)
There will be potentially less memory overhead
There won't be a chance for mutex/lock/race conditions since each request will have its own copy of the user object
Updates to the CFC model itself will be immediately recognized across the system and users wouldn't have to log out and back in
Is there a normal practice for this sort of thing? Am I over-thinking it?
All of the CF apps I've written were targeted at high traffic levels and high availability, so we never had the luxury of being able to think about single-server practices.
So, in my experience, I always had to a) allow for multiple load-balanced servers, and b) avoid sticky-sessions on the load balancer for a number of reasons. Therefore, we needed to, at the very least, have a server become part of a cluster on the fly and pick up mid-session traffic.
So, we always pulled "session" data from a shared datastore on every request.
My suggestion is to implement a session facade.
This affords you the option to change how you persist session data (like the user record) without changing the rest of your app.
You can choose, behind the scenes, to store everything in the session scope, load it up for every request, do a hybrid, use a key-value store, whatever.
You can choose whether to eager-load data, or lazy-load data, or any mix in between, and the rest of the app doesn't need to be aware of what you've done.
On Race Conditions
If you're concerned about race conditions then I would suggest using named locks around data commit and access. This is another bonus of using a facade - your application code doesn't need to know about this, and you can choose to put locks around certain objects, as opposed to locking the whole session.
You haven't indicated whether you're using an ORM, so this is a general answer.
For typical applications, I recommend instantiating the user object into the session scope. There's a big downside to creating the object anew with each request that you didn't include in your list: changes to the user object's properties and state will not persist across requests unless you intend to flush the user object's state to your persistence layer (e.g. database) on every hit. That is likely to be a much more expensive operation than object instantiation, and it doesn't necessarily insulate you from the kinds of problems you're thinking about with respect to ajax calls, race conditions, etc -- it just transfers the manifestation of those problems to the persistence layer, where your object's data could be in an unpredictable state.
Since every new request would be an "implicit save", you would also have to design your "ephemeral" object to be able to persist itself regardless of whether it's in a valid state (imagine the case of a multi-page form that modifies some aspect of the user object).
For session-stored objects, your concerns about memory can be mitigated by careful design practices. For instance, if your user has many tasks, and each task has many items, it might be a bad idea to instantiate and compose all those objects into your user object (i.e., lazy loading would be a better approach than eager loading).
If you really must to be able to change your CFCs on the fly, you can achieve that goal even with session-stored objects. One way is to store a version flag in both the application and session. With each request, your app would compare those flags. When they differ, the app would run a session-reload routine that snapshots current properties, rebuilds the session-stored objects, and finally updates the session flag to match the application flag.
This is piggy-backing partially off Ken Redler's answer but I don't have enough reputation to comment.
The way we do it, and the way I prefer, is to store the user data in Session as a struct. Then on request start, our Auth Model creates the user object in the Request scope and overrides any default values with the Session data. There are a few advantages to this:
Less hits to the database, less CPU
Always run newest code without a complex custom system ensuring that
Clustered environment friendly (complex objects in Session can't be clustered)
Can add or remove properties without corruption (assuming your User object only updates dirty columns)
Also, if you're using CF9, one of the features they were really proud of is how much they optimized object instantiation. If you haven't, test it yourself!
It depends.
If you have a lot of traffic - in the thousands of unique visitors per minute range - the memory overhead of storing your User.cfc in the session will eventually weigh you down. This can be easily overcome by throwing hardware at it (more memory for a while, eventually more servers and a hardware load balancer). Of course popularity is a good problem to have.
If you seem to have a CPU, network or other bottleneck in your database space, you may want to have the object cached in session memory so that you have fewer hits to the database.
Why do I mention these scenarios? You may be prematurely optimizing - don't fix a problem that you don't have. Don't optimize your memory, CPU and database access until those are, or soon will be, problems.
Now from an architectural best practice - not from an optimized "what's best for my processor" - well, I can only say: It depends.
Truthfully, neither way is wrong. If you are going to find yourself needing to check credentials against your database on every request, don't cache it. If you like the feel of an object in the session, then cache it. Because you know your own domain, you can probably go back and forth all day on why you should or should not cache the user object in the session. If it's going to make it easier, do it. If it's going to make it harder, don't.
I would just warn you against doing something incredibly convoluted or anything that is not immediately obvious to a developer looking at your application - the more you write, the more you have to maintain forever, the more your co-workers will associate your name with evil.
Finally, last note, if this is a vote - I say you cache it. It makes sense and always feels good to call session.user.hasRole("xyz") or the like.

Resources