Bulk publish option (Publish inc. Subnodes) with cache in magnolia - caching

I am using magnolia enterprise standard version 5.3. We have publish and publish inc. sub nodes option for different apps. Can someone please tell me how cache work when we publish a tree structure? i means to say that, is it publish each node one by one and after publishing each node is it flush the public cache? or first it publish whole tree and then flush public cache?
Actually i want to apply wait time for bulk publish? before that i want to understand cache role while we publish the tree structure.
Can we add wait time for bulk publish?
I am not talking about multisite cache things.

Depends on how you configured the cache (or flush policy (or actually the observer that triggers flush policy). IIRC, by default, it is configured such that when event ("something was published") arrives, it will wait and collect all other incoming activations that come within one second. If nothing comes in one second since last event, the event with aggregated messages is passed on to flush policy. If, on the other hand, the events keep arriving, observation will keep collecting and aggregating those events for maximum of 4 seconds before reacting and flushing the cache. (I hope, 1 sec and 4 secs are the correct intervals, but it has been couple of years since I was last time digging anything in that area, so it might have been slightly changed since.)
In EE you have also possibility to configure other caching policies and can have dual cache where one is always pre-heated w/ new content before other is flushed or you can write completely custom policy that suits your needs.

Related

Can Digital Ocean Stateless Servers (I have 4 servers running on Digital Ocean) work with the Caching Policy I implemented on Spring Boot?

I have implemented Caching in my Spring Boot REST Application. My policy includes a time based cache eviction strategy, and an update-based cache eviction strategy. I am worried that since I employ a stateless server, if there is a method called to update certain data, and this was handled by server instance A, then the corresponding caches in server instance B, C and D, are not updated as well.
Is this an issue I would face / is there a way to overcome this issue?
This is the oldest problem in software development - cache invalidation when you have multiple servers
One way to handle it is to move your cache out of the individual servers and move them to somewhere shared like another instance that holds the cache entries that every other app refers to or something like redis [centralized cache]
Second way is to do a broadcast message so that each server now knows to invalidate the entry once the data has been modified or deleted - here you run the risk of the message not being processed and thus a stale entry is left in some server[s]
Another option is to have some sort of write ahead log [like kafka or redis streams] which is processed by each server and thus they will all process the events deterministically and have the same cache state
Lmk if you need more help - we can setup some time outside of SO

Caffeine cache refresh without downtime Spring Boot

Background
I am working on a spring boot application. In this application, we have two different caffeine caches - dealDetailsCache and trackingDetailsCache.
Both of these caffeine caches have single keys, let us say, X and Y respectively. We just keep updating the values for the same keys while refreshing.
Application Flow
In every 5 minutes, there is a scheduled job that runs.
This job fetches some data from an external source and upon successful retrieval, updates the 2 caffeine caches (mentioned above)
Currently, I am manually doing a put operation on each of the above caffeine caches to refresh the data for their respective keys:
put(X, <<data>>) for updating dealDetailsCache
put(Y, <<data>>) for updating trackingDetails
Expected QPS is about 50 per second.
What am I looking for
I am looking for a way to refresh the caffeine caches (just like buildAsync)
such that it does not impact the application and there should be no
downtime.
If put is not the right way to do this, then would someone please
suggest the right way to update the cache in such a way that there is
absolutely no downtime.
I read about CacheEvict, but there is a risk associated with it. It evicts and then refreshes. There could be some time between the two operations and any requests that come in during this period (after eviction and before new data is loaded) would fail.
Ultimate aim is that the requests should always find the data, even if it is old for the time being. Would someone please suggest a clean mechanism for manual cache refreshes?

Process Laravel/Redis job from multiple server

We are building a reporting app on Laravel that need to fetch users data from a third-party server that allow 1 request per seconds.
We need to fetch 100K to 1000K rows based on user and we can fetch max 250 rows per request.
So the restriction is:
1. We can send 1 request per seconds
2. 250 rows per request
So, it requires 400-4000 request/jobs to fetch a user data, So, loading data for multiple users is very time-consuming and the server gets slow.
So, now, we are planning to load the data using multiple servers, like 4-10 servers to fetch users data, so we can send 10 requests per second from 10 servers.
How can we design the system and process jobs from multiple servers?
Is it possible to use a dedicated server for hosting Redis and connect to that Redis server from multiple servers and execute jobs? Can any conflict/race-condition happen?
Any hint or prior experience related to this would be really helpful.
The short answer is yes, this is absolutely possible and is something I've implemented in production apps many times before.
Redis is just like any other service and can run anywhere, with clients from anywhere, connecting to it. It's all up to your configuration of the server to dictate how exactly that happens (and adding passwords, configuring spiped, limiting access via the firewall, etc.). I'd reccommend reading up on the documentation they have in the Administration section here: https://redis.io/documentation
Also, when you do make the move to a dedicated Redis host, with multiple clients accessing it, you'll likely want to look into having more than just one Redis server running for reliability, high availability, etc. Redis has efficient and easy replication available with a few simple configuration commands, which you can read more about here: https://redis.io/topics/replication
Last thing on Redis, if you do end up implementing a master-slave set up, you may want to look into high availability and auto-failover if your Master instance were to go down. Redis has a really great utility built into the application that can monitor your Master and Slaves, detect when the Master is down, and automatically re-configure your servers to promote one of the slaves to the new master. The utility is called Redis Sentinel, and you can read about that here: https://redis.io/topics/sentinel
For your question about race conditions, it depends on how exactly you write your jobs that are pushed onto the queue. For your use case though, it doesn't sound like this would be too much of an issue, but it really depends on the constraints of the third-party system. Either way, if you are subject to a race condition, you can still implement a solution for it, but would likely need to use something like a Redis Lock (https://redis.io/topics/distlock). Taylor recently added a new feature to the upcoming Laravel version 5.6 that I believe implements a version of the Redis Lock in the scheduler (https://medium.com/#taylorotwell/laravel-5-6-preview-single-server-scheduling-54df8e0e139b). You can look into how that was implemented, and adapt for your use case if you end up needing it.

JBoss Data Grid library mode with multiple applications sharing caches - what is efficient way

What is the most efficient way of having Infinispan/JBoss Data Grid in library mode with several applications using same caches?
I currently setup JBoss Data Grid in library mode in EAP 6.3, have about 10 applications and 6 different caches configured.
Cache mode is Replication.
Each application has a cache manager which instantiates the caches that are required by the application. Each cache is used by at least 2 applications.
I hooked up hawtio and can see from JMX beans that multiple cache managers are created with duplicated cache instances.
From the logs, I see:
ISPN000094: Received new cluster view: [pcu-18926|10] (12) [pcu-18926, pcu-24741, pcu-57265, pcu-18397, pcu-26495, pcu-56892, pcu-59913, pcu-53108, pcu-34661, pcu-43165, pcu-32195, pcu-28641]
Does it have a lot of overhead in cache managers talking to each other all the time?
I eventually want to setup 4 cluster nodes with JBoss data grid in library mode so how can I configure so that all applications in one node share same cache manager hence reducing noise?
I can't use JBoss data grid in Server mode which I am aware will fulfil my requirements.
Thanks for any advice.
First of all, I may misunderstand your setup: this log says that there are 10 'nodes'. How many servers do you actually use? If you use the cache to communicate with 10 applications on the same machine, it's a very suboptimal approach; you keep 10 copies of all data and use many RPC to propagate writes between the caches. You should have single local-mode cache and just retrieve a reference to it (probably through JNDI).
Cache managers don't talk to each other, and caches do only when there is an executed operation, or when a node is joining/leaving/crashing (then the caches have to rebalance).
It's JGroups channel that keeps the view and exchanges some messages to detect if the other nodes are alive or other synchronizing messages, but this kind of messages is send once every few seconds, so this has a very low overhead.
On the other hand, each channel keeps several threadpools, and cache manager has a threadpool as well, so there is some memory overhead. From CPU point of view, there is a thread that iterates through the cache and purges expired entries (the task is started every minute), so even with idle cache full of entries some cycles are consumed. If the cache is empty, this has very low consumption (there's not much to iterate through).

Recommend cache updating strategy

Our site is divided into several smaller sites recently, which are then distributed in different IDCs.
One of these sites serves user authentication and other user-related services, the other sites access it through web services.
On every site that fetches data remotely, we make a local cache so that we don't have to go remote every time user information is needed.
What cache updating strategy would you recommend to ensure data integrity?
Since you need the updated-policy close to realtime, you definitely need the cache-invalidation notification engine.
There are 2 possible implementation models for it:
1.Pull
Main server pulls child-servers with notification messages like "resourceID=34392 not more valid in your cache".
This message should be sent on each data update on main server.
Poll
Each child-server ask main server about the cache item validity right before serving it to user.
Ofcourse, in this case, main server should keep the list of objects updated during last cache-lifetime period, and respond to "If-object-was-updated" requests very quickly.
As you see in both cases, your main server should trigger an event on each data change.
In first case this event will be transferred via 'notification bus' to child server, and in second case this event will be stored in recently-updated-objects list.
So both options need some code changes on main server.
As for me the second options is much more easy to implement in common, but it`s very depends of the software stack you're using.

Resources