Understand how caching helps in auto-refreshing Superset dashboards - caching

I have created a couple of Superset dashboards in a production environment, and I have read that it's recommended to use the Redis cache in production environments. A similar StackOverflow question here.
Firstly, I would like to understand what am I going to achieve by adding the following code in the superset_config.py
FILTER_STATE_CACHE_CONFIG = {
'CACHE_TYPE': 'RedisCache',
'CACHE_DEFAULT_TIMEOUT': 86400,
'CACHE_KEY_PREFIX': 'superset_filter_',
'CACHE_REDIS_URL': 'redis://localhost:6379/2'
}
Secondly, I would like to know how can I auto-refresh a superset dashboard permanently and not only for the current session option which is available. Is there any way? (similar question in this thread).
Thanks in advance for your help.
-- UPDATE 26.07.2022 (after more research)
Reference links: official doc, issue 390
I have added the following dictionaries in my superset_config.py file:
CACHE_CONFIG: CacheConfig = {
'CACHE_TYPE': 'RedisCache',
'CACHE_DEFAULT_TIMEOUT': int(timedelta(days=1).total_seconds()),
'CACHE_KEY_PREFIX': 'superset_cache_',
'CACHE_REDIS_URL': 'redis://redis:6379/2'
}
# Cache for datasource metadata and query results
DATA_CACHE_CONFIG: CacheConfig = {
'CACHE_TYPE': 'RedisCache',
'CACHE_DEFAULT_TIMEOUT': int(timedelta(days=1).total_seconds()),
'CACHE_KEY_PREFIX': 'superset_data_',
'CACHE_REDIS_URL': 'redis://redis:6379/3'
}
FILTER_STATE_CACHE_CONFIG: CacheConfig = {
'CACHE_TYPE': 'RedisCache',
'CACHE_DEFAULT_TIMEOUT': int(timedelta(days=1).total_seconds()),
'CACHE_KEY_PREFIX': 'superset_filter_',
'CACHE_REDIS_URL': 'redis://redis:6379/4'
}
EXPLORE_FORM_DATA_CACHE_CONFIG: CacheConfig = {
'CACHE_TYPE': 'RedisCache',
'CACHE_DEFAULT_TIMEOUT': int(timedelta(days=1).total_seconds()),
'CACHE_KEY_PREFIX': 'superset_explore_',
'CACHE_REDIS_URL': 'redis://redis:6379/5'
}
The superset app started successfully and when I do a dashboard refresh I can see the query running in redis-cli. My concern is that every time I apply filters on the dashboards the data is re-cached. Shouldn't the caching be applied once for every filter in the datasource, so when I apply filters superset won't have to hit the DB to fetch new records.

I'm new to Superset and caching, but here's my understanding.
If a reader opens a report that has recently been cached, instead of the report being run again, a cached version is fetched from Redis. Redis can store and transmit data faster than a relational database - that's one advantage. This reduces traffic on the database as well.
The other advantage is that the cached report doesn't need to be re-computed, saving the need to run a potentially long-running and expensive query on your source database.
You can watch this happen by entering your Redis instance (for Docker, that means entering the container with docker exec -it superset_worker /bin/bash) and then running redis-cli. You can then run MONITOR to see all the transactions that are happening. And in the main Superset app, you can see when a report is fetched from the cache. Open those and then reload reports in your browser to generate the activity.
Your block of code would set up caching only of filters. You need similar blocks of code to set up filters for data, etc. Check the config.py file to see all the cache values that can be overwritten, then supply blocks of code similar to the one you have in your post for the other caches.
Here's a related post from Chartio, a Business Intelligence company that closed after being acquired, about the role of caching in serving their reports more efficiently.

Related

How do I control the cacheability of a node rendered through JSON:API in Drupal 8/9?

We have a headless Drupal site that is exposing an API using JSON:API. One of the resources we're exposing is a content type that has a calculated/computed field inside that generates a time-limited link. The link is only valid for 120 seconds, but it seems that JSON:API wants to cache responses containing the field data for significantly longer.
Ideally, we don't want to completely turn off caching for these nodes, since they do get requested very frequently. Instead, we just want to limit caching to about 60 seconds. So, \Drupal::service('page_cache_kill_switch')->trigger() is not an ideal solution.
I've looked into altering the cache-ability of the response on the way out via an EventSubscriber on KernelEvents::RESPONSE but that only affects the top level of processing. As in, JSON:API caches all the pieces of the response (the "normalizations") separately, for maximum cache efficiency, so even though the response isn't served from cache, the JSON:API version of each node is.
I've also tried using service injection to wrap JSON:API's normalizers so that I can adjust the cache max age. It looks like I would need to do this at the field normalization level, but the JSON:API module discourages this by throwing a LogicException with an error of JSON:API does not allow adding more normalizers!.
So, after a lot of struggling with this, the solution is surprisingly simple -- instead of manipulating caching at the response level or the JSON:API level, manipulate it at the entity level!
This worked:
/**
* Implements hook_ENTITY_TYPE_load() for nodes.
*/
function my_module_node_load(array $entities) {
foreach ($entities as $entity) {
assert($entity instanceof NodeInterface);
if ($entity->bundle() === 'my_content_type') {
// Cache "my_content_type" entities for up to 60 seconds.
$entity->addCacheableDependency(
(new CacheableMetadata())->setCacheMaxAge(60)
);
}
}
}
As a bonus, this uses core's normal caching mechanism; so, these nodes will only be cached for up to 60 secs even when being displayed on admin pages for us.

Hazelcast persisting and loading data on all nodes

I have a 2 node setup distributed cache setup which needs persistence setup for both members.
I have MapSore and Maploader implemented and the same code is deployed on both nodes.
The MapStore and MapLoader work absolutely ok on a single member setup, but after another member joins, MapStore and Maploader continue to work on the first member and all insert or updates by the second member are persisted to disk via the first member.
My requirement is that each member should be able to persist to disk independently so that distributed cache is backed up on all members and not just the first member.
Is there a setting I can change to achieve this.
Here is my Hazlecast Spring Configuration.
#Bean
public HazelcastInstance hazelcastInstance(H2MapStorage h2mapStore) throws IOException{
MapStoreConfig mapStoreConfig = new MapStoreConfig();
mapStoreConfig.setImplementation(h2mapStore);
mapStoreConfig.setWriteDelaySeconds(0);
YamlConfigBuilder configBuilder=null;
if(new File(hazelcastConfiglocation).exists()) {
configBuilder = new YamlConfigBuilder(hazelcastConfiglocation);
}else {
configBuilder = new YamlConfigBuilder();
}
Config config = configBuilder.build();
config.setProperty("hazelcast.jmx", "true");
MapConfig mapConfig = config.getMapConfig("requests");
mapConfig.setMapStoreConfig(mapStoreConfig);
return Hazelcast.newHazelcastInstance(config);
}
Here is my hazlecast yml config - This is placed in /opt/hazlecast.yml which is picked up by my spring config up above.
hazelcast:
group:
name: tsystems
management-center:
enabled: false
url: http://localhost:8080/hazelcast-mancenter
network:
port:
auto-increment: true
port-count: 100
port: 5701
outbound-ports:
- 0
join:
multicast:
enabled: false
multicast-group: 224.2.2.3
multicast-port: 54327
tcp-ip:
enabled: true
member-list:
- 192.168.1.13
Entire code is available here :
[https://bitbucket.org/samrat_roy/hazelcasttest/src/master/][1]
This might just be bad luck and low data volumes, rather than an actual error.
On each node, try the running the localKeySet() method and printing the results.
This will tell you which keys are on which node in the cluster. The node that owns key "X" will invoke the map store for that key, even if the update was initiated by another node.
If you have low data volumes, it may not be a 50/50 data split. At an extreme, 2 data records in a 2-node cluster could have both data records on the same node.
If you have a 1,000 data records, it's pretty unlikely that they'll all be on the same node.
So the other thing to try is add more data and update all data, to see if both nodes participate.
Ok after struggling a lot I noticed a teeny tiny buy critical detail.
Datastore needs to be a centralized system that is accessible from all Hazelcast members. Persistence to a local file system is not supported.
This is absolutely in line with what I was observing
[https://docs.hazelcast.org/docs/latest/manual/html-single/#loading-and-storing-persistent-data]
However not be discouraged, I found out that I could use event listeners to do the same thing I needed to do.
#Component
public class HazelCastEntryListner
implements EntryAddedListener<String,Object>, EntryUpdatedListener<String,Object>, EntryRemovedListener<String,Object>,
EntryEvictedListener<String,Object>, EntryLoadedListener<String,Object>, MapEvictedListener, MapClearedListener {
#Autowired
#Lazy
private RequestDao requestDao;
I created this class and hooked it into the config as so
MapConfig mapConfig = config.getMapConfig("requests");
mapConfig.addEntryListenerConfig(new EntryListenerConfig(entryListner, false, true));
return Hazelcast.newHazelcastInstance(config);
This worked flawlessly, I am able to replicate data over to both the embedded databases on each node.
My use case was to cover HA failover edge-cases. During HA failover, The slave node needed to know the working memory of the active node.
I am not using hazelcast as a cache, rather I am using as a data syncing mechanism.

using Redis in Openstack Keystone, some Rubbish in redis

Recently, I'm using Redis to cache token for OpenStack Keystone. The function is fine, but some expired cache data still in Redis.
my Keystone config:
[cache]
enabled=true
backend=dogpile.cache.redis
backend_argument=url:redis://127.0.0.1:6379
[token]
provider = uuid
caching=true
cache_time= 3600
driver = kvs
expiration = 3600
but some expired data in Redis:
Data was over expiration time, but still in here, because the TTL is -1.
My question:
How can I change settings to stop this rubbish data created?
Is some gracefully way to clean it up?
I was trying to use command 'keystone-manage token_flush', but after reading code, I realized this command just clean up the expired tokens in Mysql
I hope this question still relevant.
I'm trying to do the same thing as you are, and for now the only option I found working is the argument on dogpile.cache.redis: redis_expiration_time.
Checkout the backend dogpile.redis API or source code.
http://dogpilecache.readthedocs.io/en/latest/api.html#dogpile.cache.backends.redis.RedisBackend.params.redis_expiration_time
The only problem with this argument is that it does not let you choose a different TTL for different categories, for example you want tokens for 10 minutes and catalog for 24 hours or so. The other parameters on keystone.conf just don't work from my experience (expiration_time and cache_time on each category)... Anyway this problem isn't relevant if you are using redis to store only keystone tokens.
[cache]
enabled=true
backend=dogpile.cache.redis
backend_argument=url:redis://127.0.0.1:6379
// Add this line
backend_argument=redis_expiration_time:[TTL]
Just replace the [TTL] with your wanted ttl and you'll start noticing keys with ttl in redis and after a while you will see that they are no more.
about the second question:
This is maybe not the best answer you'll see, but you can use OBJECT idletime [key] command on redis-cli to see how much time the specific key wasn't used (even GET reset idletime). You can delete the keys that have bigger idletime than your token revocation using a simple script.
Remember that the data on Redis isn't persistent data, meaning you can always use FLUSHALL and your OpenStack and keystone will work as usual, but ofc the first authentications will take longer.

Run #Scheduled task only on one WebLogic cluster node?

We are running a Spring 3.0.x web application (.war) with a nightly #Scheduled job in a clustered WebLogic 10.3.4 environment. However, as the application is deployed to each node (using the deployment wizard in the AdminServer's web console), the job is started on each node every night thus running multiple times concurrently.
How can we prevent this from happening?
I know that libraries like Quartz allow coordinating jobs inside clustered environment by means of a database lock table or I could even implement something like this myself. But since this seems to be a fairly common scenario I wonder if Spring does not already come with an option how to easily circumvent this problem without having to add new libraries to my project or putting in manual workarounds.
We are not able to upgrade to Spring 3.1 with configuration profiles, as mentioned here
Please let me know if there are any open questions. I also asked this question on the Spring Community forums. Thanks a lot for your help.
We only have one task that send a daily summary email. To avoid extra dependencies, we simply check whether the hostname of each node corresponds with a configured system property.
private boolean isTriggerNode() {
String triggerHostmame = System.getProperty("trigger.hostname");;
String hostName = InetAddress.getLocalHost().getHostName();
return hostName.equals(triggerHostmame);
}
public void execute() {
if (isTriggerNode()) {
//send email
}
}
We are implementing our own synchronization logic using a shared lock table inside the application database. This allows all cluster nodes to check if a job is already running before actually starting it itself.
Be careful, since in the solution of implementing your own synchronization logic using a shared lock table, you always have the concurrency issue where the two cluster nodes are reading/writing from the table at the same time.
Best is to perform the following steps in one db transaction:
- read the value in the shared lock table
- if no other node is having the lock, take the lock
- update the table indicating you take the lock
I solved this problem by making one of the box as master.
basically set an environment variable on one of the box like master=true.
and read it in your java code through system.getenv("master").
if its present and its true then run your code.
basic snippet
#schedule()
void process(){
boolean master=Boolean.parseBoolean(system.getenv("master"));
if(master)
{
//your logic
}
}
you can try using TimerManager (Job Scheduler in a clustered environment) from WebLogic as TaskScheduler implementation (TimerManagerTaskScheduler). It should work in a clustered environment.
Andrea
I've recently implemented a simple annotation library, dlock, to execute a scheduled task only once over multiple nodes. You can simply do something like below.
#Scheduled(cron = "59 59 8 * * *" /* Every day at 8:59:59am */)
#TryLock(name = "emailLock", owner = NODE_NAME, lockFor = TEN_MINUTE)
public void sendEmails() {
List<Email> emails = emailDAO.getEmails();
emails.forEach(email -> sendEmail(email));
}
See my blog post about using it.
You don't neeed to synchronize your job start using a DB.
On a weblogic application you can get the instanze name where the application is running:
String serverName = System.getProperty("weblogic.Name");
Simply put a condition two execute the job:
if (serverName.equals(".....")) {
execute my job;
}
If you want to bounce your job from one machine to the other, you can get the current day in the year, and if it is odd you execute on a machine, if it is even you execute the job on the other one.
This way you load a different machine every day.
We can make other machines on cluster not run the batch job by using the following cron string. It will not run till 2099.
0 0 0 1 1 ? 2099

How to implement "Distributed cache clearing" in Ofbiz?

We have multiple instances of Ofbiz/Opentaps running. All the instances talk to the same database. There are many tables that are rarely updated hence they are cached and all the instances maintain their individual copies of cache as a standard Ofbiz cache mechanism. But in rare situations when we update some entity using one of many instances then all other instances keep showing dirty cache data. So it requires a manual action to go and clear all the cache copies on other instances as well.
I want this cache clearing operation on all the instances to happen automatically. On Ofbiz confluence page here there is a very brief mention of "Distributed cache clearing". It relies on JMS it seems so whenever an instance's cache is cleared it sends notification over JMS to a topic and other instances subscribing to the same JMS topic clear their corresponding copies of cache upon this notification. But I could not find any other reference or documentation on how to do that? What are the files that need to be updated to set it all up in Ofbiz? An example page/link is what I'm looking for.
Alright I believe I've figured it all out. I have used ActiveMQ as my JMS broker to set it up so here are the steps in Ofbiz to make it working:
1. Copy activemq-all.jar to framework/base/lib folder inside your Ofbiz base directory.
2. Edit File base/config/jndiservers.xml: Add following definition inside <jndi-config> tag:
<jndi-server name="activemq"
context-provider-url="failover:(tcp://jms.host1:61616,tcp://jms.host2:61616)?jms.useAsyncSend=true&timeout=5000"
initial-context-factory="org.apache.activemq.jndi.ActiveMQInitialContextFactory"
url-pkg-prefixes=""
security-principal=""
security-credentials=""/>
3. Edit File base/config/jndi.properties: Add this line at the end:
topic.ofbiz-cache=ofbiz-cache
4. Edit File service/config/serviceengine.xml: Add following definition inside <service-engine> tag:
<jms-service name="serviceMessenger" send-mode="all">
<server jndi-server-name="activemq"
jndi-name="ConnectionFactory"
topic-queue="ofbiz-cache"
type="topic"
listen="true"/>
</jms-service>
5. Edit File entityengine.xml: Change default delegator to enable distributed caching:
<delegator name="default" entity-model-reader="main" entity-group-reader="main" entity-eca-reader="main" distributed-cache-clear-enabled="true">
6. Edit File framework/service/src/org/ofbiz/service/jms/AbstractJmsListener.java: This one is probably a bug in the Ofbiz code
Change following line from:
this.dispatcher = GenericDispatcher.getLocalDispatcher("JMSDispatcher", null, null, this.getClass().getClassLoader(), serviceDispatcher);
To:
this.dispatcher = GenericDispatcher.getLocalDispatcher("entity-default", null, null, this.getClass().getClassLoader(), serviceDispatcher);
7. And finally build the serviceengine code by issuing following command:
ant -f framework/service/build.xml
With this entity data changes in Ofbiz on one instances are immediately propagated to all the other Ofbiz instances clearing cache line item on its own without any need of manual cache clearing.
Cheers.
I have a added a page on this subject in OFBiz wiki https://cwiki.apache.org/OFBIZ/distributed-entity-cache-clear-mechanism.html. Though it's well explained here, the OFBiz wiki page adds other important information.
Note that the bug reported here has been fixed since, but another is currently pending, I should fix it soon https://issues.apache.org/jira/browse/OFBIZ-4296
Jacques
Yes, I fixed this behaviour sometimes ago at http://svn.apache.org/viewvc?rev=1090961&view=rev. But it still needs another fix related to https://issues.apache.org/jira/browse/OFBIZ-4296.
The patch below fixes this issue locally, but still creates 2 listeners on clusters, not sure why... Still investigating (not a priority)...
Index: framework/entity/src/org/ofbiz/entity/DelegatorFactory.java
===================================================================
--- framework/entity/src/org/ofbiz/entity/DelegatorFactory.java (revision 1879)
+++ framework/entity/src/org/ofbiz/entity/DelegatorFactory.java (revision 2615)
## -39,10 +39,10 ##
if (delegator != null) {
+ // setup the distributed CacheClear
+ delegator.initDistributedCacheClear();
+
// setup the Entity ECA Handler
delegator.initEntityEcaHandler();
//Debug.logInfo("got delegator(" + delegatorName + ") from cache", module);
-
- // setup the distributed CacheClear
- delegator.initDistributedCacheClear();
return delegator;
Please notify me using #JacquesLeRoux in your post, if ever you have something new to share.

Resources