Pentaho CDA cache scheduler with Dynamic data - caching

I have a SQL query with 2 Parameters fromDate and toDate in my dashboard.
A user can select any date range. But still users prefer the date ranges for Last 30 Days, Current Week or Current Month.
The dashboard has some huge data so my sql queries are getting too slow to load the dashboard.
So I've enabled the cda cache to make the dashboard reload faster. But the data needs to be updated on hourly basis. So a refresh is required for every hour.
When I clear the cache the dashboard is too slow the first time it is loaded. So I tried to schedule the query with the CDA cache manager.
Refer to this URL : How to Reload CDA and Mondrian cache in Pentaho CE 4.8?
Unfortunately, I am unable to schedule the queries with dynamic parameters.
How can I schedule the queries with my dynamic parameters?
Also is there any way to do clear cda cache for specific queries?
Kindly suggest your solution.
Cheers,

Related

Cloudera Hadoop Impala - Extracting last refresh date

Is there a way to get the list of all tables with the last refresh date from a database in the Cloudera Hadoop impala?
I'm trying to write a custom SQL query that can do that so I can use it to build a dashboard (in Tableau) where we can track if a table is refreshed or not. So we can take action accordingly. I tried it using a join but there are so many tables and I believe there is a better way to do it. (Database name Core_research and there are more than 500 tables)
I used to run a script that refreshed column stats on tables every Sunday. We couldn't run all the tables but we did as many as time permitted. You could do the same but actually record when the script ran in database/table. This would give you the functionality you are looking for.
Another other option would be to create a table out of the Impala logs and keep track of things that way. (With some fancy regex to track refreshes)

update ignite cache with time-stamp data

My issue is that how to update cache with new entries from database table?
my cache has my Cassandra table data suppose for till 3 p.m.
till that time user has purchase 3 item so my cache has 3 items entry associated with that user.
But after sometime (say 30min) what if user purchase 2 more item ?
As i have 3 entry in cache it wont query from database, how to get those 2 new entry at time of calculation final bill.
One option i have is to call cache.loadCache(null, null) every 15 min? but some where this is not feasible to call every time?
The better option here is to insert data not directly to Cassandra, but using Ignite. It will give a possibility to have always updated data in the cache without running any additional synchronizations with DB.
But if you will choose to run loadCache each time, you can add a timestamp to your object in DB and implement your own CacheStore, which will have addition method that will load only new data from DB. Here is a link to the documentation, it will help you to implement your own CacheStore.

Update database records based on date column

I'm working on a app where I have some entities in the database that have a column representing the date until that particular entity is available for some actions. When it expires I need to change it's state, meaning updating a column representing it's state.
What I'm doing so far, whenever I ask the database for those entities to do something with them, I first check if they are not expired and if they are, I update them. I don't particularly like this approach, since that means I will have a bunch of records in the database that would be in the wrong state just because I haven't queried them. Another approach would be to have a periodic task that runs over those records and updates them as necessary. That I also don't like since again, I would have records in a inconsistent state and in this case, the first approach seems more reasonable.
Is there another way of doing this, am I missing something? I need to mention, I use spring-boot + hibernate for my application. The underlying db is Postgresql. Is there any technology specific trick I can use to obtain what I want?
in database there it no triger type expired. if you have somethind that expired and you should do somethig with that there is two solutions (you have wrote about then) : do some extra with expired before you use data , and some cron/task (it might be on db level or on server side).
I recomend you use cron approach. Here is explanation :
do something with expired before you get data :
updated before select
+: you update expired data before you need it , and here are questions - update only that you requested or all that expired... update all might be time consumed in case if from all records you need just 2 records and updated 2000 records that are not related you you working dataset.
-: long time to update all record ; if database is shared - access to db not only throth you application , logic related to expired is not executed(if you have this case); you need controll entry point where you should do something with expired and where you shouldn't ; if time expired in min , sec - then even after you execure logic for expired , in next sec new records might be expired too;also if you need update workflow logic for expired data handling you need keep it in one plase - in cron , in case with update before you do select you should update changed logic too.
CRON/TASK
-: you should spend time to configure it just once 30-60 mins max:) ;
+: it's executed in the background ; if your db is used not only by your application , expired data logic also be available; you don't have to check(and don't rememebr about it , and explaine about for new employee....) is there any staled data in your java code before select something; you do split logic between cares about staled data , and normal queries do db .
You can execute 'select for update' in cron and even if you do select during update time from server side query you will wait will staled data logic complets and you get in select up to date data
for spring :
spring scheduling documentation , simple example spring-quartz-schedule
for db level postgresql job scheduler
scheduler/cron it's best practices for such things

How to increase mdx Query speed in pentaho cde and how to clear Mondrian Schema cache

I have a problem with mdx query. Actually I developed one dashboard has 23 mdx queries. if we run these dashboard it take 2 minute to run.How to solve this problem.
Another issue
i modify some data in database.If we run these dashboard modified data isn't shown. It show previous data only.How to solve this problem.
1) 23 queries on first load may be a bit too much. Can't you simplify that? Also, are the queries all as fast as possible but it's just too many of them? Or are there slower queries that need to be improved? Check also the priority of components. You may have components rendering more than once. Example: you have a Country selector and a City selector. Because the city selector was put in befor the country selector, if they have the same priority (default=5), it'll run first, retrieving the full list of cities; Then the country selector runs and picks the first value as parameter value. As your City selector most likely listens to the Country parameter, it'll fire again because the Country was fireChange'd.
2) Cache. You're changing the data but either Mondrian or CDA (or both) are getting data from their cache. Two options here:
- Clear Mondrian cache and clear CDA cache after the data is updated (suitable for large updates that affect most of the database);
- Disable the cache on the query definition and the cube cache on the Mondrian schema.

How to refresh iBatis Cache with database operations

We have an Java EE web application using iBatis for ORM. One of the dropdown (select box) shows a master data which is refreshed on a daily basis (say 4.00 AM) via cron jobs loading flat file into oracle database table.
Since the dropdown/select-box has to list ~1000 records and it was static data for 24 hrs, we used the CacheModel feature in iBatis. The select query was made to use a CacheModel with settings "ReadOnly=true & Serialized=true flushInterval=24 hours", so that a single cache will be shared across all users.
There will no insert/update/delete operations happening from the application to modify this master data
Question:
If the external job loading data to this oracle table fails and if the iBatis cache is flushed for the day before we manually load the data in the table, how can i get the iBatis cache flushed again inbetween of the day when i rerun a failed cron job ?
Please note that, there will not be any Insert/Update/Delete operations from the application
You can flush cache programmatically.
There are 2 methods
void flushDataCache()
Flushes all data caches.
and
void flushDataCache(java.lang.String cacheId)
Flushes the data cache that matches the cache model ID provided.
in SqlMapClient interface.
http://ibatis.apache.org/docs/java/user/com/ibatis/sqlmap/client/SqlMapClient.html

Resources