How to increase mdx Query speed in pentaho cde and how to clear Mondrian Schema cache - pentaho-cde

I have a problem with mdx query. Actually I developed one dashboard has 23 mdx queries. if we run these dashboard it take 2 minute to run.How to solve this problem.
Another issue
i modify some data in database.If we run these dashboard modified data isn't shown. It show previous data only.How to solve this problem.

1) 23 queries on first load may be a bit too much. Can't you simplify that? Also, are the queries all as fast as possible but it's just too many of them? Or are there slower queries that need to be improved? Check also the priority of components. You may have components rendering more than once. Example: you have a Country selector and a City selector. Because the city selector was put in befor the country selector, if they have the same priority (default=5), it'll run first, retrieving the full list of cities; Then the country selector runs and picks the first value as parameter value. As your City selector most likely listens to the Country parameter, it'll fire again because the Country was fireChange'd.
2) Cache. You're changing the data but either Mondrian or CDA (or both) are getting data from their cache. Two options here:
- Clear Mondrian cache and clear CDA cache after the data is updated (suitable for large updates that affect most of the database);
- Disable the cache on the query definition and the cube cache on the Mondrian schema.

Related

Filter a Data Source from a Different Data Source

I have two chart tables both with different data sources. I want one table to act as the filter to the other table.
Here is the problem...
I tried a custom query for my data source which used the email parameter to filter the data source.
The problem is every time a user changes a filter on any page a query is executed in BigQuery, slowing the results and exponentially increasing my BigQuery monthly charges.
I tried blending the two tables.
The problem is the blended data feature only allows for 10 dimensions to be added to the resulting blended data source and is very slow.
I tried creating a control filter using a custom field on the "location" column on each table sharing the same "Field Id".
The problem is that the results table returns all the stores until you click on a location in the control list. And I cannot let a user see other locations.
Here is a link to a data studio sample report you can clearly see what I am trying to do.
https://datastudio.google.com/reporting/dd33be45-ab13-4881-8a3b-cabafa8c0dbb
Thanks
One solution which i can recommend to over come your first challenge, i.e. High cost. You can customize cost by using GCP-Memorystore, depending on frequency of data that is getting updated.
Moreover, Bigquery also cashes data for a query if you are not using Wild cards on tables and Time partitioned tables. So try to customize your solution over analysis cost if it is feasible over your solution. Bigquery Partition and Clusting may also help you in reducing BQ analysis cost.

Searching/selecting query in cache

I have been using cache for a long time. We store data against some key and fetch it from cache whenever required. I know that StackOverflow and many other sites heavily rely on cache. My question is do they always use key-value mechanism for caching or do they form some sql like query within a cache? For instance, I want to view last week report. This report's content will vary each day. Do i need to store different reports against each day (where day as a key) or can I get this result from forming some query that aggregate result across different key? Does any caching product (like redis) provide this functionality?
Thanks In Advance
Cache is always done as a key-value hash table. This is how it stays so fast. If you're doing querying then you're not doing cache.
What you may be trying to ask is... you could have in your database a table that contains agregated report data. And you could query against that pre-calculated table.
One of the reasons for cache (e.g. memcached ) being fast is its simplicity of data access and querying protocol.
The more functionality you add, more tradeoff you will have to do on the efficiency part. A full fledged SQL engine in a "caching" database is not a good design. Though you can utilize a data structures oriented database like Redis to design your cache data to suit your querying needs. For example: one set or one hash for each date.
A step further, you can use databases like MongoDb , or memsql which are pretty fast and have rich querying support.So an aggregation report once a while won't be an issue.
However, as a design decision, you will have to accept that their caching throughput will not be as much as memcached or redis.

What is the most efficient way to filter a search?

I am working with node.js and mongodb.
I am going to have a database setup and use socket.io to have real-time updates that will have the db queried again as well or push the new update to the client.
I am trying to figure out what is the best way to filter the database?
Some more information in regards to what is being queried and what the real time updates are:
A document in the database will include information such as an address, city, time, number of packages, name, price.
Filters include city/price/name/time (meaning only to see addresses within the same city, or within the same time period)
Real-time info: includes adding a new document to the database which will essentially update the admin on the website with a notification of a new address added.
Method 1: Query the db with the filters being searched?
Method 2: Query the db for all searches and then filter it on the client side (Javascript)?
Method 3: Query the db for all searches then store it in localStorage then query localStorage for what the filters are?
Trying to figure out what is the fastest way for the user to filter it?
Also, if it is different than what is the most cost effective way, then the most cost effective as well (which I am assuming is less db queries)...
It's hard to say because we don't see exact conditions of the filter, but in general:
Mongo can use only 1 index in a query condition. Thus whatever fields are covered by this index can be used in an efficient filtering. Otherwise it might do full table scan which is slow. If you are using an index then you are probably doing the most efficient query. (Mongo can still use another index for sorting though).
Sometimes you will be forced to do processing on client side because Mongo can't do what you want or it takes too many queries.
The least efficient option is to store results somewhere just because IO is slow. This would only benefit you if you use them as cache and do not recalculate.
Also consider overhead and latency of networking. If you have to send lots of data back to the client it will be slower. In general Mongo will do better job filtering stuff than you would do on the client.
According to you if you can filter by addresses within time period then you could have an index that cuts down lots of documents. You most likely need a compound index - multiple fields.

optimizing large selects in hibernate/jpa with 2nd level cache

I have a user object represented in JPA which has specific sub-types. Eg, think of User and then a subclass Admin, and another subclass Power User.
Let's say I have 100k users. I have successfully implemented the second level cache using Ehcache in order to increase performance and have validated that it's working.
http://docs.jboss.org/hibernate/core/3.3/reference/en/html/performance.html#performance-cache
I know it does work (ie, you load the object from the cache rather than invoke an sql query) when you call the load method. I've verified this via logging at the hibernate level and also verifying that it's quicker.
However, I actually want to select a subset of all the users...for example, let's say I want to do a count of how many Power Users there are.
Furthermore, my users have an associated ZipCode object...the ZipCode objects are also second level cached...what I'd like to do is actually be able to ask queries like...how many Power Users do i have in New York state...
However, my question is...how do i write a query to do this that will hit the second level cache and not the database. Note that my second level cache is configured to be read/write...so as new users are added to the system they should automatically be added to the cache...also...note that I have investigated the Query cache briefly but I'm not sure it's applicable as this is for queries that are run multiple times...my problem is more a case of...the data should be in the second level cache anyway so what do I have to do so that the database doesn't get hit when I write my query.
cheers,
Brian
(...) the data should be in the second level cache anyway so what do I have to do so that the database doesn't get hit when I write my query.
If the entities returned by your query are cached, have a look at Query#iterate(). This will trigger a first query to retrieve a list of IDs and then subsequent queries for each ID... that would hit the L2 cache.

Should I store reference data in my application memory, or in the database?

I am faced with the choice where to store some reference data (essentially drop down values) for my application. This data will not change (or if it does, I am fine with needing to restart the application), and will be frequently accessed as part of an AJAX autocomplete widget (so there may be several queries against this data by one user filling out one field).
Suppose each record looks something like this:
category
effective_date
expiration_date
field_A
field_B
field_C
field_D
The autocomplete query will need to check the input string against 4 fields in each record and discrete parameters against the category and effective/expiration dates, so if this were a SQL query, it would have a where clause that looks something like:
... WHERE category = ?
AND effective_date < ?
AND expiration_date > ?
AND (colA LIKE ? OR colB LIKE ? OR colC LIKE ?)
I feel like this might be a rather inefficient query, but I suppose I don't know enough about how databases optimize their indexes, etc. I do know that a lot of really smart people work really hard to make database engines really fast at this exact type of thing.
The alternative I see is to store it in my application memory. I could have a list of these records for each category, and then iterate over each record in the category to see if the filter criteria is met. This is definitely O(n), since I need to examine every record in the category.
Has anyone faced a similar choice? Do you have any insight to offer?
EDIT: Thanks for the insight, folks. Sending the entire data set down to the client is not really an option, since the data set is so large (several MB).
Definitely cache it in memory if it's not changing during the lifetime of the application. You're right, you don't want to be going back to the database for each call, because it's completely unnecessary.
There's can be debate about exactly how much to cache on the server (I tend to cache as little as possible until I really need to), but for information that will not change and will be accessed repeatedly, you should almost always cache that in the Application object.
Given the number of directions you're coming at this data (filtering on 6 or more columns), I'm not sure how much more you'll be able to optimize the information in memory. The first thing I would try is to store it in a list in the Application object, and query it using LINQ-to-objects. Or, if there is one field that is used significantly more than the others, or try using a Dictionary instead of a list. If the performance continues to be a problem, try using storing it in a DataSet and setting indexes on it (but of course you loose some code-simplicity and maintainability this way).
I do not think there is a one size fits all answer to your question. Depending on the data size and usage patterns the answer will vary. More than that the answer may change over time.
This is why in my development I built some intermediate layer which allows me to change how the caching is done by changing configuration (with no code changes). Every while we analyze various stats (cache hit ratio, etc.) and decide if we want to change cache behavior.
BTW there is also a third layer - you can push your static data to the browser and cache it there too
Can you just hard-wire it into the program (as long as you stick to DRY)? Changing it only requires a rebuild.

Resources