In Google Datastudio, one can set the Data Freshness on each Datasource.
It can be set to 1 hour, 4 hours or 12 hours but not to an explicit point in time.
Does this mean the cache is invalidated at each of these times, or will it first check whether the cache is invalid with respect to source data and only when new data has arrived will it update the cache?
E.g. -> when I have source data that is updated only every 24 hours, and I set "1 hour data freshness" will it actually invalidate the cache 24 times a day, or just once a day?
I am asking because when the source data is refreshed, it should be reflected in the dashboards as soon as possible, on the other side I don't want to incur extra costs by setting it to "1 hour" if that would recompute the cache 23 times out of 24 unnecessarily..
The Data Freshness options represent the refresh frequency (checking and updating the report with new data from the Data Set), thus:
15 Minutes = Refreshing 96 times in a day (24 Hours)
1 Hour = Refreshing 24 times in a day (24 Hours)
4 Hours = Refreshing 6 times in a day (24 Hours)
12 Hours = Refreshing 2 times in a day (24 Hours)
Quoting from the Refresh the cache section of the Data Freshness article:
When the cache refreshes, all the old cached data is discarded. New queries generated by the report go directly to the underlying platform and the responses are added to the cache.
Related
I am currently working with large amounts of data that I'm storing in DynamoDB. Once data enters the database it never changes, but new data is flowing into the database consistently. My question is how can I perform a data cache (utilizing DAX if possible) to limit the amount of data that I have to directly query the database for.
For example, if I want the data from 10:00 AM to 11:00 AM then I can query with the parameters of:
start_time = 10:00 AM,
end_time = 11:00 AM
The response from this query will be cached in DAX for later use. My problem is that when I go to get data between 10:00 AM and 1:00 PM I have to query for data that is already in my cache (this is because the caching is based on parameters and I have new parameters).
My first thought was to cache the data in small sections and just make many queries. For example:
Request for 10 - 10:15 AM data and cache, then request for 10:15 - 10:30 AM data then cache, and so on. By doing this I could make many smaller queries but won't have overlapping data in my cache. Is this the best approach or should I cache the overlapping data. Any help is appreciated.
If i understood correctly:
start_time = 10:00 AM, end_time = 11:00 AM ( Cache has no data, hits DynamoDB )
start_time = 10:00 AM, end_time = 11:00 AM ( Cache has this data, doesn't hit DynamoDB )
start_time = 10:00 AM, end_time = 10:30 AM ( Difference in cache keys, hits DynamoDB )
Basically you could be having a full set of data in Cache, but unless you are using the same cache keys (that helps result in a cache hit), the Cache could never return smartly you a "subset" of the full data from Cache
DynamoDB DAX Item Cache
DyanmoDB DAX brings along Item Cache, where individual Items are stored and returned from DAX. However Item Cache is only limited to only GetItem and BatchGetItem
https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/DAX.concepts.html#DAX.concepts.item-cache
Fragmenting DDB Query
If DynamoDB DAX is not possible, or Query and Scan operations are needed. Then the next better least invasive technique is to fragment / partition the DDB query into "smaller" queries so that they will result in more Cache hits
e.g.
start_time = 10:00 AM, end_time = 10:15 AM
start_time = 10:15 AM, end_time = 10:30 AM
start_time = 10:30 AM, end_time = 10:45 AM
There are few good third party application libraries you can use to partition your Query Keys, and you can choose the granularity from 15 minute blocks to 1 minute blocks or even seconds block, suited to your performance needs
But this technique will not be without Cons, clearly the additional number of hops / queries it must now make needs to be taken into consideration
Application ORM
Solving problems like these are what application ORMs are really good at, for example Hibernate in the case of Java development (But the last i checked, Hibernate doesn't have support for DynamoDB quite yet, although it is possible to extend and build custom strategies)
You could check if your application ORM has support for DynamoDB
https://www.baeldung.com/hibernate-second-level-cache
I have an Elasticsearch server with logs data, right now I have 3 year data (50 GB). I have checked that, data older than 1 year is rarely required.
If I change all the queries to fetch the data only for last 1 year, how it impact the performance? Or should I store data older than 1 year on another server?
I Did some digging but could not find the exact answer.
https://www.elastic.co/guide/en/elasticsearch/guide/current/hardware.html
Setup:
Entity Framework 4 with lazy loading enabled (model-first, table-per-hierarchy).
Number of table is about 40 (and no table has more than 15-20 fields).
SQL Server Express 2008 (not r2).
No database triggers or any other stuff like this exist - it is only used for storage. All the logic is in the code.
Database size at the moment is approx 2gb.
(Primary keys are Guids and are generated in code via Guid.NewGuid() - if this matters)
Saving a complex operation result (which produces a complex object graph) takes anywhere from 40 to 60 seconds (the number returned by SaveChanges is approx. 8000 - mostly added objects and a some modified).
Saving the same operation result with an empty (or an almost empty) database usually takes around 1 seconds on the same computer.
The only variable that seems to affect this issue is the database size. But please note that I am only measuring the Context.SaveChages() call (so even if I have some weird sluggish queries somewhere that should not affect this issue).
Any suggestions as to why this operation may last this long are appreciated.
UPDATE 1
Just to clarify - the code that takes 40-60 seconds to execute is (it takes this long only when the DB size is around 2gb):
Stopwatch sw = Stopwatch.StartNew();
int count = objectContext.SaveChanges(); // this method is not overridden
Debug.Write(sw.ElapsedMilliseconds); // prints out 40000 - 60000 ms
Debug.Write(count); // I am testing with exactly the same operation and the
// result always gives the same count for it (8460)
The same operation with an empty DB takes around 1000 ms (while still giving the same count - 8460). Thus the question would be - how could database size affect SaveChanges()?
Update 2
Running a perf profiler shows that the main bottleneck (from "code perspective") is the following method:
Method: static SNINativeMethodWrapper.SNIReadSync
Called: 3251 times
Avg: 10.56 ms
Max: 264.25 ms
Min: 0.01 ms
Total: 34338.51 ms
Update 3
There are non-clustered indexes for all PKs and FKs in the database. We are using random Guids as surrogate keys (not sequential) thus fragmentation is always at very high levels. I tried testing executing the operation in question right after rebuilding all DB indexes (fragmentation was less than 2-3% for all indexes) but it did not seems to improve the situation in any way.
In addition I must say that during the operation in question one table involved in the process has approximately 4 million rows (this table gets lots of inserts). SQL Profiler shows that inserts to that table can last anywhere from 1 to 200 ms (this is a "spike"). Yet again, it does not seem that this changes in case indexes are freshly rebuilt.
In any case - it seems (at the moment) that the problem is on the SQL Server side of the application since the main thing taking up time is that SNIReadSync method. Correct me if I am being completely ignorant.
It hard to guess without profiler, but 8000 of records is definitely too many. Usually EF 4 works ok with up to couple of hundreds objects. I would not be surprised if it turns that change tracking takes most of this time. EF 5 and 6 have some performance optimizations, so if you cannot decrease number of tracked objects somehow, you could experiment with them.
I am still new into Codeigniter framework. Today I read about Database Caching http://codeigniter.com/user_guide/database/caching.html and Web Page Caching http://codeigniter.com/user_guide/general/caching.html.
I am a bit confused if database caching makes any big sense once page view is already in cache. So if the page is in cache, it won't go to database anyway.
The only point I see in the following scenario:
If I load 30 results from db, then use php to shuffle results and pull from array 10 results. Next time when page cache is deleted, I will still have 30 results from db in cache, but this time there will be different results after shuffle those 30 results.
Am I missing something, is there any other scenario when having database cache would bring any benefit when using also page caching?
Database caching can benefit you also when using page caching. If your page is generated by several database queries, where some data is constant while the other changes frequently.
In this case you will want to set the page caching to a short time period and retrieve the new data from the database each time while using the same constant data without querying the database.
Example: lets say your frequent data needs to be refreshed every 5 minutes while the constant data changes every 24 hours. In this case you will set the page caching to 5 minutes. Over a period of 24 hours you have queried the database 288 times for the frequent data but have queried for the constant data only once. It totals to 289 queries instead of 576 if you haven't used database caching.
"Cache a temporary copy of the report. Expire copy of report after a number of minutes: 5"
Does this expire after 5 minutes from the first request that generated the cached report, or after 5 minutes with no request?
For example, if a report is set to expire after 5 minutes, and I make a request every minute, do I ever get the latest data?
Thanks!
Cache expiration is triggered only based on the the selected cache duration. The cache will expire after a set amount if time from when the data was first retrieved. It doesn't matter how many times you refresh the report. All that matters is the cache duration expiring.