I'm working with a database where I'll have to make a query for a certain ID as requests come in. My issue is that the DBAs have stipulated that I should simply take a batch copy of the entirety of the table for a given day and cache that.
This would mean I have to do a periodic select *, keep that in memory and any time a request comes in for an individual userId, point it to the cached version. However if the cache has expired I need to then do the large query again.
This all sounds achievable in theory but I don't know what API I should be using.
Related
I am wondering if there is a way to expire cached items after a certain time period, e.g., 24 hours.
I know that Apollo Client v3 provides methods such as cache.evict and cache.gc which are a good start and I am already using; however, I want a way to delete cache items after a given time period.
What I am doing at the minute is adding a TimeToLive field to every object in my Apollo schema, and when the backend returns an object, the field is populated with the current time + 24 hours (i.e. the time in 24 hours time). Then when I query the data in the front end, I check the to see if the TimeToLive field of the returned data is in the future (if not that means the data was definitely retrieved from the cache and in which case I call the refetch function, which forces the query to fetch the data from the server. However, this doesn't seem like the best way to do things, mainly because I have to iterate over every result in the returned data anch check if any of the returned objects are expired; and if so, everything is refetched.
Another solution I thought of was to use something like React Native Queue and have a background task that periodically checks the cache and deleted items that have expired. But again, I am not totally sold on this solution.
For a little bit of context here: I am building a cooking / recipes app - and recipes / posts are cached on the device; however, my concern is that a user could delete a post, but everyone else who has that post cached would still be able to see it, and hence by expiring the cached item at least they would only be able to see for a number of hours before it is removed. However they might be a better way to do this all together, i.e. have the sever contact clients with the cached item (though I couldn't think of any low lift solutions at the time of writing this)
apollo-invalidation-policies replaces the Apollo-client InMemoryCache with InvalidationPolicyCache and within the typePolicies you can specify a timeToLive field. If an object is accessed beyond their TTL, they are evicted and no data is returned.
I'm thinking about tracking page views for dynamic pages on my website for pages like the url below:
example.com/things/12456
I'm currently using Ruby on Rails with postgresql, on Heroku.
If I store EVERY get request into a table, every time a user views it, the database could grow extremely large, very quickly. Ideally, I'd like to track the time stamp, user id and user role of each request as well, so each view would have to be a row in the table, as opposed to having a "count" column for each resource.
I'd also like to make aggregate queries on this large table, for things like, total count per resource over a time period.
In terms of performance and cost, would this make sense to do? Are there better alternatives out there?
EDIT: Let's say I have a 1000 views a day, with each user viewing 10 pages each. And I'm making 500 aggregate requests/day.
Would this be expensive or non-scalable?
(I'd also need to store POST, PUT and DELETE requests as well, into an actions table, which fits into this very same problem)
I am working with node.js and mongodb.
I am going to have a database setup and use socket.io to have real-time updates that will have the db queried again as well or push the new update to the client.
I am trying to figure out what is the best way to filter the database?
Some more information in regards to what is being queried and what the real time updates are:
A document in the database will include information such as an address, city, time, number of packages, name, price.
Filters include city/price/name/time (meaning only to see addresses within the same city, or within the same time period)
Real-time info: includes adding a new document to the database which will essentially update the admin on the website with a notification of a new address added.
Method 1: Query the db with the filters being searched?
Method 2: Query the db for all searches and then filter it on the client side (Javascript)?
Method 3: Query the db for all searches then store it in localStorage then query localStorage for what the filters are?
Trying to figure out what is the fastest way for the user to filter it?
Also, if it is different than what is the most cost effective way, then the most cost effective as well (which I am assuming is less db queries)...
It's hard to say because we don't see exact conditions of the filter, but in general:
Mongo can use only 1 index in a query condition. Thus whatever fields are covered by this index can be used in an efficient filtering. Otherwise it might do full table scan which is slow. If you are using an index then you are probably doing the most efficient query. (Mongo can still use another index for sorting though).
Sometimes you will be forced to do processing on client side because Mongo can't do what you want or it takes too many queries.
The least efficient option is to store results somewhere just because IO is slow. This would only benefit you if you use them as cache and do not recalculate.
Also consider overhead and latency of networking. If you have to send lots of data back to the client it will be slower. In general Mongo will do better job filtering stuff than you would do on the client.
According to you if you can filter by addresses within time period then you could have an index that cuts down lots of documents. You most likely need a compound index - multiple fields.
Let me start by describing the scenario. I have an MVC 3 application with SQL Server 2008. In one of the pages we display a list of Products that is returned from the database and is UNIQUE per logged in user.
The SQL query (actually a VIEW) used to return the list of products is VERY expensive.
It is based on very complex business requirements which cannot be changed at this stage.
The database schema cannot be changed or redesigned as it is used by other applications.
There are 50k products and 5k users (each user may have access to 1 up to 50k products).
In order to display the Products page for the logged in user we use:
SELECT TOP X * FROM [VIEW] WHERE UserID = #UserId -- where 'X' is the size of the page
The query above returns a maximum of 50 rows (maximum page size). The WHERE clause restricts the number of rows to a maximum of 50k (products that the user has access to).
The page is taking about 5 to 7 seconds to load and that is exactly the time the SQL query above takes to run in SQL.
Problem:
The user goes to the Products page and very likely uses paging, re-sorts the results, goes to the details page, etc and then goes back to the list. And every time it takes 5-7s to display the results.
That is unacceptable, but at the same time the business team has accepted that the first time the Products page is loaded it can take 5-7s. Therefore, we thought about CACHING.
We now have two options to choose from, the most "obvious" one, at least to me, is using .Net Caching (in memory / in proc). (Please note that Distributed Cache is not allowed at the moment for technical constraints with our provider / hosting partner).
But I'm not very comfortable with this. We could end up with lots of products in memory (when there are 50 or 100 users logged in simultaneously) which could cause other issues on the server, like .Net constantly removing cache items to free up space while our code inserts new items.
The SECOND option:
The main problem here is that it is very EXPENSIVE to generate the User x Product x Access view, so we thought we could create a flat table (or in other words a CACHE of all products x users in the database). This table would be exactly the result of the view.
However the results can change at any time if new products are added, user permissions are changed, etc. So we would need to constantly refresh the table (which could take a few seconds) and this started to get a little bit complex.
Similarly, we though we could implement some sort of Cache Provider and, upon request from a user, we would run the original SQL query and select the products from the view (5-7s, acceptable only once) and save that result in a flat table called ProductUserAccessCache in SQL. Next request, we would get the values from this cached-table (as we could easily identify the results were cached for that particular user) with a fast query without calculations in SQL.
Any time a product was added or a permission changed, we would truncate the cached-table and upon a new request the table would be repopulated for the requested user.
It doesn't seem too complex to me, but what we are doing here basically is creating a NEW cache "provider".
Does any one have any experience with this kind of issue?
Would it be better to use .Net Caching (in proc)?
Any suggestions?
We were facing a similar issue some time ago, and we were thinking of using EF caching in order to avoid the delay on retrieving the information. Our problem was a 1 - 2 secs. delay. Here is some info that might help on how to cache a table extending EF. One of the drawbacks of caching is how fresh you need the information to be, so you set your cache expiration accordingly. Depending on that expiration, users might need to wait to get the fresh info more than they would like to, but if your users can accept that they migth be seing outdated info in order to avoid the delay, then the tradeoff would worth it.
In our scenario, we decided to better have the fresh info than quick, but as I said before, our waiting period wasn't that long.
Hope it helps
We have a fantasy football application that uses memcached and the classic memcached-object-read-with-sql-server-fallback. This works fairly well, but recently I've been contemplating the overhead involved and whether or not this is the best approach.
Case in point - we need to generate a drop down list of the users teams, so we follow this pattern:
Get a list of the users teams from memcached
If not available get the list from SQL server and store in memcached.
Do a multiget to get the team objects.
Fallback to loading objects from sql store these.
This is all very well - each cached piece of data is relatively easily cached and invalidated, but there are two major downsides to this:
1) Because we are operating on objects we are incurring a rather large overhead - a single team occupies some hundred bytes in memcached and what we really just need for this case is a list of team names and ids - not all the other stuff in the team objects.
2) Due to the fallback to loading individual objects, the number of SQL queries generated on an empty cache or when the items expire can be massive:
1 x Memcached multiget (which misses, which and causes)
1 x SELECT ... FROM Team WHERE Id IN (...)
20 x Store in memcached
So that's 21 network request just for this one query, and also the IN query is slower than a specific join.
Obviously we could just do a simple
SELECT Id, Name FROM Teams WHERE UserId = XYZ
And cache that result, but this this would mean that this data would need to be specifically invalidated whenever the user creates a new team. In this case it might seem relatively simple , but we have many of these type of queries, and many of them operate on axes that are not easily invalidated (like a list of id and names of the teams that your friends have created in a specific game).
Sooo.. My question is - do any of you have ideas for resolving the mentioned drawbacks, or should I just accept that there is an overhead and that cache misses are bad, live with it?
First, cache what you need, maybe that two fields, not a complete record.
Second, cache what you need again, break the result set into records and cache them seperately
about caching:
You generally use caching to offload the slower disc-based storage, in this case mysql. The memory cache scales up rather easily, mysql scales less easy.
Given that, even if you double the cpu/netowork/memory usage of the cache and putting it all together again, it will still offload the db. Adding another nodejs instance or another memcached server is easy.
back to your question
You say its a user's team, you could go and fetch it when the user logs-in, and keep it updated in cache while the user changes it throughout his session.
I presume the team member's names do not change, if so you can load all team members by id,name and store those in cache or even local on nodejs, use the same fallback strategy as you do now. Only step 1 and 2 and 4 will be left then.
personally i usually try to split the sql results into smaller ready-made pieces and cache those, and keep the cache updated as long as possible, untimately trying to use mysql only as storage and never read from it
usually you will run some logic on the returned rows form mysql anyways, theres no need to keep repeating that.