Apollo Client v3 Delete cache entries after given time period - caching

I am wondering if there is a way to expire cached items after a certain time period, e.g., 24 hours.
I know that Apollo Client v3 provides methods such as cache.evict and cache.gc which are a good start and I am already using; however, I want a way to delete cache items after a given time period.
What I am doing at the minute is adding a TimeToLive field to every object in my Apollo schema, and when the backend returns an object, the field is populated with the current time + 24 hours (i.e. the time in 24 hours time). Then when I query the data in the front end, I check the to see if the TimeToLive field of the returned data is in the future (if not that means the data was definitely retrieved from the cache and in which case I call the refetch function, which forces the query to fetch the data from the server. However, this doesn't seem like the best way to do things, mainly because I have to iterate over every result in the returned data anch check if any of the returned objects are expired; and if so, everything is refetched.
Another solution I thought of was to use something like React Native Queue and have a background task that periodically checks the cache and deleted items that have expired. But again, I am not totally sold on this solution.
For a little bit of context here: I am building a cooking / recipes app - and recipes / posts are cached on the device; however, my concern is that a user could delete a post, but everyone else who has that post cached would still be able to see it, and hence by expiring the cached item at least they would only be able to see for a number of hours before it is removed. However they might be a better way to do this all together, i.e. have the sever contact clients with the cached item (though I couldn't think of any low lift solutions at the time of writing this)

apollo-invalidation-policies replaces the Apollo-client InMemoryCache with InvalidationPolicyCache and within the typePolicies you can specify a timeToLive field. If an object is accessed beyond their TTL, they are evicted and no data is returned.

Related

Specify cache policy for parts of a graphQL query

In Apollo's GraphQL version, there are fetch policies that specify whether a fetch query should obtain data from server or use the local cache (if any data is available).
In addition, cache normalization allows usage of the cache to cut down on the amount of data that needs to be obtained from the server. For example, if I am requesting object A and object B, but earlier I had requested A and C, then in my current query it will get A from cache, and get B from server.
However, however, these specify cache policies for the entire query. I want to know if there is a method for specifying TTLs on individual fields.
From a developer standpoint, I want to be able to specify in my query that I want to go to cache for some information that I am requesting, but not others. For example, take the below query:
query PersonInfo($id: String) {
person(id: $id) {
birthcertificate // Once this is cached, it is cached forever. I should just always get this info from the cache if it is available.
age // I want to have this have a TTL of a day before invalidating the cached value and going to network
legalName // I want to always go to network for this information.
}
}
In other words, for a fixed id value (and assuming this is the only query that touches the person object or its fields):
the first time I make this query, I get all three fields from the server.
now if I make this query again within a few seconds, I should only get the third field (legalName) from the server, and the first two from the cache.
now, if I then wait more than a day, and then make this query again, I get birthCertificate from the cache, and age + legalName from the server.
Currently, to do this the way I would want to, I end up writing three different queries, one for each TTL. Is there a better way?
Update: there is some progress on cache timing done on the iOS client (https://github.com/apollographql/apollo-ios/issues/142), but nothing specifically on this?
It would be a nice feature but AFAIK [for now, taking js/react client, probably the same for ios]:
there is no query normalization, only cache normalization
if any of requested field not exists in cache then entire query is fetched from network
no time stored in cache [normalized] entries (per query/per type)
For now [only?] solution is to [save in local state/]store timestamps for each/all/some queries/responses (f.e. in onCompleted) and use it to invalidate/evict them before fetching. It could probably be automated f.e. starting timers within some field policy fn.
You can fetch person data at start (session) just after login ... any following and more granular person(id: $id) { birthcertificate } query (like in react subcomponent) can have "own" 'cache-only' policy. If you need always fresh legalName, fetch for it [separately or not] with network-only policy.

Caching the results of a query and making smaller queries against it

I'm working with a database where I'll have to make a query for a certain ID as requests come in. My issue is that the DBAs have stipulated that I should simply take a batch copy of the entirety of the table for a given day and cache that.
This would mean I have to do a periodic select *, keep that in memory and any time a request comes in for an individual userId, point it to the cached version. However if the cache has expired I need to then do the large query again.
This all sounds achievable in theory but I don't know what API I should be using.

Solr capacity to handle delta import frequency

I wanted to arrange a system where a new item gets indexed in Solr as soon as it is created in db system, to avoid a few minutes delay of the time based delta polling. So I tweaked the delta import a little and made it work based on a query parameter. In my c# code, when a new item is saved, I construct a deltaimport url and pass the newsid to be indexed and invoke it by httpwebrequest. It then uses the delta query to fetches the details from the db and index it.
http://localhost:89983/solr/mycore/dataimport?command=deltaimport&clean=false&newsid=1234
This works as expected. But now, the issue comes when the flow of the news gets higher, say 5 news at a time. The url is hit by the code for each item in a loop, but it is so fast that it is observed that one (first) or sometimes 2 items gets indexed only. Rest are missed.
So, I believe that Solr can't handle multiple hits for delta in nearly same time. How can I overcome this situation?

Redis cache strategy

I'm developing a website to display some kind of articles, just like stackoverflow.
Each article contains title, description, and some frequently changed fields (like view_count).
The website also supports cursor paging(max_id, since_id), filtering(by category, tags).
I want to add a cache layer using Redis. Theres are some choices in my mind:
Use zset to store top 1000 acticle id list. Each filter has one zset, like articles:category:{category_id} articles:tag:{tag_id}. Each zset will be updated when a new article published.
Use hash to store each article, like article:{article_id}
Update view_count directly in cache for every view, and sync to db at some point.
Implement cursor paging using ZRANGEBYSCORE (score is publishing timestamp).
Pros: Never need to expire cache. new article immediately shown.
Cons: Difficult to implement and may be error prone. Need some messaging mechanism like rabbitmq.
Use database do the filtering and paging, only return id list, cache id list in redis, and set some TTL (10 seconds). Still use hash to cache each article, so view_count can be updated immediately.
Pros: Easy to implement, No need to have messaging mechanism. Only id list need to be queried when cache expired.
Cons: Need query database for new id list every 10 seconds, new article will be shown after 10 seconds, but view_count will updated immediately.
Use redis to cache acticle list for each query, serialize it to json, and set TTL.
Pros: Easiest to implement (use spring #Cacheable).
Cons: Cache will expire frequently and need query database again. view_count will only be updated when cache expired.
I don't know witch one is a better option, for performance and stability. Thank you for your help.

REST Api for Infinite scrolled query results

I'm building an internal server which contains a database of customer events. The webpage which allows access to the events is going to utilize an infinite scroll/dynamic loading scheme for display of live events as well as for browsing the results of queries to the database. So, you might query the database and maybe get 200k results. The webpage would display the 'first' 50 and allow you to scroll and scroll and scroll to see more and more results (loading perhaps 50 more at time).
I'm supposed to be using a REST api for the database access (a C# server). I'm unsure what the API should be so it remains RESTful. I've come up with 3 options. The question is, are any of them RESTful and which is most RESTful(is there such a thing -- if not I'll pick one of the RESTful).
Option 1.
GET /events?query=asdfasdf&first=1&last=50
This simply does the query and specifies the range of results to return. The server, unable to keep state, would have to requery the database each time (though perhaps utilizing the first/last hints to stop early) the infinite scroll occurs. Seems bad and there isn't any feedback about how many results are forthcoming.
Option 2 :
GET /events/?query=asdfasdf
GET /events/details?id1=asdf&id2=qwer&id3=zxcv&id4=tyui&...&id50=vbnm
This option first does a query which then returns the list of event ids but no further details. The webpage simply has the list of all the ids(at least it knows the count). The webpage holds onto the event id list and as infinite scroll/dynamic load is needed, makes another query for the event details of the specified ids. Each id is would nominally be a guid, so about 36 characters per id (plus &id##= for 41 characters). At 50 queries per hit, the URL would be quite long, 2000+ characters. The URL limit mentioned elsewhere on SO is around 2k. Maybe if I limit it to 40 ids per query this would be fine. It'd be nice to simply have a comma separated list instead of all the query parameters. Can you make a query parameter like ?ids=qwer,asdf,zxcv,wert,sdfg,rtyu,gfhj, ... ,vbnm ?
Option 3 :
POST /events/?query=asdfasdf
GET /events/results/{id}?first=1&last=50
This would post the query to the server and cause it to create a results resource. The ID of the results resource would be returned and would then be used to get blocks of the query results which in turn contain the event details needed for the webpage. The return from the POST XML could contain the number of records and other useful information besides the ID. Either the webpage would have to later delete the resource when the query page closed or the server would have to clean them up once they expire (days or weeks later).
I am concerned at Option 1, while RESTful is horrible for the server. I'm not sure requesting so many simultaneous resources, like the second GET in Option 2 is really RESTful or practical(seems like there has to be a better way). I'm not sure Option 3 is RESTful at all or if it is, its sort of cheating the REST thing by creating state via a POST(or should that be PUT).
Option 3 worked out fine. It required the server to maintain the query results and there was a bit of debate about how many queries (from various users) should simultaneously be saved as there would be no way to know when a user was actually done with a query.

Resources