Understanding AWS cache key for specific scenario - caching

Imagine the following scenario:
A customer is using AWS CloudFront and is serving a site that expects and ordinarily receives no query strings. Essentially the entire site is static and the CDN is purely enabling performance by caching and distributing the content.
On occasion, we as a solutions provider for the customer, wish to refresh any page (JavaScript) on the customer site and within that refresh request, append a query string with parameter values that will be unique to each visitor. This event will only happen once per visitor in a set period - say for argument's sake, 365 days.
With the existing cache configuration, the cache can be busted and performance affected when x new visitors (x being sizeable) come to the site and trigger the refresh event.
I wish to understand how we can configure the cache key to send any page request with our query string included, to the origin but not store it in the cache once served. The cache should only store the page when the static URL is requested not the dynamic URL (dynamic because our query string has been included) when it is requested.
Any assistance would be appreciated.

You can specify which query strings to be included in the cache key or not, have a look at this document https://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/controlling-the-cache-key.html#cache-policy-query-strings. With include specified query strings cache behaviour cloudfront will send the request to origin with query strings and than you probably want your origin to send cache header as no-cache so cloudfront will not cache.

Related

Specify cache policy for parts of a graphQL query

In Apollo's GraphQL version, there are fetch policies that specify whether a fetch query should obtain data from server or use the local cache (if any data is available).
In addition, cache normalization allows usage of the cache to cut down on the amount of data that needs to be obtained from the server. For example, if I am requesting object A and object B, but earlier I had requested A and C, then in my current query it will get A from cache, and get B from server.
However, however, these specify cache policies for the entire query. I want to know if there is a method for specifying TTLs on individual fields.
From a developer standpoint, I want to be able to specify in my query that I want to go to cache for some information that I am requesting, but not others. For example, take the below query:
query PersonInfo($id: String) {
person(id: $id) {
birthcertificate // Once this is cached, it is cached forever. I should just always get this info from the cache if it is available.
age // I want to have this have a TTL of a day before invalidating the cached value and going to network
legalName // I want to always go to network for this information.
}
}
In other words, for a fixed id value (and assuming this is the only query that touches the person object or its fields):
the first time I make this query, I get all three fields from the server.
now if I make this query again within a few seconds, I should only get the third field (legalName) from the server, and the first two from the cache.
now, if I then wait more than a day, and then make this query again, I get birthCertificate from the cache, and age + legalName from the server.
Currently, to do this the way I would want to, I end up writing three different queries, one for each TTL. Is there a better way?
Update: there is some progress on cache timing done on the iOS client (https://github.com/apollographql/apollo-ios/issues/142), but nothing specifically on this?
It would be a nice feature but AFAIK [for now, taking js/react client, probably the same for ios]:
there is no query normalization, only cache normalization
if any of requested field not exists in cache then entire query is fetched from network
no time stored in cache [normalized] entries (per query/per type)
For now [only?] solution is to [save in local state/]store timestamps for each/all/some queries/responses (f.e. in onCompleted) and use it to invalidate/evict them before fetching. It could probably be automated f.e. starting timers within some field policy fn.
You can fetch person data at start (session) just after login ... any following and more granular person(id: $id) { birthcertificate } query (like in react subcomponent) can have "own" 'cache-only' policy. If you need always fresh legalName, fetch for it [separately or not] with network-only policy.

Looking for help understanding Apollo Client local state and cache

I'm working with Apollo Client local state and cache and although I've gone through the docs (https://www.apollographql.com/docs/react/essentials/local-state), a couple of tutorials (for example, https://www.robinwieruch.de/react-apollo-link-state-tutorial/) and looked at some examples, I'm a bit befuddled. In addition to any insight you might be able provide with the specific questions below, any links to good additional docs/resources to put things in context would be much appreciated.
In particular, I understand how to store local client side data and retrieve it, but I'm not seeing how things integrate with data retrieved from and sent back to the server.
Taking the simple 'todo app' as a starting point, I have a couple of questions.
1) If you download a set of data (in this case 'todos') from the server using a query, what is the relationship between the cached data and the server-side data? That is, I grab the data with a query, it's stored in the cache automatically. Now if I want to grab that data locally, and, say, modify it (in this case, add a todo or modify it), how I do that? I know how to do it for data I've created, but not data that I've downloaded, such as, in this case, my set of todos. For instance, some tutorials reference the __typename -- in the case of data downloaded from the server, what would this __typename be? And if I used readQuery to grab the data downloaded from the server and stored in the cache, what query would I use? The same I used to download the data originally?
2) Once I've modified this local data (for instance, in the case of todos, setting one todo as 'completed'), and written it back to the cache with writeData, how does it get sent back to the server, so that the local copy and the remote copy are in sync? With a mutation? So I'm responsible for storing a copy to the local cache and sending it to the server in two separate operations?
3) As I understand it, unless you specify otherwise, if you make a query from Apollo Client, it will first check to see if the data you requested is in the cache, otherwise it will call the server. Why, then, do you need to make an #client in the example code to grat the todos? Because these were not downloaded from the server with a prior query, but are instead only local data?
const GET_TODOS = gql`
{
todos #client {
id
completed
text
}
visibilityFilter #client
}
`;
If they were in fact downloaded with an earlier query, can't you just use the same query that you used originally to get the data from the server, not putting #client, and if the data is in the cache, you'll get the cached data?
4) Lastly, I've read that Apollo Client will update things 'automagically -- that is, if you send modified data to the server (say, in our case, a modified todo) Apollo Client will make sure that that piece of data is modified in the cache, referencing it by ID. Are there any rules as to when it does and when it doesn't? If Apollo Client is keeping things in sync with the server using IDs, when do we need to handle it 'manually', as above, and when not?
Thanks for any insights, and if you have links to other docs than those above, or a good tutorial, I'd be grateful
The __typename is Apollo's built-in auto-magic way to track and cache results from queries. By default you can look up items in your cache by using the __typename and id of your items. You usually don't need to worry about __typename until you manually need to tweak the cache. For the most part, just re-run your server queries to pull from the cache after the original request. The server responses are cached by default, so the next time you run a query it will pull from the cache.
It depends on your situation, but most of the time if you set your IDs properly Apollo client will automatically sync up changes from a mutation. All you should need to do is return the id property and any changed fields in your mutation query and Apollo will update the cache auto-magically. So, in the case you are describing where you mark a todo as completed, you should probably just send the mutation to the server, then in the mutation response you request the completed field and the id. The client will automatically update.
You can use the original query. Apollo client essentially caches things using a query + variable -> results map. As long as you submit the same query with the same variables it will pull from the cache (unless you explicitly tell it not to).
See my answer to #2 above, but Apollo client will handle it for you as long as you include the id and any modified data in your mutation. It won't handle it for you if you add new data, such as adding a todo to a list. Same for removing data.

Reduce data size returned from an API through GraphQL?

We use one endpoint that returns massive size of data and sometime the page would take 5-10s to load. We don't have control over the backend api.
Is there a way to reduce the size that's going to be downloaded from the API?
We have already enabled compression.
I heard GraphQL could make a data schema before query it. Would GraphQL help in this case?
GraphQL could help, assuming:
Your existing API request is doing a lot of overfetching, and you don't actually need a good chunk of the data being returned
You have the resources to set up an additional GraphQL server to serve as a proxy to the REST endpoint
The REST endpoint response can be modeled as a GraphQL schema (this might be difficult or outright impossible if the object keys in the returned JSON are subject to change)
The response from the REST endpoint can be cached
The extra latency introduced by adding the GraphQL server as an intermediary is sufficiently offset by the reduction in response size
Your GraphQL server would have to expose a query that would make a request to the REST endpoint and then cache it server-side. The cached response would be used upon subsequent queries to the server, until it expires or is invalidated. Caching the response is key, otherwise simply proxying the request through the GraphQL server will simply make all your queries slower since getting the data from the REST endpoint to the server will itself take approximately the same amount of time as your request does currently.
GraphQL can then be used to cut down the size of your response in two ways:
By not requesting certain fields that aren't needed by your client (or omitting these fields from your schema altogether)
By introducing pagination. If the reason for the bloated size of your response is the sheer number of records returned, you can add pagination to your schema and return smaller chunks of the total list of records one at a time.
Note: the latter can be a significant optimization, but can also be tricky if your cache is frequently invalidated

jsp(springMVC) store data in session or request to retrieve between same page requests

I have a JSP page, lets say events_index.jsp that gets all the events in the system. i am using spring MVC pagedListHolder to implement pagination. do i need to store the data source in request or session. if i store it in session, new events created will not come into list unless i close the browser before creating a new event. If i store in request it fetches entire data from database every time as it cannot find data in next request object. i need data to retain only between events_index.jsp requests only but not entire session.
any suggestions?
What I understood from your question you need to show latest data everytime you paginate?
Even you are paginating in same page for every next or prev , there is a http request goes to server and as per pageSize and offset setting server returns data.
so this happens for every request and you will get latest data.
If you store paging data in session and serve the subsequent request from session you may not have latest state of data.
So I suggest use request to show latest data [may hit performance]
use session to show data[may not show latest data]

What is better in this scenario? ViewBag or TempData or Session?

I have created news website in MVC.
I have search functionality on it.
When Index Action of Search Controller is called, it fetches records from database, it returns Search View.
This Search View has AJAX Pager for paging, when Next or Previous button of Pager is clicked, AJAX request is made to Paging Action of Search Controller.
Now I don't want again to make call to my Database. I want to use results which were fetched during Index action of Search Controller.
For now I have used Session[""] object.
I want to know what is better to used for state management in this scenario.
Results fetched from database can be around 1000-5000, ArticleName, ArticleShortDescription (~200 characters)
ViewBag or ViewData are only persistent in the current request. As such, they are not usable.
TempData persists until the next request, but this could be anything, so there's no guarantee it persists long enough for you to make your Ajax call (or subsequent ajax calls).
Realistically, Session is your only decent option in this case, though it's still not optimal.
You'll be storing a lot of information, which may not even be requested by the client. Even then, cleaning it up after it's no longer needed might prove hard as well.
Your best bet would be to make calls to the database which take paging into account, so you only ever return a subset of the data each request, rather than just pulling out all the data.
You should not use any of those. Session are created per user, if you are storing 1000 - 5000 articles for each user using your search, you are going to have a bad time. ViewData are fundamentally Session object with a nice wrappers, so it's also bad for your use case.
Let's say you decide to use HttpRuntime.Cache instead, so that you are not putting all the result on a per-user basis, then you have to worry about how long to store the objects in cache.
The logical approach would be to query the database with pagination
To prevent hitting your database so frequently, then you should cache the paged result with the search term + page number + page size (optional) as your cache key and store your result objects as the cache value, ideally with the cache expiration set. (You wouldn't want to serve stale search results till your cache gets evicted right?)
I avoid using session state as it affects how your application scales in a load balanced environment. You have to ensure a user always has requests served from the same server because that is where session state is stored (unless you put it in the database, but that defeats the point in your situation).
I would try to use application caching. It does mean, if the user clicks Next or Prev and that request is served from another server, you'll have to go to the database again - but personally I would prefer to take that hit.
Have a look at this page, in particular scroll down to the Application Caching section.
Hope this helps.

Resources