okhttp,the cache key is the url.But every time I request the same url I need to put param of "distinctRequestId" to distinguish every request.It lead to my url is different every time.But the content is the same .So I want to set the cache key custom. which i can get the cache is Unique.Thx
Related
I have the following endpoints:
/orders
/orders?status=open
/orders?status=open&my_orders=true
The third example uses headers to determine user and return their specific items.
Obviously, this is a poor API design but we want to cache the first two and not the third. The caching policy can be modified to either whitelist or exclude querystring params but based on my understanding this won't be helpful. If we include the user specific header than the first 2 URIs will all be cached per user.
Is there an option I am missing that allows me to avoid caching the 3rd endpoint, while still caching the first two? Another option is to cache the 3rd but include the user specific headers in the cache key.
If you exclude the my_orders query string from the cache policy, CloudFront will not include that value in the cache key. That means all else held equal, these two URI paths will share the same cache key:
/orders?status=open
/orders?status=open&my_orders=true
That doesn't sound like it's what you want - you do want to treat requests with my_orders=true as separate cache keys, but you also need to account for a specific request header where the value of that header changes the cache key. If that's the case, you need to include the request header as part of your cache key (which will also ensure CloudFront passes it through to your origin)
ExpireAfter will only purge the item but will not re-create the item. So what I need to do is, after a predefined interval, I need to purge a particular item from the cache and at the same time I need to recreate it. It might recreate with same data if there is no change in the data. Assuming the data was changed, the recreating will give the latest object.
My idea was to retrieve latest item form the cache all the time. In contrast, the Refresh feature (https://github.com/ben-manes/caffeine/wiki/Refresh) will provide the stale item for the first request and does an asynchronous loading. So for the second request the cache will provide the latest object.
Asynchronous removal listener that re-fetches the expired entry
should work in my case. Can you please provide me some information on
how to achieve this?
I'm also curious to know how the scheduled task can do it?
Assuming cache can address the following two cases:
Subsequent requests case:
I understand the refreshAfterWrite will provide the stale entry for
the first time but for the second request, what happens if the cache
hasn't yet completed loading the expired entry?
Does cache blocks the second request, completes the re-fetch, and
then provide the latest value to the second request?.
The idea is to make the cache provides the latest data after the
defined entry expiry time.
In the case where the cache has to load values equal to its capacity at one shot:
Let say the cache size is 100 and the time to load all the 100 items
is 2 minutes.
Assuming the first request would load 100 items into the cache at the
same time, after the defined expiry time, the cache should evict and
re-fetch all the 100 elements.
For the second request to access items from those 100 items, how can
I make the cache smart enough so that it returns the entries that
have been re-loaded and asynchronously re-loads the other entries?.
The idea is not to block any request for an existing entry. Serve the
request for an existing entry and do the re-load for the remaining
expired entries.
Asynchronous removal listener that re-fetches the expired entry should work in my case. Can you please provide me some information on how to achieve this?
The removal listener requires a reference to the cache, but that is not available during construction. If it calls a private method instead then the uninitialized field isn't captured and it can be resolved at runtime.
Cache<K, V> cache = Caffeine.newBuilder()
.expireAfterWrite(1, TimeUnit.HOURS)
.removalListener((K key, V value, RemovalCause cause) -> {
if (cause == RemovalCause.EXPIRED) {
reload(key);
}
}).build();
private void reload(K key) {
cache.get(key, k -> /* load */);
}
I'm also curious to know how the scheduled task can do it?
If you are reloading all entries then you might not even need a key-value cache. In that case the simplest approach would be to reload an immutable map.
volatile ImmutableMap<K, V> data = load();
scheduledExecutorService.scheduleAtFixedRate(() -> data = load(),
/* initial */ 1, /* period */ 1, TimeUnit.HOURS);
I understand the refreshAfterWrite will provide the stale entry for the first time but for the second request, what happens if the cache hasn't yet completed loading the expired entry?
The subsequent requests obtain the stale entry until either (a) the refresh completes and updates the mappings or (b) the entry was removed and the caller must reload. The case of (b) can occur if the entry expired while the refresh was in progress, where returning the stale value is no longer an option.
Does cache blocks the second request, completes the re-fetch, and then provide the latest value to the second request?.
No, the stale but valid value is returned. This is to let the refresh hide the latency of reloading a popular entry. For example an application configuration that is used by all requests would block when expired, causing periodic delays. The refresh would be triggered early, reload, and the callers would never observe it absent. This hides latencies, while also allowing idle entries to expire and fade away.
In the case where the cache has to load values equal to its capacity at one shot... after the defined expiry time, the cache should evict and re-fetch all the 100 elements.
The unclear part of your description is if the cache reloads only the entries being accessed within the refresh period or if it reloads the entire contents. The former is what Caffeine offers, while the latter is better served with an explicit scheduling thread.
I'm trying to find the technical term for the following (and potential solutions), in a distributed system with a shared cache:
request A comes in, cache miss, so we begin to generate the response
for A
request B comes in with the same cache key, since A is not
completed yet and hasn't written the result to cache, B is also a
cache miss and begins to generate a response as well
request A completes and stores value in cache
request B completes and stores value in cache (over-writing request A's cache value)
You can see how this can be a problem at scale, if instead of two requests, you have many that all get a cache miss and attempt to generate a cache value as soon as the cache entry expires. Ideally, there would be a way for request B to know that request A is generating a value for the cache, and wait until that is complete and use that value.
I'd like to know the technical term for this phenomenon, it's a cache race of sorts.
It's a kind of Thundering Herd
Solution: when first request A comes and fills a flag, if request B comes and finds the flag then wait... After A loaded the data into the cache, remove flag.
If all other request are waked up by the cache loaded event, would trigger all thread "Thundering Herd". So also need to care about the solution.
For example in Linux kernel, only one process would be waked up, even several process depends on the event.
When you configure a cache in Edge you give it some key fragments (e.g. request.uri, request.header.Accept, request.header.Accept-Language, etc.). To clear that key you pass the same key fragments.
If I have 5,000 elements cached, how can I clear the entire cache without generating 5,000 calls to my API with all the possible cache keys?
You can use the clear all cache entries API call, documented here. If you don't pass in the prefix query parameter, it should remove all.
Invalidate cache policy is used to explicitly invalidate the cache entry for the given CacheKey (Where Cachekey is the combination of 'Prefix and KeyFragment'), not for clearing the all entries, associated with the given Cache resource. Please go through the document here to understand more about 'Invalidate Cache'.
The cache can also be cleared from UI.
You can login to UI then go to API's tab under that will be "Environment Configuration" Tab
Here you will get the option to clear entire cache.
The following API call will also allow you to delete all your cache entries :
curl -v -u admin 'https://api.enterprise.apigee.com/v1/organizations/{org-name}/environments/{env-name}/caches/{cache-name}/entries?action=clear' -X POST
I am trying to use Varnish to cache a page that has some user specific text and links on it. The best way to cache such pages is via Edge Side Includes.
Context
My web application is RESTful and does not support sessions or even cookies for that matter. Every source URL is complete in a sense that it contains a user specific query parameter to be able to identify a unique user. The pages which see most visits in the web application are listing pages. I just need to show the user's email in the header and the links on the page must also carry the user specific query parameter ahead so as to simulate a logged in behavior. Page contents are supposed to be the same for each user except for the header and those internal links.
I tried to use <esi:include /> for such areas on the page but obviously, could not include the user specific parameter in the page source (else the first user specific hit would be cached with the first user's parameter and be served the same for every subsequent user). Further, I tried to strip user specific parameter in vcl_recv subroutine of Varnish and store it temporarily in a header such as req.http.X-User just before a lookup. Each source URL gets hashed with a req.url that doesn't contain any user specific parameters and hence, does not create duplicate cache objects for each unique user.
Question
I would like to read the user specific parameter from req.http.X-User and hash user specific ESI requests by adding this user specific value against each ESI URL as a query parameter. I do not see a way in which one could share query parameters between a source request and it's included ESI requests. Could someone help?
I have tried to depict my objective in the following diagram:
I guess your problem is that the ESI call itself is going to be cached. Including any query strings in the URL.
I cant remember the specifics, but I think you can get Varnish to pass cookies through the ESI requests, so you could store the value in a cookie (encrypted?) and then that can be read via whatever is handling the ESI call.
Or maybe you can get it to pass the HTTP headers through? In which case it can be read directly from the HTTP header parameter