Google WebRisk API TTL - caching

How Google Webrisk evaluates the expiryTime and negativeExpiryTime for a particular UpdateAPi request.
Sometimes GWR api return the response with an expirytime of 5-10 minutes and therefore we have to again make an API call to GWR after 5-10 minutes which is pretty soon.
Is there any way to optimise the expiryTime to reduce the cost ?

The expireTime and negativeExpireTime fields indicate until when the hashes must be considered either unsafe or safe respectively in Update API.
To reduce the overall number of hashes.search requests sent to Google using the Update API , clients are required to maintain a local cache. The API establishes two types of caching, positive and negative.
Positive caching (expireTime)
To prevent clients from repeatedly asking about the state of a particular unsafe full hash , each returned ThreatHash contains a positive cache time (defined by the expireTime field). The full hash can be considered unsafe until this time. A positive cache entry should be created or updated for the full hash per the expireTime field.
Negative caching(negativeExpireTime)
To prevent clients from repeatedly asking about the state of a particular safe full hash, the response defines a negative cache duration for the requested prefix (defined by the negativeExpireTime field). All full hashes with the requested prefix are to be considered safe for the requested threat types until this time, except for those returned by the server as unsafe. The hash prefix's negative cache duration should also be created or updated per the response's negativeExpireTime field.
For Example:
Assume a client with an empty cache visits example.com/ and sees that h(example.com/) is in the local database. The client requests the full-length hashes for hash prefix h(example.com/) and receives back the full-length hash H(example.com/) together with a positive cache expireTime of 5 minutes from now and a negative cache expireTime of 1 hour from now.
The positive cache duration of 5 minutes tells the client how long the full-length hash H(example.com/) must be considered unsafe without sending another hashes.search request. After 5 minutes the client must issue another hashes.search request for that prefix h(example.com/) if the client visits example.com/ again. The client should reset the hash prefix's negative cache expireTime per the new response.
Refer to the link for more information about Update API .

Related

The public subscriber lists pages not matching batch size and total counts

I am trying to get a list of my public subscribers. When I execute the request below I get weird and inconsistent results. When I ask for a max result size of 50 I get two pages back, one with 27 and another with 9. Also when I look at my web page subscribers it says I only have 24 public subscribers. I have 95 total subscribers.
Why is it paging in buckets less than my max page size?
Why are the reported numbers so far off?
https://www.googleapis.com/youtube/v3/subscriptions?part=subscriberSnippet&mySubscribers=true&key=[MY_API_KEY]
According to the official docs, for to call the Subscriptions.list API endpoint with parameter mySubscribers=true you have to pass on to the endpoint proper authorization credentials:
mySubscribers (boolean)
This parameter can only be used in a properly authorized request. Set this parameter's value to true to retrieve a feed of the subscribers of the authenticated user in no particular order.
Therefore, passing on only an application key, via the parameter key, is not sufficient. If passing on proper authorization credentials (i.e. a valid access token), then the key parameter is superfluous.

Does EX second impact performance in Redis?

I tried googling something similar , but wasn't habel to find something on the topic
I'm just curious, does it matter how big the number of seconds are set in a key impact performance in redis?
For example:
set mykey "foobarValue" EX 100 VS set mykey "foobarValue" EX 2592000
To answer this question, we need to see how Redis works.
Redis maintains tables of a key, value pair with an expiry time, so each entry can be translated to
<Key: <Value, Expiry> >
There can be other metadata associated with this as well. During GET, SET, DEL, EXPIRE etc operations Redis calculates the hash of the given key(s) and tries to perform the operation. Since it's a hash table, it needs to prob during any operation, while probing it may encounter some expired keys. If you have subscribed for "Keyspace notification" then notification would be sent and the given entry is removed/updated based on the operation being performed. It also does rehashing, during rehashing it might find expired keys as well. Redis also runs background tasks to cleanup expire keys, that means if TTL is too small then more keys would be expired, as this process is random, so more event would be generated.
https://github.com/antirez/redis/blob/a92921da135e38eedd89138e15fe9fd1ffdd9b48/src/expire.c#L98
It does have a small performance issue when TTL is small since it needs to free the memory and fix some pointers. But it can so happen that you're running out of memory since expired keys are also present in the database. Similarly, if you use higher expiry time then the given key would present in the system for a longer time, that can create memory issue.
Setting smaller TTL has also more cache miss for the client application, so client will have performance issues as well.

Java Redis Rate limiting

I just want to do rate limiting on a rest api using redi. Could you please suggest me, which datastructure in redis would be appropriate. I just used the RedisTemplate which is not feasible to expire an element, once after updating a key and value.
There are multiple approaches, depending on what exactly you are trying to achieve - from general "ops per second" limiting, to fine grained limits in a lower resolution, like how many posts a specific user can make per day, etc.
One very simple and elegant approach I like is an expiring counter. The technique is simple, and takes advantage of the fact that INCR does not change a key's expiration time in redis. So basically if you want 1000 requests per second on a resource, just create a key with the number 1 (by running INCR) and expire it in a second. Then for each request check if it's reached 1000 and if not increment it. If it has - block the request. When the time window has passed the key will expire automatically and will be recreated on the next request.
In terms of pseudo code, the algorithm is:
def limit(resource_key):
current = GET(resource_key)
if current != NULL and current >= 1000:
return ERROR
else:
value = INCR(resource_key)
IF value == 1:
EXPIRE(value,1)
return OK

What is it called when two requests are being served from the same cache?

I'm trying to find the technical term for the following (and potential solutions), in a distributed system with a shared cache:
request A comes in, cache miss, so we begin to generate the response
for A
request B comes in with the same cache key, since A is not
completed yet and hasn't written the result to cache, B is also a
cache miss and begins to generate a response as well
request A completes and stores value in cache
request B completes and stores value in cache (over-writing request A's cache value)
You can see how this can be a problem at scale, if instead of two requests, you have many that all get a cache miss and attempt to generate a cache value as soon as the cache entry expires. Ideally, there would be a way for request B to know that request A is generating a value for the cache, and wait until that is complete and use that value.
I'd like to know the technical term for this phenomenon, it's a cache race of sorts.
It's a kind of Thundering Herd
Solution: when first request A comes and fills a flag, if request B comes and finds the flag then wait... After A loaded the data into the cache, remove flag.
If all other request are waked up by the cache loaded event, would trigger all thread "Thundering Herd". So also need to care about the solution.
For example in Linux kernel, only one process would be waked up, even several process depends on the event.

Guava Cache: How to access without it counting for the eviction policy?

I have a Guava cache which I would like to expire after X minutes have passed from the last access on a key. However, I also periodically do an action on all the current key-vals (much more frequently than the X minutes), and I wouldn't like this to count as an access to the key-value pair, because then the keys will never expire.
Is there some way to read the value of the keys without this influencing the internal state of the cache? ie cache._secretvalues.get(key) where I could conceivably subclass Cache to StealthCache and do getStealth(key)? I know relying on internal stuff is non-ideal, just wondering if it's possible at all. I think when I do cache.asMap.get() it still counts as an access internally.
From the official Guava tutorials:
Access time is reset by all cache read and write operations (including
Cache.asMap().get(Object) and Cache.asMap().put(K, V)), but not by
containsKey(Object), nor by operations on the collection-views of
Cache.asMap(). So, for example, iterating through cache.entrySet()
does not reset access time for the entries you retrieve.
So, what I would have to do is iterate through the entrySet instead to do my stealth operations.

Resources