Does Google Guava Cache do deduplication when refreshing value of the same key - caching

I implemented a non-blocking cache using Google Guava, there's only one key in the cache, and value for the key is only refreshed asynchronously (by overriding reload()).
My question is that does Guava cache handle de-duplication if the first reload() task hasn't finished, and a new get() request comes in.
//Cache is defined like below
this.cache = CacheBuilder
.newBuilder()
.maximumSize(1)
.refreshAfterWrite(10, TimeUnit.MINUTES)
.recordStats()
.build(loader);
//reload is overwritten asynchronously
#Override
public ListenableFuture<Map<String, CertificateInfo>> reload(final String key, Map<String, CertificateInfo> prevMap) throws IOException {
LOGGER.info("Refreshing certificate cache.");
ListenableFutureTask<Map<String, CertificateInfo>> task = ListenableFutureTask.create(new Callable<Map<String, CertificateInfo>>() {
#Override
public Map<String, CertificateInfo> call() throws Exception {
return actuallyLoad();
}
});
executor.execute(task);
return task;
}

Yes, see the documentation for LoadingCache.get(K) (and it sibling, Cache.get(K, Runnable)):
If another call to get(K) or getUnchecked(K) is currently loading the value for key, simply waits for that thread to finish and returns its loaded value.
So if a cache entry is currently being computed (or reloaded/recomputed), other threads that try to retrieve that entry will simply wait for the computation to finish - they will not kick off their own redundant refresh.

Related

Project Reactor/Webflux: limit subscription time and pass another object downstream

I have a method that accepts "infinite" subscriptions:
#GetMapping("/sse")
public Flux<ServerSentEvent<UserUpdateResponse>> handleSse(String id) {
return usersSink.asFlux()
.filter(update -> id.equals(update.getId()))
.map(this::wrapIntoSse);
}
I want to limit the time of the subscription and when the timer expires produce an object that will be passed to the downstream.
Basically, I want takeUntilOther() with a way to change the object. Instead of waiting until the filter matches, I want to create an object myself and pass it to the consumers of the above Flux.
Basically you need to cancel subscription but I don't think such operator exists. Also, as far as I know WebFlux doesn't provide any mechanism to access active subscriptions. For example, in Netty subscription happens in HttpServer.
Not sure about side-effects but you could get access to subscription using doOnSubscribe and keep it in some cache that allow to set TTL for entries. Then in removal listener we could cancel subscription.
Here is an example with Caffeine cache but you could use some custom implementation and have background thread monitoring entries and evict expired values.
#Slf4j
#RestController
public class StreamingController {
private final Cache<String, Subscription> cache = Caffeine.newBuilder()
.expireAfterWrite(3, TimeUnit.SECONDS)
.removalListener((String key, Subscription subscription, RemovalCause cause) -> {
log.info("Canceling subscription: {}", key);
subscription.cancel();
})
.build();
#GetMapping("/sse")
public Flux<ServerSentEvent<UserUpdateResponse>> handleSse(String id) {
return usersSink.asFlux()
.filter(update -> id.equals(update.getId()))
.map(this::wrapIntoSse)
.doOnSubscribe(s -> {
this.cache.put(UUID.randomUUID().toString(), s);
});
}
}

Updating global store from data within transform

I currently have a simple topology:
KStream<String, Event> eventsStream = builder.stream(sourceTopic);
eventsStream.transformValues(processorSupplier, "nameCache")
.to(destinationTopic);
My events sometimes have a key/value pair and other times have just the key. I want to be able to add the value to those events that are missing the value. I have this working fine with a local state store but when I add more tasks, sometimes the key/value events and the value events are in different threads and so they aren't updated correctly.
I'd like to use a global state store for this but I'm having difficulty figuring out how to update the global store when new key/value pairs come in. I've created a global state store with the following code:
builder.addGlobalStore(stateStore, "global_store", Consumed.with(Serdes.String(), Serdes.String()), new ProcessorSupplier<String, String>() {
#Override
public Processor<String, String> get() {
return new Processor<String, String>() {
private ProcessorContext context;
#Override
public void init(final ProcessorContext processorContext) {
this.context = processorContext;
}
#Override
public void process(final String key, final String value) {
context.forward(key, value);
}
#Override
public void close() {
}
};
}
});
As far as I can tell, it is working but since there is no data in the topic, I'm not sure.
So my question is how do I update the global store from inside of the transformValues? store.put() fails with an error that global store is read only.
I found Write to GlobalStateStore on Kafka Streams but the accepted answer just says to update the underlying topic but I don't see how I can do that since the topic isn't in my stream.
---Edited---
I updated the code per #1 in the accepted answer. I see the new key/value pairs show up in global_store. But the globalStore doesn't seem to see the new keys. If I restart the application, it fills the cache with the data in the topic but new keys aren't visible until after I stop/start the application.
I added logging to the process(String, String) in the global store processor and it shows new keys being processed. Any ideas?
You can only get a real-only access on Global state store inside transformValues, and if you want to update a global state store, yes, you have to send the update to the underlying input topic of Global state store, and your state will update the value when this update message is consumed. The reason behind this is that, Global state store are populated on all application instances and use this input topic for fault tolerance. You can do this by branching you topology:
KStream<String, Event> eventsStream = builder.stream(sourceTopic);
//processing message as normal
eventsStream.transformValues(processorSupplier, "nameCache")
.to(destinationTopic);
//this transform to the updated message to global state
eventsStream.transform(updateGlobalStateProcessorSupplier, "nameCache")
.to("global_store");
Using low level API to construct your Topology manually, so you can forward both to your destinationTopic topic and global_state topic using ProcessorContext.forward to forward message to sink processor node using name of the sink processor.

How to refresh the key and value in cache after they are expired in Guava (Spring)

So, I was looking at caching methods in Java (Spring). And Guava looked like it would solve the purpose.
This is the usecase -
I query for some data from a remote service. Kind of configuration field for my application. This field will be used by every inbound request to my application. And it would be expensive to call the remote service everytime as it's kind of constant which changes periodically.
So, on the first request inbound to my application, when I call remote service, I would cache the value. I set an expiry time of this cache as 30 mins. After 30 mins when the cache is expired and there is a request to retrieve the key, I would like a callback or something to do the operation of calling the remote service and setting the cache and return the value for that key.
How can I do it in Guava cache?
Here i give a example how to use guava cache. If you want to handle removal listener then need to call cleanUp. Here i run a thread which one call clean up every 30 minutes.
import com.google.common.cache.*;
import org.springframework.stereotype.Component;
import java.util.concurrent.Executors;
import java.util.concurrent.TimeUnit;
#Component
public class Cache {
public static LoadingCache<String, String> REQUIRED_CACHE;
public Cache(){
RemovalListener<String,String> REMOVAL_LISTENER = new RemovalListener<String, String>() {
#Override
public void onRemoval(RemovalNotification<String, String> notification) {
if(notification.getCause() == RemovalCause.EXPIRED){
//do as per your requirement
}
}
};
CacheLoader<String,String> LOADER = new CacheLoader<String, String>() {
#Override
public String load(String key) throws Exception {
return null; // return as per your requirement. if key value is not found
}
};
REQUIRED_CACHE = CacheBuilder.newBuilder().maximumSize(100000000)
.expireAfterWrite(30, TimeUnit.MINUTES)
.removalListener(REMOVAL_LISTENER)
.build(LOADER);
Executors.newSingleThreadExecutor().submit(()->{
while (true) {
REQUIRED_CACHE.cleanUp(); // need to call clean up for removal listener
TimeUnit.MINUTES.sleep(30L);
}
});
}
}
put & get data:
Cache.REQUIRED_CACHE.get("key");
Cache.REQUIRED_CACHE.put("key","value");

Difference Between cacheNames and Key in #cachable

I am new to caching and Spring, I can't work out the difference between cacheNames and Key in below example taken from Spring Docs:
#Cacheable(cacheNames="books", key="#isbn")
public Book findBook(ISBN isbn, boolean checkWarehouse, boolean includeUsed)
As I understand cache is simply a key-value pair stored in memory. So in the above example on first invocation the returned Book value will be stored in cache using the value of isbn parameter as key. On subsequent invocations where isbn value is the same as it was first requested the Book stored in cache will be returned. This Book in cache will be found using the Key. So what is cacheNames?
Am I correct in saying cache is stored as key values like this:
isbn111111 ---> Book,
isbn122222 ---> Book2,
isbn123333 ---> Book3
Thanks in advance.
CacheName is more like group of cache key. When you open this class
org.springframework.cache.interceptor.AbstractCacheResolver
you will find this method to find cache by cacheName
#Override
public Collection<? extends Cache> resolveCaches(CacheOperationInvocationContext<?> context) {
Collection<String> cacheNames = getCacheNames(context);
if (cacheNames == null) {
return Collections.emptyList();
}
Collection<Cache> result = new ArrayList<>(cacheNames.size());
for (String cacheName : cacheNames) {
Cache cache = getCacheManager().getCache(cacheName);
if (cache == null) {
throw new IllegalArgumentException("Cannot find cache named '" +
cacheName + "' for " + context.getOperation());
}
result.add(cache);
}
return result;
}
So later in org.springframework.cache.interceptor.CacheAspectSupport spring will get value by cache key from that cache object
private Object execute(final CacheOperationInvoker invoker, Method method, CacheOperationContexts contexts) {
// Special handling of synchronized invocation
if (contexts.isSynchronized()) {
CacheOperationContext context = contexts.get(CacheableOperation.class).iterator().next();
if (isConditionPassing(context, CacheOperationExpressionEvaluator.NO_RESULT)) {
Object key = generateKey(context, CacheOperationExpressionEvaluator.NO_RESULT);
Cache cache = context.getCaches().iterator().next();
try {
return wrapCacheValue(method, cache.get(key, () -> unwrapReturnValue(invokeOperation(invoker))));
}
catch (Cache.ValueRetrievalException ex) {
// The invoker wraps any Throwable in a ThrowableWrapper instance so we
// can just make sure that one bubbles up the stack.
throw (CacheOperationInvoker.ThrowableWrapper) ex.getCause();
}
}
//...other logic
The cacheNames are the names of the caches itself, where the data is stored. You can have multiple caches, e.g. for different entity types different caches, or depending on replication needs etc.
One significance of cacheNames would be helping with default key generation for #Cacheable used when explicit keys aren't passed to method. Its very unclear from Spring documentation on what would be seriously wrong or inaccurate if cacheNames is not supplied at Class level or Method level when using Spring Cache.
https://docs.spring.io/spring-framework/docs/current/javadoc-api/org/springframework/cache/annotation/CacheConfig.html#cacheNames--

Does CompletableFuture have a corresponding Local context?

In the olden days, we had ThreadLocal for programs to carry data along with the request path since all request processing was done on that thread and stuff like Logback used this with MDC.put("requestId", getNewRequestId());
Then Scala and functional programming came along and Futures came along and with them came Local.scala (at least I know the twitter Futures have this class). Future.scala knows about Local.scala and transfers the context through all the map/flatMap, etc. etc. functionality such that I can still do Local.set("requestId", getNewRequestId()); and then downstream after it has travelled over many threads, I can still access it with Local.get(...)
Soooo, my question is in Java, can I do the same thing with the new CompletableFuture somewhere with LocalContext or some object (not sure of the name) and in this way, I can modify Logback MDC context to store it in that context instead of a ThreadLocal such that I don't lose the request id and all my logs across the thenApply, thenAccept, etc. etc. still work just fine with logging and the -XrequestId flag in Logback configuration.
EDIT:
As an example. If you have a request come in and you are using Log4j or Logback, in a filter, you will set MDC.put("requestId", requestId) and then in your app, you will log many log statements line this:
log.info("request came in for url="+url);
log.info("request is complete");
Now, in the log output it will show this:
INFO {time}: requestId425 request came in for url=/mypath
INFO {time}: requestId425 request is complete
This is using a trick of ThreadLocal to achieve this. At Twitter, we use Scala and Twitter Futures in Scala along with a Local.scala class. Local.scala and Future.scala are tied together in that we can achieve the above scenario still which is very nice and all our log statements can log the request id so the developer never has to remember to log the request id and you can trace through a single customers request response cycle with that id.
I don't see this in Java :( which is very unfortunate as there are many use cases for that. Perhaps there is something I am not seeing though?
If you come across this, just poke the thread here
http://mail.openjdk.java.net/pipermail/core-libs-dev/2017-May/047867.html
to implement something like twitter Futures which transfer Locals (Much like ThreadLocal but transfers state).
See the def respond() method in here and how it calls Locals.save() and Locals.restort()
https://github.com/simonratner/twitter-util/blob/master/util-core/src/main/scala/com/twitter/util/Future.scala
If Java Authors would fix this, then the MDC in logback would work across all 3rd party libraries. Until then, IT WILL NOT WORK unless you can change the 3rd party library(doubtful you can do that).
My solution theme would be to (It would work with JDK 9+ as a couple of overridable methods are exposed since that version)
Make the complete ecosystem aware of MDC
And for that, we need to address the following scenarios:
When all do we get new instances of CompletableFuture from within this class? → We need to return a MDC aware version of the same rather.
When all do we get new instances of CompletableFuture from outside this class? → We need to return a MDC aware version of the same rather.
Which executor is used when in CompletableFuture class? → In all circumstances, we need to make sure that all executors are MDC aware
For that, let's create a MDC aware version class of CompletableFuture by extending it. My version of that would look like below
import org.slf4j.MDC;
import java.util.Map;
import java.util.concurrent.*;
import java.util.function.Function;
import java.util.function.Supplier;
public class MDCAwareCompletableFuture<T> extends CompletableFuture<T> {
public static final ExecutorService MDC_AWARE_ASYNC_POOL = new MDCAwareForkJoinPool();
#Override
public CompletableFuture newIncompleteFuture() {
return new MDCAwareCompletableFuture();
}
#Override
public Executor defaultExecutor() {
return MDC_AWARE_ASYNC_POOL;
}
public static <T> CompletionStage<T> getMDCAwareCompletionStage(CompletableFuture<T> future) {
return new MDCAwareCompletableFuture<>()
.completeAsync(() -> null)
.thenCombineAsync(future, (aVoid, value) -> value);
}
public static <T> CompletionStage<T> getMDCHandledCompletionStage(CompletableFuture<T> future,
Function<Throwable, T> throwableFunction) {
Map<String, String> contextMap = MDC.getCopyOfContextMap();
return getMDCAwareCompletionStage(future)
.handle((value, throwable) -> {
setMDCContext(contextMap);
if (throwable != null) {
return throwableFunction.apply(throwable);
}
return value;
});
}
}
The MDCAwareForkJoinPool class would look like (have skipped the methods with ForkJoinTask parameters for simplicity)
public class MDCAwareForkJoinPool extends ForkJoinPool {
//Override constructors which you need
#Override
public <T> ForkJoinTask<T> submit(Callable<T> task) {
return super.submit(MDCUtility.wrapWithMdcContext(task));
}
#Override
public <T> ForkJoinTask<T> submit(Runnable task, T result) {
return super.submit(wrapWithMdcContext(task), result);
}
#Override
public ForkJoinTask<?> submit(Runnable task) {
return super.submit(wrapWithMdcContext(task));
}
#Override
public void execute(Runnable task) {
super.execute(wrapWithMdcContext(task));
}
}
The utility methods to wrap would be such as
public static <T> Callable<T> wrapWithMdcContext(Callable<T> task) {
//save the current MDC context
Map<String, String> contextMap = MDC.getCopyOfContextMap();
return () -> {
setMDCContext(contextMap);
try {
return task.call();
} finally {
// once the task is complete, clear MDC
MDC.clear();
}
};
}
public static Runnable wrapWithMdcContext(Runnable task) {
//save the current MDC context
Map<String, String> contextMap = MDC.getCopyOfContextMap();
return () -> {
setMDCContext(contextMap);
try {
return task.run();
} finally {
// once the task is complete, clear MDC
MDC.clear();
}
};
}
public static void setMDCContext(Map<String, String> contextMap) {
MDC.clear();
if (contextMap != null) {
MDC.setContextMap(contextMap);
}
}
Below are some guidelines for usage:
Use the class MDCAwareCompletableFuture rather than the class CompletableFuture.
A couple of methods in the class CompletableFuture instantiates the self version such as new CompletableFuture.... For such methods (most of the public static methods), use an alternative method to get an instance of MDCAwareCompletableFuture. An example of using an alternative could be rather than using CompletableFuture.supplyAsync(...), you can choose new MDCAwareCompletableFuture<>().completeAsync(...)
Convert the instance of CompletableFuture to MDCAwareCompletableFuture by using the method getMDCAwareCompletionStage when you get stuck with one because of say some external library which returns you an instance of CompletableFuture. Obviously, you can't retain the context within that library but this method would still retain the context after your code hits the application code.
While supplying an executor as a parameter, make sure that it is MDC Aware such as MDCAwareForkJoinPool. You could create MDCAwareThreadPoolExecutor by overriding execute method as well to serve your use case. You get the idea!
You can find a detailed explanation of all of the above here in a post about the same.

Resources