Kotlin + Spring Boot - Add TTL to your map - spring

So I have an empty map referenced like:
private var labelsForGroupId: Map<GroupId, Label> = emptyMap()
to lower the amount of calls through network api. After first call I cache the response to the map.
However, I would love to add TTL to that map, (for example, every hour it should be empty again). I am quite new to Kotlin, so wondering what would be the best approach here with some examples?

Instead of using a Map, you could use Guava Cache. It works like a Map (key-value) and have expiration policies.
Expiration by time example:
CacheBuilder.newBuilder()
.expireAfterAccess(200, TimeUnit.MILLISECONDS)
.build(loader);
If you are not interested in caches at all, then you could try to setup a Coroutine with a ScheduledExecutorService as Dispatcher. I never did this before but is a way out. Take a look at the Executors documentation - Coroutine context and dispatchers
If the given [ExecutorService] is an instance of
[ScheduledExecutorService], then all time-related * coroutine
operations such as [delay], [withTimeout] and time-based [Flow]
operators will be scheduled * on this executor using
[schedule][ScheduledExecutorService.schedule] method. If the
corresponding * coroutine is cancelled, [ScheduledFuture.cancel] will
be invoked on the corresponding future.

Related

Spring Boot Caching auto refresh using #PostConstruct

I currently have a Spring Boot based application where there is no active cache. Our application is heavily dependent on key-value configurations which we maintain in an Oracle DB. Currently, without cache, each time I want to get any value from that table, it is a database call. This is, expectedly causing a lot of overhead due to high number of transactions to the DB. Hence, the need for cache arrived.
On searching for caching solutions for SpringBoot, I mostly found links where we are caching object while any CRUD operation is performed via the application code itself, using annotations like #Cacheable, #CachePut, #CacheEvict, etc. but this is not applicable for me. I have a master data of key-value pairs in the DB, any change needs approvals and hence the access is not directly provided to the user, it is made once approved directly in the DB.
I want to have these said key-values to be loaded at startup time and kept in the memory, so I tried to implement the same using #PostConstruct and ConcurrentHashMap class, something like this:
public ConcurrentHashMap<String, String> cacheMap = new ConcurrentHashMap<>();
#PostConstruct
public void initialiseCacheMap() {
List<MyEntity> list = myRepository.findAll();
for(int i = 0; i < list.size(); i++) {
cacheMap.put(list.get(i).getKey(), list.get(i).getValue());
}
}
In my service class, whenever I want to get something, I am first checking if the data is available in the map, if not I am checking the DB.
My purpose is getting fulfilled and I am able to drastically improve the performance of the application. A certain set of transactions were earlier taking 6.28 seconds to complete, which are now completed in mere 562 milliseconds! however, there is just one problem which I am not able to figure out:
#PostConstruct is called once by Spring, on startup, post dependency injection. Which means, I have no means to re-trigger the cache build without restart or application downtime, this is not acceptable unfortunately. Further, as of now, I do not have the liberty to use any existing caching frameworks or libraries like ehcache or Redis.
How can I achieve periodic refreshing of this cache (let's say every 30 minutes?) with only plain old Java/Spring classes/libraries?
Thanks in advance for any ideas!
You can do this several ways, but how you can also achieve this is by doing something in the direction of:
private const val everyThrityMinute = "0 0/30 * * * ?"
#Component
class TheAmazingPreloader {
#Scheduled(cron = everyThrityMinute)
#EventListener(ApplicationReadyEvent::class)
fun refreshCachedEntries() {
// the preloading happens here
}
}
Then you have the preloading bits when the application has started, and also the refreshing mechanism in place that triggers, say, every 30 minutes.
You will require to add the annotation on some #Configuration-class or the #SpringBootApplication-class:
#EnableScheduling

Spring WebFlux: Refactoring blocking API with Reactive API, or should I?

I have a legacy Spring Boot REST app that interacts with downstream services that block. I'm new to reactive programming, and am unsure how to handle these blocking requests. Most Webflux examples I've seen are pretty trivial. Here's the flow-of-control of my app:
User queries MyApp at http://myapp.com
MyApp then queries partner REST API, which is BLOCKING.
Depending on account type, data from the blocking app needs to be queried to make another call to another blocking REST application.
All data is enriched and rendered by MyApp to the browser.
Where to start? I'm using WebClient currently, so that part's done. I know I should perform the blocking steps on a different scheduler (parallel or boundedElastic?) Should I use a Flux or Mono, since the partner APIs return the data all at once?
Both apps return thousands of rows of data, and the user just waits... Steps 1-2 take about 4 secs; add in step 3, and we're looking at over 30 seconds due to the inefficiency of the API. Can Flux help my users' wait time at all?
EDIT Below is a (long) example of what my application is doing. Notice that I block my first call to the API to get a count of what's being returned, then I fetch the rest in batches of TASK_QUERY_LIMIT.
#Bean
public WebClient authWebClient(WebClient.Builder builder) {
MultiValueMap<String, String> map = new LinkedMultiValueMap<>();
map.set(HttpHeaders.CONTENT_TYPE, MediaType.APPLICATION_JSON_VALUE);
final int size = 48 * 1024 * 1024;
final ExchangeStrategies strategies = ExchangeStrategies.builder()
.codecs(codecs -> codecs.defaultCodecs().maxInMemorySize(size))
.build();
return builder.baseUrl(configProperties.getUrl())
.exchangeStrategies(strategies)
.defaultHeaders(httpHeaders -> httpHeaders.addAll(map))
.filters(exchangeFilterFunctions -> {
exchangeFilterFunctions.add(logResponseStatus());
exchangeFilterFunctions.add(logRequest());
})
.build();
}
public Mono<Task> getTasksMono() {
return getAuthWebClient()
.baseUrl("http://MyApp.com")
.accept(MediaType.APPLICATION_JSON)
.retrieve()
.onStatus(HttpStatus::isError, this::onHttpStatusError)
.bodyToMono(new ParameterizedTypeReference<Response<Task>>() {}));
}
// Service method
public List<Task> getTasksMono() {
Mono<Response<Task>> monoTasks = getTasksMono();
Task tasks = monoTasks.block();
int taskCount = tasks.getCount();
List<Task> returnTasks = new ArrayList<>(tasks.getData());
List<Mono<<Task>> tasksMonoList = new ArrayList<>();
// query API-ONE for all remaining tasks
if (taskCount > TASK_QUERY_LIMIT) {
retrieveAdditionalTasks(key, taskCount, tasksMonoList);
}
// Send out all of the calls at once, and subscribe to their results.
Flux.mergeSequential(tasksMonoList)
.map(Response::getData)
.doOnNext(returnTasks::addAll)
.blockLast();
return returnTasks.stream()
.map(this::transform) // This method performs business logic on the data before returning to user
.collect(Collectors.toList());
}
private void retrieveAdditionalTasks(String key, int taskCount,
List<Mono<Response<Task>>> tasksMonoList) {
int offset = TASK_QUERY_LIMIT;
int numRequests = (taskCount - offset) / TASK_QUERY_LIMIT + 1;
for (int i = 0; i < numRequests; i++) {
tasksMonoList.add(getTasksMono(processDefinitionKey, encryptedIacToken,
TASK_QUERY_LIMIT, offset));
offset += TASK_QUERY_LIMIT;
}
}
There are multiple questions here. Will try to highlight main points
1. Does it make sense refactoring to Reactive API?
From the first look your application is IO bound and typically reactive applications are much more efficient because all IO operations are async and non-blocking. Reactive application will not be faster but you will need less resources to The only caveat is that in order to get all benefits from the reactive API, your app should be reactive end-to-end (reactive drivers for DB, reactive WebClient, …). All reactive logic is executed on Schedulers.parallel() and you need small number of threads (by default, number of CPU cores) to execute non-blocking logic. It’s still possible use blocking API by “offloading” them to Schedulers.boundedElastic() but it should be an exception (not the rule) to make your app efficient. For more details, check Flight of the Flux 3 - Hopping Threads and Schedulers.
2. Blocking vs non-blocking.
It looks like there is some misunderstanding of the blocking API. It’s not about response time but about underlining API. By default, Spring WebFlux uses Reactor Netty as underlying Http Client library which itself is a reactive implementation of Netty client that uses Event Loop instead of Thread Per Request model. Even if request takes 30-60 sec to get response, thread will not be blocked because all IO operations are async. For such API reactive applications will behave much better because for non-reactive (thread per request) you would need large number of threads and as result much more memory to handle the same workload.
To quantify efficiency we could apply Little's Law to calculate required number of threads in a ”traditional” thread per request model
workers >= throughput x latency, where workers - number of threads
For example, to handle 100 QPS with 30 sec latency we would need 100 x 30 = 3000 threads. In reactive app the same workload could be handled by several threads only and, as result, much less memory. For scalability it means that for IO bound reactive apps you would typically scale by CPU usage and for “traditional” most probably by memory.
Sometimes it's not obvious what code is blocking. One very useful tool while testing reactive code is BlockHound that you could integrate into unit tests.
3. How to refactor?
I would migrate layer by layer but block only once. Moving remote calls to WebClient could be a first step to refactor app to reactive API. I would create all request/response logic using reactive API and then block (if required) at the very top level (e.g. in controller). Do’s and Don’ts: Avoiding First-Time Reactive Programmer Mines is a great overview of the common pitfalls and possible migration strategy.
4. Flux vs Mono.
Flux will not help you to improve performance. It’s more about downstream logic. If you process record-by-record - use Flux<T> but if you process data in batches - use Mono<List<T>>.
Your current code is not really reactive and very hard to understand mixing reactive API, stream API and blocking multiple times. As a first step try to rewrite it as a single flow using reactive API and block only once.
Not really sure about your internal types but here is some skeleton that could give you an idea about the flow.
// Service method
public Flux<Task> getTasks() {
return getTasksMono()
.flatMapMany(response -> {
List<Mono<Response<Task>>> taskRequests = new ArrayList<>();
taskRequests.add(Mono.just(response));
if (response.getCount() > TASK_QUERY_LIMIT) {
retrieveAdditionalTasks(key, response.getCount(), taskRequests);
}
return Flux.mergeSequential(taskRequests);
})
.flatMapIterable(Response::getData)
.map(this::transform); // use flatMap in case transform is async
}
As I mentioned before, try to keep internal API reactive returning Mono or Flux and block only once in the upper layer.

Spring Boot #Cachable - how to find out expire datetime at runtime?

When using #Cachable in Spring, is there any way to find out at runtime if the next method call would be a cache hit or cache miss? Or, at what datetime the cache expires?
The background is, it would be nice to have a scheduled job which refreshes caches just before they expire. Also, this could allow us to find out if caches are expired and show the user a message that the system is now refreshing the caches. "please hold on, we are refreshing your caches for XY" to let the user know what's going on. - Please note we use the Cacheable feature to cache method calls which collect data from multiple calls to a 3rd party system which take up to several minutes.
There is no such thing in the Spring Cache abstraction.
As the data collection takes minutes, and the users might need that last value while data is being collected, you can't wait for the cache eviction to collect new data.
One solution would be to add an #Scheduled task that populates the cache calling a method with the #CachePut.
CachedService.java
#CachePut(cacheNames="book", key="#isbn")
public Book updateBook(int id, Book book) {
// Left blank
}
Config.java
#Scheduled(fixedRate = 60000)
public void reportCurrentTime() {
log.info("refreshing cache");
... long data collection ...
// Call method that has #CachePut
service.updateBook(bookId, book);
log.info("cache refreshed");
}
The example shows 60000ms (1 minute), it updates the cache every minute. You have to calculate it so it is less time than the Spring Cache expiration time. i.e. (expire time - request time)

Project reactor processors v3.X

We are trying to migrate from 2.X to 3.X.
https://github.com/reactor/reactor-core/issues/375
We have used the EventBus as event manager in our application(Low latency FX system) and it works very well for us.
After the change we decided to take every module and create his own processor to handle event.
1. Does this use seems to be correct from your point of view? Because lack of document at the current stage and after reviewing everything we could we don't really know what to do here
2. We have tried to use Flux in order to perform action every X interval
For example: Market is arriving 1000 for 1 second but we want to process an update only 4 time in a second. After upgrading we are using:
Processor with buffer and sending to another method.
In this method we have Flux that get list and try to work in parallel in order to complete his task.
We had 2 major problems:
1. Sometimes we received Null event which we cannot find that our system is sending to i suppose maybe we are miss using the processor
//Definition of processor
ReplayProcessor<Event> classAEventProcessor = ReplayProcessor.create();
//Event handler subscribing
public void onMyEventX(Consumer<Event> consumer) {
Flux<Event> handler = classAEventProcessor .filter(event -> event.getType().equals(EVENT_X));
handler.subscribe(consumer);
}
in the example above the event in the handler sometimes get null.. Once he does the stream stop working until we are restating server(Because only on restart we are doing creating processor)
2.We have tried to us parallel but sometimes some of the message were disappeared so maybe we are misusing the framework
//On constructor
tickProcessor.buffer(1024, Duration.of(250, ChronoUnit.MILLIS)).subscribe(markets ->
handleMarkets(markets));
//Handler
Flux.fromIterable(getListToProcess())
.parallel()
.runOn(Schedulers.parallel())
.doOnNext(entryMap -> {
DoBlockingWork(entryMap);
})
.sequential()
.subscribe();
The intention of this is that the processor will wakeup every 250ms and invoke the handler. The handler will work work with Flux parallel in order to make better and faster processing.
*In case that DoBlockingWork takes more than 250ms i couldn't understand what will be the behavior
UPDATE:
The EventBus was wrapped by us and every event subscribed throw the wrapped event manager.
Now we have tried to create event processor for every module but it works very slow. We have used TopicProcessor with ThreadExecutor and still very slow.. EventBus did the same work in high speed
Anyone has any idea? BTW when i tried to use DirectProcessor it seems to work much better that the TopicProcessor
Reactor 3 is built around the concept that you should avoid blocking as much as you can, so in your second snippet DoBlockingWork doesn't look good.
How are the events generated? Do you maybe have an listener-based asynchronous API to get them? If so, you could try using Flux.create.
For your use case of "we have 1000 events in 1 second, but only want to process 4", I'd chain a sample operator. For instance, sample(Duration.ofMillis(250)) will divide each second into 4 windows, from which it will only emit the last element.
The reference guide is being written, as well as a page where you can find links to external articles and learning material.There's a preview of the WIP reference guide here and the learning resources page here.

Distributed caching in storm

How to store the temporary data in Apache storm?
In storm topology, bolt needs to access the previously processed data.
Eg: if the bolt processes varaiable1 with result as 20 at 10:00 AM.
and again varaiable1 is received as 50 at 10:15 AM then the result should be 30 (50-20)
later if varaiable1 receives 70 then the result should be 20 (70-50) at 10:30.
How to achieve this functionality.
In short, you wanted to do micro-batching calculations with in storm’s running tuples.
First you need to define/find key in tuple set.
Do field grouping(don't use shuffle grouping) between bolts using that key. This will guarantee related tuples will always send to same task of downstream bolt for same key.
Define class level collection List/Map to maintain old values and add new value in same for calculation, don’t worry they are thread safe between different executors instance of same bolt.
I'm afraid there is no such built-in functionality as of today.
But you can use any kind of distributed cache, like memcached or Redis. Those caching solutions are really easy to use.
There are a couple of approaches to do that but it depends on your system requirements, your team skills and your infrastructure.
You could use Apache Cassandra for you events storing and you pass the row's key in the tuple so the next bolt could retrieve it.
If your data is time series in nature, then maybe you would like to have a look at OpenTSDB or InfluxDB.
You could of course fall back to something like Software Transaction Memory but I think that would needs good amount of crafting.
Uou can use CacheBuilder to remember your data within your extended BaseRichBolt (put this in the prepare method):
// init your cache.
this.cache = CacheBuilder.newBuilder()
.maximumSize(maximumCacheSize)
.expireAfterWrite(expireAfterWrite, TimeUnit.SECONDS)
.build();
Then in execute, you can use the cache to see if you have already seen that key entry or not. from there you can add your business logic:
// if we haven't seen it before, we can emit it.
if(this.cache.getIfPresent(key) == null) {
cache.put(key, nearlyEmptyList);
this.collector.emit(input, input.getValues());
}
this.collector.ack(input);
This question is a good candidate to demonstrate Apache Spark's in memory computation over the micro batches. However, your use case is trivial to implement in Storm.
Make sure the bolt uses fields grouping. It will consistently hash the incoming tuple to the same bolt so we do not lose out on any tuple.
Maintain a Map<String, Integer> in the bolt's local cache. This map will keep the last known value of a "variable".
class CumulativeDiffBolt extends InstrumentedBolt{
Map<String, Integer> lastKnownVariableValue;
#Override
public void prepare(){
this.lastKnownVariableValue = new HashMap<>();
....
#Override
public void instrumentedNextTuple(Tuple tuple, Collector collector){
.... extract variable from tuple
.... extract current value from tuple
Integer lastValue = lastKnownVariableValue.getOrDefault(variable, 0)
Integer newValue = currValue - lastValue
lastKnownVariableValue.put(variable, newValue)
emit(new Fields(variable, newValue));
...
}

Resources