Make sure that data is loaded before the application startup

Make sure that data is loaded before the application startup | Spring webflux - spring

I have a spring webflux application.
I am loading some list from database into bean. I have two ways of implementing the loading of this bean.
Approach 1: Reactive Way
#Bean
public List<Item> getItemList() throws IOException {
List<Item> itemList = new ArrayList<>();
itemRespository.findAll().collectList().subscribe(itemList::addAll);
return itemList;
}
Approach 2 : Blocking way
#Bean
public List<Item> getItemList() throws IOException {
List<Item> itemList = itemRespository.findAll().collectList().block();
return itemList;
}
Now as I want my application to be reactive, I don't want to use the blocking way.
But the endpoints which I am exposing through my controller depends on this bean's data.
#RestController
public class SomeController{
#Autowired
private List<items> getItemList;
#GetMapping('/endpoint')
public void process(){
List list = getItemList; //this may not get initialzed as the bean loading is reactive
//some more code
}
}
So in case of reactive approach, it may happen that somebody may call my endpoint(as application has already started and ready to serve requests), while due to some reason it may happened that my list has yet not bean retrieved from database(may be any reason ex: slowness of database server etc.), producing inconsistent results for the users calling my endpoint(which in turns depend on this bean).
I am looking for a solution for this scenario.
EDIT : More precise question is that should I load those beans reactively in my application, on which my exposed endpoints are dependent?

The current application architecture solution presented is a typical example on a design that is inherently blocking.
If the first request made to the api needs the items to be in place, then we must sure that they are there before we can take on requests. And the only way to ensure that is to block until the items de facto have been fetched and stored.
Since the design is inherently blocking, we need to rethink our approach.
What we want is to make the service available for requests as quick as possible. We can solve this by using a cache, that will get filled when the first request is made.
Which means application starts up with an empty cache. This cache could for instance be a #Component as spring beans are singletons by default.
the steps would be:
service starts up, cache is empty
service receives its first request
checks if there is data in the cache
if data is stale, evict the cache
if cache is empty, fetch the data from our source
fill the cache with our fetched data
set a ttl (time to live) on the data placed in the cache
return the data to the calling client
Second request:
request comes in to the service
checks if there is data in the cache
checks if the data is stale
if not grab the data and return it to the calling subscriber
There are several cache solutions out there, spring has their #Cachable annotation, which by default is just a key value store, but can be paired with an external solution like redis etc.
Other solutions can be Google guava which has a very good read on their github.
This type of solution is called trading memory for cpu we gain startup time and fast requests (cpu), but the cost is we will spend some more memory to hold data in a cache.

Related

optimizing findAll in spring Data JPA

I have a table which has a list of lookup values max 50 rows.
Currently, I am querying this table every time to look for a particular value which is not efficient.
So, I am planning to optimize this by loading all the value at once as a List from the repository using findAll.
List<CardedList> findAll();
My question here is
Class A -> Class B - Class B which holds this repository. Will it query findAll everytime when Class A calls Class B?
Class A {
//foreach items in the list call Class B
b.someMethod();
}
Class B {
#Autowired
CardedListRepository cardRepo;
someMethod() {
cardRepo.findAll();
}
}
What is the best way to achieve this?

If it is just 50 rows you could cache them in an instance variable of a service and check like this:
Class B {
#Autowired
CardedListRepository cardRepo;
List<CardedList> cardedList = new ArrayList<>();
someMethod() {
if(cardedList.isEmpty())
{
cardedList = cardRepo.findAll();
}
// do others in someMethod
}

The proposed "solution" by #Juliyanage Silva (to "cache" the findAll query result as a simple instance variable of service B) can be very dangerous and should not be implemented before checking very carefully that it works under all circumstances.
Just imagine the same service instance being called from a subsequent transaction - you would end up with a (probably outdated) list of detached entities.
(e.g. leading to LazyInitializationExceptions when accessing not initialized properties, etc.)
Hibernate already provides several caching mechanisms, as e.g. a standard first level cache, which avoids unnecessary DB round trips when looking for an already loaded entity by ID within the same transaction.
However, query results (as from findAll) are not cached by default, as explained in the documentation:
Caching of query results introduces some overhead in terms of your applications normal transactional processing. For example, if you cache results of a query against Person, Hibernate will need to keep track of when those results should be invalidated because changes have been committed against any Person entity.
That, coupled with the fact that most applications simply gain no benefit from caching query results, leads Hibernate to disable caching of query results by default.
To enable the Hibernate query cache, the second level cache needs to be configured. To prevent ending up with stale entries when having multiple application instances, this calls for a distributed cache (like Hazelcast or EhCache).
There are also various discussions on using springs caching mechanisms for this purpose. However, there are also various pitfalls when it comes to caching collections. And when running multiple application instances you may need a distributed cache or another global invalidation mechanism, too.
How to add cache feature in Spring Data JPA CRUDRepository
Spring Cache with collection of items/entities
Spring Caching not working for findAll method
So depending on your use-case, it may be the easiest to just avoid unnecessary calls of service B by storing the result in a local variable within the calling method of service A.

Reactive streaming approach of file upload in Spring (Boot)

We have spent a lot of hours on the inet and on stackoverflow, but none of the findings satisfied us in the way we planned a file upload in Spring context.
A few words towards our architecture. We have a node.js client which uploads files into a Spring Boot app. Let us call this REST endpoint our "client endpoint". Our Spring Boot application acts as middleware and calls endpoints of a "foreign system", so we call this endpoint a "foreign" one, due to distinction. The main purpose is the file handling between these two endpoints and some business logic in between.
Actually, the interface to our client looks like this:
public class FileDO {
private String id;
private byte[] file;
...
}
Here we are very flexible because it is our client and our interface defintion.
Due to the issue that under load our system has run out of memory sometimes, we plan to reorganize our code into a more stream-based, reactive approach. When i write "under load", i mean heavily under load, e.g. hundreds of file uploads at the same time with big files from at least some MB to at most 1GB. We know, that this tests don't represent real applications use cases, but we want to be prepared.
We spent some research into our challenge and we ended up with profiler tools showing us that according to our REST endpoints we store the files as byte arrays completely in our memory. Thats all, but not efficient.
Currently we are facing this requirement to deliver a REST endpoint for file upload and push these files into another REST endpoint of some foreign system. Doing so, our main applications intention is to be some middle tier for file upload. According to this initial situation we are looking forward to not have those files as a whole in our memory. Best would be a stream, maybe reactive. We are partially reactive with some business functions already, but at the very beginning of being familiar with all that stuff.
So, what are our steps so far? We introduced a new Client (node.js --> Spring Boot) interface as the following one. This works so far. But is it really a stream based approach? First metrics have shown, that this doesn't reduce memory utilization.
#PostMapping(value="/uploadFile", consumes = MediaType.MULTIPART_FORM_DATA_VALUE)
#ResponseStatus(HttpStatus.CREATED)
public Mono<Void> upload(#RequestPart(name = "id") String id, #RequestPart(name = "file") Mono<FilePart> file) {
fileService.save(id, file);
...
}
First question: is this type Mono<> right here? Or should we better have Flux of DataBuffer or something? And, if so, how the client shoud behave and deliver data in such a format that it is really a streaming approach?
The FileService class then should post this file(s) into the foreign system, perhaps do something else with given data, at least log the id and the file name. :-)
Our code in this FileService.save(..) actually looks like the following in between:
...
MultipartBodyBuilder bodyBuilder = new MultipartBodyBuilder();
bodyBuilder.asyncPart(...take mono somehow...);
bodyBuilder.part("id", id);
return webClient.create("url-of-foreign-system")
.uri("/uploadFile")
.syncBody(bodyBuilder.build())
.retrieve()
.bodyToMono(Result.class);
...
Unfortunately, the second REST endpoint, the one of our foreign system, looks little different to our first one. It will be enriched by data from another system. It takes some FileDO2 with an id and a byte array and some other meta data specific to the second foreign system.
As said, our approach should be to minimize the memory footprint of the actions in between client and foreign system. Sometimes we have not only to deliver data to that system, but also do some business logic that maybe slows down the whole streaming process.
Any ideas to do that in a whole? Currently we have not clue to do that all...
We appreciate any help or ideas.

Spring JCache logging cache hits

I have a method on which I added a cache by adding the #CacheResult annotation (I actual created a proxy because I can't change the original implementation of SomethingService):
#Service
public class SomethingServiceProxyImpl implements SomethingService {
#Autowired
#Qualifier("somethingService")
SomethingService somethingService;
#Override
#CacheResult(cacheName = "somethingCache", exceptionCacheName = "somethingExceptionCache", cachedExceptions = { SomeException.class })
public SomePojo someMethod(String someArg) {
return somethingService.someMethod(someArg);
}
}
What I need now, is to be able to log cache hits, meaning cases where the result returned was the one from the cache. I've looked at Spring Cache, at JCache and EHCache (the implementation I use) and I've only found way to listen (with listeners) to the following events: CREATED, UPDATED, REMOVED, EVICTED, EXPIRED but none of them have an event for when the cache returned a result (not null).
I don't really want to have to change the implementation to use the cache programatically instead of using the annotations (I actually have a lot of services to change, not just the one), is there a good way to log those events anyway?

Thoughts about that topic. Probably, the first two are the most relevant:
Don't: The code that gets executed in Spring and the respective cache on a cache hit, is the most performance critical one. That's why it is not so clever to let call additional code in that case, or even have an option for that. Wiring in a log will impact your performance massively. Usually there is already logging in an application for everything that leads to a cache request (e.g. incoming web requests). To get an idea whether the cache is working correctly, a counter of the hits is enough. That is available via the JCache JMX Statistics.
Logging adapter: Using Spring, you can write a Cache adapter which does the logging as you need it and wire it in via configuration. Rough idea: Look at the CacheManager and Cache interfaces. Wrap the CacheManager create cache method and return a wrapped cache with logging.
Hack via ExpiryPolicy: When a custom ExpiryPolicy is specified a JCache implementation calls the method getExpiryForAccess on every cache access. However, you don't get any information on the actual key being requested. I also recommend staying away from own ExpiryPolicy implementations, because of performance reasons. So this is just for completeness.
Logging cache / log every access: In case you specify multiple caches, Spring calls them one after another. You could wire in a dummy cache as first cache, which just logs the access.

Store data in BEAN (Spring Boot REST)

I am creating a REST API with Spring Boot. Most of the data come from a database but some data is fetched from third-party APIs. The problem is some of them have access limitations like max 10 requests per minute or something.
Now I am looking for a method to cache the data in my spring application and only update it every few seconds. Storing it in the db and updating it every 10 seconds is a little bit too much since the fetched data is about 1000 rows. So I thought I simply store it in my service bean.
This is my approach so far. (Coded in Kotlin)
#Service
class MyService(){
var myData: CustomDataObject
fun getData() = myData
fun updateData(){
// call API and store in myData Object
}
}
It works but it seems kinda hacky to me. Not really a clean solution, is it?
If someone has a better approach to this I would be very thankful.

Well, for me the answer is pretty obvious: use Spring Cache.

Problems with Spring and Hibernate SessionFactory: Domain object scope restricted to session

I have been using the session factory (Singleton Bean injected into the DAO objects) in my Spring/Hibernate application, I am using the service layers architecture, and I have the following issue:
Anytime I get a domain object from the database, it uses a new session provided by the hibernate session factory. In the case of requesting several times the same row, this leads to having multiple instances of that same domain object. (If using a single session, it would return multiple objects pointing to the same reference) Thus, any changes made to one of those domain object is not taken into account by the other domain objects representing this same row.
I am developing a SWING application with multiple views and I get the same DB row from different locations (And queries), and I thus need to obtain domain objects pointing to the same instance.
My question is then, Is it a way to make this happen using the SessionFactory? If not, is it a good practice to use a single session for my whole application? In that case, how and where should I declare this session? (Should it be a bean injected into the DAO objects just like the sessionFactory?)
Thank you in advance for your help

Hibernate session (I will call it h-session) in Spring usually bound to thread (see JavaDoc for HibernateTransactionManager), so h-session acquired once per thread.
First level cache (h-session cache - always turned on) used to retrieve same object if you call "get" or "load" several times on one h-session. But this cache doesn't work for queries.
Also, you shouldn't forget about problems related to transaction isolation. In most applications "Read committed" isolation level is used. And this isolation level affected by phenomenon known as "non-repeatable reads". Basically, you could receive several versions of the same row in one transaction if you query for this row several times (because row could be updated between queries in another transaction).
So, you shouldn't query several times for same data in one h-session/transaction.

You're looking for the Open Session in View Pattern. Essentially, you want to bind a Session to your thread on application startup and use the same Session throughout the lifetime of the application. You can do this by creating a singleton util class which keeps a session like so (note that the example I have uses an EntityManager instead of a Session, but your code will be essentially the same):
private static EntityManager entityManager;
public static synchronized void setupEntityManager() {
if (entityManager == null) {
entityManager = entityManagerFactory.createEntityManager();
}
if (!TransactionSynchronizationManager.hasResource(entityManagerFactory)) {
TransactionSynchronizationManager.bindResource(entityManagerFactory, new EntityManagerHolder(entityManager));
}
}
public static synchronized void tearDownEntityManager() {
if (entityManager != null) {
if (entityManager.isOpen()) {
entityManager.close();
}
if (TransactionSynchronizationManager.hasResource(entityManagerFactory)) {
TransactionSynchronizationManager.unbindResource(entityManagerFactory);
}
if (entityManagerFactory.isOpen()) {
entityManagerFactory.close();
}
}
}
Note that there are inherent risks associated with the Open Session in View pattern. For example, I noticed in the comments that you intend to use threading in your application. Sessions are not threadsafe. So you'll have to make sure you aren't trying to access the database in a threaded manner.*
You'll also have to be more aware of your fetching strategy for collections. With an open session and lazy loading there's always the chance that you'll put undue load on your database.
*I've used this approach in a NetBeans application before, which I know uses threading for certain tasks. We never had any problems with it, but you need to be aware of the risks, of which there are many.
Edit
Depending on your situation, it may also be possible to evict your domain objects from the Session and cache the detached objects for later use. This strategy would of require that your domain objects not be updated very often, otherwise your application would become unnecessarily complicated.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio