spring boot business reports - spring

I have a order fulfillment application mostly made with Spring Data Rest and spring boot. I need to get every product, find them in orders and calculate how many sold in total and what the price is in a given time period. I have other requirements like this.
Now the data represented in the report will not be like any of the business entities. It will have sums, totals, a field from one entity, a field from another entity... And it will not be persisted it will be generated only for the client to consume. It should be pageable too.
What is the correct approach to tackle this? Do I use pojos to represent the report lines? How do I manage paging with this? Does it make sense to have a repository for each report? Is it possible with hql to return a report line that is not a persisted entity from a repository method?

Related

How to handle data migrations in distributed microservice databases

so im learning about microservices and common patterns and i cant seem to find how to address this one issue.
Lets say that my customer needs a module managing customers, and a module managing purchase orders.
I believe that when dealing with microservices its pretty natural to split these two functionalities into separate services - each having its own data.
CustomerService
PurchaseOrderService
Also, he wants to have a table of purchase orders displaying the data of both customers and purchase orders, ie .: Customer name, Order number.
Now, i dont want to use the API Composition pattern because the user must be able to sort over any column he wants which (afaik) is impossible to do without slaughtering the performance using that pattern.
Instead, i choose CQRS pattern
after every purchase order / customer update a message is sent to the message broker
message broker notifies the third service about that message
the third service updates its projection in its own database
So, our third service .:
PurchaseOrderTableService
It stores all the required data in the single database - now we can query it, sort over any column we like while still maintaining a good performance.
And now, the tricky part .:
In the future, client can change his mind and say "Hey, i need the purchase orders table to display additional column - 'Customer country'"
How does one handle that data migration? So far, The PurchaseOrderTableService knows only about two columns - 'Customer name' and 'Order number'.
I imagine that this probably a pretty common problem, so what can i do to avoid reinventing the wheel?
I can of course make CustomerService generate 'CustomerUpdatedMessage' for every existing customer which would force PurchaseOrderTableService to update all its projections, but that seems like a workaround.
If that matters, the stack i thought of is java, spring, kafka, postgresql.
Divide the problem in 2:
Keeping live data in sync: your projection service from now on also needs to persist Customer Country, so all new orders will have the country as expected.
Backfill the older orders: this is a one off operation, so how you implement it really depends on your organization, technologies, etc. For example, you or a DBA can use whatever database tools you have to extract the data from the source database and do a bulk update to the target database. In other cases, you might have to solve it programmatically, for example creating a process in the projection microservice that will query the Customer's microservice API to get the data and update the local copy.
Also note that in most cases, you will already have a process to backfill data, because the need for the projection microservice might arrive months or years after the orders and customers services were created. Other times, the search service is a 3rd party search engine, like Elastic Search instead of a database. In those cases, I would always keep in hand a process to fully reindex the data.

Spring batch fetch huge amount of data from DB-A and store them in DB-B

I have the following scenario. In a database A I have a table with huge amount of records (several millions); these records increase day by day very rapidly (also 100.000 records at day).
I need to fetch these records, check if these records are valid and import them in my own database. At the first interaction I should take all the stored records. Then I can take only the new records saved. I have a timestamp column I can use for this filter but I can't figure how to create a JpaPagingItemReader or a JdbcPagingItemReader and pass the dynamic filter based on the date (e.g. select all records where timestamp is greater than job last execution date)
I'm using spring boot, spring data jpa and spring batch.I'm configuring the Job instance in chunks with dimension 1000. I can also use a paging query (is it useful if I use chunks?)
I have a micro service (let's call this MSA) with all the business logic needed to check if records are valid and insert the valid records.
I have another service on a separate server. This service contains all the batch operation (let's call this MSB).
I'm wondering what is the best approach to the batch. I was thinking to these solutions:
in MSB I duplicate all the entities, repositories and services I use in the MSA. Then in MSB I can make all needed queries
in MSA I create all the rest API needed. The ItemProcessor of MSB will call these rest API to perform checks on items to be processed and finally in the ItemWriter I'll call the rest API for saving data
The first solution would avoid the http calls but it forces me to duplicate all repositories and services between the 2 micro services. Sadly I can't use a common project where to place all the common objects.
The second solution, on the other hand, would avoid the code duplication but it would imply a lot of http calls (above all in the ItemProcessor to check if an item is valid or less).
Do you have any other suggestion? Is there a better approach?
Thank you
Angelo

Spring HATEOAS, how to handle conversion links to entities without flooding the DB

I'm using Spring boot 2.3, Spring Data REST, Spring HATEOAS, Hibernate.
Let's think to a simple use case like an user creating an invoice in a web client, or a inventory list for a warehouse. When the user submit the form, could be sent hundreds or rows and these rows can have links to other entities.
In the case of the invoice, for example, each row can have a product reference that will be passed to the sever as a link.
That link is translated by Spring into an entity using Repository. My point is that for every row, a query to get the product runs.
This means that everything will be really slow during insert (n+1 select problem).
Probably I missed somthing in the logic, but I didn't see concrete examples that focus on how to handle a big quantity of translations link -> entity.
Do you have any hint about it?
Is your point about many entities that will be created if linked entities will be returned to server? Hibernate (as well as spring) has lazy loading mechanism - https://blog.ippon.tech/boost-the-performance-of-your-spring-data-jpa-application/, so only necessary entities will be populated. Please, correct me if I miss understand your questions.

How can I divide one database to multi databases?

I want to decompose my application to adopt microservices architecture, and i will need to come up with a solid strategy to split my database (Mysql) into multiple small databases (mysql) aligned with my applications.
TL;DR: Depends on the scenario and from what each service will do
Although there is no clear answer to this, since it really depends on your needs and on what each service should do, you can come up with a general starting point (assuming you don't need to keep the existing database type).
Let's assume you have a monolithic application for an e-commerce, and you want to split this application into smaller services, each one with it's own database.
The approach you could use is to create some services that handles some parts of the website: for example you could have one service that handles users authentication,one for the orders, one for the products, one for the invoices and so on...
Now, each service will have it's own database, and here's come another question: which database a specific service should have? Because one of the advantages of this kind of architecture is that each service can have it's own kind of database, so for example the products service can have a non relational database, such as MongoDB, since all it does is getting details about products, so you don't have to manage any relation.
The orders service, on the other hand, could have a relational database, since you want to keep a relation between the order and the invoice for that order. But wait, invoices are handled by the invoice service, so how can you keep the relation between these two without sharing the database? Well, that's one of the "issues" of this approach: you have to keep services independent while also let them communicate each other. How can we do this? There is no clear answer here too... One approach could be to just pass all invoices details to the orders service as well, or you can just pass the invoice ID when saving the order and later retrieve the invoice via an API call to the invoice service, or you can pass all the relevant details you need for the invoice to an API endpoint in the order service that stores these data to a specific table in the database (since most of the times you don't need the entire actual object), etc... The possibilities are endless...

Microservices "JOINS"

Let's say we want to create the app with microservices.
We have some page where we display some items (products).
These products have multiple joins(categories, tags, users, and so on).
If users, categories data are within another services, how can we manage and filter the results?
For example in SQL you create 3,4 joins and get.
With microservices - I have to filter the categories, then filter tags and then products - this could be 10 time slower than the speed of the SQL query.
Also if I have table "products_categories" which set categories for each product which service is responsible for that? Product service or Category service ?
Thank you
In Microservices architecture there are two ways to deal with it.
The API composition pattern— This is the simplest approach and should be used whenever possible. It works by making clients of the services that own the data responsible for invoking the services and combining the results.
The Command query responsibility segregation (CQRS) pattern— This is more powerful than the API composition pattern, but it’s also more complex. It maintains one or more view databases whose sole purpose is to support queries.
I will prefer to use CQRS, Define a view database, which is a read-only replica to support specifically that query. The rest of the services keeps the replica up to date by subscribing to (create, update, insert)events published by the data owner services.
This is a very standard problem whenever any micro-service is built.. People just always feel micro-service is the solution for everything which is not true.
Solution to this problem is designing better. Designing so that there is a balance between performance and redundancy of data. Higher performance ( lower latency numbers ) means more duplicacy of data across different databases of microservice. You should not target to achieve performance as good as SQL Joins ; but also do not duplicate data too much. A balance is needed..
Most importantly, dividing the requirement into right set of micro-services is needed.
I assume you created a "microservice" per database table. Those are not microservices, those are just HTTP-based CRUD interfaces to your database.
First, know why you need microservices. (Is there an actual reason?) Second, you have to create microservices that encompass at least one full (business) functionality for your software. Meaning it doesn't need other services to do it.
If you need a table that needs data from multiple microservices, you by definition made wrong microservices. If a microservice can't provide it's own UI without the help of other services, it doesn't fully contain it's own functionality.
What's stopping you from having multiple services for reading / writing to the same database / table? For example:
One service to write to categories
One service to write to tags
One service to write to products
You could then write another service to read from all three of these services, however, this might not be at a HTTP level, instead you could read from the same database within your read service and leverage the power of SQL.
The service that reads could encompass your join logic which would mean you wouldn't need to consume the other services around it.

Resources