scaled microservices instances needs to update 1 - spring-boot

I have unique problem trying to see what is the best implementation for this.
I have table which has half million rows. Each row represents
business entity I need to fetch information about this entity from
internet and update back on the table asynchronously
. (this process takes about 2 to 3 minutes) .
I cannot get all these rows updated efficiently with 1 instance of
microservices. so planning to scale this up to multiple instances
my microservice instances is async daemon fetch business entity 1 at time and process the data & finally update the data back to the table.
. Here is where my problem between multiple instances how do I ensure no 2 microservice instance works with same business entity (same row) in the update process? I want to implement an optimal solution microservices probably without having to maintain any state on the application layer.

You have to use an external system (Database/Cache) to save information about each instance.
Example: Shedlock. Creates a table or document in the database where it stores the information about the current locks.

I would suggest you to use a worker queue. Which looks like a perfect fit for your problem. Just load the whole data or id of the data to the queue once. Then let the consumers consume them.
You can see an clear explanation here
https://www.rabbitmq.com/tutorials/tutorial-two-python.html

Related

Cache and update regularly complex data

Lets star with background. I have an api endpoint that I have to query every 15 minutes and that returns complex data. Unfortunately this endpoint does not provide information of what exactly changed. So it requires me to compare the data that I have in db and compare everything and than execute update, add or delete. This is pretty boring...
I came to and idea that I can simply remove all data from certain tables and build everything from scratch... But it I have to also return this cached data to my clients. So there might be a situation that the db will be empty during some request from my client because it will be "refreshing/rebulding". And that cant happen because I have to return something
So I cam to and idea to
Lock the certain db tables so that the client will have to wait for the "refreshing the db"
or
CQRS https://martinfowler.com/bliki/CQRS.html
Do you have any suggestions how to solve the problem?
It sounds like you're using a relational database, so I'll try to outline a solution using database terms. The idea, however, is more general than that. In general, it's similar to Blue-Green deployment.
Have two data tables (or two databases, for that matter); one is active, and one is inactive.
When the software starts the update process, it can wipe the inactive table and write new data into it. During this process, the system keeps serving data from the active table.
Once the data update is entirely done, the system can begin to serve data from the previously inactive table. In other words, the inactive table becomes the active table, and vice versa.

Implementing static shared counter in microservice architecture

I have a use case where i want to record data in rows and display to the user.
Multiple users can add these records and they have to be displayed in order of insertion AND - MOST IMPORTANTLY - with a sequence number starting from 1.
I have a Spring boot microservice architecture at the backend, which obviously means i cannot hold state in my boot application as i'm gonna have multiple running instances.
Another method was to fetch all existing records in the db,count them,increment the count by 1 and use that as my sequence. I need to do this every time i am doing an insert.
But the problem with the second approach is with parallel requests, which could result in same sequence number being given to 2 records.
Third approach is to configure the counter in a db , but since i am using cosmos DB, apparently that is also not an option.
Any suggestions as to how i can implement a static, shared counter ?

How to lock on select and release lock after update is committed using spring?

I have started using spring from last few months and I have a question on transactions. I have a java method inside my spring batch job which first does a select operation to get first 100 rows with status as 'NOT COMPLETED' and does a update on the selected rows to change the status to 'IN PROGRESS'. Since I'm processing around 10 million records, I want to run multiple instances of my batch job and each instance has multiple threads. For a single instance, to make sure two threads are not fetching the same set of records, I have made my method as synchonized. But if I run multiple instances of my batch job (multiple JVMs), there is high probability that same set of records might be fetched by both the instances even if I use "optimistic" or "pesimistic lock" or "select for update" since we cannot lock records during selection. Below is the example shown. Transaction 1 has fetched 100 records and meanwhile Transaction2 also fetched 100 records but if I enable locking transaction 2 waits until transaction 1 is updated and committed. But Transaction 2 again does the same update.
Is there any way in spring to make transaction 2's select operation to wait until transaction 1's select is completed ?
Transaction1 Transaction2
fetch 100 records
fetch 100 records
update 100 records
commit
update 100 records
commit
#Transactional
public synchronized List<Student> processStudentRecords(){
List<Student> students = getNotCompletedRecords();
if(null != students && students.size() > 0){
updateStatusToInProgress(students);
}
return student;
}
Note: I cannot perform update first and then select. I would appreciate if any alternative approach is suggested ?
Transaction synchronization should be left to the database server and not managed at the application level. From the database server point of view, no matter how many JVMs (threads) you have, those are concurrent database clients asking for read/write operations. You should not bother yourself with such concerns.
What you should do though is try to minimize contention as much as possible in the design of your solution, for example, by using the (remote) partitioning technique.
if I run multiple instances of my batch job (multiple JVMs), there is high probability that same set of records might be fetched by both the instances even if I use "optimistic" or "pesimistic lock" or "select for update" since we cannot lock records during selection
Partitioning data will by design remove all these problems. If you give each instance a set of data to work on, there is no chance that a worker would select the same of records of another worker. Michael gave a detailed example in this answer: https://stackoverflow.com/a/54889092/5019386.
(Logical) Partitioning however will not solve the contention problem since all workers would read/write from/to the same table, but that's the nature of the problem you are trying to solve. What I'm saying is that you don't need to start locking/unlocking the table in your design, leave this to the database. Some database severs like Oracle can write data of the same table to different partitions on disk to optimize concurrent access (which might help if you use partitioning), but again that's Oracle's business, not Spring's (or any other framework) business.
Not everybody can afford Oracle so I would look for a solution at the conceptual level. I have successfully used the following solution ("Pseudo" physical partitioning) to a problem similar to yours:
Step 1 (in serial): copy/partition unprocessed data to temporary tables (in serial)
Step 2 (in parallel): run multiple workers on these tables instead of the source table with millions of rows.
Step 3 (in serial): copy/update processed data back to the original table
Step 2 removes the contention problem. Usually, the cost of (Step 1 + Step 3) is neglectable compared to Step 2 (even more neglectable if Step 2 is done in serial). This works well if the processing is the bottleneck.
Hope this helps.

Parallel processing of records from database table

I have a relational table that is being populated by an application. There is a column named o_number which can be used to group the records.
I have another application that is basically having a Spring Scheduler. This application is deployed on multiple servers. I want to understand if there is a way where I can make sure that each of the scheduler instances processes a unique group of records in parallel. If a set of records are being processed by one server, it should not be picked up by another one. Also, in order to scale, we would want to increase the number of instances of the scheduler application.
Thanks
Anup
This is a general question, so here's my general 2 cents on the matter.
You create a new layer managing the requesting originating from your application instances to the database. So, probably you will be building a new code/project running on the same server as the database (or some other server). The application instances will be talking to that managing layer instead of the database directly.
The manager will keep track of which records are requested hence fetch records that are yet to be processed upon each new request.

Distributed database design style for microservice-oriented architecture

I am trying to convert one monolithic application into micro service oriented architecture style. Back end I am using spring , spring boot frameworks for development. Front-end I am using angular 2. And also using PostgreSQL as database.
Here my confusion is that, when I am designing my databases as distributed, according to functionalities it may contain 5 databases. Means I am designing according to vertical partition. Then I am thinking to implement inter-microservice communication services to achieve the entire functionality.
The other way I am thinking that to horizontally partition the current structure. So my domain is based on some educational university. So half of university go under one DB and remaining will go under another DB. And deploy services according to Two region (two for two set of university).
Currently I am decided to continue with the last mentioned approach. I am new to these types of tasks, since it referring some architecture task. Also I am beginner to this microservice and distributed database world. Would someone confirm that my approach will give solution to my issue? Can I continue with my second approach - horizontal partitioning of databases according to domain object?
Can I continue with my second approach - Horizontal partitioning of
databases according to domain object?
Temporarily yes, if based on that you are able to scale your current system to meet your needs.
Now lets think about why on the first place you want to move to Microserices as a development style.
Small Components - easier to manager
Independently Deployable - Continous Delivery
Multiple Languages
The code is organized around business capabilities
and .....
When moving to Microservices, you should not have multiple services reading directly from each other databases, which will make them tightly coupled.
One service should be completely ignorant on how the other service designed its internal structure.
Now if you want to move towards microservices and take complete advantage of that, you should have vertical partition as you say and services talk to each other.
Also while moving towards microservices your will get lots and lots of other problems. I tried compiling on how one should start on microservices on this link .
How to separate services which are reading data from same table:
Now lets first create a dummy example: we have three services Order , Shipping , Customer all are three different microservices.
Following are the ways in which multiple services require data from same table:
Service one needs to read data from other service for things like validation.
Order and shipping service might need some data from customer service to complete their operation.
Eg: While placing a order one will call Order Service API with customer id , now as Order Service might need to validate whether its a valid customer or not.
One approach Database level exposure -- not recommened -- use the same customer table -- which binds order service to customer service Impl
Another approach, Call another service to get data
Variation - 1 Call Customer service to check whether customer exists and get some customer data like name , and save this in order service
Variation - 2 do not validate while placing the order, on OrderPlaced event check in async from Customer Service and validate and update state of order if required
I recommend Call another service to get data based on the consistency you want.
In some use cases you want a single transaction between data from multiple services.
For eg: Delete a customer. you might want that all order of the customer also should get deleted.
In this case you need to deal with eventual consistency, service one will raise an event and then service 2 will react accordingly.
Now if this answers your question than ok, else specify in what kind of scenario multiple service require to call another service.
If still not solved, you could email me on puneetjindal.11#gmail.com, will answer you
Currently I am decided to continue with the last mentioned approach.
If you want horizontal scalability (scaling for increasingly large number of client connections) for your database you may be better of with a technology that was designed to work as a scalable, distributed system. Something like CockroachDB or NoSQL. Cockroachdb for example has built in data sharding and replication and allows you to grow with adding server nodes as required.
when I am designing my databases as distributed, according to functionalities it may contain 5 databases
This sounds like you had the right general idea - split by domain functionality. Here's a link to a previous answer regarding general DB design with micro services.
In the Microservices world, each Microservice owns a set of functionalities and the data manipulated by these functionalities. If a microservice needs data owned by another microservice, it cannot directly go to the database maintained/owned by the other microservice rather it would call an API exposed by the other microservice.
Now, regarding the placement of data, there are various options - you can store data owned by a microservice in a NoSQL database like MongoDB, DynamoDB, Cassandra (it really depends on the microservice's use-case) OR you can have a different table for each micro-service in a single instance of a SQL database. BUT remember, if you choose a single instance of a SQL Database with multiple tables, then there would be no joins (basically no interaction) between tables owned by different microservices.
I would suggest you start small and then think about database scaling issues when the usage of the system grows.

Resources