Parallel processing of records from database table

Parallel processing of records from database table - parallel-processing

I have a relational table that is being populated by an application. There is a column named o_number which can be used to group the records.
I have another application that is basically having a Spring Scheduler. This application is deployed on multiple servers. I want to understand if there is a way where I can make sure that each of the scheduler instances processes a unique group of records in parallel. If a set of records are being processed by one server, it should not be picked up by another one. Also, in order to scale, we would want to increase the number of instances of the scheduler application.
Thanks
Anup

This is a general question, so here's my general 2 cents on the matter.
You create a new layer managing the requesting originating from your application instances to the database. So, probably you will be building a new code/project running on the same server as the database (or some other server). The application instances will be talking to that managing layer instead of the database directly.
The manager will keep track of which records are requested hence fetch records that are yet to be processed upon each new request.

Related

Horizontal scaling a microservice that processes a lot of data

Let’s say I have a microservice that needs to generate millions of reports with even more rows of data.
Business rules:
One client generates 0 to many reports on a single run
Many clients can be generating reports in a single
Any request to generate a report for a client that is currently processing should throw an error
The reports are generated on a schedule.
The schedule is stored in the database of the microservice (a) for each client. The schedule is managed by a separate microservice (b) and the data is replicated via integration events to microservice a.
Ex:
Client A, Schedule = today
Client B, Schedule = 3 days from now
Only client A will have a report generated.
Now, let’s say the microservice gets a request to generate all reports for clients configured to generate today. Since it has to generate millions of reports, we want it to horizontally scale.
However, I’m having a hard time identifying a great way to do this. Some ideas:
Only let one instance of the microservice a retrieve the clients that need to generate today. This can be polled in case that service fails and another can pick it up.
Insert this data into a shared cache
or into a topic or queue
that all other instances will process from. Scale based on the number of
messages in the topic.
Let another microservice (b) make the request for generation and pass in each request into a topic or queue that microservice (a) reads. However this introduces a dependency between services and can cause some data ownership ambiguities

scaled microservices instances needs to update 1

I have unique problem trying to see what is the best implementation for this.
I have table which has half million rows. Each row represents
business entity I need to fetch information about this entity from
internet and update back on the table asynchronously
. (this process takes about 2 to 3 minutes) .
I cannot get all these rows updated efficiently with 1 instance of
microservices. so planning to scale this up to multiple instances
my microservice instances is async daemon fetch business entity 1 at time and process the data & finally update the data back to the table.
. Here is where my problem between multiple instances how do I ensure no 2 microservice instance works with same business entity (same row) in the update process? I want to implement an optimal solution microservices probably without having to maintain any state on the application layer.

You have to use an external system (Database/Cache) to save information about each instance.
Example: Shedlock. Creates a table or document in the database where it stores the information about the current locks.

I would suggest you to use a worker queue. Which looks like a perfect fit for your problem. Just load the whole data or id of the data to the queue once. Then let the consumers consume them.
You can see an clear explanation here
https://www.rabbitmq.com/tutorials/tutorial-two-python.html

How can I divide one database to multi databases?

I want to decompose my application to adopt microservices architecture, and i will need to come up with a solid strategy to split my database (Mysql) into multiple small databases (mysql) aligned with my applications.

TL;DR: Depends on the scenario and from what each service will do
Although there is no clear answer to this, since it really depends on your needs and on what each service should do, you can come up with a general starting point (assuming you don't need to keep the existing database type).
Let's assume you have a monolithic application for an e-commerce, and you want to split this application into smaller services, each one with it's own database.
The approach you could use is to create some services that handles some parts of the website: for example you could have one service that handles users authentication,one for the orders, one for the products, one for the invoices and so on...
Now, each service will have it's own database, and here's come another question: which database a specific service should have? Because one of the advantages of this kind of architecture is that each service can have it's own kind of database, so for example the products service can have a non relational database, such as MongoDB, since all it does is getting details about products, so you don't have to manage any relation.
The orders service, on the other hand, could have a relational database, since you want to keep a relation between the order and the invoice for that order. But wait, invoices are handled by the invoice service, so how can you keep the relation between these two without sharing the database? Well, that's one of the "issues" of this approach: you have to keep services independent while also let them communicate each other. How can we do this? There is no clear answer here too... One approach could be to just pass all invoices details to the orders service as well, or you can just pass the invoice ID when saving the order and later retrieve the invoice via an API call to the invoice service, or you can pass all the relevant details you need for the invoice to an API endpoint in the order service that stores these data to a specific table in the database (since most of the times you don't need the entire actual object), etc... The possibilities are endless...

Microservices "JOINS"

Let's say we want to create the app with microservices.
We have some page where we display some items (products).
These products have multiple joins(categories, tags, users, and so on).
If users, categories data are within another services, how can we manage and filter the results?
For example in SQL you create 3,4 joins and get.
With microservices - I have to filter the categories, then filter tags and then products - this could be 10 time slower than the speed of the SQL query.
Also if I have table "products_categories" which set categories for each product which service is responsible for that? Product service or Category service ?
Thank you

In Microservices architecture there are two ways to deal with it.
The API composition pattern— This is the simplest approach and should be used whenever possible. It works by making clients of the services that own the data responsible for invoking the services and combining the results.
The Command query responsibility segregation (CQRS) pattern— This is more powerful than the API composition pattern, but it’s also more complex. It maintains one or more view databases whose sole purpose is to support queries.
I will prefer to use CQRS, Define a view database, which is a read-only replica to support specifically that query. The rest of the services keeps the replica up to date by subscribing to (create, update, insert)events published by the data owner services.

This is a very standard problem whenever any micro-service is built.. People just always feel micro-service is the solution for everything which is not true.
Solution to this problem is designing better. Designing so that there is a balance between performance and redundancy of data. Higher performance ( lower latency numbers ) means more duplicacy of data across different databases of microservice. You should not target to achieve performance as good as SQL Joins ; but also do not duplicate data too much. A balance is needed..
Most importantly, dividing the requirement into right set of micro-services is needed.

I assume you created a "microservice" per database table. Those are not microservices, those are just HTTP-based CRUD interfaces to your database.
First, know why you need microservices. (Is there an actual reason?) Second, you have to create microservices that encompass at least one full (business) functionality for your software. Meaning it doesn't need other services to do it.
If you need a table that needs data from multiple microservices, you by definition made wrong microservices. If a microservice can't provide it's own UI without the help of other services, it doesn't fully contain it's own functionality.

What's stopping you from having multiple services for reading / writing to the same database / table? For example:
One service to write to categories
One service to write to tags
One service to write to products
You could then write another service to read from all three of these services, however, this might not be at a HTTP level, instead you could read from the same database within your read service and leverage the power of SQL.
The service that reads could encompass your join logic which would mean you wouldn't need to consume the other services around it.

Distributed database design style for microservice-oriented architecture

I am trying to convert one monolithic application into micro service oriented architecture style. Back end I am using spring , spring boot frameworks for development. Front-end I am using angular 2. And also using PostgreSQL as database.
Here my confusion is that, when I am designing my databases as distributed, according to functionalities it may contain 5 databases. Means I am designing according to vertical partition. Then I am thinking to implement inter-microservice communication services to achieve the entire functionality.
The other way I am thinking that to horizontally partition the current structure. So my domain is based on some educational university. So half of university go under one DB and remaining will go under another DB. And deploy services according to Two region (two for two set of university).
Currently I am decided to continue with the last mentioned approach. I am new to these types of tasks, since it referring some architecture task. Also I am beginner to this microservice and distributed database world. Would someone confirm that my approach will give solution to my issue? Can I continue with my second approach - horizontal partitioning of databases according to domain object?

Can I continue with my second approach - Horizontal partitioning of
databases according to domain object?
Temporarily yes, if based on that you are able to scale your current system to meet your needs.
Now lets think about why on the first place you want to move to Microserices as a development style.
Small Components - easier to manager
Independently Deployable - Continous Delivery
Multiple Languages
The code is organized around business capabilities
and .....
When moving to Microservices, you should not have multiple services reading directly from each other databases, which will make them tightly coupled.
One service should be completely ignorant on how the other service designed its internal structure.
Now if you want to move towards microservices and take complete advantage of that, you should have vertical partition as you say and services talk to each other.
Also while moving towards microservices your will get lots and lots of other problems. I tried compiling on how one should start on microservices on this link .
How to separate services which are reading data from same table:
Now lets first create a dummy example: we have three services Order , Shipping , Customer all are three different microservices.
Following are the ways in which multiple services require data from same table:
Service one needs to read data from other service for things like validation.
Order and shipping service might need some data from customer service to complete their operation.
Eg: While placing a order one will call Order Service API with customer id , now as Order Service might need to validate whether its a valid customer or not.
One approach Database level exposure -- not recommened -- use the same customer table -- which binds order service to customer service Impl
Another approach, Call another service to get data
Variation - 1 Call Customer service to check whether customer exists and get some customer data like name , and save this in order service
Variation - 2 do not validate while placing the order, on OrderPlaced event check in async from Customer Service and validate and update state of order if required
I recommend Call another service to get data based on the consistency you want.
In some use cases you want a single transaction between data from multiple services.
For eg: Delete a customer. you might want that all order of the customer also should get deleted.
In this case you need to deal with eventual consistency, service one will raise an event and then service 2 will react accordingly.
Now if this answers your question than ok, else specify in what kind of scenario multiple service require to call another service.
If still not solved, you could email me on puneetjindal.11#gmail.com, will answer you

Currently I am decided to continue with the last mentioned approach.
If you want horizontal scalability (scaling for increasingly large number of client connections) for your database you may be better of with a technology that was designed to work as a scalable, distributed system. Something like CockroachDB or NoSQL. Cockroachdb for example has built in data sharding and replication and allows you to grow with adding server nodes as required.
when I am designing my databases as distributed, according to functionalities it may contain 5 databases
This sounds like you had the right general idea - split by domain functionality. Here's a link to a previous answer regarding general DB design with micro services.

In the Microservices world, each Microservice owns a set of functionalities and the data manipulated by these functionalities. If a microservice needs data owned by another microservice, it cannot directly go to the database maintained/owned by the other microservice rather it would call an API exposed by the other microservice.
Now, regarding the placement of data, there are various options - you can store data owned by a microservice in a NoSQL database like MongoDB, DynamoDB, Cassandra (it really depends on the microservice's use-case) OR you can have a different table for each micro-service in a single instance of a SQL database. BUT remember, if you choose a single instance of a SQL Database with multiple tables, then there would be no joins (basically no interaction) between tables owned by different microservices.
I would suggest you start small and then think about database scaling issues when the usage of the system grows.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio