Can Data Replication Deliver/Push One of Two Set Data to Client Nodes? - oracle

I step into a retail system merge project lately. A retail chain company acquires a far smaller different business retail chain company. The company decides to modify their retail system so that it also can be used in the acquired retail stores. Their retail system is built with the SAP retail application and Oracle Data replication with a store inventory application. They have one set of DB tables under one schema for read-only in the store application and another set of DB tables under another schema for data generated in their store application. In other words, the first set of DB table is for inbound data and the second set of DB tables for both outbound and inbound data from a store point of view.
The SDEs who built the store application suggest adding a new column, store type, to multiple tables for the inbound data to differentiate the two different retail business system data. For example, they want to add a store type column to their vendor table. To my understanding, data replication shall/can set up so that only related data is sent to a client node. For example, a store of one of their retail business system shall receive vendor inbound data for the business, but not any vendor data for the other system. If so, why a new column is needed? Those SDEs are not experts of data replication. I didn't know anything about data replication until three weeks ago. I don't know whether I miss something on this subject or not.

Related

How to handle data migrations in distributed microservice databases

so im learning about microservices and common patterns and i cant seem to find how to address this one issue.
Lets say that my customer needs a module managing customers, and a module managing purchase orders.
I believe that when dealing with microservices its pretty natural to split these two functionalities into separate services - each having its own data.
CustomerService
PurchaseOrderService
Also, he wants to have a table of purchase orders displaying the data of both customers and purchase orders, ie .: Customer name, Order number.
Now, i dont want to use the API Composition pattern because the user must be able to sort over any column he wants which (afaik) is impossible to do without slaughtering the performance using that pattern.
Instead, i choose CQRS pattern
after every purchase order / customer update a message is sent to the message broker
message broker notifies the third service about that message
the third service updates its projection in its own database
So, our third service .:
PurchaseOrderTableService
It stores all the required data in the single database - now we can query it, sort over any column we like while still maintaining a good performance.
And now, the tricky part .:
In the future, client can change his mind and say "Hey, i need the purchase orders table to display additional column - 'Customer country'"
How does one handle that data migration? So far, The PurchaseOrderTableService knows only about two columns - 'Customer name' and 'Order number'.
I imagine that this probably a pretty common problem, so what can i do to avoid reinventing the wheel?
I can of course make CustomerService generate 'CustomerUpdatedMessage' for every existing customer which would force PurchaseOrderTableService to update all its projections, but that seems like a workaround.
If that matters, the stack i thought of is java, spring, kafka, postgresql.
Divide the problem in 2:
Keeping live data in sync: your projection service from now on also needs to persist Customer Country, so all new orders will have the country as expected.
Backfill the older orders: this is a one off operation, so how you implement it really depends on your organization, technologies, etc. For example, you or a DBA can use whatever database tools you have to extract the data from the source database and do a bulk update to the target database. In other cases, you might have to solve it programmatically, for example creating a process in the projection microservice that will query the Customer's microservice API to get the data and update the local copy.
Also note that in most cases, you will already have a process to backfill data, because the need for the projection microservice might arrive months or years after the orders and customers services were created. Other times, the search service is a 3rd party search engine, like Elastic Search instead of a database. In those cases, I would always keep in hand a process to fully reindex the data.

Advice on Setup

I started my first data analysis job a few months ago and I am in charge of a SQL database and then taking that data and creating dashboards within Power BI. Our SQL database is replicated from an online web portal we use for data entry. We do not add data ourselves to the database but instead the data is put into tables based on the data entered into the web portal. Since this database is replicated via another company, I created our own database that is connected via linked server. I have built many views to pull only the needed data from the initial database( did this to limit the amount of data sent to Power BI for performance). My view count is climbing and wondering in terms of performance, is this the best way forward. The highest row count of a view is 32,000 and the lowest is around 1000 rows.
Some of the views that I am writing end up joining 5-6 tables together due to the structure built by the data web portal company that controls the database.
My suggestion would be to create a Datawarehouse schema ( star schema ) keeping as principal, one star schema per domain. For example one for sales, one for subscriptions, one for purchase, etc. Use the logic of Datamarts.
Identify your dimensions and your facts and keep evolving that schema. You will find out that you will end up with a much fewer number of tables.
Your data are not that big so you can use whatever ETL strategy you like.
Truncate load or incrimental.

Can I keep a copy of a table of one database in another database in a microservice architecture?

I am currently new to microservice architecture so thanks in advance.
I have two different services a User Service and a Footballer Service each having their individual databases.(User database and Footballer database).
The Footballer service has a database with a single table storing footballer informations.
The User service has a database which stores User details along with other user related data.
Now a User can add footballers to their team by querying the Footballer service and I need to store them somewhere in order to be displayed later.
Currently I'm storing the footballers for each user in a table in the User database whereby I make a call to the Footballer service to give me the details of a specific footballer by ID and save them in the USer database by mapping against the USer ID.
So is this a good idea to do that and by any chance does it mean im replicating data between two services
and if it is than what other ways can i achieve the same functionality ?
Currently I'm storing the footballers for each user in a table in the User database whereby I make a call to the Footballer service to give me the details of a specific footballer by ID and save them in the USer database by mapping against the USer ID.
"Caching" is a fairly common pattern. From the perspective of the User microservice, the data from Footballer is just another input which you might save or not. If you are caching, you'll usually want to have some sort of timestamp/version on the cached data.
Caching identifiers is pretty normal - we often need some kind of correlation identifier to connect data in two different places.
If you find yourself using Footballer data in your User domain logic (that is to say, the way that User changes depends on the Footballer data available)... that's more suspicious, and may indicate that your boundaries are incorrectly drawn / some of your capabilities are in the wrong place.
If you are expecting the User Service to be autonomous - that is to say, to be able to continue serving its purpose even when Footballer is out of service, then your code needs to be able to work from cached copies of the data from Footballer and/or be able to suspend some parts of its work until fresh copies of that data are available.
People usually follow DDD (Domain driven design) in case of micro-services :
So here in your case there are two domains i.e. 2 services :
Users
Footballers
So, user service should only do user specific tasks, it should not be concerned about footballer's data.
Hence, according to DDD, the footballers that are linked to the user should be stored in football service.
Replicating the ID wouldn't be considered replication in case of microservices architecture.

How can I divide one database to multi databases?

I want to decompose my application to adopt microservices architecture, and i will need to come up with a solid strategy to split my database (Mysql) into multiple small databases (mysql) aligned with my applications.
TL;DR: Depends on the scenario and from what each service will do
Although there is no clear answer to this, since it really depends on your needs and on what each service should do, you can come up with a general starting point (assuming you don't need to keep the existing database type).
Let's assume you have a monolithic application for an e-commerce, and you want to split this application into smaller services, each one with it's own database.
The approach you could use is to create some services that handles some parts of the website: for example you could have one service that handles users authentication,one for the orders, one for the products, one for the invoices and so on...
Now, each service will have it's own database, and here's come another question: which database a specific service should have? Because one of the advantages of this kind of architecture is that each service can have it's own kind of database, so for example the products service can have a non relational database, such as MongoDB, since all it does is getting details about products, so you don't have to manage any relation.
The orders service, on the other hand, could have a relational database, since you want to keep a relation between the order and the invoice for that order. But wait, invoices are handled by the invoice service, so how can you keep the relation between these two without sharing the database? Well, that's one of the "issues" of this approach: you have to keep services independent while also let them communicate each other. How can we do this? There is no clear answer here too... One approach could be to just pass all invoices details to the orders service as well, or you can just pass the invoice ID when saving the order and later retrieve the invoice via an API call to the invoice service, or you can pass all the relevant details you need for the invoice to an API endpoint in the order service that stores these data to a specific table in the database (since most of the times you don't need the entire actual object), etc... The possibilities are endless...

Parallel processing of records from database table

I have a relational table that is being populated by an application. There is a column named o_number which can be used to group the records.
I have another application that is basically having a Spring Scheduler. This application is deployed on multiple servers. I want to understand if there is a way where I can make sure that each of the scheduler instances processes a unique group of records in parallel. If a set of records are being processed by one server, it should not be picked up by another one. Also, in order to scale, we would want to increase the number of instances of the scheduler application.
Thanks
Anup
This is a general question, so here's my general 2 cents on the matter.
You create a new layer managing the requesting originating from your application instances to the database. So, probably you will be building a new code/project running on the same server as the database (or some other server). The application instances will be talking to that managing layer instead of the database directly.
The manager will keep track of which records are requested hence fetch records that are yet to be processed upon each new request.

Resources