How do I access data that my microservice does not own?

How do I access data that my microservice does not own? - microservices

A have a microservice that needs some data it does not own. It needs a read-only cache of data that is owned by another service. I am looking for guidence on how to implement this.
I dont' want my microserivce to call another microservice. I have too much data that is used in a join for this to be successful. In addition, I don't want my service to be dependent on another service (which may be dependent on another ...).
Currently, I am publishing an event to a queue. Then my service subscribes and maintains a copy of the data. I am haivng problem staying in sync with the source system. Plus, our DBAs are complaining about data duplication. I don't see a lot of informaiton on this topic.
Is there a pattern for this? What the name?

First of all, there are couple of ways to share data and two of them you mention.
One service call another service to get the data when it is required. This is good as you get up to date data and also there is no extra management required on consuming service. Problem is that if you are calling this too many times then other service performance may impact.
Another solution is maintained local copy of that data in consuming service using Pub/Sub mechanism.
Depending on your requirement and architecture you can keep this in actual db of consuming service or some type of cache ( persisted cache)
Here cons is consistency. When working with distributed architecture you will not get strong consistency but you have to depends on Eventual consistency.
Another solution is that and depends on your required you can separate out that tables that needs to join in some separate service. It depends on your use case.
If you still want consistency then at the time when first service call that update the data and then publish. Instead create some mediator component and that will call two service in sync fashion. Here things get complicated as you now try to implement transaction over distributed system.
One another point, when product build around Microservice architecture then it is not only technical move, as a organization and as a team your team needs to understand something that work in Monolith, it is not same in Microservices. DBA needs to understand that part and in Microservices Duplication of data across schema ( other aspect like code) prefer over reusability.
Last but not least, If it is always required to call another service to get data, It is worth checking service boundary as well. It may possible that sometime service needs to merge as business functionality required to stay together.

Related

Sharing entity with other microservice

There is a microservice in Spring with PostgreSQL database responsible for some Product entity.
As there is a lot of Product's and they are still growing exponentially we want to archive this data to other database (also PostgreSQL as we have best knowledge about it and we are limited by support of some other tool). In our main microservice (Product) is already happening lot of things so we want to extract archiving data to other job/microservice. We use migration tool in main microservice which is responsible for Product table changes.
Question: how to keep our Product entity synced with this new technical (archiving) microservice to let this new microservice always be able to get data from DB and push it in same state to other DB with same schema?

Don't.
The point of microservices is that each service has a narrow, clearly defined set of responsibilities, allowing them to be deployed independently of each other.
If you do this then your entity code will have two sets of responsibilities, and changes that might help it do something in one service might be unneeded or even cause issues in another. It complicates deployment and testing.
Better to keep separate code bases, allow the two services to evolve independently, and live with some duplication.
There is also the question of why an archive job would need jpa entities, this sounds more like a job for a bulk copy tool or replication service than jpa. Very likely this isn't the right technical choice, you'll have a very slow archive process that will end up getting rewritten to not use jpa and this effort to reuse the entity will have been wasted.

What approach we should follow to create relationship between two microservices without duplicating?

Microservice architecture is docker-based, one microservice(transaction database with userId) is in Node JS, and the other is in Rust language(User database). We need to create a common API or function to retrieve data from both microservices. MongoDB is used as Database for both microservices.

There are several approaches to do that.
One possible solution is that one of the microservices will be responsible of aggregate this data so this microservice will call the other to obtain the data and then combine it with its own data and return it to the caller. This makes sense when the operation to be done is part of the domain of one of the microservices. For example, if the consumer needs user information it is normal to call the user service and this service makes whatever calls are needed to other services to return all the information.
Another possibility is to use the BFF (Backend For Frontend) pattern, this makes sense when the consumer (for example a frontend) needs different information from different domains to populate the UI, in this case, you will create an additional service that will expose an API with all the information needed for the consumer and this service will do the aggregation of the information. In certain cases, this can be done directly in the API gateway if you are using one.
The third way is similar to the first one but it needs to duplicate data so I don't know if it will be suitable for you. It consists of having a read-only copy of the data owned by one of the service in the other service and updates it asynchronously using events when this data is modified. The benefit of this approach is the performance will be better because you don't need to make the communication between services. The disadvantage is eventual consistency.

Microservice architecture - is database shared across all instances of the service?

I understand that microservice architecture suggests that each service should have its own private database. But when such a service is scaled, then is it one db per service instance or one db shared by all service instances?

Your first statement may be misleading to some: "each service should have its own private database."
Your architecture should be careful about sharing a single set of tables across multiple services-- that sharing frequently leads to a shared schema dependency, which creates a tight coupling that makes it difficult to update the schema without updating many of the services that share that schema at the same time.
However, sharing a single database instance (or database cluster) doesn't mean your services are accessing the same tables or even the same schema within the database. And if they aren't accessing the same tables, they aren't coupled. (Relying on the same database instance isn't coupling any more than relying on the same network. Don't confuse coupling with shared infrastructure.)
Frequently, multiple instances of the same service share the same database. In my opinion, there is nothing inherently wrong with this, but there are some things to be aware of. If you go this route, you need to be very careful when making changes to the data schema. Because multiple versions of that service may be accessing the data at the same time during updates, any schema changes need to compatible to at least any two adjacent versions. If you add a column or table, that's fine. The older version won't attempt to use it, so there will be no problem. (Note too, that the older version won't populate it either.) Removing a column or table is another problem entirely and to make that kind of breaking change, you will likely need to do it in several smaller steps to ensure that the older version of the service isn't broken. It can be done, it's just tougher.

A general rule of microservice development is that each microservice
should manage its own data. In an ideal world, the data managed by
each service would be completely independent. There would be no need
to propagate data changes made in one service to other services.
In the real world, however, complete data independence is impossible.
There will always be overlaps between the data used in different
services, Consequently, as an architect, you need to think carefully about
sharing data and managing data consistency. You need to think about
the microservices as an interacting system rather than as individual
units.
This means:
You should isolate data within each system service with as little
data sharing as possible.
If data sharing is mavoidable, you should design microservices so
that most sharing is read-only, with a minimal number of
services responsible for data updates.
If services are replicated in your system, you must include a
mechanism that can keep the database copies used by replica
services consistent.

Good question indeed. I would answer it like: "at least a database per microservice (not instance)"
A concern is the scalability of the databse itself, i.e. can service instances outscale the database?
If so, you could opt for e.g. an in-memory database or a sidecar for your microservice. The database would be ephemeral and you would need to populate it after the pod/container (re)starts. So the state not really lives in the database.
Apache Kafka is a tool that fits this spot, as it would allow you to populate the database after the service comes up and also provides the tooling to synchronize state for all currently running and future instances. But successfully implementing a Event-Sourcing with Kafka is not a trivial task, but you could come the conclusion that you don't need databases at all.
So the question remains, can service instances really outscale the database?
The answer would be "no" more often than not.
So by having a database instance per microservice (physically or logically) already gives you a lot in terms of "loose coupling and cohesive behaviour" as you don't share databases.
Another concern are breaking changes to the database between versions of the microservice. If things go wrong you could find yourself being unable to rollback. An ephemeral database could sync itself up in a compatible way.
Some say they change database technologies throughout the lifetime of a microservice, I never had the neccessity to do so, but an in-memory/sidecar approach would fit here very well.

I presume you share one database with all instances of one microservice. So that one update is available for every instance of the same microservice immediately. You may use one database instance per microservice instance to avoid the database as a single point of failure. But you would have to keep in sync every database which, it seems like an unnecesary overload for the database and application. I assume the database is able to keep a group of db instances in sync (every insert,update, delete is properly propagated).

how to handle duplicated data in a micro service architecture

I am working on a jobs site where I am thinking of breaking out the jobs matching section into a micro service - everything else is a monolith.
But when thinking about how the microservice should have its own separate database, that would mean having the microservice have a separate copy of all the jobs, given the monolith would still handle all job crud functionality.
Am I thinking about this the right way and is it normal to have multiple copies of the same data spread out across different microservices?
The idea of having different databases with the same data scares me a bit, since that creates the potential for things to get out of sync.

You are trying to go away from monolith and the approach you are taking is very common, to take out part from monolith which can be converted into a microservice. Monolith starts to shrink over time and you have more number of MSs.
Coming to your question of data duplicacy, yes this is a challenge and some data needs to be duplicated but this vary case to case and difficult to say without looking into application.
You may expose API so monolith can get/create the data if needed and I strongly suggest not to sacrifice or compromise data model of microservice to avoid duplicacy, because MS will be going to more important than your monolith in future. Keep in mind you should avoid adding any new code to the monolith and even if you have to, for data ask the MS instead of the monolith.

One more thing you can try, instead of REST API call between microservices, you can use caching mechanism with event bus. Every microservice will publish CRUD changes to event bus, interested micro-service consume those events & update local cache accordingly.
Problem with REST call is, in some situation when dependent service is down we can not query main microservice, which could become bottleneck sometime.

Microservices and isolated persistence - how should the data be stored/fetched?

At my company, we're about to move to the micro services architecture. I read a lot about it, and there are tons of obscure areas where it's specific to the project built, but one area seems to get everyone to agree, microservices need to have isolated persistence or another way to say it, they need to have they own database.
Now I love the idea, that means every microservice has its own database schema, its own domain objects and is 100% independent of any other microservice data structure.
There are things I don't quite understand though.
The "Customer Service" is obviously central to the application, and we can see that basically any other microservice will need some data about the user at some point. Whether it'd be the user's credit amount, its ID, or its name.
But since other microservices can't directly read into the Customer Service database, they'll need to query this service over and over again. This is fine (I guess) for simple stuff like getting the name of current logged user, but when we need to display 60 users on a page and we can't do any SQL join, it feels like we're missing something. This is even worse when microservices depend upon tons of microservices.
So I found out that some people actually queried microservices X times a day to get data into their own microservices.
So if microservice "Search" needs data from "Product", "Customer", it'll actually query these microservices and will persist the data with its own data structure.
The question I have is should it be "Search" that queries "Product" and "Customer", or should "Product" and "Customer" send data to "Search" ?
The first option looks a bit easier to do, we only need to have this logic on one side, and that's where the data is needed. But we'll only get static freshness of data which is not very smart, but could definitely work.
The second option looks a bit more difficult but more scalable too, because we could have very fresh data when we need it, since the data changed where it's sent, it could also be more granular.

I think you correctly identified downsides to the microservices approach! And there are no elegant solutions to these specific problems. You will have to eat the additional work and architecture deterioration that this brings.
Concretely addressing your question now:
The question I have is should it be "Search" that queries "Product" and "Customer", or should "Product" and "Customer" send data to "Search" ?
You seem to be looking for a data synchronization service. You want to decide between push and pull. You are concerned about data freshness and logic duplication.
The key point here is that the source service cannot know about its consumers. This is to prevent an unwanted reverse dependency. This would break architectural isolation. Any data sync process that maintains this is fine. You can do what is most convenient.
For example, you could make the data source expose two APIs:
An API to get the whole data set. This would be called periodically by the destination (e.g. nightly). It can also be used to seed the destination at will and to fix data errors there.
A feed of changes in the source database keyed by the date and time the change occurred. The destination can now poll that change feed very frequently (e.g. every few seconds or minutes) and apply the small delta that occurred.
You can even build a realtime change feed through a publish-subscribe middleware. Many message queue softwares can do that. The source would just send out changes to the middleware.
Building all of this is conceptually simple but takes a lot of work. It also creates lots of ongoing work and increases the potential for bugs. Debugging becomes much harder. I have worked on systems like that.
I'm going to add a subjective note: Microservices are not well understood by many teams. The downsides are often ignored. You identified a few of the downsides correctly and they are nasty! Given what I read on the web I believe many teams do not realize the mess they are getting themselves into. Managing disparate data stores can be a nightmare. This is not a one-time "mess" but an ongoing one.
As an alternative I'd recommend using a common data store and building services simply as classes or projects that live in the same process. This gives you the microservices code structuring with the convenience of normal development. It also leaves a few of the upsides of microservices on the table.

your identification of the problem is correct.
But the solution to your problem will depend on use case to use case.
In your example of search service , product service and customer service should publish their events on kafka or similar messaging and search service listen to them and updates it.
In case of lets say in order service while creating an order for a customer , you want to check customer exists , then you might do it by calling the sync api of customer service , but for that also there are variour other approaches , i have answered here linking Microservices and allowing for one to be unavailable
From my perspective sync communication between services should be avoided , and there are way around for this , above link would help
You can use domain driven design philosophy to correctly break your services and their contract

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio