Elasticsearch in microservices architecture, design question [closed] - elasticsearch

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 3 years ago.
Improve this question
I am designing a system, where I have several microservices communicating via middleware.
Now, every blueprint about microservices underlines that microservices must be autonomous and each of them must handle their own data. Currently, each microservice in my system does store data in a relational database.
I have a new requirement to implement a full-text search, each of my microservices is storing possibly searchable entities.
I was thinking to use an ElasticSearch cluster, where I'd have several indexes, indexes would serve as boundaries that separate the data which comes from various micro-services. I would like to underscore the fact that I plan to use ES only as a search engine, not as a system of record.
Here is my dilemma:
1. Should I allow each microservice to handle ES interactions directly (as caching and persistence)? 2. Or should I create a separate microservice (let's call it "search"), which would be the one that interacts with the ES cluster?
I am leaning towards 1. b/c since each microservice has to be autonomous in persistence, caching, it can also handle full-text searches too.
It will be interesting to hear out different opinions.
UPDATE:
Here is why I think each microservice should handle their searches individually:
To me, a full-text search capability is similar to persistence and caching layers, each micro-service knows better the business model and is responsible for implementing those layers individually.
If I introduce one more microservice just for doing searches, I'll have one extra possible point of failure, same goes to using PubSub as a middleman if we do not want direct interaction between search microservice and the rest of the pack.
On the contrary, using ES directly, which is a highly-available SaaS, eliminates single point of failure.
All write requests will be fast and there will be no lag. Information will be searchable right away. This will guarantee a seamless user experience.
I do not see search as another business process (maybe my understanding is flawed). To me, it is just a nice-to-have feature, not part of core functionality. However, once implemented, I want it to provide a great user experience.
This model of having an individual search microservice reminds of CQRS (command query responsibility segregation) architectural pattern. Where I'd first push the data to DB in my microservice A, then publish it to the messaging broker (command), a message would be picked up from the queue by the consumer and pushed into ES. Then frontend, on the read path (query), would go directly to search microservice.
I have never seen this pattern implemented for searching, it makes sense to do it in a big data world, where one microservice would ingest the data, then the worker process aggregates it for analytics and pushes it into an aggregated data table or separate data store and only then the data will become queryable via separate micro-service, that is enabling fetching of the analytics data.
Are there any publications out there or successful implementations of the CQRS pattern for ES (taking to consideration that ES is not used as a primary system of record but as a full-text search engine)?

Another search service would be overly abstracting it.
What I would do:
Use Xpack Security RBAC, which is now free, to lock down the indices for each micro service to an account that the service is configured to use: https://www.elastic.co/blog/security-for-elasticsearch-is-now-free
Use search templates in Elasticsearch to abstract out the search logic from the services, to ES, then have the services call the templates.

I would go with a separate Search service. There are several reasons for that.
It's another (business) process so you can be more flexible. Let's say you might have CustomerMasterData service and CustomerAddress service. But search requirements are to be able to search either by customer name or by address. Having two different Servers/ES indexes will not make you life easier. However in case of separate search service you can actually build index that holds data from different sources.
Service should own data. It means that Search should be the only service that has access to the ES index directly.
Filling ES index could be separated and done via communication to another service. I would do it via messaging system. For instance Search service sends Sync request and other services that listen to the queue will send out data. It allows to keep things independent.

Related

How do I access data that my microservice does not own?

A have a microservice that needs some data it does not own. It needs a read-only cache of data that is owned by another service. I am looking for guidence on how to implement this.
I dont' want my microserivce to call another microservice. I have too much data that is used in a join for this to be successful. In addition, I don't want my service to be dependent on another service (which may be dependent on another ...).
Currently, I am publishing an event to a queue. Then my service subscribes and maintains a copy of the data. I am haivng problem staying in sync with the source system. Plus, our DBAs are complaining about data duplication. I don't see a lot of informaiton on this topic.
Is there a pattern for this? What the name?
First of all, there are couple of ways to share data and two of them you mention.
One service call another service to get the data when it is required. This is good as you get up to date data and also there is no extra management required on consuming service. Problem is that if you are calling this too many times then other service performance may impact.
Another solution is maintained local copy of that data in consuming service using Pub/Sub mechanism.
Depending on your requirement and architecture you can keep this in actual db of consuming service or some type of cache ( persisted cache)
Here cons is consistency. When working with distributed architecture you will not get strong consistency but you have to depends on Eventual consistency.
Another solution is that and depends on your required you can separate out that tables that needs to join in some separate service. It depends on your use case.
If you still want consistency then at the time when first service call that update the data and then publish. Instead create some mediator component and that will call two service in sync fashion. Here things get complicated as you now try to implement transaction over distributed system.
One another point, when product build around Microservice architecture then it is not only technical move, as a organization and as a team your team needs to understand something that work in Monolith, it is not same in Microservices. DBA needs to understand that part and in Microservices Duplication of data across schema ( other aspect like code) prefer over reusability.
Last but not least, If it is always required to call another service to get data, It is worth checking service boundary as well. It may possible that sometime service needs to merge as business functionality required to stay together.

What approach we should follow to create relationship between two microservices without duplicating?

Microservice architecture is docker-based, one microservice(transaction database with userId) is in Node JS, and the other is in Rust language(User database). We need to create a common API or function to retrieve data from both microservices. MongoDB is used as Database for both microservices.
There are several approaches to do that.
One possible solution is that one of the microservices will be responsible of aggregate this data so this microservice will call the other to obtain the data and then combine it with its own data and return it to the caller. This makes sense when the operation to be done is part of the domain of one of the microservices. For example, if the consumer needs user information it is normal to call the user service and this service makes whatever calls are needed to other services to return all the information.
Another possibility is to use the BFF (Backend For Frontend) pattern, this makes sense when the consumer (for example a frontend) needs different information from different domains to populate the UI, in this case, you will create an additional service that will expose an API with all the information needed for the consumer and this service will do the aggregation of the information. In certain cases, this can be done directly in the API gateway if you are using one.
The third way is similar to the first one but it needs to duplicate data so I don't know if it will be suitable for you. It consists of having a read-only copy of the data owned by one of the service in the other service and updates it asynchronously using events when this data is modified. The benefit of this approach is the performance will be better because you don't need to make the communication between services. The disadvantage is eventual consistency.

Update/Add as separate service and Get as separate service

We started to migrate our existing project into microservice architecture. After going through a lot of videos/lectures, we came to a conclusion that a service should do one task and only one task and should be great at it. The services should be designed around Noun and Verb.
We have an entity which has basically CRUD operations. Now the add, update and delete are least used operations but GET requests at too high compared to those operations. Typically, update/add/delete are done by admin guys.
What we thought of is breaking the CRUD entity into two services
EntityCUDService (create/update/delete)
EntityLookupService (get)
Now both these services point to the same collection in mongo or say some SQL.
Now if EntityCUDService has done some changes to collection/table then EntityLookupService fails.
We heard of maintaining semantic versioning, that sounds okay but We also heard microservices should not share model/data source. So what would be the optimal solution to handle this where we have tons of gets but tens of updates/adds of same entity
Any help is greatly appreciated.
Typically, a micro service should manage single entity. So in your case you can have one micro-service to manage the entity (for various operations on the entity). Now if you want to split the service again on the basis of read and write operation then you are following the CQRS pattern. In CQRS , you split your micro-service on the basis of read and write operations. So now you will have 2 services one called command service and other called query service over the same entity. I will suggest to go with one service first to manage the entity and then if required split it more for separate service for read and write operations. Again if you are going to use CQRS, then have a look at event sourcing as it nicely fits with CQRS in micro-services design.

how to handle duplicated data in a micro service architecture

I am working on a jobs site where I am thinking of breaking out the jobs matching section into a micro service - everything else is a monolith.
But when thinking about how the microservice should have its own separate database, that would mean having the microservice have a separate copy of all the jobs, given the monolith would still handle all job crud functionality.
Am I thinking about this the right way and is it normal to have multiple copies of the same data spread out across different microservices?
The idea of having different databases with the same data scares me a bit, since that creates the potential for things to get out of sync.
You are trying to go away from monolith and the approach you are taking is very common, to take out part from monolith which can be converted into a microservice. Monolith starts to shrink over time and you have more number of MSs.
Coming to your question of data duplicacy, yes this is a challenge and some data needs to be duplicated but this vary case to case and difficult to say without looking into application.
You may expose API so monolith can get/create the data if needed and I strongly suggest not to sacrifice or compromise data model of microservice to avoid duplicacy, because MS will be going to more important than your monolith in future. Keep in mind you should avoid adding any new code to the monolith and even if you have to, for data ask the MS instead of the monolith.
One more thing you can try, instead of REST API call between microservices, you can use caching mechanism with event bus. Every microservice will publish CRUD changes to event bus, interested micro-service consume those events & update local cache accordingly.
Problem with REST call is, in some situation when dependent service is down we can not query main microservice, which could become bottleneck sometime.

Sharing huge data between microservices

I am designing an review analysis platform in microservices architecture.
Application is works like below;
all product reviews retrieved from ecommerce-site-a ( site-a ) as an excel file
reviews are uploaded to system with excel
Analysis agent can list all reviews, edit some of them, delete or approve
Analysis agent can export all reviews for site-a
Automated regexp based checks are applied for each review on upload and editing.
I have 3 microservices.
Reviews: Responsible for Review Crud operations plus operations similar to approve/reject..
Validations: Responsible for defining and applying validation rules on review.
Export/Import: Export service exports huge files given site name (like site-a)
The problem is at some point, validation service requires to get all reviews for site-a, apply validation rules and generate errors if is there any. I know sharing database schema's and entities breaks micro-services architecture.
One possible solution is
Whenever validation service requires reviews for a site, it requests gateway, gateway redirects request to Reviews service and response taken.
Two possible drawbacks of this approach is
validation service knows about gateway? Is it brings a dependency?
in case I have 1b reviews for a site, getting all reviews via rest request may be a problem. ( or not, I can make paginated requests from validation service to gateway..)
So what is the best practice for sharing huge data between micro-services without
sharing entity
and dublicating data
I read lot about using messaging queues but I think in my case it is not good to use messaging queue to share gigabytes of data.
edit 1: Instead of sharing entity, using data stores with rest API can be a solution? Assume I am using mongodb, instead of sharing my entity object between microservices, I can use rest interface of mongo (http://restheart.org/) and query data whenever possible.
Your problem here is not "sharing huge data", but rather the boundaries you choose to separate your micro services based on.
I can tell from your requirements that the 3 micro services you chose to separate (Reviews, Validations, Import/Export) are actually operating on the same context and business domain .. which is Reviews.
I would encourage you to reconsider your design decision and consider Reviews, as a single micro service, that handles all reviews operations and logic as a black box.
I assume that reviews are independent from each other and that validating a review therefore requires only that review, and no other reviews.
You don't want to share entities, which rules out things like shared databases, Hadoop clusters or data stores like Redis. You also don't want to duplicate data, thereby ruling out plain file copies or trigger-based replication on database level.
In summary, I'd say your aim should be a stream. Let the Validator request everything from Reviews about Site A, but not in one bulk, but in a stream of single or small packages of reviews.
The Validator can now process the reviews one after the other, at constant memory and processor consumption. To gain performance, you can make multiple instances of the Validator who pull different, disjunct pieces of the stream at the same time. Similarly, you can create multiple instances of the Reviews microservice if one alone wouldn't be able to answer the pull fast enough.
The Validator does not persist this stream, it produces only the errors and stores or sends them somewhere; this should fulfill your no-sharing no-duplication requirements.

Resources